We present CoLe, a model for cooperative agents for mining knowledge from heterogeneous data. CoLe allows for the cooperation of different mining agents and the combination of the mined knowledge into knowledge structures that no individual mining agent can produce alone. CoLe organizes the work in rounds so that knowledge discovered by one mining agent can help others in the next round. We implemented a multi-agent system based on CoLe for mining diabetes data, including an agent using a genetic algorithm for mining event sequences, an agent with improvements to the PART algorithm for our problem and a combination agent with methods to produce hybrid rules containing conjunctive and sequence conditions. In our experiments, the CoLe-based system outperformed the individual mining algorithms, with better rules and more rules of a certain quality. From the medical perspective, our system confirmed hypertension has a tight relation to diabetes, and it also suggested connections new to medical doctors.
We propose a machine learning method to automatically classify the extracted ngrams from a corpus into terms and non-terms. We use 10 common statistics in previous term extraction literature as features for training. The proposed method, applicable to term recognition in multiple domains and languages, can help 1) avoid the laborious work in the post-processing (e.g. subjective threshold setting); 2) handle the skewness and demonstrate noticeable resilience to domain-shift issue of training data. Experiments are carried out on 6 corpora of multiple domains and languages, including GENIA and ACLRD-TEC(1.0) corpus as training set and four TTC subcorpora of wind energy and mobile technology in both Chinese and English as test set. Promising results are found, which indicate that this approach is capable of identifying both single word terms and multiword terms with reasonably good precision and recall.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.