Abstract-The paper is devoted to classification of MEDLINE abstracts into categories that correspond to types of medical interventions -types of patient treatments. This set of categories was extracted from Clinicaltrials.gov web site. Few classification algorithms were tested including Multinomial Naive Bayes, Multinomial Logistic Regression, and Linear SVM implementations from sklearn machine learning library. Document marking was based on the consideration of abstracts containing links to the Clinicaltrials.gov Web site. As the result of an automatical marking 3534 abstracts were marked for training and testing the set of algorithms metioned above. Best result of multinomial classification was achieved by Linear SVM with macro evaluation precision 70.06%, recall 55.62% and F-measure 62.01%, and micro evaluation precision 64.91%, recall 79.13% and F-measure 71.32%.
Based on the results of computational experiments, the best results of abstract clustering by containing and not containing medical intervention were obtained using the K-Means ++ algorithm together with LSA, choosing the first 210 facts. The quality of classification abstracts by subtypes of medical interventions value for existing ones [8] has been improved using non linear SVM algorithm, with "bag of words" model and the removal of stop words. The results of clustering obtained in this study will help in grouping abstracts by levels of evidence, using the classification by subtypes of medical interventions and it will be possible to extract information from the abstracts on specific types of interventions.
The paper describes a process of clustering of article abstracts, taken from the largest bibliographic life sciences and biomedical information MEDLINE database into categories that correspond to types of medical interventions -types of patient treatments. Experiments were carried out to evaluate the quality of clustering for the following algorithms: K-means; K-means++; Hierarchical clustering, SIB (Sequential information bottleneck) together with the LSA (Latent Semantic Analysis) methods and MI (Mutual Information) which allow selecting feature vectors. Best results of clustering were achieved by K-means++ together with LSA then 210-dimensional space was chosen: Purity = 0.5719, Entropy = 1.3841, Normalized Entropy = 0.6299.
A369open-source R package repository. We reproduced in a concise and readable format all the results of the analyses described in "Decision Modelling for Health Economic Evaluation", such as homogeneous and non-homogeneous (with time-varying properties) Markov models, as well as sensitivity and probabilistic uncertainty analysis (where it is possible to specify arbitrary distributions and correlation structures between parameters). ConClusions: This work shows that it is possible to develop complex Markov models easily in the R language without sacrificing transparency, reproducibility or mathematical exactitude. The free and open-source license facilitates code review and improvement of the package by third-party experts. We hope the availability of this package will facilitate the use of script-based approaches to health evaluation modelling and help improve the overall quality and reproducibility of studies in this domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.