Exhaled volatile organic compounds (VOCs) have shown promise in diagnosing chronic obstructive pulmonary disease (COPD) but studies have been limited by small sample size and potential confounders. An investigation was conducted in order to establish whether combinations of VOCs could identify COPD patients from age and BMI matched controls. Breath samples were collected from 119 stable COPD patients and 63 healthy controls. The samples were collected with a portable apparatus, and then assayed by gas chromatography and mass spectroscopy. Machine learning approaches were applied to the data and the automatically generated models were assessed using classification accuracy and receiver operating characteristic (ROC) curves. Cross-validation of the combinations correctly predicted the diagnosis in 79% of COPD patients and 64% of controls and an optimum area under the ROC curve of 0.82 was obtained. Comparison of current and ex smokers within the COPD group showed that smoking status was likely to affect the classification; with correct prediction of smoking status in 85% of COPD subjects. When current smokers were omitted from the analysis, prediction of COPD was similar at 78% but correct prediction of controls was increased to 74%. Applying different analytical methods to the largest group of subjects so far, suggests VOC analysis holds promise for diagnosing COPD but smoking status needs to be balanced.
Of all of the challenges which face the effective application of computational intelligence technologies for pattern recognition, dataset dimensionality is undoubtedly one of the primary impediments. In order for pattern classifiers to be efficient, a dimensionality reduction stage is usually performed prior to classification. Much use has been made of Rough Set Theory for this purpose as it is completely data-driven and no other information is required; most other methods require some additional knowledge. However, traditional rough set-based methods in the literature are restricted to the requirement that all data must be discrete. It is therefore not possible to consider real-valued or noisy data. This is usually addressed by employing a discretisation method, which can result in information loss. This paper proposes a new approach based on the tolerance rough set model, which has the ability to deal with real-valued data whilst simultaneously retaining dataset semantics. More significantly, this paper describes the underlying mechanism for this new approach to utilise the information contained within the boundary region or region of uncertainty. The use of this information can result in the discovery of more compact feature subsets and improved classification accuracy. These results are supported by an experimental evaluation which compares the proposed approach with a number of existing feature selection techniques.
N. Mac Parthal?in, Q. Shen, and R. Jensen. A Distance Measure Approach to Exploring the Rough Set Boundary Region for Attribute Reduction. IEEE Transactions on Knowledge and Data Engineering, vol. 22, no.3, pp. 306--317, 2010.Feature Selection (FS) or Attribute Reduction techniques are employed for dimensionality reduction and aim to select a subset of the original features of a dataset which are rich in the most useful information. The benefits of employing FS techniques include improved data visualisation and transparency, a reduction in training and utilisation times and potentially, improved prediction performance. Many approaches based on rough set theory up to now, have employed the dependency function, which is based on lower approximations as an evaluation step in the FS process. However, by examining only that information which is considered to be certain and ignoring the boundary region, or region of uncertainty, much useful information is lost. This paper examines a rough set FS technique which uses the information gathered from both the lower approximation dependency value and a distance metric which considers the number of objects in the boundary region and the distance of those objects from the lower approximation. The use of this measure in rough set feature selection can result in smaller subset sizes than those obtained using the dependency function alone. This demonstrates that there is much valuable information to be extracted from the boundary region. Experimental results are presented for both crisp and real-valued data and compared with two other FS techniques in terms of subset size, runtimes and classification accuracy.Peer reviewe
A better understanding of the behavior of individual grazing dairy cattle will assist in improving productivity and welfare. Global positioning systems (GPS) applied to cows could provide a means of monitoring grazing herds while overcoming the substantial efforts required for manual observation. Any model of behavioral prediction using GPS needs to be accurate and robust by accounting for inter-cow variation as well as atmospheric effects. We evaluated the performance using a series of machine learning algorithms on GPS data collected from 40 pasture-based dairy cows over 4 mo. A feature extraction step was performed on the collected raw GPS data, which resulted in 43 different attributes. The evaluated behaviors were grazing, resting, and walking. Classifier learners were built using 10 times 10-fold cross validation and tested on an independent test set. Results were evaluated using a variety of statistical significance tests across all parameters. We found that final model selection depended upon level of performance and model complexity. The classifier learner deemed most suitable for this particular problem was JRip, a rule-based learner (classification accuracy=0.85; false positive rate=0.10; F-measure=0.76; area under the receiver operating curve=0.87). This model will be used in further studies to assess the behavior and welfare of pasture-based dairy cows.
Hospital length of stay of patients is a crucial factor for the effective planning and management of hospital resources. There is considerable interest in predicting the LoS of patients in order to improve patient care, control hospital costs and increase service efficiency. This paper presents an extensive review of the literature, examining the approaches employed for the prediction of LoS in terms of their merits and shortcomings. In order to address some of these problems, a unified framework is proposed to better generalise the approaches that are being used to predict length of stay. This includes the investigation of the types of routinely collected data used in the problem as well as recommendations to ensure robust and meaningful knowledge modelling. This unified common framework enables the direct comparison of results between length of stay prediction approaches and will ensure that such approaches can be used across several hospital environments. A literature search was conducted in PubMed, Google Scholar and Web of Science from 1970 until 2019 to identify LoS surveys which review the literature. 32 Surveys were identified, from these 32 surveys, 220 papers were manually identified to be relevant to LoS prediction. After removing duplicates, and exploring the reference list of studies included for review, 93 studies remained. Despite the continuing efforts to predict and reduce the LoS of patients, current research in this domain remains ad-hoc; as such, the model tuning and data preprocessing steps are too specific and result in a large proportion of the current prediction mechanisms being restricted to the hospital that they were employed in. Adopting a unified framework for the prediction of LoS could yield a more reliable estimate of the LoS as a unified framework enables the direct comparison of length of stay methods. Additional research is also required to explore novel methods such as fuzzy systems which could build upon the success of current models as well as further exploration of black-box approaches and model interpretability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.