BackgroundMany biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance.MethodsIn this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features.ResultsWe tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression.ConclusionIt demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied.
Here we describe an effective, reproducible, non-invasive volatile organic compound collection and analysis method for exhaled breath gas samples designed specifically for use with dogs. Conditions of the method were optimized, using a range of standard chemicals. This method utilizes a canine mask, two-way non-re-breathing valve, teflon connector, tubing and bag for sample collection. Collection is followed by condensation and headspace solid phase microextraction for sample concentration and gas chromatography-mass spectrometry for analysis. Custom-made glassware, designed to hold the SPME fiber assembly, was cooled to -10 °C and used for the collection of the condensate followed by 2 h of headspace extraction at 37 °C. Standards show LOD of 0.6-16.8 ppbv, LOQ between 2.1-55.8 ppbv, and good linearity with R(2) between 0.996-0.999 (RSD% 10-19). The method was verified with preliminary results from three dogs demonstrating that this technique is capable of collecting, identifying and quantifying volatile organic chemical constituents in different breath samples.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.