Gene expression microarray is a rapidly maturing technology that provides the opportunity to assay the expression levels of thousands or tens of thousands of genes in a single experiment. We present a new heuristic to select relevant gene subsets in order to further use them for the classification task. Our method is based on the statistical significance of adding a gene from a ranked-list to the final subset. The efficiency and effectiveness of our technique is demonstrated through extensive comparisons with other representative heuristics. Our approach shows an excellent performance, not only at identifying relevant genes, but also with respect to the computational cost.
A groundwater treatment technology based on catalytic
reductive dehalogenation has been developed to efficiently
destroy chlorinated hydrocarbons in situ using a reactive
well approach. The treatment process utilizes dissolved
H2 as an electron donor, in the presence of a commercial
palladium-on-alumina catalyst, to rapidly reduce common
chlorinated aliphatics such as trichloroethylene and
tetrachloroethylene into nonchlorinated hydrocarbons
such as ethane. Rapid reaction rates permit the deployment
of a treatment unit within a dual-screened well bore,
allowing contaminated groundwater to be drawn from one
water-bearing zone, treated within the well bore, and
discharged to an adjacent zone with only one pass through
the system. A demonstration groundwater treatment
system based on this concept was evaluated in a chlorinated
hydrocarbon contaminated aquifer at a major Superfund
site. The system rapidly destroyed a variety of common
contaminants such as TCE and PCE and maintained its
performance for a test period of 1 year. Operation of the
treatment system was optimized to maintain catalyst activity
and to prevent formation of intermediate compounds.
At present, automated data collection tools allow us to collect large amounts of information, not without associated problems. This paper, we apply feature selection to several software engineering databases selecting attributes with the final aim that project managers can have a better global vision of the data they manage. In this paper, we make use of attribute selection techniques in different datasets publicly available (PROMISE repository), and different data mining algorithms for classification to defect faulty modules. The results show that in general, smaller datasets with less attributes maintain or improve the prediction capability with less attributes than the original datasets.
Decision making has been traditionally based on managers experience. At present, there is a number of software engineering (SE) repositories, and furthermore, automated data collection tools allow managers to collect large amounts of information, not without associated problems. On the one hand, such a large amount of information can overload project managers. On the other hand, problems found in generic project databases, where the data is collected from different organizations, is the large disparity of its instances. In this paper, we characterize several software engineering databases selecting attributes with the final aim that project managers can have a better global vision of the data they manage. In this paper, we make use of different data mining algorithms to select attributes from the different datasets publicly available (PROMISE repository), and then, use different classifiers to defect faulty modules. The results show that in general, the smaller datasets maintain the prediction capability with a lower number of attributes than the original datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.