We examined the effect of subsample size on the accuracy of information obtained from aquatic macroinvertebrate assemblage samples. Subsamples containing 100 organisms or 300 organisms were compared on the bases of processing time and the ability to discern ecological differences among samples. Independently of subsample size, assemblages differed between study streams, primarily reflecting an intermittent vs. permanent stream difference, and between seasons at most streams. It required, on average, two additional hours to process the larger subsamples. Larger subsamples gave significantly higher estimates of total richness and Ephemeroptera, Plecoptera, and Trichoptera (EPT) richness, but the relative abundances of many assemblage subsets (e.g., EPT organisms and most functional feeding groups) were similar using both subsample sizes. Larger subsamples did not typically enhance the ability to discriminate between samples from different seasons, but did more accurately distinguish among streams when differences were subtle. They also appeared to avoid Type I error in comparisons of compositionally similar reaches within a study stream.
Purpose -This paper seeks to present a complete set of graphical and numerical outputs of data mining performed for microarray databases of plant data as described in earlier research by the authors. A brief description of data mining is also presented, as well as a brief background of previous research. Design/methodology/approach -The paper uses applications of data mining using SAS Enterprise Miner Version 4 for plant data from the Osmotic Stress Microarray Information Database (OSMID) that is available on the web for both normalized and log(2) transformed data. Findings -This paper illustrates that useful information about the effects of environmental stress tolerances (ESTs) on plants can be obtained by using data mining. Research limitations/implications -Use of SAS Enterprise Miner was very effective for performing data mining of microarray databases with its modules of cluster analysis, decision trees, and descriptive and visual statistics. Practical implications -The data used from the OSMID database are considered to be representative of those that could be used for biotech application such as the manufacture of plant-made-pharmaceuticals and genetically modified foods. Originality/value -This paper contributes to the discussion on the use of data mining for microarray databases and specifically for studying the effects of ESTs on plants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.