Terminal restriction fragment length polymorphism (T-RFLP) analysis has the potential to be useful for comparisons of complex bacterial communities, especially to detect changes in community structure in response to different variables. To do this successfully, systematic variations have to be detected above methodassociated noise, by standardizing data sets and assigning confidence estimates to relationships detected. We investigated the use of different standardizing methods in T-RFLP analysis of PCR-amplified 16S rRNA genes to elucidate the similarities between the bacterial communities in 17 soil and sediment samples. We developed a robust method for standardizing data sets that appeared to allow detection of similarities between complex bacterial communities. We term this the variable percentage threshold method. We found that making conclusions about the similarities of complex bacterial communities from T-RFLP profiles generated by a single restriction enzyme (RE) may lead to erroneous conclusions. Instead, the use of multiple REs, each individually, to generate multiple data sets allowed us to determine a confidence estimate for groupings of apparently similar communities and at the same time minimized the effects of RE selection. In conjunction with the variable percentage threshold method, this allowed us to make confident conclusions about the similarities of the complex bacterial communities in the 17 different samples.
Abstract. Documents are co-derivative if they share content: for two documents to be co-derived, some portion of one must be derived from the other or some portion of both must be derived from a third document. The current technique for concurrently detecting all co-derivatives in a collection is document fingerprinting, which matches documents based on the hash values of selected document subsequences, or chunks. Fingerprinting is currently hampered by an inability to accurately isolate information that is useful in identifying co-derivatives. In this paper we present spex, a novel hash-based algorithm for extracting duplicated chunks from a document collection. We discuss how information about shared chunks can be used for efficiently and reliably identifying coderivative clusters, and describe deco, a prototype system which makes use of spex. Our experiments with several document collections demonstrate the effectiveness of the approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.