In almost every scientific field, measurements are performed over time. These observations lead to a collection of organized data called time series. The purpose of time series data mining is to try to extract all meaningful knowledge from the shape of data. Even if humans have a natural capacity to perform these tasks, it remains a complex problem for computers. In this paper we intend to provide a survey of the techniques applied for time series data mining. The first part is devoted to an overview of the tasks that have captured most of the interest of researchers. Considering that in most cases, time series task relies on the same components for implementation, we divide the literature depending on these common aspects, namely representation techniques, distance measures and indexing methods. The study of the relevant literature has been categorized for each individual aspects. Four types of robustness could then be formalized and any kind of distance could then be classified. Finally, the study submit various research trends and avenues that can be explored in the near future. We hope that this paper can provide a broad and deep understanding of the time series data mining research field.
Tagging amplicons with tag sequences appended to PCR primers allow the multiplexing of numerous samples for high-throughput sequencing (HTS). This approach is routinely used in HTS-based diversity analyses, especially in microbial ecology and biomedical diagnostics. However, amplicon library preparation is subject to pervasive sample sequence cross-contaminations as a result of tag switching events referred to as mistagging. Here, we sequenced seven amplicon libraries prepared using various multiplexing designs in order to measure the magnitude of this phenomenon and its impact on diversity analyses. Up to 28.2% of the unique sequences correspond to undetectable (critical) mistags in single- or saturated double-tagging libraries. We show the advantage of multiplexing samples following Latin Square Designs in order to optimize the detection of mistags and maximize the information on their distribution across samples. We use this information in designs incorporating PCR replicates to filter the critical mistags and to recover the exact composition of mock community samples. Being parameter-free and data-driven, our approach can provide more accurate and reproducible HTS data sets, improving the reliability of their interpretations.
The measurement of species diversity represents a powerful tool for assessing the impacts of human activities on marine ecosystems. Traditionally, the impact of fish farming on the coastal environment is evaluated by monitoring the dynamics of macrobenthic infaunal populations. However, taxonomic sorting and morphology-based identification of the macrobenthos demand highly trained specialists and are extremely time-consuming and costly, making it unsuitable for large-scale biomonitoring efforts involving numerous samples. Here, we propose to alleviate this laborious task by developing protist metabarcoding tools based on next-generation sequencing (NGS) of environmental DNA and RNA extracted from sediment samples. In this study, we analysed the response of benthic foraminiferal communities to the variation of environmental gradients associated with salmon farms in Scotland. We investigated the foraminiferal diversity based on ribosomal minibarcode sequences generated by the Illumina NGS technology. We compared the molecular data with morphospecies counts and with environmental gradients, including distance to cages and redox used as a proxy for sediment oxygenation. Our study revealed high variations between foraminiferal communities collected in the vicinity of fish farms and at distant locations. We found evidence for species richness decrease in impacted sites, especially visible in the RNA data. We also detected some candidate bioindicator foraminiferal species. Based on this proof-of-concept study, we conclude that NGS metabarcoding using foraminifera and other protists has potential to become a new tool for surveying the impact of aquaculture and other industrial activities in the marine environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.