Sawsan Kanj scite author profile

Sawsan Kanj

2Publications

31Citation Statements Received

83Citation Statements Given

How they've been cited

How they cite others

Affiliations

Direction de la Recherche Technologique, University of Technology of Compiègne, French National Centre for Scientific Research

Publications

Order By: Most citations

Editing training data for multi-label classification with the k-nearest neighbor rule

Kanj

Abdallah

Denœux

et al. 2015

Pattern Anal Applic

View full text Add to dashboard Cite

International audienceMulti-label classification allows instances to belong to several classes at once. It has received significant attention in machine learning and has found many real world applications in recent years, such as text categorization, automatic video annotation and functional genomics, resulting in the development of many multi-label classification methods. Based on labelled examples in the training dataset, a multi-labelled method extracts inherent information in order to output a function that predicts the labels of unlabelled data. Due to several problems, like errors in the input vectors or in their labels, this information may be wrong and might lead the multi-label algorithm to fail. In this paper, we propose a simple algorithm for overcoming these problems by editing the existing training dataset, and adapting this edited set with different multi-label classification methods. Evaluation on benchmark datasets demonstrates the usefulness and effectiveness of our approach

show abstract

A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes

et al. 2016

View full text Add to dashboard Cite

BackgroundMetagenomics holds great promises for deepening our knowledge of key bacterial driven processes, but metagenome assembly remains problematic, typically resulting in representation biases and discarding significant amounts of non-redundant sequence information. In order to alleviate constraints assembly can impose on downstream analyses, and/or to increase the fraction of raw reads assembled via targeted assemblies relying on pre-assembly binning steps, we developed a set of binning modules and evaluated their combination in a new “assembly-free” binning protocol.ResultsWe describe a scalable multi-tiered binning algorithm that combines frequency and compositional features to cluster unassembled reads, and demonstrate i) significant runtime performance gains of the developed modules against state of the art software, obtained through parallelization and the efficient use of large lock-free concurrent hash maps, ii) its relevance for clustering unassembled reads from high complexity (e.g., harboring 700 distinct genomes) samples, iii) its relevance to experimental setups involving multiple samples, through a use case consisting in the “de novo” identification of sequences from a target genome (e.g., a pathogenic strain) segregating at low levels in a cohort of 50 complex microbiomes (harboring 100 distinct genomes each), in the background of closely related strains and the absence of reference genomes, iv) its ability to correctly identify clusters of sequences from the E. coli O104:H4 genome as the most strongly correlated to the infection status in 53 microbiomes sampled from the 2011 STEC outbreak in Germany, and to accurately cluster contigs of this pathogenic strain from a cross-assembly of these 53 microbiomes.ConclusionsWe present a set of sequence clustering (“binning”) modules and their application to biomarker (e.g., genomes of pathogenic organisms) discovery from large synthetic and real metagenomics datasets. Initially designed for the “assembly-free” analysis of individual metagenomic samples, we demonstrate their extension to setups involving multiple samples via the usage of the “alignment-free” d2S statistic to relate clusters across samples, and illustrate how the clustering modules can otherwise be leveraged for de novo “pre-assembly” tasks by segregating sequences into biologically meaningful partitions.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1186-3) contains supplementary material, which is available to authorized users.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sawsan Kanj

Editing training data for multi-label classification with the k-nearest neighbor rule

A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes

Contact Info

Product

Resources

About