MLSeq: Machine learning interface for RNA-sequencing data

Göksülük, Dinçer; Zararsız, Gökmen; Korkmaz, Selçuk; Eldem, Vahap; Ozcetin, Erdener; Öztürk, Ahmet; Karaağaoğlu, Ahmet Ergun

doi:10.1016/j.cmpb.2019.04.007

Cited by 44 publications

(41 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The 36 samples were then divided into "training data set" and "testing data set". The size of both data sets was calculated using an option implemented in the MLseq [88]. Twenty-five samples were defined as testing data set and were used by ML to learn and build algorithms from existing data sets, whereas the remaining 11 were defined as testing data set.…”

Section: Methodsmentioning

confidence: 99%

Applications and Trends of Machine Learning in Genomics and Phenomics for Next-Generation Breeding

Esposito¹,

Carputo

Cardi³

et al. 2019

Plants

View full text Add to dashboard Cite

Crops are the major source of food supply and raw materials for the processing industry. A balance between crop production and food consumption is continually threatened by plant diseases and adverse environmental conditions. This leads to serious losses every year and results in food shortages, particularly in developing countries. Presently, cutting-edge technologies for genome sequencing and phenotyping of crops combined with progress in computational sciences are leading a revolution in plant breeding, boosting the identification of the genetic basis of traits at a precision never reached before. In this frame, machine learning (ML) plays a pivotal role in data-mining and analysis, providing relevant information for decision-making towards achieving breeding targets. To this end, we summarize the recent progress in next-generation sequencing and the role of phenotyping technologies in genomics-assisted breeding toward the exploitation of the natural variation and the identification of target genes. We also explore the application of ML in managing big data and predictive models, reporting a case study using microRNAs (miRNAs) to identify genes related to stress conditions.

show abstract

Section: Methodsmentioning

confidence: 99%

Applications and Trends of Machine Learning in Genomics and Phenomics for Next-Generation Breeding

Esposito¹,

Carputo

Cardi³

et al. 2019

Plants

View full text Add to dashboard Cite

show abstract

“…There are certain number of classifiers proposed especially for RNA-Seq data in the literature [6]. The most recent one is qtQDA classifier proposed by Koçhan et al [1].…”

Section: Methodsmentioning

confidence: 99%

A new local covariance matrix estimation for the classification of gene expression profiles in RNA-Seq data

Koçhan

Tütüncü

Giner

2019

Preprint

View full text Add to dashboard Cite

Background and Objective Recent developments in the next-generation sequencing (NGS) based on RNA-sequencing (RNA-Seq) allow researchers to measure the expression levels of thousands of genes for multiple samples simultaneously. In order to analyze these kind of data sets, many classification models have been proposed in the literature. Most of the existing classifiers assume that genes are independent; however, this is not a realistic approach for real RNA-Seq classification problems. For this reason, some other classification methods, which incorporates the dependence structure between genes into a model, are proposed. qtQDA proposed by Koçhan et al. [1] is one of those classifiers, which estimates covariance matrix by Maximum Likelihood Estimator. Methods In this study, we use a another approach based on local dependence function to estimate the covariance matrix to be used in the qtQDA classification model. We investigate the impact of different covariance estimates on RNA-Seq data classification. Results The performances of qtQDA classifier based on two different covariance matrix estimates are compared over two real RNA-Seq data sets, in terms of classification error rates. The results show that using local dependence function approach yields a better estimate of covariance matrix and increases the performance of qtQDA classifier. Conclusion Incorporating the true/accurate covariance matrix into the classification model is an important and crucial step particularly for cancer prediction. The local covariance matrix estimate allows researchers to classify cancer patients based on gene expression profiles more accurately. R code for local dependence function is available at https://github.com/Necla/LocalDependence.

show abstract

“…To test these algorithms, we used MLSeq (Machine learning interface for RNAsequencing data) which is an R package including more than 80 machine learning algorithms and a pipeline to classify RNA-seq data including normalization, filtering and transformation steps [18].…”

Section: Classification and Clustering Algorithms Of Machine Learningmentioning

confidence: 99%

Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq

Jamail¹,

Moussa²

2021

Applications of Pattern Recognition

View full text Add to dashboard Cite

Latest developments in high-throughput cDNA sequencing (RNA-seq) have revolutionized gene expression profiling. This analysis aims to compare the expression levels of multiple genes between two or more samples, under specific circumstances or in a specific cell to give a global picture of cellular function. Thanks to these advances, gene expression data are being generated in large throughput. One of the primary data analysis tasks for gene expression studies involves data-mining techniques such as clustering and classification. Clustering, which is an unsupervised learning technique, has been widely used as a computational tool to facilitate our understanding of gene functions and regulations involved in a biological process. Cluster analysis aims to group the large number of genes present in a sample of gene expression profile data, such that similar or related genes are in same clusters, and different or unrelated genes are in distinct ones. Classification on the other hand can be used for grouping samples based on their expression profile. There are many clustering and classification algorithms that can be applied in gene expression experiments, the most widely used are hierarchical clustering, k-means clustering and model-based clustering that depend on a model to sort out the number of clusters. Depending on the data structure, a fitting clustering method must be used. In this chapter, we present a state of art of clustering algorithms and statistical approaches for grouping similar gene expression profiles that can be applied to RNA-seq data analysis and software tools dedicated to these methods. In addition, we discuss challenges in cluster analysis, and compare the performance of height commonly used clustering methods on four different public datasets from recount2.

show abstract

MLSeq: Machine learning interface for RNA-sequencing data

Cited by 44 publications

References 34 publications

Applications and Trends of Machine Learning in Genomics and Phenomics for Next-Generation Breeding

Applications and Trends of Machine Learning in Genomics and Phenomics for Next-Generation Breeding

A new local covariance matrix estimation for the classification of gene expression profiles in RNA-Seq data

Current State-of-the-Art of Clustering Methods for Gene Expression Data with RNA-Seq

Contact Info

Product

Resources

About