Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients

Smolander, Johannes; Stupnikov, Alexey; Glazko, Galina V.; Dehmer, Matthias; Emmert‐Streib, Frank

doi:10.1186/s12885-019-6338-1

Cited by 18 publications

(17 citation statements)

References 87 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recent results for deep learning networks and support vector machines demonstrate that feature selection can negatively affect the prediction performance for high-dimensional genomic data ( Smolander et al, 2019 ). However, whether these results translate to data in other domains remains to be seen.…”

Section: Robustness Issues Of ML and Ai Modelsmentioning

confidence: 99%

Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing

Tripathi

Muhr

Manuel

et al. 2021

Front. Artif. Intell.

Self Cite

View full text Add to dashboard Cite

The Cross-Industry Standard Process for Data Mining (CRISP-DM) is a widely accepted framework in production and manufacturing. This data-driven knowledge discovery framework provides an orderly partition of the often complex data mining processes to ensure a practical implementation of data analytics and machine learning models. However, the practical application of robust industry-specific data-driven knowledge discovery models faces multiple data- and model development-related issues. These issues need to be carefully addressed by allowing a flexible, customized and industry-specific knowledge discovery framework. For this reason, extensions of CRISP-DM are needed. In this paper, we provide a detailed review of CRISP-DM and summarize extensions of this model into a novel framework we call Generalized Cross-Industry Standard Process for Data Science (GCRISP-DS). This framework is designed to allow dynamic interactions between different phases to adequately address data- and model-related issues for achieving robustness. Furthermore, it emphasizes also the need for a detailed business understanding and the interdependencies with the developed models and data quality for fulfilling higher business objectives. Overall, such a customizable GCRISP-DS framework provides an enhancement for model improvements and reusability by minimizing robustness-issues.

show abstract

Section: Robustness Issues Of ML and Ai Modelsmentioning

confidence: 99%

Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing

Tripathi

Muhr

Manuel

et al. 2021

Front. Artif. Intell.

Self Cite

View full text Add to dashboard Cite

show abstract

“…For instance, a deep learning method set the record for the classification of handwritten digits of the MNIST data set with an error rate of 0.21% (Wan et al, 2013 ). Further application areas include image recognition (Krizhevsky et al, 2012a ; LeCun et al, 2015 ), speech recognition (Graves et al, 2013 ), natural language understanding (Sarikaya et al, 2014 ), acoustic modeling (Mohamed et al, 2011 ) and computational biology (Leung et al, 2014 ; Alipanahi et al, 2015 ; Zhang S. et al, 2015 ; Smolander et al, 2019a , b ).…”

Section: Introductionmentioning

confidence: 99%

An Introductory Review of Deep Learning for Prediction Models With Big Data

Emmert‐Streib

Yang

Han

et al. 2020

Front. Artif. Intell.

Self Cite

433

210

View full text Add to dashboard Cite

Deep learning models stand for a new learning paradigm in artificial intelligence (AI) and machine learning. Recent breakthrough results in image analysis and speech recognition have generated a massive interest in this field because also applications in many other domains providing big data seem possible. On a downside, the mathematical and computational methodology underlying deep learning models is very challenging, especially for interdisciplinary scientists. For this reason, we present in this paper an introductory review of deep learning approaches including Deep Feedforward Neural Networks (D-FFNN), Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), Autoencoders (AEs), and Long Short-Term Memory (LSTM) networks. These models form the major core architectures of deep learning models currently used and should belong in any data scientist's toolbox. Importantly, those core architectural building blocks can be composed flexibly-in an almost Lego-like manner-to build new application-specific network architectures. Hence, a basic understanding of these network architectures is important to be prepared for future developments in AI.

show abstract

“…Since LUAD is the most frequent lung cancer type, many works have been published for LUAD and control classification. Smolander et al presented a deep learning model using gene expression from coding RNA, and non-coding RNA [ 25 ]. They obtained a classification accuracy of 95.97% using coding RNA.…”

Section: Related Workmentioning

confidence: 99%

Non-small-cell lung cancer classification via RNA-Seq and histology imaging probability fusion

et al. 2021

View full text Add to dashboard Cite

Background Adenocarcinoma and squamous cell carcinoma are the two most prevalent lung cancer types, and their distinction requires different screenings, such as the visual inspection of histology slides by an expert pathologist, the analysis of gene expression or computer tomography scans, among others. In recent years, there has been an increasing gathering of biological data for decision support systems in the diagnosis (e.g. histology imaging, next-generation sequencing technologies data, clinical information, etc.). Using all these sources to design integrative classification approaches may improve the final diagnosis of a patient, in the same way that doctors can use multiple types of screenings to reach a final decision on the diagnosis. In this work, we present a late fusion classification model using histology and RNA-Seq data for adenocarcinoma, squamous-cell carcinoma and healthy lung tissue. Results The classification model improves results over using each source of information separately, being able to reduce the diagnosis error rate up to a 64% over the isolate histology classifier and a 24% over the isolate gene expression classifier, reaching a mean F1-Score of 95.19% and a mean AUC of 0.991. Conclusions These findings suggest that a classification model using a late fusion methodology can considerably help clinicians in the diagnosis between the aforementioned lung cancer cancer subtypes over using each source of information separately. This approach can also be applied to any cancer type or disease with heterogeneous sources of information.

show abstract

Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients

Cited by 18 publications

References 87 publications

Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing

Ensuring the Robustness and Reliability of Data-Driven Knowledge Discovery Models in Production and Manufacturing

An Introductory Review of Deep Learning for Prediction Models With Big Data

Non-small-cell lung cancer classification via RNA-Seq and histology imaging probability fusion

Contact Info

Product

Resources

About