2017
DOI: 10.1093/bioinformatics/btx682
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised multiple kernel learning for heterogeneous data integration

Abstract: Motivation: Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account. Res… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
62
0
4

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 88 publications
(67 citation statements)
references
References 38 publications
1
62
0
4
Order By: Relevance
“…However, the class labels of training data samples may not always be available prior to execute the MKL task in some real-world scenarios, such as clustering and dimension reduction. Unsupervised Multiple Kernel Learning(UMKL) determines a linear combination of multiple basis kernels by learning from unlabeled data samples, and the generated kernel can be used in data mining, such as clustering and classifying, as it is supposed to provide an integrated feature of input datasets [ 31 ]. Thus, to apply multiple kernels to clustering, MKDCI obtain an optimal kernel by the UMKL method.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the class labels of training data samples may not always be available prior to execute the MKL task in some real-world scenarios, such as clustering and dimension reduction. Unsupervised Multiple Kernel Learning(UMKL) determines a linear combination of multiple basis kernels by learning from unlabeled data samples, and the generated kernel can be used in data mining, such as clustering and classifying, as it is supposed to provide an integrated feature of input datasets [ 31 ]. Thus, to apply multiple kernels to clustering, MKDCI obtain an optimal kernel by the UMKL method.…”
Section: Methodsmentioning
confidence: 99%
“…To apply the UMKL method, the input dataset is split into a training set and a test set with the ratio of 70:30 by randomly sampling, i.e., they account for 70% and 30% of entire input dataset respectively. According to each predefined basis kernel, m kernel matrices are computed for the training data samples, the parameters γ 1 and B are estimated by cross-validation on the training data samples, and the above optimization problem can be solved with the algorithm discussed in [ 31 ]. Thus, by training on the unlabeled input dataset with the UMKL method, an optimally combined kernel k (·,·) with the weights of the predefined basis kernels μ t are learned.…”
Section: Methodsmentioning
confidence: 99%
“…In this section, it is also possible to detect and remove potential outlier samples. To allow new users to easily test the functionality of MiBiOmics, we provide two example datasets: the breast TGCA datasets from The Cancer Genome Atlas [8] allows to explore assocations between miRNAs, mRNAs and proteins in different breast cancer subtypes; and a dataset from the Tara Oceans Expeditions [9,10] to explore prokaryotic community compositions across depth and geographic locations.…”
Section: Data Uploadmentioning
confidence: 99%
“…Multiple kernel learning (MKL) algorithms have been proved to be effective tools to solve learning problems such as classification or regression. Jérôme Mariette et al [11] applied MKL on breast cancer heterogeneous data and achieved a good performance through the experiments. Arezou et al [12] proposed an MKL method, which employs the gene expression profiles to predict cancer and achieves a satisfactory predictive performance.…”
Section: Introductionmentioning
confidence: 99%