A Partitional Approach for Genomic-Data Clustering Combined with K-Means Algorithm

Kenidra, Billel; Bahaj, Mohamed; Beghriche, Abdesselem; Benmounah, Zakaria

doi:10.1109/cse-euc-dcabes.2016.170

Cited by 4 publications

(3 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Classification or prediction based on data from a single high throughput source may require machine learning techniques since the number of genes or metabolites will inevitably be larger than the number of samples. Both supervised and unsupervised machine learning methods have been successfully utilized for classification [2–4], regression [5, 6], and identification of latent batch effects [7, 8]. In this paper, we focus on supervised classification of dichotomized survival outcome for various cancer types, specifically discussing support vector machines and multiple kernel learning.…”

Section: Introductionmentioning

confidence: 99%

Multiple-kernel learning for genomic data mining and prediction

Wilson

et al. 2019

BMC Bioinformatics

View full text Add to dashboard Cite

Background Advances in medical technology have allowed for customized prognosis, diagnosis, and treatment regimens that utilize multiple heterogeneous data sources. Multiple kernel learning (MKL) is well suited for the integration of multiple high throughput data sources. MKL remains to be under-utilized by genomic researchers partly due to the lack of unified guidelines for its use, and benchmark genomic datasets. Results We provide three implementations of MKL in R. These methods are applied to simulated data to illustrate that MKL can select appropriate models. We also apply MKL to combine clinical information with miRNA gene expression data of ovarian cancer study into a single analysis. Lastly, we show that MKL can identify gene sets that are known to play a role in the prognostic prediction of 15 cancer types using gene expression data from The Cancer Genome Atlas, as well as, identify new gene sets for the future research. Conclusion Multiple kernel learning coupled with modern optimization techniques provides a promising learning tool for building predictive models based on multi-source genomic data. MKL also provides an automated scheme for kernel prioritization and parameter tuning. The methods used in the paper are implemented as an R package called RMKL package, which is freely available for download through CRAN at https://CRAN.R-project.org/package=RMKL . Electronic supplementary material The online version of this article (10.1186/s12859-019-2992-1) contains supplementary material, which is available to authorized users.

show abstract

Section: Introductionmentioning

confidence: 99%

Multiple-kernel learning for genomic data mining and prediction

Wilson

et al. 2019

BMC Bioinformatics

View full text Add to dashboard Cite

show abstract

“…The high intra-cluster similarity should be based on the derived measurement from the data while the low inter-cluster similarity should maintain that elements in the different clusters should have maximum distance. These are intended to achieve beneficial knowledge from the data [8] for decision making and strategizing. Among different types of clustering, the most conventional distinction is whether the set of clusters is hierarchical or partitional [9] where hierarchical is a set of nested clusters while partitional clustering divides the set of data objects into non-overlapping clusters such that each object is in exactly a single cluster [10].…”

Section: Introductionmentioning

confidence: 99%

Modified Graph-theoretic Clustering Algorithm for Mining International Linkages of Philippine Higher Education Institutions

Lingaya¹,

Gerardo²,

Medina³

2019

IJACSA

View full text Add to dashboard Cite

Graph-theoretic clustering either uses limited neighborhood or construction of a minimum spanning tree to aid the clustering process. The latter is challenged by the need to identify and consequently eliminate inconsistent edges to achieve final clusters, detect outliers and partition substantially. This work focused on mining the data of the International Linkages of Philippine Higher Education Institutions by employing a modified graph-theoretic clustering algorithm with which the Prim's Minimum Spanning Tree algorithm was used to construct a minimum spanning tree for the internationalization dataset infusing the properties of a small world network. Such properties are invoked by the computation of local clustering coefficient for the data elements in the limited neighborhood of data points established using the von Neumann Neighborhood. The overall result of the cluster validation using the Silhouette Index with a score of .69 indicates that there is an acceptable structure found in the clustering resulthence, a potential of the modified MSTbased clustering algorithm. The Silhouette per cluster with .75 being the least score means that each cluster derived for r=5 by the von Neumann Neighborhood has a strong clustering structure.

show abstract

“…In data mining, clustering is one important technique [1] to reduce the data by means of categorizing or grouping similar data items together in order to achieve valuable information [2]. The principle is simply to achieve high intra-cluster similarity based on a measure derived from the data itself, and low inter-cluster similarity where elements in separate clusters are maximally apart from each other.…”

Section: Introductionmentioning

confidence: 99%

Small-World-Like Structured MST-Based Clustering Algorithm

Lingaya¹,

Gerardo²,

Medina³

2019

IJMLC

View full text Add to dashboard Cite

Graph-theoretic clustering is one method of clustering where dataset is represented with a connected undirected graph having the distance between these points as the weights of the links between them. One approach is the construction of the Minimum Spanning Tree of said graph where the connected subgraphs formed after the removal of an inconsistent edge are the clusters. However, such methods suffer with drawbacks including partitioning without sufficient evidence and robustness to outliers. Hence, this work aims to modify the Prim's MST-based clustering algorithm to produce a spanning tree of the dataset infusing the small-world network thereby invoking its properties (i.e. small mean shortest path length and high clustering coefficient) which manifest inherent or natural clustering characteristics.

show abstract

A Partitional Approach for Genomic-Data Clustering Combined with K-Means Algorithm

Cited by 4 publications

References 8 publications

Multiple-kernel learning for genomic data mining and prediction

Multiple-kernel learning for genomic data mining and prediction

Modified Graph-theoretic Clustering Algorithm for Mining International Linkages of Philippine Higher Education Institutions

Small-World-Like Structured MST-Based Clustering Algorithm

Contact Info

Product

Resources

About