2019
DOI: 10.1038/s41598-019-39459-w
|View full text |Cite
|
Sign up to set email alerts
|

Tight clustering for large datasets with an application to gene expression data

Abstract: This article proposes a practical and scalable version of the tight clustering algorithm. The tight clustering algorithm provides tight and stable relevant clusters as output while leaving a set of points as noise or scattered points, that would not go into any cluster. However, the computational limitation to achieve this precise target of tight clusters prohibits it from being used for large microarray gene expression data or any other large data set, which are common nowadays. We propose a pragmatic and sca… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
9
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(9 citation statements)
references
References 31 publications
0
9
0
Order By: Relevance
“…A review of different techniques designed for analysis of microarray data was presented in [18]. Tight clustering algorithm was employed in [19] to minimize time complexity of large microarray gene expression data. An evolutionary uncertain data-clustering algorithm was designed in [20] to determine the similarities among sets of gene expression clusters.…”
Section: Literature Surveymentioning
confidence: 99%
“…A review of different techniques designed for analysis of microarray data was presented in [18]. Tight clustering algorithm was employed in [19] to minimize time complexity of large microarray gene expression data. An evolutionary uncertain data-clustering algorithm was designed in [20] to determine the similarities among sets of gene expression clusters.…”
Section: Literature Surveymentioning
confidence: 99%
“…The Python3 geospatial data abstraction library may be used for converting satellite image *.tiff files to *.csv files, and R (computer language) is used for unsupervised learning algorithms including those used by AI currently. Increasingly the R libraries can optimize numbers of clusters for pairwise plotting of feature bands and cluster validation (Karmakar et al., 2019; R Core Team, 2022). However, where vast distances need classification for soil physical and chemical properties via index correction or vector algorithms, these big‐data clusters create significant processing bottlenecks (compared with genetic data clustering, which also use this type of approach).…”
Section: Introductionmentioning
confidence: 99%
“…Alternative partial classification rules include the tight clustering algorithm introduced in Tseng and Wong (2005) and extended to deal with large datasets in Karmakar, Das, Bhattacharya, Sarkar, and Mukhopadhyay (2019). In the latter work, when applying the extended tight clustering algorithm to a dataset consisting of more than 50,000 gene expression probes for individuals suffering from psoriasis, more than 30,000 probes were not classified in any of the six clusters that were identified.…”
Section: Introductionmentioning
confidence: 99%