Optimizing graph layout by t-SNE perplexity estimation

Xiao, Chengshan; Hong, Seok‐Hee; Huang, Weidong

doi:10.1007/s41060-022-00348-7

Cited by 8 publications

(2 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multidimensional datasets often become sparse as the number of features increases, which can negatively impact clustering success. To address this, increasing the sample size or using dimensionality reduction algorithms like Principal Component Analysis, Linear Discriminant Analysis, Factor Analysis, t-SNE, and others can help reduce dimensions and improve data density (Gao et al, 2021;Wang et al, 2021;Mair, 2018;Xiao et al, 2023;Groth et al, 2013). Before creating a hypothesis, it is critical to evaluate the impact of each feature on cluster results using methods such as information retrieval.…”

Section: Dataset Preparation Proceduresmentioning

confidence: 99%

See 1 more Smart Citation

Machine Learning for Enhanced Classroom Homogeneity in Primary Education

Bulut,

Dönmez,

İnce

et al. 2024

International Online Journal of Primary Education

View full text Add to dashboard Cite

A homogeneous distribution of students in a class is accepted as a key factor for overall success in primary education. A class of students with similar attributes normally increases academic success. It is also a fact that general academic success might be lower in some classes where students have different intelligence and academic levels. In this study, a class distribution model is proposed by using some data science algorithms over a small number of students’ dataset. With unsupervised and semi-supervised learning methods in machine learning and data mining, a group of students is equally distributed to classes, taking into account some criteria. This model divides a group of students into clusters by the considering students’ different qualitative and quantitative characteristics. A draft study is carried out by predicting the effectiveness and efficiency of the presented approaches. In addition, some process elements such as quantitative and qualitative characteristics of a student, data acquisition style, digitalization of attributes, and creating a future prediction are also included in this study. Satisfactory and promising experimental results are received using a set of algorithms over collected datasets for classroom scenarios. As expected, a clear and concrete evaluation between balanced and unbalanced class distributions cannot be performed since these two scenarios for the class distributions cannot be applicable at the same time.

show abstract

Section: Dataset Preparation Proceduresmentioning

confidence: 99%

“…When the data size is too large, using principal component analysis (PCA) or t-distributed stochastic neighbor embedding (Xiao et al, 2023) can first reduce the size of the data and then apply DBSCAN.…”

Section: Dbscan With Data Reduced In Size With T-snementioning

confidence: 99%