Analyzing the impact of data representations in classification problems using clustering

Farias, Felipe; Ludermir, Teresa Bernarda; Bastos-Filho, Carmelo J. A.; Oliveira, Flavio Rosendo da Silva

doi:10.1109/ijcnn.2019.8851856

Cited by 2 publications

(5 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Now we recall the similarity measure (5) and the similarity matrix (6), which where defined and investigated in [12]. Let μ, λ, ω ∈ R be given numbers (weights).…”

Section: Similarity Matrixmentioning

confidence: 99%

See 1 more Smart Citation

On Two Approaches to Clustering of Cross-Sections of Rotationally Symmetric Objects

Baczyńska

Kaliszewska

Syga

2022

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

We analyze two approaches to clustering 2D shapes representing cross-sections of rotationally symmetrical objects. These approaches are based on two ways of shape representation - contours and silhouettes - and a number of similarity measures which are based on a combination of Procrustes analysis (PA) and Dynamic Time Warping (DTW) as well as on binary matrix analysis. The comparison of efficiency of the proposed approaches is performed on datasets of archaeological ceramic vessels.

show abstract

“…Now we recall the similarity measure (5) and the similarity matrix (6), which where defined and investigated in [12]. Let μ, λ, ω ∈ R be given numbers (weights).…”

Section: Similarity Matrixmentioning

confidence: 99%

“…The problem of suitable data representation has been recognized in the literature, see e.g. [6]. The choice of data representation for the investigated data set has a considerable impact on the possibility to achieve satisfying results.…”

Section: Introduction and Problem Formulationmentioning

confidence: 99%

On Two Approaches to Clustering of Cross-Sections of Rotationally Symmetric Objects

Baczyńska

Kaliszewska

Syga

2022

Lecture Notes in Networks and Systems

View full text Add to dashboard Cite

show abstract

“…We have used clustering techniques to find the essential regions of interest. The clustering process aims to create sub-labels based on the proposal presented in [4]. The primary goal is to enhance the learning process.…”

Section: Our Proposalmentioning

confidence: 99%

“…Given the inputs x i ∈ X and the outputs y i ∈ Y , we use GMMs (Gaussian Mixture Models) clustering algorithm to create sub-labels as shown in [4] using the X as inputs. This process create the clusters regarding each label separately and apply the prediction of each created cluster to all the data in order to calculate the scores discussed in the next paragraph.…”

Section: Our Proposalmentioning

confidence: 99%

“…Generate Sublabels using calinski_harabasz to select the best clusterizer for each label, retaining only the sublabels which the mean score is greater than the original label. the number of clusters in [2,3,4,5] repeating the process for 2 times, totaling 8 GMMs generation. We decided to deploy the GMM technique since it presented the highest Calinski-Harabasz score [2] when compared to Silhouette Score for related.…”

Section: Our Proposalmentioning

confidence: 99%

See 1 more Smart Citation

Clustering for Data-driven Unraveling Artificial Neural Networks

Farias¹,

Ludermir²,

Bastos-Filho³

2020

Anais Do Encontro Nacional De Inteligência Artificial E Computacional (ENIAC 2020)

View full text Add to dashboard Cite

This work presents an investigation on how to define Neural Networks (NN) architectures adopting a data-driven approach using clustering to create sub-labels to facilitate the learning process and to discover the number of neurons needed to compose the layers. We also increase the depth of the model aiming to represent the samples better, the more in-depth it flows into the model. We hypothesize that the clustering process identifies sub-regions in the feature space in which the samples belonging to the same cluster have strong similarities. We used seven benchmark datasets to validate our hypothesis using 10-fold cross validation 3 times. The proposed model increased the performance, while never decreased it, with statistical significance considering the p-value $< 0.05$ in comparison with a Multi-Layer Perceptron with a single hidden layer with approximately the same number of parameters of the architectures found by our approach.

show abstract

Analyzing the impact of data representations in classification problems using clustering

Cited by 2 publications

References 11 publications

On Two Approaches to Clustering of Cross-Sections of Rotationally Symmetric Objects

On Two Approaches to Clustering of Cross-Sections of Rotationally Symmetric Objects

Clustering for Data-driven Unraveling Artificial Neural Networks

Contact Info

Product

Resources

About