Nora Hariadi scite author profile

Nora Hariadi

5Publications

23Citation Statements Received

102Citation Statements Given

How they've been cited

How they cite others

102

Affiliations

University of Indonesia

Publications

Order By: Most citations

The performance of BERT as data representation of text clustering

2022

View full text Add to dashboard Cite

Text clustering is the task of grouping a set of texts so that text in the same group will be more similar than those from a different group. The process of grouping text manually requires a significant amount of time and labor. Therefore, automation utilizing machine learning is necessary. One of the most frequently used method to represent textual data is Term Frequency Inverse Document Frequency (TFIDF). However, TFIDF cannot consider the position and context of a word in a sentence. Bidirectional Encoder Representation from Transformers (BERT) model can produce text representation that incorporates the position and context of a word in a sentence. This research analyzed the performance of the BERT model as data representation for text. Moreover, various feature extraction and normalization methods are also applied for the data representation of the BERT model. To examine the performances of BERT, we use four clustering algorithms, i.e., k-means clustering, eigenspace-based fuzzy c-means, deep embedded clustering, and improved deep embedded clustering. Our simulations show that BERT outperforms TFIDF method in 28 out of 36 metrics. Furthermore, different feature extraction and normalization produced varied performances. The usage of these feature extraction and normalization must be altered depending on the text clustering algorithm used.

show abstract

Deep autoencoder-based fuzzy c-means for topic detection

Murfi

Rosaline

Hariadi

2022

Array

View full text Add to dashboard Cite

The Performance of BERT as Data Representation of Text Clustering

Subakti

Murfi

Hariadi

2021

Preprint

View full text Add to dashboard Cite

Text clustering is the task of grouping a set of texts so that text in the same group will be more similar than those from a different group. The process of grouping text manually requires a significant amount of time and labor. Therefore, automation utilizing machine learning is necessary. The standard method used to represent textual data is Term Frequency Inverse Document Frequency (TFIDF). However, TFIDF cannot consider the position and context of a word in a sentence. Bidirectional Encoder Representation from Transformers (BERT) model can produce text representation that incorporates the position and context of a word in a sentence. This research analyzed the performance of the BERT model as data representation for text. Moreover, various feature extraction and normalization methods are also applied for the data representation of the BERT model. To examine the performances of BERT, we use four clustering algorithms, i.e., k-means clustering, eigenspace-based fuzzy c-means, deep embedded clustering, and improved deep embedded clustering. Our simulations show that BERT outperforms the standard TFIDF method in 28 out of 36 metrics. Furthermore, different feature extraction and normalization produced varied performances. The usage of these feature extraction and normalization must be altered depending on the text clustering algorithm used.

show abstract

Rainbow cycles and paths in fan and wheel graph

Fitriani¹,

Sugeng²,

Hariadi³

2018

View full text Add to dashboard Cite

Eigenvalues of antiadjacency matrix of Cayley graph of Z_n

Daniel

Sugeng²,

Hariadi³

2022

IJC

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Nora Hariadi

The performance of BERT as data representation of text clustering

Deep autoencoder-based fuzzy c-means for topic detection

The Performance of BERT as Data Representation of Text Clustering

Rainbow cycles and paths in fan and wheel graph

Eigenvalues of antiadjacency matrix of Cayley graph of Z_n

Contact Info

Product

Resources

About