2021
DOI: 10.3390/sym13050837
|View full text |Cite
|
Sign up to set email alerts
|

A New Sentence-Based Interpretative Topic Modeling and Automatic Topic Labeling

Abstract: This article presents a new conceptual approach for the interpretative topic modeling problem. It uses sentences as basic units of analysis, instead of words or n-grams, which are commonly used in the standard approaches.The proposed approach’s specifics are using sentence probability evaluations within the text corpus and clustering of sentence embeddings. The topic model estimates discrete distributions of sentence occurrences within topics and discrete distributions of topic occurrence within the text. Our … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 24 publications
0
3
0
Order By: Relevance
“…Each sentence thus has a single topical reference. In order for this to work, it is assumed that sentences with similar themes will also have similar embeddings [30], [31].…”
Section: A Listener and Familiarizationmentioning
confidence: 99%
“…Each sentence thus has a single topical reference. In order for this to work, it is assumed that sentences with similar themes will also have similar embeddings [30], [31].…”
Section: A Listener and Familiarizationmentioning
confidence: 99%
“…Bi-directional Encoder Representation of Transformers (BERT) has been widely utilised as a base framework for various applications. Authors in [44] proposed a BERT-based TM strategy, where they implemented k-means on the sentence BERT embedding matrix to generate topic modeling. The centroid of each cluster has been considered the topic label.…”
Section: Neural Network-based Approachesmentioning
confidence: 99%
“…Multiple strategies have been used to extract the closest possible term, "candidate." Authors in [44] created clusters and assumed the centroid of clusters as closest to all the entities in the cluster, so they considered the centroid of the proposed cluster as the candidate. The authors in [26] used an information extraction method based on BM25 and TF-IDF [27] to extract the candidate terms.…”
Section: ) Candidate Termsmentioning
confidence: 99%
“…Our approach uses sentence-level topic descriptors rather than key words, and we apply a recent sentence encoder that supports multiple languages (Yang et al, 2020). Kozbagarov et al (2021) present another approach to generating interpretable topics by combining sentence embeddings with a topic modeling technique, though they use EM (expectationmaximization) instead of LDA and use averaged BERT word embeddings (Devlin et al, 2019) instead of a pretrained sentence encoder. Like us, they cluster the resulting sentence embeddings and estimate the probability of sentence occurrence within texts, assuming sentences within each cluster as identical.…”
Section: Related Workmentioning
confidence: 99%