2022
DOI: 10.3390/electronics11142168
|View full text |Cite
|
Sign up to set email alerts
|

Corpus Statistics Empowered Document Classification

Abstract: In natural language processing (NLP), document classification is an important task that relies on the proper thematic representation of the documents. Gaussian mixture-based clustering is widespread for capturing rich thematic semantics but ignores emphasizing potential terms in the corpus. Moreover, the soft clustering approach causes long-tail noise by putting every word into every cluster, which affects the natural thematic representation of documents and their proper classification. It is more challenging … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 61 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?