2023
DOI: 10.11591/ijeecs.v30.i1.pp246-256
|View full text |Cite
|
Sign up to set email alerts
|

Search and classify topics in a corpus of text using the latent dirichlet allocation model

Abstract: This work aims at discovering topics in a text corpus and classifying the most relevant terms for each of the discovered topics. The process was performed in four steps: first, document extraction and data processing; second, labeling and training of the data; third, labeling of the unseen data; and fourth, evaluation of the model performance. For processing, a total of 10,322 "curriculum" documents related to data science were collected from the web during 2018-2022. The latent dirichlet allocation (LDA) mode… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 24 publications
0
5
0
Order By: Relevance
“…severe side effect), from the patient's perspective on the drug under consideration. For a pair of two linked nodes representing two patients, the thickness of the link between them indicates how comparable their semantic content is, which is determined by adding up all the words (terms) in both reviews that appear to be about the same topic and have non-zero weights for TF-IDF (see [55][58]). Green links are those that provide support (are in favor) and red links are those that provide opposition (are against).…”
Section: Ppis Graph Generationmentioning
confidence: 99%
“…severe side effect), from the patient's perspective on the drug under consideration. For a pair of two linked nodes representing two patients, the thickness of the link between them indicates how comparable their semantic content is, which is determined by adding up all the words (terms) in both reviews that appear to be about the same topic and have non-zero weights for TF-IDF (see [55][58]). Green links are those that provide support (are in favor) and red links are those that provide opposition (are against).…”
Section: Ppis Graph Generationmentioning
confidence: 99%
“…searching and classifying topics in a text corpus [30], improving document classification using domainspecific vocabulary [31], and customer opinion mining using Twitter topic modeling and logistic regression [32]. While applicable to a large corpus of documents, LDA makes some rigid assumptions regarding a corpus, suggesting areas for improvisation.…”
Section: Generate Lda Modelsmentioning
confidence: 99%
“…Finally, the MLP model is characterized as one of the best predictors, this predictor learns a feature from a set of inputs and combines the different features in a set of outputs, the performance rate of this model has been 99%, and it is a result with a high pre-accuracy rate, which allows it to be a reliable option for the prediction of breast cancer. Also, [20], [21] used this model with three clinical factors: age, cancer cell type, and cell surface receptors, obtaining satisfactory results, with a performance rate of 98%. The summary of the analysis of the 6 models used in this work to predict breast cancer is presented in Table V.…”
Section: J Model Training and Testingmentioning
confidence: 99%
“…Using features associated with cancer cell imaging, breast cancer can be predicted using ML models. This field of action is in constant development from two deans to after [19], [20].…”
Section: Introductionmentioning
confidence: 99%