2015
DOI: 10.1007/s10664-015-9402-8
|View full text |Cite
|
Sign up to set email alerts
|

A survey on the use of topic models when mining software repositories

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
115
0
1

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 163 publications
(117 citation statements)
references
References 219 publications
1
115
0
1
Order By: Relevance
“…Using topic models, topics are extracted from documents and are used to represent the corpora. A topic is a collection of terms that co-occur frequently in the documents of the corpus, so the documents can be clustered by topics and the entire corpus can be indexed and organized in terms of this discovered semantic structure [7], [13]. Latent Dirichlet Allocation (LDA) is a popular probabilistic topic model [21].…”
Section: B Topic Modelingmentioning
confidence: 99%
See 2 more Smart Citations
“…Using topic models, topics are extracted from documents and are used to represent the corpora. A topic is a collection of terms that co-occur frequently in the documents of the corpus, so the documents can be clustered by topics and the entire corpus can be indexed and organized in terms of this discovered semantic structure [7], [13]. Latent Dirichlet Allocation (LDA) is a popular probabilistic topic model [21].…”
Section: B Topic Modelingmentioning
confidence: 99%
“…Before topic modeling, several preprocess steps are generally taken to reduce noise and improve the modeling results [13]. Compared to natural language text, Hindle et al [22] reported that text extracted from source code is much more repetitive and predictable.…”
Section: Preprocessing Proceduresmentioning
confidence: 99%
See 1 more Smart Citation
“…It also has a strong tendency for overfitting, and of even greater consequence, the model is unable to generalize topic mixtures onto previously unseen documents (not part of the training data) [2], [5]. Through correcting these problems with a truly generative model, LDA has seen a surge in popularity and has acted like a springboard for numerous other advancements in IR.…”
Section: A Topic Models In Briefmentioning
confidence: 99%
“…Often, the models are treated as "black box" approaches without regard for the underlying assumptions they are based on. Parameter tuning can prove difficult without a full understanding of the specific technique to be employed [2]. Additionally, the emerging topics are by no means guaranteed to be sensible to a human reader -motivating the use of human knowledge and user interaction as an additional step toward more coherent and sensible results [3].…”
Section: Introductionmentioning
confidence: 99%