2018
DOI: 10.7287/peerj.preprints.26555v1
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Duplicate Question Detection in Stack Overflow: A Reproducibility Study

Abstract: Stack Overflow has become a fundamental element of developer toolset. Such influence increase has been accompanied by an effort from Stack Overflow community to keep the quality of its content. One of the problems which jeopardizes that quality is the continuous growth of duplicated questions. To solve this problem, prior works focused on automatically detecting duplicated questions. Two important solutions are DupPredictor and Dupe. Despite reporting significant results, both works do not provide their implem… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2019
2019
2019
2019

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 20 publications
(30 reference statements)
0
1
0
Order By: Relevance
“…k and b are two parameters where k controls non-linear term frequency normalization (saturation), and b controls to what degree document length normalizes term frequency values. CROKAGE uses the same values as used by previous works in Software Engineering [13,14] with best performance. idf (q i ) is the inverse document frequency of keyword q i and computed as follows:…”
Section: Searching For Relevant Answersmentioning
confidence: 99%
“…k and b are two parameters where k controls non-linear term frequency normalization (saturation), and b controls to what degree document length normalizes term frequency values. CROKAGE uses the same values as used by previous works in Software Engineering [13,14] with best performance. idf (q i ) is the inverse document frequency of keyword q i and computed as follows:…”
Section: Searching For Relevant Answersmentioning
confidence: 99%