2010 Asia Pacific Software Engineering Conference 2010
DOI: 10.1109/apsec.2010.49
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Duplicate Bug Report Using Character N-Gram-Based Features

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
56
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
6
1
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 115 publications
(56 citation statements)
references
References 8 publications
0
56
0
Order By: Relevance
“…Their experimental results showed that about two-thirds of the duplicates could be found using natural language processing (NLP) techniques. Sureka and Jalote also proposed a method that used a character N-gram-based model for duplicate bug report identification [15]. This approach differed from word-based duplicate bug report identification methods because they investigated the usefulness of lowlevel features based on characters, which have many advantages such as natural language independence and robustness against noisy data.…”
Section: Duplicate Detection and Classification Of Bug Reportsmentioning
confidence: 99%
“…Their experimental results showed that about two-thirds of the duplicates could be found using natural language processing (NLP) techniques. Sureka and Jalote also proposed a method that used a character N-gram-based model for duplicate bug report identification [15]. This approach differed from word-based duplicate bug report identification methods because they investigated the usefulness of lowlevel features based on characters, which have many advantages such as natural language independence and robustness against noisy data.…”
Section: Duplicate Detection and Classification Of Bug Reportsmentioning
confidence: 99%
“…This comment is often marked by developer. According to previous studies, although centroid-based approaches bring many advantages [6], however, they also face much serious with the problem of inductive bias or model misfit [10,13]. Centroid-based approaches are more susceptible to model misfit because of its assumption that a document should be assigned to a particular class when the similarity of this document and the class is the largest [12].…”
Section: Duplication Detection Designmentioning
confidence: 99%
“…Moreover, the SVM model also need to retrain when a new bug report comes, this can cause a great cost in the detection process. Also in the same year 2010, feature extraction method based n-gram of Ashish Sureka and Pankaj Jalote [10] were proposed and have improved the performance of duplicate detection on bug reports. The method based observation of bug report characteristics which contain many code compound words.…”
mentioning
confidence: 99%
“…Bug report deduplication is the querying of similar bug reports in order to cluster and group bug reports that report the same issue. Common tools in bug report deduplication are NLP Runeson et al (2007), machine-learning Bettenburg et al (2008); Sun et al (2010); ; Lazar et al (2014), information retrieval Sun et al (2011);Sureka and Jalote (2010), topic analysis Alipour (2013); ; Klein et al (2014). Zhang et al Zhang et al (2015) have applied typical bug-deduplication technology to StackOverflow duplicate question detection.…”
Section: Bug Report Deduplicationmentioning
confidence: 99%