2021
DOI: 10.17485/ijst/v14i20.312
|View full text |Cite
|
Sign up to set email alerts
|

Identifying Similar Question Pairs Using Machine Learning Techniques

Abstract: Background/Objectives: Every day millions of people visit search engines like Quora, reedit, stack overflow, etc., the demand for new intelligent techniques is growing, to help individuals find better solutions. Methods: In our proposed system, the Quora datasets were filtered using SQLite which takes one-quarter of the time taken to pre-process the same dataset using existing approaches like python functions. We used machine learning techniques namely the Random Forest, Logistic Regression, Linear SVM (Suppor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 13 publications
1
2
0
Order By: Relevance
“…The Gradient Boosting Machine and AdaBoost models achieved the highest train accuracies, outperforming the Decision Tree, Random Forest, and Extra Trees models. These results are in agreement with authors of [17] research, where their goal was to find the best machine learning technique for removing all duplicate questions and increasing user satisfaction. Using a real-time dataset, this work trained and tested four machine learning models to recognize duplicate inquiries.…”
Section: Discussionsupporting
confidence: 91%
See 1 more Smart Citation
“…The Gradient Boosting Machine and AdaBoost models achieved the highest train accuracies, outperforming the Decision Tree, Random Forest, and Extra Trees models. These results are in agreement with authors of [17] research, where their goal was to find the best machine learning technique for removing all duplicate questions and increasing user satisfaction. Using a real-time dataset, this work trained and tested four machine learning models to recognize duplicate inquiries.…”
Section: Discussionsupporting
confidence: 91%
“…Anishaa et al [17] proposed a novel approach by filtration of the Quora datasets using SQLite which takes one-quarter the time it takes to pre-process the same dataset using existing methodologies such as python functions. It concluded that XGBoost outperformed the other machine learning approaches discussed, it has also been discovered that pre-processing with SQLite has improved response time.…”
Section: Chandra and Stefanusmentioning
confidence: 99%
“…After execution, the random, logistic regression, linear SVM, and XGBoost error parameters referred to from the log loss function are found to be, respectively, 0.887, 0.521, 0.654, and 0.357. As a result of the unique pre-processing activities carried out using PL/SQL, which improve response time overall, the result demonstrates that XGBoost is the best model, delivering the greatest accuracy in the shortest period of time [9].…”
Section: Literature Surveymentioning
confidence: 99%