2020
DOI: 10.1016/j.ipm.2020.102204
|View full text |Cite
|
Sign up to set email alerts
|

A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts

Abstract: Paraphrase detection is an important task in text analytics with numerous applications such as plagiarism detection, duplicate question identification, and enhanced customer support helpdesks. Deep models have been proposed for representing and classifying paraphrases. These models, however, require large quantities of human-labeled data, which is expensive to obtain. In this work, we present a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts. Our data augm… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0
4

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 35 publications
(17 citation statements)
references
References 31 publications
0
13
0
4
Order By: Relevance
“…It has applications in different domains such as graphs [36], [37], nodes in graphs [38], [39], and electricity consumption [33], [40]. This vector-based representation also achieves significant success in sequence analysis, such as texts [41]- [43], electroencephalography and electromyography sequences [44], [45], networks [46], and biological sequences [32], [47]. However, most of the existing sequence classification methods require the input sequences to be aligned.…”
Section: Literature Reviewmentioning
confidence: 99%
“…It has applications in different domains such as graphs [36], [37], nodes in graphs [38], [39], and electricity consumption [33], [40]. This vector-based representation also achieves significant success in sequence analysis, such as texts [41]- [43], electroencephalography and electromyography sequences [44], [45], networks [46], and biological sequences [32], [47]. However, most of the existing sequence classification methods require the input sequences to be aligned.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Since the dimensionality of data are another problem while dealing with larger sized sequences, using approximate methods to compute the similarity between two sequences is a popular approach [21,27,28]. The fixed-length numerical embedding methods have been successfully used in literature for other applications such as predicting missing values in graphs [29], text analytics [30][31][32], biology [21,27,33], graph analytics [34,35], classification of electroencephalography and electromyography sequences [36,37], detecting security attacks in networks [38], and electricity consumption in smart grids [39]. The conditional dependencies between variables is also important to study so that their importance can be analyzed in detail [40].…”
Section: Literature Reviewmentioning
confidence: 99%
“…It has applications in different domains such as graphs [19,20], nodes in graphs [8,18], and electricity consumption [5,6]. This vector-based representation also achieve significant success in sequence analysis, such as texts [38][39][40], electroencephalography and electromyography sequences [12,42], Networks [4], and biological sequences [10]. However, most of the existing sequence classification methods require the input sequences to be aligned.…”
Section: Related Workmentioning
confidence: 99%