Proceedings of the 20th International Conference on Computational Linguistics - COLING '04 2004
DOI: 10.3115/1220355.1220406
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised construction of large paraphrase corpora

Abstract: We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy that pairs initial (presumably summary) sentences from different news stories in the same cluster. We evaluate both datasets using a word alignment algorithm and a metric borrowed from machine translation. Results … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
393
0
9

Year Published

2005
2005
2018
2018

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 538 publications
(404 citation statements)
references
References 10 publications
2
393
0
9
Order By: Relevance
“…For the purpose of evaluation, we used Microsoft Shared Paraphrase Corpus (MSPC, [1]). It consists of 5081 pairs of sentences graded with a binary 0 for semantically non-similar and binary 1 for semantically similar.…”
Section: Cbss Evaluationmentioning
confidence: 99%
See 1 more Smart Citation
“…For the purpose of evaluation, we used Microsoft Shared Paraphrase Corpus (MSPC, [1]). It consists of 5081 pairs of sentences graded with a binary 0 for semantically non-similar and binary 1 for semantically similar.…”
Section: Cbss Evaluationmentioning
confidence: 99%
“…We tried to improve this method by adding a measure of a weight to each word, so the words with a bigger weight would factor more in the overall evaluation of sentence similarity. Next, we imposed a modifi cation of a text similarity scoring function, defi ned in [8], where the similarity between the input text segments 1 T and 2 T is determined by using the following scoring function:…”
Section: Kbss Implementationmentioning
confidence: 99%
“…Since our solution to IER establishes a relationship between entity definitions and the input text, the tasks of paraphrase recognition (Barzilay and Elhadad, 2003) (Dolan et al, 2004) and textual entailment recognition are related to our solution. However, these tasks are fundamentally different in two aspects: 1) Both paraphrase recognition and textual entailment recogni-229 tion are defined at the sentence level, whereas text phrases considered for IER can exist as a sentence fragment or span across multiple sentences, and 2) The objective of IER is to find whether a given text phrase has a mention of an entity-as opposed to determining whether two sentences are similar or entail one another.…”
Section: Related Workmentioning
confidence: 99%
“…Paraphrasing: Most previous studies in paraphrasing have focused exclusively on text, and the primary goal has been learning semantic equivalence of phrases that would be true out of context (e.g., (Barzilay and McKeown, 2001;Pang et al, 2003;Dolan et al, 2004;Ganitkevitch et al, 2013)), rather than targeting situated or pragmatic equivalence given a context. Emerging efforts began exploring paraphrases that are situated in video content (Chen and Dolan, 2011), news events (Zhang and Weld, 2013), and knowledge base (Berant and Liang, 2014).…”
Section: Related Workmentioning
confidence: 99%