2011
DOI: 10.7251/jit1101065f
|View full text |Cite
|
Sign up to set email alerts
|

Comparable Evaluation of Contemporary Corpus-Based and Knowledge-Based Semantic Similarity Measures of Short Texts

Abstract: This paper presents methods for measuring the semantic similarity of texts, where we evaluated different approaches based on existing similarity measures. On one side word similarity was calculated by processing large text corpuses and on the other, commonsense knowledgebase was used. Given that a large fraction of the information available today, on the Web and elsewhere, consists of short text snippets (e.g. abstracts of scientifi c documents, image captions or product descriptions), where commonsense knowle… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
12
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(13 citation statements)
references
References 6 publications
1
12
0
Order By: Relevance
“…The Islam and Inkpen method [10] is close to it, although the inclusion of common-word order information into its similarity score actually leads to slightly worse results. Even though the statistical approach proposed by Furlan et al [11] is quite similar to the one in [10], the corpus used for training the COALS algorithm was rather small, which is why their results are, unsurprisingly, worse.…”
Section: Evaluation and Classificationmentioning
confidence: 99%
See 3 more Smart Citations
“…The Islam and Inkpen method [10] is close to it, although the inclusion of common-word order information into its similarity score actually leads to slightly worse results. Even though the statistical approach proposed by Furlan et al [11] is quite similar to the one in [10], the corpus used for training the COALS algorithm was rather small, which is why their results are, unsurprisingly, worse.…”
Section: Evaluation and Classificationmentioning
confidence: 99%
“…The performance of the SyMSS model of Oliva et al [19] increases if weighting according to semantic roles is applied, with verbs carrying the greatest weight, subjects and objects a somewhat smaller one, while adverbial complements and other roles are assigned even lower values. The topological approach of Furlan et al [11] reaches maximal accuracy levels by giving the verbs a weight four times greater than the one used for subjects and objects. Nevertheless, this method performs poorly in comparison to the statistical algorithm presented in the same paper because many subjects and objects consist of proper nouns which cannot be found in the ConceptNet knowledge base, effectively rendering those constituents irrelevant in the calculation of the similarity score.…”
Section: Evaluation and Classificationmentioning
confidence: 99%
See 2 more Smart Citations
“…This method, besides the semantic word similarity measure, incorporates the string similarity measure, so it performs better with typos, evolving hotwords or different forms of infrequent proper nouns. 22…”
Section: Semantic and String Similarity Incorporationmentioning
confidence: 99%