2015
DOI: 10.1007/s10579-015-9319-2
|View full text |Cite
|
Sign up to set email alerts
|

Robust semantic text similarity using LSA, machine learning, and linguistic resources

Abstract: Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrap… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
23
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
4
2
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 38 publications
(23 citation statements)
references
References 33 publications
0
23
0
Order By: Relevance
“…This in turn was followed by a classification or regression to get the similarity score (Agirrea et al, 2015). Most of the distributional semantic representation based on dimensionality reduction algorithms (Han et al, 2013;Kashyap et al, 2015) and word embedding models were based on deep learning (Kenter and de Rijke, 2015;Wu et al, 2014;Socher et al, 2011).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This in turn was followed by a classification or regression to get the similarity score (Agirrea et al, 2015). Most of the distributional semantic representation based on dimensionality reduction algorithms (Han et al, 2013;Kashyap et al, 2015) and word embedding models were based on deep learning (Kenter and de Rijke, 2015;Wu et al, 2014;Socher et al, 2011).…”
Section: Related Workmentioning
confidence: 99%
“…Web corpus from the Stanford WebBase 1 project utilized to build the distributional semantic word representation and then the model was enhanced by integrating POS with WordNet. Same system was then extended to the Multilingual Semantic Textual Similarity and Cross Level Semantic Similarity in SemEval 2014 with few external resources (Google translate 2 , Wordnik 3 , and bing 4 ) and showed greater accuracy (Kashyap et al, 2015 In order to represent the sentence pair, high quality word embedding was obtained using Word2Vec and Glove. Further feature vectors of length 60 computed using feature functions and evaluated on Microsoft Research Paraphrase Corpus (MSRP) (Kenter and de Rijke, 2015).…”
Section: Related Workmentioning
confidence: 99%
“…A crescent number of works in STS literature rely on the use of resources such as WordNet, FrameNet and VerbNet for integrating some linguistic relationships to the STS process (Al-Alwani, 2015;Yousif et al, 2015;Brychcín and Svoboda, 2016;Ferreira et al, 2016;Kashyap et al, 2016;Ferreira et al, 2018). As a complement aspect, probabilistic-based techniques, as we can see in the Vector Space Models (VSM) has been motivating studies about its advantages, such as domain independence and the ability to automatically obtain some of the semantic relations between sentences considering a space of contexts (Hartmann, 2016;Barbosa et al, 2016;Freire et al, 2016).…”
Section: Introductionmentioning
confidence: 99%
“…7 There is a growing number of curation strategies supported by Natural 8 Language Processing (NLP) and Machine Learning (ML) tasks, and they 9 have become a key source of information for bioinformatics repositories. 10 Protein-protein interactions, regulatory interactions identification, entity 11 association to ontologies, or even directed searches, are just some of aided 12 curation examples [6,17,23]. It is important to emphasize that they are 13 focused in facilitating access to specific information patterns.…”
mentioning
confidence: 99%
“…Thus, it is the user 112 who decides what to focus on: a closer meaning to the original sentence 113 (higher score), or a more broader context similarity (a lower score). 114 We decided to apply a frequently used strategy consisting in applying 115 several metrics which measure different similarity aspects of both sentences 116 and then combine the scores into a single one [10,21]. This strategy has 117 proved to be robust to contexts' changes.…”
mentioning
confidence: 99%