A method for the computation of the semantic similarity and relatedness between natural language words

Anisimov, A. V.; Marchenko, Oleksandr; Kysenko, V. K.

doi:10.1007/s10559-011-9334-2

Cited by 6 publications

(2 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The correlation coefficient of the proposed method was estimated at 87%, which is optimal comparing with methods proposed by Resnik [15] with correlation coefficient of 79%, by Anisimo et al [16] with correlation coefficient of 74%, and by Qasim with correlation coefficient of 89%. It can also be focused on article similarities using a specialized WordNet.…”

Section: Resultsmentioning

confidence: 98%

A new approach for finding semantic similar scientific articles

Nasab¹,

Javidan

2015

JACST

View full text Add to dashboard Cite

Calculating article similarities enables users to find similar articles and documents in a collection of articles. Two similar documents are extremely helpful for text applications such as document-to-document similarity search, plagiarism checker, text mining for repetition, and text filtering. This paper proposes a new method for calculating the semantic similarities of articles. WordNet is used to find word semantic associations. The proposed technique first compares the similarity of each part two by two. The final results are then calculated based on weighted mean from different parts. Results are compared with human scores to find how it is close to Pearson's correlation coefficient. The correlation coefficient above 87 percent is the result of the proposed system. The system works precisely in identifying the similarities.

show abstract

Section: Resultsmentioning

confidence: 98%

A new approach for finding semantic similar scientific articles

Nasab¹,

Javidan

2015

JACST

View full text Add to dashboard Cite

show abstract

“…Then 6 words with the highest wt are selected and bonded in alphabetical order in the string. As the signature of the document, CRC32 checksum is calculated for the resulting string (Anisimov et al, 2011).…”

Section: Tf-ifd and Its Modificationsmentioning

confidence: 99%

Establishing Semantic Similarity of the Cluster Documents and Extracting Key Entities in the Problem of the Semantic Analysis of News Texts

Солошенко¹,

Orlova²,

Розалиев³

et al. 2015

MAS

View full text Add to dashboard Cite

This paper is dedicated to the problem of establishing semantic similarity for the documents of the news cluster and extracting key entities from the article's text. The existing methods and algorithms for fuzzy duplicate detection texts are briefly reviewed and analysed, such as TF-IDF and its modifications, Long Sent, Megashingles and Log Shingles, and Lex Rand. The shingles algorithm essence and its main stages are described in detail. Several options of the parallel implementation for the shingles algorithm are presented: for multiprocessor heterogeneous computing systems using CUDA and Open CL and for distributed computing systems using Google App Engine. The parameters of the algorithm (operation time, acceleration) applied to the problem of the semantic analysis for news texts are assessed. In addition, the methods and algorithms for extracting key phrases from the news text are reviewed: graph methods, in particular TextRank, building horizontal visibility graphs, the Viterbi algorithm, types of Markov random fields method, as well as a comprehensive context-sensitive algorithm for news text analysis (a combination of statistical algorithms for extracting key words and algorithms for forming semantic coherence of the text blocks). These methods are analysed from the standpoint of applicability to the news articles analysis. Particular attention is paid to the peculiarities of the news text structure. Although the thematic classification and selection of key entities in text documents are powerful text processing tools, these stages of analysis cannot give a complete picture of the news piece semantics. The paper presents a methodology and a comprehensive analysis of news text, based on a combination of semantic analysis and subsequent text abstracting submitting it in a compressed formatso-called mind map.

show abstract