Word2vec Based System for Recognizing Partial Textual Entailment

Víta, Martin; Kríž, Vincent

doi:10.15439/2016f419

Cited by 5 publications

(3 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Note that we assume that if the entailment T → H does not hold, then there is at least one facet such that faceted entailment according to T , H does not hold. Although the proposed method includes a certain part of manual work, in case of preparing balanced corpus, half of the work (positive instances) is done automatically and, moreover, the negative instances can be recommended from the list of potential candidates (obtained in the third step) by some simple algorithm like [19].…”

Section: A Description Of the Methodsmentioning

confidence: 99%

From Building Corpora for Recognizing Faceted Entailment to Recognizing Relational Entailment

Víta

2018

Annals of Computer Science and Information Systems

Self Cite

View full text Add to dashboard Cite

Recognizing textual entailment (RTE) became a well established and widely studied task. Partial textual entailmentand faceted textual entailment in particular-belong to tasks that are derived from RTE. Although there exist many annotated corpora for the original RTE problem, faceted textual entailment is in the sense of easy-accessible corpora highly neglected. In this paper, we present a semi-automatic approach to deriving corpora for faceted entailment task from a general RTE corpus using open information extraction (open IE) tools.As a generalization of this approach and general principles of open IE, we introduce a notion of relational entailment and provide its basic properties and relations to other entailmentlike issues. We would like to introduce the problem of relational entailment as an important task with potentially wide range of real-world applications.

show abstract

Section: A Description Of the Methodsmentioning

confidence: 99%

From Building Corpora for Recognizing Faceted Entailment to Recognizing Relational Entailment

Víta

2018

Annals of Computer Science and Information Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…Word embedding is a continuous vector representation of words that encodes the meaning of the word, such that the words that are closer in the vector space are supposed to be similar in the meaning. The use of word embeddings as additional features improves the performance in many NLP tasks, including text classification [22][23][24][25][26][27][28][29][30]. Different Machine Learning algorithms can be trained to derive these vectors, such as Word2Vec [31], FastText [32], Glove [33].…”

Section: Literature Reviewmentioning

confidence: 99%

Supervised and Unsupervised Categorization of an Imbalanced Italian Crime News Dataset

Rollo

Bonisoli

2022

Lecture Notes in Business Information Processing

View full text Add to dashboard Cite

The automatic categorization of crime news is useful to create statistics on the type of crimes occurring in a certain area. This assignment can be treated as a text categorization problem. Several studies have shown that the use of word embeddings improves outcomes in many Natural Language Processing (NLP), including text categorization. The scope of this paper is to explore the use of word embeddings for Italian crime news text categorization. The approach followed is to compare different document pre-processing, Word2Vec models and methods to obtain word embeddings, including the extraction of bigrams and keyphrases. Then, supervised and unsupervised Machine Learning categorization algorithms have been applied and compared. In addition, the imbalance issue of the input dataset has been addressed by using Synthetic Minority Oversampling Technique (SMOTE) to oversample the elements in the minority classes. Experiments conducted on an Italian dataset of 17,500 crime news articles collected from 2011 till 2021 show very promising results. The supervised categorization has proven to be better than the unsupervised categorization, overcoming 80% both in precision and recall, reaching an accuracy of 0.86. Furthermore, lemmatization, bigrams and keyphrase extraction are not so decisive. In the end, the availability of our model on GitHub together with the code we used to extract word embeddings allows replicating our approach to other corpus either in Italian or other languages.

show abstract

“…For example, Spanish [6], Arabic [7,8], German [9], and Czech [10], Italian [14], Japanese [15], China [16]. Moreover, some researchers build systems are independent from standard dataset, although the experimental data still refers to the standard dataset [17,18]. These all works indicate that the research in TE field still grows [2].…”

Section: Related Workmentioning

confidence: 99%

WERTES: Web as External Resources for Textual Entailment Systems

Abdiansah¹,

Azhari²,

Sari³

2018

IJIES

View full text Add to dashboard Cite

Abstract:Research in Textual Entailment (TE) has been widely conducted, mainly in natural language based systems, since TE can provide solutions to semantic problems. Usually, the researchers focus on method improvement, hence, they use standard data sets, which are specific to a particular language, primarily in English. For low-resource languages, it is very difficult to find data sets to test the TE systems. Therefore, in this paper we propose a model to extract data from the web to serve as data set for TE systems. The model can be used for crosslanguage domains with simple modifications. Two datasets are created and used to evaluate the model, i.e. DS-100-R, which contains facts, and DS-100-W, which contains non-facts. The model produces a set of sentences that are expected to be relevant to the queries. Some algorithms are created to address problems that arise during experiments. Based on the evaluation, the model accuracy for DS-100-R dataset is 79.0%, and for DS-100-W dataset is 70.0%. Hence, the overall model accuracy is 74.5%.

show abstract

Word2vec Based System for Recognizing Partial Textual Entailment

Cited by 5 publications

References 5 publications

From Building Corpora for Recognizing Faceted Entailment to Recognizing Relational Entailment

From Building Corpora for Recognizing Faceted Entailment to Recognizing Relational Entailment

Supervised and Unsupervised Categorization of an Imbalanced Italian Crime News Dataset

WERTES: Web as External Resources for Textual Entailment Systems

Contact Info

Product

Resources

About