Short text categorization exploiting contextual enrichment and external knowledge

Mizzaro, Stefano; Pavan, Marco; Scagnetto, Ivan; Valenti, Martino

doi:10.1145/2632188.2632205

Cited by 12 publications

(4 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It also incorporates central sentences from articles found in Wikipedia that are linked with tweet, and lastly, to improve the performance of the fusion, it uses the resultant clusters retrieved from the expanded micro-blog based on the cluster. In Mizzaro et al [21] a method that uses information extracted from the web derived from the same temporal context was proposed. In this method, Wikipedia which acts as an external resource is used to query the words.…”

Section: Related Workmentioning

confidence: 99%

Information Retrieval Framework for Digital Resource Objects

Alma’aitah¹

2019

IJATCSE

View full text Add to dashboard Cite

Basically, digital resource objects (DRO) suffer from two fundamental issues, namely lack of quality of metadata content and difficulty in accessing metadata content. These lead to decrease in the performance of the DRO retrieval. With a view to increase the performance of the DRO retrieval, many components of information retrieval have been enhanced such as document expansion (DE), retrieval model such as Dirichlet smoothing (DS) model, and query expansion (QE). Most of these studies have shown that employing IR components (DE, QE or DS) independently to enhance the DROs retrieval has helped to increase the performance of the retrieval. It is assumed that IR components can enhance the performance of the DRO retrieval. Based on this assumption, an information retrieval framework (IRF) for DROs is presented in this paper. The proposed IRF is to address the retrieval problems in DROs and provide an environment for retrieving information from DROs with the highest possible performance. The principle task of IRF is to make all components of IR (DE, DS, and QE) work together to achieve the greatest benefit in improving the retrieval performance. Several experiments were conducted on CHiC2013 which is a collection on cultural heritage. The results show a considerable enhancement over other IR approaches that use the DE method, DS model and QE method independently.

show abstract

Section: Related Workmentioning

confidence: 99%

Information Retrieval Framework for Digital Resource Objects

Alma’aitah¹

2019

IJATCSE

View full text Add to dashboard Cite

show abstract

“…The problem of data sparsity in short-text analysis is often handled by contextual enrichment methods. Such methods exploit external sources of semantic knowledge to extend the sparse features of short-text with additional information to make it appear like a long text or a heterogeneous document [37], [38], [39]. Based on this analogy, we consider a set of contextual enrichment methods, that are typically used in short-text analysis, to contextually enrich software requirements with domainspecific data derived from Wikipedia.…”

Section: B Requirements Textmentioning

confidence: 99%

Exploiting online human knowledge in Requirements Engineering

Mahmoud

Carver

2015

2015 IEEE 23rd International Requirements Engineering Conference (RE)

View full text Add to dashboard Cite

Data-driven Natural Language Processing (NLP) methods have noticeably advanced in the past few years. These advances can be tied to the drastic growth of the quality of collaborative knowledge bases (KB) available on the World Wide Web. Such KBs contain vast amounts of up-to-date structured human knowledge and common sense data that can be exploited by NLP methods to discover otherwise-unseen semantic dimensions in text, aiding in tasks related to natural language understanding, classification, and retrieval. Motivated by these observations, we describe our research agenda for exploiting online human knowledge in Requirements Engineering (RE). The underlying assumption is that requirements are a product of the human domain knowledge that is expressed mainly in natural language. In particular, our research is focused on methods that exploit the online encyclopedia Wikipedia as a textual corpus. Wikipedia provides access to a massive number of real-world concepts organized in hierarchical semantic structures. Such knowledge can be analyzed to provide automated support for several exhaustive RE activities including requirements elicitation, understanding, modeling, traceability, and reuse, across multiple application domains. This paper describes our preliminary findings in this domain, current state of research, and prospects of our future work.

show abstract

“…In this paper the two primary sources for constructing enriched BoW have been identified as Legal statute pertaining to dowry acts [304B, 498, 256] and the vast knowledge accrued by the legal experts over a period of time. The reason for choosing Legal Statute as the one of the external knowledge source [4,5] for constructing the enriched BoW is that statute happens to be the basis for the different sections of IPC (Indian Penal Code). The enriched BoW thus created is a semantic BoW which can be used as a major source of metadata for the researchers whose research area happens to be dowry cases.…”

Section: Introductionmentioning

confidence: 99%

Enhancement of Bag-of-Words for Legal documents using Legal Statute

Rao¹,

Krishna²,

Rao³

et al. 2016

IOSR

View full text Add to dashboard Cite

In this paper Legal statute related to dowry acts has been processed to obtain a distinct set of legal keywords which don't have a common occurrence in day to day dowry case judgments. This effort coupled with the knowledge of legal expert would give a very much broadened scope for the BoW. These keywords are very rich in concept and well connected to the domain of dowry acts. The earlier work [22] constructed BoW for dowry case notes of judgments. Current work tries to improve the BoW by widening the scope of dowry related cases. This enriches the Bag-of-Words with high probability legal terms taking precedence over low probability non-legal terms. The enriched BoW set when put through any of the similarity measures or machine learning techniques is bound to give better results when compared to earlier BoW[22].

show abstract

Short text categorization exploiting contextual enrichment and external knowledge

Cited by 12 publications

References 9 publications

Information Retrieval Framework for Digital Resource Objects

Information Retrieval Framework for Digital Resource Objects

Exploiting online human knowledge in Requirements Engineering

Enhancement of Bag-of-Words for Legal documents using Legal Statute

Contact Info

Product

Resources

About