On indexing, retrieval and the meaning of about

Maron, M. E.

doi:10.1002/asi.4630280107

Cited by 113 publications

(45 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Here the aim is to identify the most significant topics; those which the document was written about (Maron, 1977). These index topics can be used to summarize the document and organize it under category-like headings.…”

Section: Related Workmentioning

confidence: 99%

Learning to link with wikipedia

Milne

Witten

2008

Proceedings of the 17th ACM Conference on Information and Knowledge Management

933

876

View full text Add to dashboard Cite

This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles. The resulting link detector and disambiguator performs very well, with recall and precision of almost 75%. This performance is constant whether the system is evaluated on Wikipedia articles or "real world" documents.This work has implications far beyond enriching documents with explanatory links. It can provide structured knowledge about any unstructured fragment of text. Any task that is currently addressed with bags of words-indexing, clustering, retrieval, and summarization to name a few-could use the techniques described here to draw on a vast network of concepts and semantics.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning to link with wikipedia

Milne

Witten

2008

Proceedings of the 17th ACM Conference on Information and Knowledge Management

933

876

View full text Add to dashboard Cite

show abstract

“…For example, the degree of representational indeterminacy between the phrase "SSN" and someone's actual social security number in a data base is the probability that someone using the term "SSN" as a search term (in, for example, an SQL query) will not find the retrieval of data or documents linked to social security numbers useful. If we symbolize representational indeterminacy as R Ind then we can formalize its definition as [adapted from Maron, 1977]:…”

Section: An Operational Definition Of Representational Indeterminacymentioning

confidence: 99%

The data-document distinction revisited

Blair

2006

SIGMIS Database

View full text Add to dashboard Cite

Abstract:The Data Retrieval and Document Retrieval models have a number of differences which influence their design, use and management. This paper discusses the most prominent of these differences and shows that they all arise from the more fundamental problem of representational indeterminacy. Representational indeterminacy is a result of the effects of semantic ambiguity and system size. In the May 1984 issue of Communications of the ACM (CACM) a paper was published that described fundamental distinctions between two types of information retrieval: data retrieval and document/text retrieval . Four principal differences between these two types of retrieval were identified and some of the consequences of these distinctions were discussed. At the time of this paper the dominant model of information management was the data model, and the dominant technology was the data base management system. Document retrieval was largely a research field (known in the ACM---Association for Computing Machinery---as "Information Retrieval") with

show abstract

“…Nesse processo, o modo de leitura, apesar de pouco explorado e avaliado pela literatura científica, é a fase principal da operação, sendo considerado fator determinante para seu sucesso (Silva & Fujita, 2004). Ainda podem ocorrer divergências entre indexadores que atribuem diferentes termos-chave a um mesmo documento, ou ainda um mesmo indexador atribuindo diferentes termos-chave a um documento em momentos diferentes (Maron, 1977;Guedes, 1994;Fujita, 1999).…”

Section: Introductionunclassified

Uma arquitetura hibrida para a indexação de documentos do Diário Oficial do Município de Cachoeiro de Itapemirim

Xavier¹,

Silva²,

Gomes

2015

Transinformação

View full text Add to dashboard Cite

ResumoTécnicas de Mineração de Textos vêm sendo amplamente utilizadas para processamento de grandes volumes de documentos. Contudo, ainda há uma grande defasagem na tentativa de definir uma arquitetura para sistemas transacionais com elementos de inteligência computacional. Este trabalho tem o objetivo de apresentar uma proposta de arquitetura para a construção de um sistema computacional que utiliza técnicas de Mineração de Textos para indexar conteúdos da base do Diário Oficial do município de Itapemirim, no estado do Espírito Santo, transformando a informação antes disponível em linguagem natural para um formato estruturado, passível de ser persistido. Para validar a arquitetura, foi desenvolvido um protótipo em linguagem Java acessível no ambiente Web. Para avaliação da ferramenta, o estudo de caso proposto contou com uma base composta por 22 documentos, contendo 198 atos normativos da base daquele Diário Oficial, para os quais foram identificados bons níveis de precisão e abrangência na recuperação da informação. Este trabalho contribui com a apresentação de uma arquitetura híbrida, composta por elementos do modelo de sistemas transacionais e elementos da Mineração de Textos, além da utilização de padrões de projetos de software. Palavras-chave: Diário Oficial de Cachoeiro de Itapemirim. Indexação de documentos. Mineração de textos. Recuperação da informação. Abstract Text mining techniques have been widely used to process large volumes of documents. However, there is still a large gap when defining the architecture for systems with transactional elements of computational intelligence. The aim of the paper is to outline a proposed architecture to build a computational system that uses text mining techniques to index content from the database of the Official Gazette in the city of

show abstract

On indexing, retrieval and the meaning of about

Cited by 113 publications

References 3 publications

Learning to link with wikipedia

Learning to link with wikipedia

The data-document distinction revisited

Uma arquitetura hibrida para a indexação de documentos do Diário Oficial do Município de Cachoeiro de Itapemirim

Contact Info

Product

Resources

About