Ian H. Witten scite author profile

fellow, ieee Arithmetic coding provides an eeective mechanism for removing redundancy in the encoding of data. We show how arithmetic coding works and describe an eecient implementation that uses table lookup as a fast alternative to arithmetic operations. The reduced-precision arithmetic has a provably negligible eeect on the amount of compression achieved. We can speed up the implementation further by use of parallel processing. We discuss the role of probability models and how they provide probability information to the arithmetic coder. We conclude with perspectives on the comparative advantages and disadvantages of arithmetic coding.

show abstract

Learning to link with wikipedia

Milne

Witten

2008

933

876

View full text Add to dashboard Cite

This paper describes how to automatically cross-reference documents with Wikipedia: the largest knowledge base ever known. It explains how machine learning can be used to identify significant terms within unstructured text, and enrich it with links to the appropriate Wikipedia articles. The resulting link detector and disambiguator performs very well, with recall and precision of almost 75%. This performance is constant whether the system is evaluated on Wikipedia articles or "real world" documents.This work has implications far beyond enriching documents with explanatory links. It can provide structured knowledge about any unstructured fragment of text. Any task that is currently addressed with bags of words-indexing, clustering, retrieval, and summarization to name a few-could use the techniques described here to draw on a vast network of concepts and semantics.

show abstract

Data mining in bioinformatics using Weka

Frank

Hall

Trigg³

et al. 2004

812

596

View full text Add to dashboard Cite

show abstract

The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression

Witten

Bell

1991

IEEE Trans. Inform. Theory

499

265

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ian H. Witten

Managing Gigabytes: Compressing and Indexing Documents and Images

Arithmetic coding for data compression

Learning to link with wikipedia

Data mining in bioinformatics using Weka

The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression

Contact Info

Product

Resources

About