Claude E. Shannon scite author profile

A new method of estimating the entropy and redundancy of a language is described. This method exploits the knowledge of the language statistics possessed by those who speak the language, and depends on experimental results in prediction of the next letter when the preceding text is known, Results of experiments in prediction are given, and some properties of an ideal predictor are developed. hTRODUCTIOXI N A previous paper' the entropy and redundancy of a language have been defined. The entropy is a statistical parameter which measures, in a certain sense, how much information is produced on the average for each letter of a text in the language. If the language is translated into binary digits (0 or 1) in the most efficient way, the entropy [{ is the average number of binary digits required per letter of the original language. The redundancy, on the other hand, measures the amount of constraint imposed on a text in the language due to its statistical structure, e.g., in English the high frequency of the letter E, the strong tendency of H to follow T or of L' to follow Q. It was estimated that when statistical effects extending over not more than eight letters are considered the entropy is roughly 2.3 bits per letter, the redundancy about 50 per cent.Since then a new method has been found for estimating these quantities, which is more sensitive and takes account of long range statistics, intluences extending over phrases, sentences, etc. This method is based on a study of the predictability of English; how well can the next letter of a text be predicted when the preceding ?{ letters are known. The results of some experiments in prediction will be given, and a theoretical analysis of some of the properties of ideal prediction. By combining the experimental and theoretical results it is possible to estimate upper and lower bounds for the entropy and redundancy. From this analysis it appears that, in ordinary literary English, the long range statistical effects (up to 100 letters) reduce the entropy to something of the order of one bit per letter, with a corresponding redundancy of roughly 75%. The redundancy may be still higher when structure extending over paragraphs, chapters, etc. is included. However, as the lengths involved are increased, the parameters in question become moreIe.

show abstract

A Mathematical Theory of Communication

Shannon¹,

Weaver²

1948

15,804

8,170

View full text Add to dashboard Cite

Communication Theory of Secrecy Systems*

Shannon¹

1949

7,677

4,191

View full text Add to dashboard Cite

The Mathematical Theory of Communication

Shannon¹,

Weaver²,

Wiener³

1950

5,435

4,083

View full text Add to dashboard Cite

Communication in the Presence of Noise

Shannon¹

1949

Proc. IRE

5,548

2,879

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Claude E. Shannon

A Mathematical Theory of Communication

A Mathematical Theory of Communication

Communication Theory of Secrecy Systems*

The Mathematical Theory of Communication

Communication in the Presence of Noise

Contact Info

Product

Resources

About