We discuss algorithms for estimating the Shannon entropy h of finite symbol sequences with long range correlations. In particular, we consider algorithms which estimate h from the code lengths produced by some compression algorithm. Our interest is in describing their convergence with sequence length, assuming no limits for the space and time complexities of the compression algorithms. A scaling law is proposed for extrapolation from finite sample lengths. This is applied to sequences of dynamical systems in non-trivial chaotic regimes, a 1-D cellular automaton, and to written English texts.Partially random chains of symbols s 1 ,s 2 ,s 3 , . . . drawn from some finite alphabet "we restrict ourselves here to finite alphabets though most of our considerations would also apply to countable ones… appear in practically all sciences. Examples include spins in one-dimensional magnets, written texts, DNA sequences, geological records of the orientation of the magnetic field of the earth, and bits in the storage and transmission of digital data. An interesting question in all these contexts is to what degree these sequences can be ''compressed'' without losing any information. This question was first posed by Shannon 1 in a probabilistic context. He showed that the relevant quantity is the entropy "or average information content… h, which in the case of magnets coincides with the thermodynamic entropy of the spin degrees of freedom. Estimating the entropy is non-trivial in the presence of complex and long range correlations. In that case one has essentially to understand perfectly these correlations for optimal compression and entropy estimation, and thus estimates of h measure also the degree to which the structure of the sequence is understood.