The information content of symbolic sequences (such as nucleic or amino acid sequences, but also neuronal firings or strings of letters) can be calculated from an ensemble of such sequences, but because information cannot be assigned to single sequences, we cannot correlate information to other observables attached to the sequence. Here we show that an information
score
obtained from multivariate (multiple-variable) correlations within sequences of a ‘training’ ensemble can be used to predict observables of out-of-sample sequences with an accuracy that scales with the complexity of correlations, showing that functional information emerges from a hierarchy of multi-variable correlations.
This article is part of the theme issue ‘Emergent phenomena in complex physical and socio-technical systems: from cells to societies’.