The information content of symbolic sequences (such as nucleic or amino acid sequences, but also neuronal firings or strings of letters) can be calculated from an ensemble of such sequences, but because information cannot be assigned to single sequences, we cannot correlate information to other observables attached to the sequence. Here we show that an information
score
obtained from multivariate (multiple-variable) correlations within sequences of a ‘training’ ensemble can be used to predict observables of out-of-sample sequences with an accuracy that scales with the complexity of correlations, showing that functional information emerges from a hierarchy of multi-variable correlations.
This article is part of the theme issue ‘Emergent phenomena in complex physical and socio-technical systems: from cells to societies’.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.