Many systems in nature can be described using discrete input–output maps. Without knowing details about a map, there may seem to be no a priori reason to expect that a randomly chosen input would be more likely to generate one output over another. Here, by extending fundamental results from algorithmic information theory, we show instead that for many real-world maps, the a priori probability P(x) that randomly sampled inputs generate a particular output x decays exponentially with the approximate Kolmogorov complexity of that output. These input–output maps are biased towards simplicity. We derive an upper bound P(x) ≲ , which is tight for most inputs. The constants a and b, as well as many properties of P(x), can be predicted with minimal knowledge of the map. We explore this strong bias towards simple outputs in systems ranging from the folding of RNA secondary structures to systems of coupled ordinary differential equations to a stochastic financial trading model.
Significance Why does evolution favor symmetric structures when they only represent a minute subset of all possible forms? Just as monkeys randomly typing into a computer language will preferentially produce outputs that can be generated by shorter algorithms, so the coding theorem from algorithmic information theory predicts that random mutations, when decoded by the process of development, preferentially produce phenotypes with shorter algorithmic descriptions. Since symmetric structures need less information to encode, they are much more likely to appear as potential variation. Combined with an arrival-of-the-frequent mechanism, this algorithmic bias predicts a much higher prevalence of low-complexity (high-symmetry) phenotypes than follows from natural selection alone and also explains patterns observed in protein complexes, RNA secondary structures, and a gene regulatory network.
In 2020, the term ‘infodemic’ rose from relative obscurity to becoming a popular catch-all metaphor, representing the perils of fast, wide-spreading (false) information about the coronavirus pandemic. It featured in thousands of academic publications and received widespread attention from policymakers and the media. In this article, we trace the origins and use of the ‘infodemic’ metaphor and examine the blind spots inherent in this seemingly intuitive term. Drawing from literature in the cognitive sciences and communication studies, we show why information does not spread like a virus and point out how the ‘infodemic’ metaphor can be misleading, as it conflates multiple forms of social behaviour, oversimplifies a complex situation and helps constitute a phenomenon for which concrete evidence remains patchy. We point out the existing tension between the usefulness of the widespread use of the term ‘infodemic’ and its uncritical adoption, which we argue can do more harm than good, potentially diluting the quality of academic work, public discourse and contributing to state overreach in policymaking.
Thousands of new news articles appear daily in outlets in different languages. Understanding which articles refer to the same story can not only improve applications like news aggregation but enable cross-linguistic analysis of media consumption and attention. However, assessing the similarity of stories in news articles is challenging due to the different dimensions in which a story might vary, e.g., two articles may have substantial textual overlap but describe similar events that happened years apart. To address this challenge, we introduce a new dataset of nearly 10,000 news article pairs spanning 18 language combinations annotated for seven dimensions of similarity as SemEval 2022 Task 8. Here, we present an overview of the task, the best performing submissions, and the frontiers and challenges for measuring multilingual news article similarity. While the participants of this SemEval task contributed very strong models, achieving up to 0.818 correlation with gold standard labels across languages, human annotators are capable of reaching higher correlations, suggesting space for further progress.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.