According to Barlow's seminal "efficient coding hypothesis," the coding strategy of sensory neurons should be matched to the statistics of stimuli that occur in an animal's natural habitat. Using an automatic search technique, we here test this hypothesis and identify stimulus ensembles that sensory neurons are optimized for. Focusing on grasshopper auditory receptor neurons, we find that their optimal stimulus ensembles differ from the natural environment, but largely overlap with a behaviorally important sub-ensemble of the natural sounds. This indicates that the receptors are optimized for peak rather than average performance. More generally, our results suggest that the coding strategies of sensory neurons are heavily influenced by differences in behavioral relevance among natural stimuli.
This paper presents a detailed survey of word co-occurrence measures used in natural language processing. Word co-occurrence information is vital for accurate computational text treatment, it is important to distinguish words which can combine freely with other words from other words whose preferences to generate phrases are restricted. The latter words together with their typical co-occurring companions are called collocations. To detect collocations, many word cooccurrence measures, also called association measures, are used to determine a high degree of cohesion between words in collocations as opposed to a low degree of cohesion in free word combinations. We describe such association measures grouping them in classes depending on approaches and mathematical models used to formalize word co-occurrence.
Linguistics as a scientific study of human language intends to describe and explain it. However, validity of a linguistic theory is difficult to prove due to volatile nature of language as a human convention and impossibility to cover all real-life linguistic data. In spite of these problems, computational techniques and modeling can provide evidence to verify or falsify linguistic theories. As a case study, we conducted a series of computer experiments on a corpus of Spanish verb-noun collocations using machine learning methods, in order to test a linguistic point that collocations in the language do not form an unstructured collection but are language items related via what we call collocational isomorphism, represented by lexical functions of the Meaning-Text Theory. Our experiments allowed us to verify this linguistic statement. Moreover, they suggested that semantic considerations are more important in the definition of the notion of collocation than statistical ones.
45Resumen: La Lingüística, siendo el estudio científico del lenguaje humano, intenta describirlo y explicarlo. Sin embargo, es difícil demostrar la certeza de cualquier teoría lingüística por la naturaleza versátil del lenguaje como una convención humana y también por la imposibilidad de investigar todo lo que se habla y se escribe en la vida real. A pesar de estos problemas, las técnicas y los modelos computacionales pueden proporcionar la evidencia para que las teorías lingüísticas sean comprobadas o refutadas. A través de un estudio de caso, realizamos una serie de experimentos por computador en un corpus de colocaciones de verbo-sustantivo en español usando métodos de aprendizaje de máquina, con el fin de comprobar el hecho lingüístico de que las colocaciones del idioma no conforman un grupo sin estructura sino que son unidades lingüísticas relacionadas por medio de lo que denominamos isomorfismo de colocaciones, representado por las funciones léxicas de la Teoría Significado-Texto. Nuestros experimentos nos permiten verificar esa declaración lingüística. Asimismo, los experimentos sugieren que las consideraciones semánticas son más importantes en la definición de colocaciones que las estadísticas.Palabras Clave: Colocaciones, funciones léxicas, aprendizaje de máquina.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.