2005
DOI: 10.1073/pnas.0409746102
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised learning of natural languages

Abstract: We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The ADIOS (automatic distillation of structure) algorithm relies on a statistical method for pattern … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
233
0

Year Published

2007
2007
2023
2023

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 230 publications
(233 citation statements)
references
References 24 publications
0
233
0
Order By: Relevance
“…These results also increase the credibility of our binning method for assigning 'viral-like' versus 'non-viral' notations, as the identified motifs were found almost exclusively in the 'viral-like' file (statistically significant based on hypergeometric distribution; see Methods and Supplementary Table S3). Interestingly, some of these viral-specific motifs, as well as others in other regions, were also identified independently by the unsupervised de novo motif extraction (MEX) algorithm (Solan et al, 2005;Kunik et al, 2007; a list of different viral-specific D1 peptides identified by the MEX algorithm is presented in Supplementary File S3).…”
Section: Resultsmentioning
confidence: 99%
“…These results also increase the credibility of our binning method for assigning 'viral-like' versus 'non-viral' notations, as the identified motifs were found almost exclusively in the 'viral-like' file (statistically significant based on hypergeometric distribution; see Methods and Supplementary Table S3). Interestingly, some of these viral-specific motifs, as well as others in other regions, were also identified independently by the unsupervised de novo motif extraction (MEX) algorithm (Solan et al, 2005;Kunik et al, 2007; a list of different viral-specific D1 peptides identified by the MEX algorithm is presented in Supplementary File S3).…”
Section: Resultsmentioning
confidence: 99%
“…Onnis et al [54] showed that when sentences that include common words are presented sequentially or only one sentence apart, segmentation is better than when the same corpus of sentences is presented in a scrambled random order. Note that this result would not follow in models that did not require a learning window, for example, in models where all the data are first acquired and then analysed, as in most computational models for word learning [17][18][19]. Moreover, recent analyses of child-directed speech show that parents and carers behave as if they know that this proximity of sentences with common words is necessary.…”
Section: Predictions Supportive Evidence and Future Workmentioning
confidence: 99%
“…input mechanisms in social learning [16]), but these mechanisms are usually not incorporated into learning models as a way of guiding data selection. For example, computational models for language acquisition use large datasets of child-directed speech without using attentional or communicational cues for data selection [17][18][19]. We believe that much of the learning is already determined by the selection of data to acquire.…”
Section: Introductionmentioning
confidence: 99%
“…(Here we intentionally gloss over the distinction between acceptability and grammaticality.) This consideration suggests that the generative performance of a grammar could be measured by two figures : RECALL, defined as the proportion of unfamiliar sentences that a parser based on the grammar accepts, and PRECISION, defined as the proportion of novel sentences generated by the grammar that are deemed acceptable by native-speaker subjects, preferably in a blind, controlled test (Solan et al, 2005). These definitions of recall and precision are related but not identical to those used in NLP (Klein & Manning, 2002).…”
Section: Introduction a N D O V E R V I E Wmentioning
confidence: 99%