Commentary on the 6th International Symposium of Animal Functional Genomics

Ajmone‐Marsan, Paolo; Stella, Alessandra

doi:10.1186/s12711-016-0276-z

Cited by 3 publications

(3 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Similar to other crops, in maize, less than half of the genome sequence is expected to be shared between inbred lines [ 4 ]. Building accurate models from expensive data derived from reference line(s) will enable breeders to project that information to other genotypes for use in genomic selection models and to prioritize regions of the genome to edit using strategies such as CRISPR technology [ 5 , 6 ].…”

Section: Introductionmentioning

confidence: 99%

A k-mer grammar analysis to uncover maize regulatory architecture

Mejía‐Guerra

Buckler

2019

BMC Plant Biol

View full text Add to dashboard Cite

BackgroundOnly a small percentage of the genome sequence is involved in regulation of gene expression, but to biochemically identify this portion is expensive and laborious. In species like maize, with diverse intergenic regions and lots of repetitive elements, this is an especially challenging problem that limits the use of the data from one line to the other. While regulatory regions are rare, they do have characteristic chromatin contexts and sequence organization (the grammar) with which they can be identified.ResultsWe developed a computational framework to exploit this sequence arrangement. The models learn to classify regulatory regions based on sequence features - k-mers. To do this, we borrowed two approaches from the field of natural language processing: (1) “bag-of-words” which is commonly used for differentially weighting key words in tasks like sentiment analyses, and (2) a vector-space model using word2vec (vector-k-mers), that captures semantic and linguistic relationships between words. We built “bag-of-k-mers” and “vector-k-mers” models that distinguish between regulatory and non-regulatory regions with an average accuracy above 90%. Our “bag-of-k-mers” achieved higher overall accuracy, while the “vector-k-mers” models were more useful in highlighting key groups of sequences within the regulatory regions.ConclusionsThese models now provide powerful tools to annotate regulatory regions in other maize lines beyond the reference, at low cost and with high accuracy.Electronic supplementary materialThe online version of this article (10.1186/s12870-019-1693-2) contains supplementary material, which is available to authorized users.

show abstract

Section: Introductionmentioning

confidence: 99%

A k-mer grammar analysis to uncover maize regulatory architecture

Mejía‐Guerra

Buckler

2019

BMC Plant Biol

View full text Add to dashboard Cite

show abstract

“…After publication of this paper [ 1 ], we noted that the “Acknowledgement” section was incomplete and should be as follows:…”

Section: Erratum To: Genet Sel Evol (2016) 48:97 Doi 101186/s12711-0mentioning

confidence: 99%

Erratum to: Commentary on the 6th International Symposium of Animal Functional Genomics

Ajmone‐Marsan

Stella

2017

Genet Sel Evol

Self Cite

View full text Add to dashboard Cite

“…In maize, less than half of the genome sequence is expected to be shared between inbred lines [4]. Building accurate models from expensive data derived from the maize reference line will enable breeders to broadcast that information to other genotypes for use in genomic selection models and to prioritize regions of the genome to edit using strategies such as CRISPR [5,6].…”

Section: Introductionmentioning

confidence: 99%

k-mer grammar uncovers maize regulatory architecture

Mejía‐Guerra

Buckler

2017

Preprint

View full text Add to dashboard Cite

Only a small percentage of the genome sequence is involved in regulation of gene expression, but to biochemically identify this portion is expensive and laborious. In species like maize, with diverse intergenic regions and lots of repetitive elements, this is an especially challenging problem. While regulatory regions are rare, they do have characteristic chromatin contexts and sequence organization (the grammar) with which they can be identified. We developed a computational framework to exploit this sequence arrangement. The models learn to classify regulatory regions based on sequence features -k-mers. To do this, we borrowed two approaches from the field of natural language processing: (1) "bag-of-words" which is commonly used for differentially weighting key words in tasks like sentiment analyses, and (2) a vector-space model using word2vec (vector-k-mers), that captures semantic and linguistic relationships between words. We built "bag-of-k-mers" and "vector-k-mers" models that distinguish between regulatory and nonregulatory regions with an accuracy above 90%. Our "bag-of-k-mers" achieved higher overall accuracy, while the "vector-k-mers" models were more useful in highlighting key groups of sequences within the regulatory regions. These models now provide powerful tools to annotate regulatory regions in other maize lines beyond the reference, at low cost and with high accuracy.

show abstract

Commentary on the 6th International Symposium of Animal Functional Genomics

Cited by 3 publications

References 6 publications

A k-mer grammar analysis to uncover maize regulatory architecture

A k-mer grammar analysis to uncover maize regulatory architecture

Erratum to: Commentary on the 6th International Symposium of Animal Functional Genomics

k-mer grammar uncovers maize regulatory architecture

Contact Info

Product

Resources

About