Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment

Gotoh, Osamu; Morita, Mariko; Nelson, David R.

doi:10.1186/1471-2105-15-189

Cited by 49 publications

(25 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Importantly, the sole reliance on gene models would have resulted in the homologous relationships for over 30 % of the genes going unfound. Central to this is the new crop of protein-to-genome aligners that are refining long established genome annotations [ 33 , 66 ].…”

Section: Discussionmentioning

confidence: 99%

Expanding the view on the evolution of the nematode dauer signalling pathways: refinement through gene gain and pathway co-option

et al. 2016

View full text Add to dashboard Cite

BackgroundSignalling pathways underlie development, behaviour and pathology. To understand patterns in the evolution of signalling pathways, we undertook a comprehensive investigation of the pathways that control the switch between growth and developmentally quiescent dauer in 24 species of nematodes spanning the phylum.ResultsOur analysis of 47 genes across these species indicates that the pathways and their interactions are not conserved throughout the Nematoda. For example, the TGF-β pathway was co-opted into dauer control relatively late in a lineage that led to the model species Caenorhabditis elegans. We show molecular adaptations described in C. elegans that are restricted to its genus or even just to the species. Similarly, our analyses both identify species where particular genes have been lost and situations where apparently incorrect orthologues have been identified.ConclusionsOur analysis also highlights the difficulties of working with genome sequences from non-model species as reliance on the published gene models would have significantly restricted our understanding of how signalling pathways evolve. Our approach therefore offers a robust standard operating procedure for genomic comparisons.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2770-7) contains supplementary material, which is available to authorized users.

show abstract

Section: Discussionmentioning

confidence: 99%

Expanding the view on the evolution of the nematode dauer signalling pathways: refinement through gene gain and pathway co-option

et al. 2016

View full text Add to dashboard Cite

show abstract

“…Recently, we have access to huge amounts of sequence data from widely divergent organisms, but the quality of the data is not always high because of the limitations of sequencing technologies. In the case of amino acid sequence data, the difficulty in eukaryotic gene prediction [28–30] also results in errors in data. It might be possible to automatically exclude such problematic data in certain cases, but sometimes, biologically important information is in low-quality sequences, especially when interest is in nonmodel organisms.…”

Section: Interactive Sequence Choice and Visualizationmentioning

confidence: 99%

MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization

Katoh

Rozewicki

Yamada³

2017

Briefings in Bioinformatics

6,059

3,906

View full text Add to dashboard Cite

This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.

show abstract

“…Unfortunately, the quality of the sequences is not always high, partly due to limitations in sequencing technologies. Moreover, at the amino acid sequence level, a number of errors can be introduced due to difficulty in gene prediction ( Brent, 2005 ; Gotoh et al , 2014 ; Nagy and Patthy, 2013 ; Yandell and Ence, 2012 ). With incorrect reading frames, unrelated amino acid segments can appear in a set of homologous sequences.…”

Section: Introductionmentioning

confidence: 99%

A simple method to control over-alignment in the MAFFT multiple sequence alignment program

2016

View full text Add to dashboard Cite

Motivation: We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments). Conventional MAFFT is highly sensitive in aligning conserved regions in remote homologs, but the risk of over-alignment is recently becoming greater, as low-quality or noisy sequences are increasing in protein sequence databases, due, for example, to sequencing errors and difficulty in gene prediction.Results: The proposed method utilizes a variable scoring matrix for different pairs of sequences (or groups) in a single multiple sequence alignment, based on the global similarity of each pair. This method significantly increases the correctly gapped sites in real examples and in simulations under various conditions. Regarding sensitivity, the effect of the proposed method is slightly negative in real protein-based benchmarks, and mostly neutral in simulation-based benchmarks. This approach is based on natural biological reasoning and should be compatible with many methods based on dynamic programming for multiple sequence alignment.Availability and implementation: The new feature is available in MAFFT versions 7.263 and higher. http://mafft.cbrc.jp/alignment/software/Contact: katoh@ifrec.osaka-u.ac.jpSupplementary information: Supplementary data are available at Bioinformatics online.

show abstract

Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment

Cited by 49 publications

References 51 publications

Expanding the view on the evolution of the nematode dauer signalling pathways: refinement through gene gain and pathway co-option

Expanding the view on the evolution of the nematode dauer signalling pathways: refinement through gene gain and pathway co-option

MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization

A simple method to control over-alignment in the MAFFT multiple sequence alignment program

Contact Info

Product

Resources

About