Genomic Analysis and Geographic Visualization of the Spread of Avian Influenza (H5N1)

Janies, Daniel; Hill, Andrew; Guralnick, Robert; Habib, Farhat; Waltari, Eric; Wheeler, Ward C.

doi:10.1080/10635150701266848

Cited by 64 publications

(91 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Evolutionary events such as mutations of molecular sequences are modeled to occur along the branches of the tree, between ancestor and descendant. Phylogenetic trees have many important applications in medical and biological research (see [1] for a summary) ranging from mapping of the emergence of infectious diseases [17] to the tests of whether Caribbean frogs have a common origin or represent multiple independent invasions of the islands [16].…”

Section: Introductionmentioning

confidence: 99%

Large-Scale Phylogenetic Analysis on Current HPC Architectures

Ott

Żola

Aluru

et al. 2008

Scientific Programming

Self Cite

View full text Add to dashboard Cite

Abstract. Phylogenetic inference is considered a grand challenge in Bioinformatics due to its immense computational requirements. The increasing popularity and availability of large multi-gene alignments as well as comprehensive datasets of single nucleotide polymorphisms (SNPs) in current biological studies, coupled with rapid accumulation of sequence data in general, pose new challenges for high performance computing. By example of RAxML, which is currently among the fastest and most accurate programs for phylogenetic inference under the Maximum Likelihood (ML) criterion, we demonstrate how the phylogenetic ML function can be efficiently scaled to current supercomputer architectures like the IBM BlueGene/L (BG/L) and SGI Altix. This is achieved by simultaneous exploitation of coarse-and fine-grained parallelism which is inherent to every ML-based biological analysis. Performance is assessed using datasets consisting of 270 sequences and 566,470 base pairs (haplotype map dataset), and 2,182 sequences and 51,089 base pairs, respectively. To the best of our knowledge, these are the largest datasets analyzed under ML to date. Experimental results indicate that the fine-grained parallelization scales well up to 1,024 processors. Moreover, a larger number of processors can be efficiently exploited by a combination of coarse-and fine-grained parallelism. We also demonstrate that our parallelization scales equally well on an AMD Opteron cluster with a less favorable network latency to processor speed ratio. Finally, we underline the practical relevance of our approach by including a biological discussion of the results from the haplotype map dataset analysis, which revealed novel biological insights via phylogenetic inference.

show abstract

Section: Introductionmentioning

confidence: 99%

Large-Scale Phylogenetic Analysis on Current HPC Architectures

Ott

Żola

Aluru

et al. 2008

Scientific Programming

Self Cite

View full text Add to dashboard Cite

show abstract

“…In parallel, major effort has been invested in compiling digital taxonomic name resources and the taxonomic literature (Koning et al, 2005;Sautter et al, 2006). Finally, display and visualization efforts include presenting compiled data on maps, or creating information pages with biodiversity data 'mashed up' with other types of data (Butler, 2006;Janies et al, 2007).…”

Section: Current Statementioning

confidence: 99%

The big questions for biodiversity informatics

Peterson

Knapp

Guralnick

et al. 2010

Systematics and Biodiversity

Self Cite

View full text Add to dashboard Cite

“…Location identifiers are mapped to the geographic center of each country, which is also identified with the standard two-letter country code. epidemiological hypotheses concerning HIV and other pathogens such as Influenza A ( Janies et al 2007). For instance, the exceptional subtype distributions seen in Tanzania that lead it to cluster with countries in central Africa is consistent with the hypothesis that events such as the Tanzania-Uganda war, which ended in 1979, were responsible for founder events that introduced non-C subtypes into Tanzania, while C arrived later from elsewhere in east Africa (Serwadda et al 1985;Vasan et al 2006).…”

Section: Nonrecombinant Hiv-1 Subtypes In Africamentioning

confidence: 99%

GenGIS: A geospatial information system for genomic data

Parks¹,

Porter²,

Churcher³

et al. 2009

Genome Res.

111

View full text Add to dashboard Cite

The increasing availability of genetic sequence data associated with explicit geographic and ecological information is offering new opportunities to study the processes that shape biodiversity. The generation and testing of hypotheses using these data sets requires effective tools for mathematical and visual analysis that can integrate digital maps, ecological data, and large genetic, genomic, or metagenomic data sets. GenGIS is a free and open-source software package that supports the integration of digital map data with genetic sequences and environmental information from multiple sample sites. Essential bioinformatic and statistical tools are integrated into the software, allowing the user a wide range of analysis options for their sequence data. Data visualizations are combined with the cartographic display to yield a clear view of the relationship between geography and genomic diversity, with a particular focus on the hierarchical clustering of sites based on their similarity or phylogenetic proximity. Here we outline the features of GenGIS and demonstrate its application to georeferenced microbial metagenomic, HIV-1, and human mitochondrial DNA data sets.

show abstract

Genomic Analysis and Geographic Visualization of the Spread of Avian Influenza (H5N1)

Cited by 64 publications

References 30 publications

Large-Scale Phylogenetic Analysis on Current HPC Architectures

Large-Scale Phylogenetic Analysis on Current HPC Architectures

The big questions for biodiversity informatics

GenGIS: A geospatial information system for genomic data

Contact Info

Product

Resources

About