Phenotypic differences among species have long been systematically itemized and described by biologists in the process of investigating phylogenetic relationships and trait evolution. Traditionally, these descriptions have been expressed in natural language within the context of individual journal publications or monographs. Thus, this rich store of phenotype data has been largely unavailable for statistical and computational comparisons across studies or integration with other biological knowledge.Here we describe Phenex, a platform-independent desktop application designed to facilitate efficient and consistent annotation of phenotypic similarities and differences using Entity-Quality syntax, drawing on terms from community ontologies for anatomical entities, phenotypic qualities, and taxonomic names. Phenex can be be configured to load ontologies for different taxonomic groups. The graphical user interface was developed for, and tested by, evolutionary biologists accustomed to working with lists of taxa, characters, character states, and character-by-taxon matrices.Annotation of phenotypic data using ontologies and globally unique taxonomic identifiers will allow biologists to better leverage decades of work in systematics and comparative morphology and contribute to an ever more useful web of linked biological data.
The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in non-computer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 1051 variable characters scored for 639 taxa resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived by machine reasoning over the formal semantics of the source ontologies. Inferred data reduced the missing data in the variable character-subset from 98.5% to 78.2%. Machine reasoning also enables the isolation of conflicts in the data, i.e., cells where both presence and absence are indicated; reports regarding conflicting data provenance can be generated automatically. Further, reasoning enables quantification and new visualizations of the data, here for example, allowing identification of character space that has been undersampled across the fin to limb transition. The approach and methods demonstrated here to compute synthetic presence/absence supermatrices are applicable to any taxonomic and phenotypic slice across the tree of life, providing the data are semantically annotated. Because such data can also be linked to model organism genetics through computational scoring of phenotypic similarity, they open a rich set of future research questions into phenotype to genome relationships.
15The reality of larger and larger molecular databases and the need to integrate 16 data scalably have presented a major challenge for the use of phenotypic data.
The 14th annual Bioinformatics Open Source Conference (BOSC) was held in Berlin in July 2013, bringing together over 100 bioinformatics researchers, developers and users of open source software. Since its inception in 2000, BOSC has been organised as a Special Interest Group (SIG) satellite meeting preceding the large International Conference on Intelligent Systems for Molecular Biology (ISMB), which is the annual meeting of the International Society for Computational Biology (ISCB). BOSC provides bioinformatics developers with a forum for communicating the results of their latest efforts to the wider research community, and a focused environment for developers and users to interact and share ideas about standards, software development practices, and practical techniques for solving bioinformatics problems. As in previous years, BOSC 2013 was preceded by a Codefest, a two day hackathon that brings together bioinformatics open source project developers and members of the community and allows them to work collaboratively and achieve greater interoperability between tools developed by different groups. The session topics at BOSC 2013 included several that have been popular in previous years, including Cloud and Parallel Computing, Visualization, Software Interoperability, Genome-scale Data Management, and a session for updates on ongoing open source projects, as well as two new sessions: Translational Bioinformatics, recognizing the growing use of computational biology in medical applications, and Open Science and Reproducible Research. Open Science, a movement dedicated to making all aspects of scientific knowledge production freely available for reuse and extension, not only validates published results by allowing others to reproduce them, but also accelerates the pace of scientific discovery by enabling researchers to more efficiently build on previous work, rather than having to reinvent tools and reassemble data sets. BOSC typically features two keynote talks by researchers who are influential in some aspect of open source bioinformatics. Our first keynote talk this year was by Cameron Neylon, the Advocacy Director for the Public Library of Science (PLOS), who is a prominent advocate for open science. He discussed the cultural issues that are hindering open science, and how openness in scientific collaborations can generate impact. Our second keynote speaker, Sean Eddy, who is perhaps best known as the author of the HMMER software suite, began his keynote talk with an inspiring history of how he got involved in bioinformatics and proceeded to argue that dedicating effort to thorough engineering in tool development, which is often shunned as incremental, can become the key to creating a lasting impact. With the increasing reliance of more and more fields of biology on computational tools to manage and analyze their data, BOSC is well positioned to stay relevant to life science, and thus life scientists, for many years to come.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.