The CIPRES Science Gateway is a community web application that provides public access to a set of parallel tree inference and multiple sequence alignment codes run on large computational resources. These resources are made available at no charge to users by the NSF Extreme Science and Engineering Discovery Environment (XSEDE) project. Here we describe the CIPRES RESTful application programmer interface (CRA), a web service that provides programmatic access to all resources and services currently offered by the CIPRES Science Gateway. Software developers can use the CRA to extend their web or desktop applications to include the ability to run MrBayes, BEAST, RAxML, MAFFT, and other computationally intensive algorithms on XSEDE. The CRA also makes it possible for individuals with modest scripting skills to access the same tools from the command line using curl, or through any scripting language. This report describes the CRA and its use in three web applications (Influenza Research Database – www.fludb.org, Virus Pathogen Resource – www.viprbrc.org, and MorphoBank – www.morphobank.org). The CRA is freely accessible to registered users at https://cipresrest.sdsc.edu/cipresrest/v1; supporting documentation and registration tools are available at https://www.phylo.org/restusers.
Advancement of understanding in paleontology and biology has always been hindered by difficulty in accessing comparative data. With current and burgeoning technology, the severity of this hindrance can be substantially reduced. Researchers and museum personnel generating three-dimensional (3-D) digital models of museum specimens can archive them using internet repositories that can then be explored and utilized by other researchers and private individuals without a museum trip. We focus on MorphoSource, the largest web archive for 3-D museum data at present. We describe the site, how to use it most effectively in its current form, and best practices for file formats and metadata inclusion to aid the growing community wishing to utilize it for distributing 3-D digital data. The potential rewards of successfully crowd sourcing the digitization of museum collections from the research community are great, as it should ensure rapid availability of the most important datasets. Challenges include long-term governance (i.e., maintaining site functionality, supporting large amounts of digital storage, and monitoring/updating file to prevent bit rot, which is the slow and random corruption of electronic data over time, and data format obsolescence, which is the problem of data becoming unreadable or ineffective because of the loss of functional software necessary for access), and utilization by the community (i.e., detecting and minimizing user error in creating data records, incentivizing data sharing by researchers and institutions alike, and protecting stakeholder rights to data, while maximizing accessibility and discoverability).MorphoSource serves as a proof-of-concept of how these kinds of challenges can be met. Accordingly, it is generally recognized as the most appropriate repository for large, raw datasets of fossil organisms and/or comparative samples. Its existence has begun to transform data transparency standards because journal reviewers, editors, and grant officers now often suggest or require that 3-D data be made available through this site.
A highly interoperable informatics infrastructure rapidly emerged to handle genomic data used for phylogenetics and was instrumental in the growth of molecular systematics. Parallel growth in software and databases to address needs peculiar to phylophenomics has been relatively slow and fragmented. Systematists currently face the challenge that Earth may hold tens of millions of species (living and fossil) to be described and classified. Grappling with research on this scale has increasingly resulted in work by teams, many constructing large phenomic supermatrices. Until now, phylogeneticists have managed data in single-user, filebased desktop software wholly unsuitable for real-time, team-based collaborative work. Furthermore, phenomic data often differ from genomic data in readily lending themselves to media representation (e.g. 2D and 3D images, video, sound). Phenomic data are a growing component of phylogenetics, and thus teams require the ability to record homology hypotheses using media and to share and archive these data. Here we describe MorphoBank, a web application and database leveraging software as a service methodology compatible with ''cloud'' computing technology for the construction of matrices of phenomic data. In its tenth year, and fully available to the scientific community at-large since inception, MorphoBank enables interactive collaboration not possible with desktop software, permitting self-assembling teams to develop matrices, in real time, with linked media in a secure web environment. MorphoBank also provides any user with tools to build character and media ontologies (rule sets) within matrices, and to display these as directed acyclic graphs. These rule sets record the phylogenetic interrelatedness of characters (e.g. if X is absent, Y is inapplicable, or X-Z characters share a media view). MorphoBank has enabled an order of magnitude increase in phylophenomic data collection: a recent collaboration by more than 25 researchers has produced a database of > 4500 phenomic characters supported by > 10 000 media.
The phenotype represents a critical interface between the genome and the environment in which organisms live and evolve. Phenotypic characters also are a rich source of biodiversity data for tree building, and they enable scientists to reconstruct the evolutionary history of organisms, including most fossil taxa, for which genetic data are unavailable. Therefore, phenotypic data are necessary for building a comprehensive Tree of Life. In contrast to recent advances in molecular sequencing, which has become faster and cheaper through recent technological advances, phenotypic data collection remains often prohibitively slow and expensive. The next-generation phenomics project is a collaborative, multidisciplinary effort to leverage advances in image analysis, crowdsourcing, and natural language processing to develop and implement novel approaches for discovering and scoring the phenome, the collection of phentotypic characters for a species. This research represents a new approach to data collection that has the potential to transform phylogenetics research and to enable rapid advances in constructing the Tree of Life. Our goal is to assemble large phenomic datasets built using new methods and to provide the public and scientific community with tools for phenomic data assembly that will enable rapid and automated study of phenotypes across the Tree of Life.
Scientists building the Tree of Life face an overwhelming challenge to categorize phenotypes (e.g., anatomy, physiology) from millions of living and fossil species. This biodiversity challenge far outstrips the capacities of trained scientific experts. Here we explore whether crowdsourcing can be used to collect matrix data on a large scale with the participation of nonexpert students, or "citizen scientists." Crowdsourcing, or data collection by nonexperts, frequently via the internet, has enabled scientists to tackle some large-scale data collection challenges too massive for individuals or scientific teams alone. The quality of work by nonexpert crowds is, however, often questioned and little data have been collected on how such crowds perform on complex tasks such as phylogenetic character coding. We studied a crowd of over 600 nonexperts and found that they could use images to identify anatomical similarity (hypotheses of homology) with an average accuracy of 82% compared with scores provided by experts in the field. This performance pattern held across the Tree of Life, from protists to vertebrates. We introduce a procedure that predicts the difficulty of each character and that can be used to assign harder characters to experts and easier characters to a nonexpert crowd for scoring. We test this procedure in a controlled experiment comparing crowd scores to those of experts and show that crowds can produce matrices with over 90% of cells scored correctly while reducing the number of cells to be scored by experts by 50%. Preparation time, including image collection and processing, for a crowdsourcing experiment is significant, and does not currently save time of scientific experts overall. However, if innovations in automation or robotics can reduce such effort, then large-scale implementation of our method could greatly increase the collective scientific knowledge of species phenotypes for phylogenetic tree building. For the field of crowdsourcing, we provide a rare study with ground truth, or an experimental control that many studies lack, and contribute new methods on how to coordinate the work of experts and nonexperts. We show that there are important instances in which crowd consensus is not a good proxy for correctness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.