Semantic heterogeneity is becoming increasingly prominent in bioinformatics domains that deal with constantly expanding, dynamic, often very large, datasets from various distributed sources. Metadata is the key component for effective information integration. Traditional approaches for reconciling semantic heterogeneity use standards or mediation-based methods. These approaches have had limited success in addressing the general semantic heterogeneity problem and by themselves are not likely to succeed in bioinformatics domains where one faces the additional complexity of keeping pace with the speed at which data and semantic heterogeneity is being generated. This paper presents a methodology for reconciliation of semantic heterogeneity of metadata in bioinformatics data sources. The approach is based on the proposition that by globally monitoring, clustering, and visualizing bioinformatics metadata across disparately created data sources, patterns of practice can be identified. This can facilitate semantic reconciliation of metadata in current data and mitigate semantic heterogeneity in future data by promoting sharing and reuse of existing metadata. To instantiate the methodology, a research architecture, MicroSEEDS, is presented and its implementation and envisioned uses are discussed.
The multiple genome sequence alignment problem falls in the domain of problems that can be parallelized to address large sequence lengths. Although there is communication required for the computation of the aligned sequences, the proper distribution can reduce the overall problem to a set of tasks to be solved independently and then merged. A parallel algorithm for the alignment of multiple genome sequences is described. The algorithm is experimentally evaluated in a distributed Grid environment that provides very scalable and low cost computation performance. The Grid environment is evaluated with respect to a traditional cluster environment and results are compared to evaluate the effectiveness of a Grid environment for large computational biology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.