A bsfractThe goal of this project was to develop tools to facilitate data transformations between heterogeneous data sources found throughout biomedical applications. Such transformations are necessary when sharing data between different groups worlung on related problems as well as when querying data spread over different databases, files and software analysis packages.We summarize progress made during the term of this grant in the development of the Kleisli query system. Kleisli implements a high level query language called the Collection Programming anguage (CPL) and contains drivers to access many types of databases found throughout the genomic community, including: relational databases (Oracle and Sybase); ASN.l; BLAST and FASTA; Shore (and potentially other objectoriented databases); EcoCyc (a Lisp-based metabolic pathways system); SRS; Medline; MMDB; OPWCTL; US and IBM patent databases. The query system is based on a complex model of data, which provides a natural encoding of data in this domain, and uses a number of optimization strategies and sophisticated parallel execution strategies to improve the performace of queries. The system has been used for applications with the Center fdr Bioinformatics at the University of Pennsylvania, and within projects at SmithKline Beecham. It also forms the basis of the Tambis system developed at the University of Mancheste, UK.We then describe a re-implementation of Kleisli called K2. K2 is implemented in Java (Kleisli was implemented in the rapid prototyping language ML), and allows the overlay on an ontology to aid integration. The report closes with some preliminary observations on the use of XML for data integration.