Objectives
Using agile software development practices, develop and evaluate an architecture and implementation for reliable and user-friendly self-service management of bioinformatic data stored in the cloud.
Materials and methods
Comprehensive Oncology Research Environment (CORE) Browser is a new open-source web application for cancer researchers to manage sequencing data organized in a flexible format in Amazon Simple Storage Service (S3) buckets. It has a microservices- and hypermedia-based architecture, which we integrated with Test-Driven Development (TDD), the iterative writing of computable specifications for how software should work prior to development. Relying on repeating patterns found in hypermedia-based architectures, we hypothesized that hypermedia would permit developing test “templates” that can be parameterized and executed for each microservice, maximizing code coverage while minimizing effort.
Results
After one-and-a-half years of development, the CORE Browser backend had 121 test templates and 875 custom tests that were parameterized and executed 3031 times, providing 78% code coverage.
Discussion
Architecting to permit test reuse through a hypermedia approach was a key success factor for our testing efforts. CORE Browser’s application of hypermedia and TDD illustrates one way to integrate software engineering methods into data-intensive networked applications. Separating bioinformatic data management from analysis distinguishes this platform from others in bioinformatics and may provide stable data management while permitting analysis methods to advance more rapidly.
Conclusion
Software engineering practices are underutilized in informatics. Similar informatics projects will more likely succeed through application of good architecture and automated testing. Our approach is broadly applicable to data management tools involving cloud data storage.