Understanding and interpreting the inherently uncertain nature of complex biological systems, as well as the time to an event in these systems, are notable challenges in the field of bioinformatics. Overcoming these challenges could potentially lead to scientific discoveries, for example paving the path for the design of new drugs to target specific diseases such as cancer, or helping to apply more effective treatment for these diseases. In general, reverse
engineering of these types of biological systems using online datasets is difficult. In particular, finding a unique solution to these systems is hard due to their complexity and the small sample size of datasets. This remains an unsolved problem due to such uncertainty, and the often intractable solution space of these systems.
The term"uncertainty" describes the application-based margin of significance, validity, and efficiency of inferred or predictive models in their ability to extract characteristic properties and features describing the observed state of a given biological system. In this work, uncertainties within two specific bioinformatics domains are considered, namely "gene regulatory network reconstruction" (in which gene interactions/relationships within a biological entity
are inferred from gene expression data); and "cancer survivorship prediction" (in which patient survival rates are predicted based on clinical factors and treatment outcomes). One approach to reduce uncertainty is to apply different constraints that have particular relevance to each application domain. In gene network reconstruction for instance, the consideration of constraints such as sparsity, stability and modularity, can informand reduce uncertainty in the inferred reconstructions. While in cancer survival prediction, there is uncertainty in determining which clinical features (or feature aggregates) can improve associated prediction models. The inherent lack of understanding of how, why and when such constraints should be applied, however, prompts the need for a radically new approach.
In this dissertation, a new approach is thus considered to aid human expert users in understanding and exploring inherent uncertainties associated with these two bioinformatics domains. Specifically, a novel set of tools is introduced and developed to assist in evidence gathering, constraint definition, and refinement of models toward the discovery of better solutions. This dissertation employs computational approaches, including convex optimization and feature selection/aggregation, in order to increase the chances of finding a unique solution. These approaches are realized through three novel interactive tools that employ tangible interaction in combination with graphical visualization to enable experts to query and manipulate the data. Tangible interaction provides physical embodiments of data and computational functions in support of learning and collaboration. Using these approaches, the dissertation demonstrates: (1) a modified stability constraint for reconstructing gene regulatory network that shows improvement in accuracy of predicted networks, (2) a novel modularity constraint (neighbor norm) for extracting available structures in the data which is validated with Laplacian eigenvalue spectrum, and (3) a hybrid method for estimating overall survival and inferring effective prognosis factors for patients with advanced prostate cancer that improves the accuracy of survival analysis.