Mestrado em Tecnologias de Informação aplicadas às Ciências Biológicas e Médicas 2009Resumo A correcta compreensão de como funcionam os sistemas biológicos depende do estudo dos mecanismos que regulam a expressão genética. Estes mecanismos controlam em que momento e durante quanto tempo é utilizada a informação codificada num gene, e podem actuar em diversas etapas do processo de expressão genética. No presente trabalho, a etapa em análise é a transcrição, na qual a sequência de ADN de um gene é transformada numa sequência de ARN, que posteriormente dará origem a uma proteína.A regulação da transcrição centra-se na acção de uma classe de proteínas reguladoras denominadas factores de transcrição. Estes ligam-se à cadeia de ADN na região próxima do início de um gene (a região promotora), potenciando ou inibindo a ligação da proteína responsável pelo processo de transcrição.
AbstractThe understanding of biological systems is dependent on the study of the mechanisms that regulate gene expression. These mechanisms control when and for how long the information coded in a gene is used, and can act on several of the steps in the gene expression process. In the present work, the step of interest is the transcription, where the DNA sequence of a gene is transformed into an RNA sequence, which will later be used to synthesise a protein.The knowledge about gene regulations is mainly available in the literature.Although there are currently multiple public biological databases, the majority of those contain data on biological entities but not explicitly on gene regulations.In order to provide the scientific community with data on Saccharomyces cerevisiae transcription regulations, a Portuguese public repository maintained by manual curation of scientific literature, named Yeastract, was created.Due to the increasing amount of papers published nowadays, the development of automatic tools that can help the curation process is of great importance. In the specific case of Yeastract, a tool was needed to help in the identification of papers describing gene regulations of S. cerevisiae. This tool was created with two components: one that identifies transcription factors in the papers' abstracts and verifies if they describe gene regulations; the other that evaluates if the hypothetical regulations the paper contains correspond to valid regulations from a biological point of view. This second component was named GREAT, Gene Regulation EvAluation Tool, and is the goal of my work.The tool I developed uses data obtained exclusively from public biological databases to validate the regulations. That data is used in the evaluation of three aspects:the participation of a gene and a transcription factor in the same biological process; the existence of the transcription factor binding motif in the gene promoter region; the experimental method with which the regulation was identified. The output of these features is used by a machine learning method, either regression or model trees, to calculate a confidence score to attribute to each putat...