The biomedical literature holds our understanding of pharmacogenomics, but it is dispersed across many journals. In order to integrate our knowledge, connect important facts across publications and generate new hypotheses we must organize and encode the contents of the literature. By creating databases of structured pharmocogenomic knowledge, we can make the value of the literature much greater than the sum of the individual reports. We can, for example, generate candidate gene lists or interpret surprising hits in genome-wide association studies. Text mining automatically adds structure to the unstructured knowledge embedded in millions of publications, and recent years have seen a surge in work on biomedical text mining, some specific to pharmacogenomics literature. These methods enable extraction of specific types of information and can also provide answers to general, systemic queries. In this article, we describe the main tasks of text mining in the context of pharmacogenomics, summarize recent applications and anticipate the next phase of text mining applications.
KeywordsBioNLP; classification; curation; data mining; gene-drug relationships; information extraction; information retrieval; machine learning; natural language processing; NLP; pharmacogenetics; pharmacogenomics; text mining After several decades of pharmacogenomics research, it is clear that the overall pharmacologic effects of medications are typically not monogenic traits, but are determined by the interactions among several genes encoding proteins involved in numerous pathways [1]. Polygenic determinants of drug response are often difficult to elucidate in clinical studies; however, recently functional genomics and high-throughput screening methods have been providing powerful new tools to reveal these interactions. To uncover the relationships between biological systems and drug response, pharmacogenomic researchers must † Author for correspondence: russ.altman@stanford.edu.For reprint orders, please contact: reprints@futuremedicine.com
Financial & competing interests disclosureThe authors acknowledge support from NIH LM07033 (Yael Garten), GM61374 (Yael Garten, Adrien Coulet and Russ B Altman), LM05652 (Russ B Altman) and the National Center for Biomedical Computing (NCBC) National Institute of Health roadmap initiative; NIH grant U54 HG004028 (Adrien Coulet). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed. No writing assistance was utilized in the production of this manuscript.
NIH Public Access
Author ManuscriptPharmacogenomics. Author manuscript; available in PMC 2011 August 1.
NIH-PA Author ManuscriptNIH-PA Author Manuscript NIH-PA Author Manuscript assimilate knowledge from a multitude of disciplines, on levels ranging from genomic, molecular, cellular, tissue, organ and organismic.Therefore, researchers need the ability to que...