Abstract-To identify liver cancer related genes and understand their interactions, we introduce a method based on literature mining. The genes are extracted with the disease-gene association classifier based on Bayesian network, and then the interaction network between the corresponding proteins is built. This paper has extracted 464 genes which are related to the liver cancer, and found genes such as p53, VEGF, TNF, AKT are hub proteins, which play important roles in the network. The KEGG pathway analysis shows that 19 enriched pathways may participate in liver carcinogenesis. Network analysis and pathway analysis implies the complexity of the occurrence, progression and prognosis of liver cancer.Index Terms-Bayesian classifier, genes and pathways analysis, liver cancer, literature mining.
I. INTRODUCTIONIn the past ten years, with the development of the biomedical science technology, the number of the biomedical literature has been growing exponentially. How to extract the needed knowledge and to obtain new knowledge quickly and efficiently from massive articles has become a very important research area. The interaction between diseases and related proteins is one of the main research directions which mean a lot to disease prevention, diagnosis, therapy and designing biomedical experiment and so on [1].Liver cancer is one of the most common malignant tumors in China. It is reported that the number of people who die of liver cancer is about 110,000 every year and is 55% of the global patients. What's worse, the early symptoms of liver cancer are not apparent and the condition of patients deteriorates quickly. On average, patients can only live about 6 months after the cancer was diagnosed. Therefore, with all these mentioned above, research on liver cancer is necessary.So far, the existing study has been focusing on the interaction between individual protein and liver cancer. The present problem is how to integrate the experiment information gained and researched to help analyze molecular interactions, pathways and their influence on the occurrence, progression as well as prognosis of liver cancer.A method of identifying disease-gene associations based on literature mining which was applied in prostate cancer has genes related to diseases as prior knowledge, and the extracted genes must appear with at least one of seed genes in the same sentence, which leads to the limit of that method. Manuscript received January 24, 2013; revised March 24, 2013. Engineering, Soochow University, P. R. China (e-mail: liuxuan1220@ yahoo.cn, zouwei_198107@163.com, jjwang@suda.edu.cn).In order to solve this problem, we try to extract genes related to liver cancer without priors from the massive literatures and automatically build the interaction network between the corresponding proteins.
II. METHODS
A. Search for Literatures on Liver CancerBy searching the key words and free words of liver cancer in MeSH database of NCBI and collecting aliases of liver cancer from the reference [2] to conclude 14 aliases of liver cancer, such ...