With huge amounts of biomedical data being generated day by day extracting statistical information about the chemicals mentioned in such huge databases manually is tedious and time consuming. Our system is mainly designed for naive users, which aims to automate data collection and knowledge extraction from chemical literature in a user friendly and efficient way on the hadoop platform. The system downloads the abstracts related to the disease of interest from Pubmed database. The text of the abstracts is then extensively parsed for chemicals such as protein/gene names and chemical compound names and classified into different classes. This analysis would prove to be helpful in various biomedical and pharmaceutical industries. The extraction of important information will be done using the Ling Pipe API wherein a training dataset is given to this Ling Pipe which classifies the extracted bioentities in the respective classes. The system being deployed on hadoop platform provides a scalable and distributed system which processes huge number of abstracts in a short time and with high efficiency. The system also provides a user friendly user interface for easy use of the hadoop system for non technical users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.