Background: Identifying relevant research in an ever-growing body of published literature is becoming increasingly difficult. Establishing domain-specific knowledge bases may be a more effective and efficient way to manage and query information within specific biomedical fields. Adopting controlled vocabulary is a critical step toward data integration and interoperability in any information system. We present an open source infrastructure that provides a powerful capacity for managing and mining data within a domain-specific knowledge base. As a practical application of our infrastructure, we presented two applications -Literature Finder and Investigator Browseras well as a tool set for automating the data curating process for the human genome published literature database. The design of this infrastructure makes the system potentially extensible to other data sources.