Document search on PubMed, the pre-eminent database for biomedical literature, relies on the annotation of its documents with relevant terms from the Medical Subject Headings ontology (MeSH) for improving recall through query expansion. Patent documents are another important information source, though they are considerably less accessible. One option to expand patent search beyond pure keywords is the inclusion of classification information: Since every patent is assigned at least one class code, it should be possible for these assignments to be automatically used in a similar way as the MeSH annotations in PubMed. In order to develop a system for this task, it is necessary to have a good understanding of the properties of both classification systems. This report describes our comparative analysis of MeSH and the main patent classification system, the International Patent Classification (IPC). We investigate the hierarchical structures as well as the properties of the terms/classes respectively, and we compare the assignment of IPC codes to patents with the annotation of PubMed documents with MeSH terms.Our analysis shows a strong structural similarity of the hierarchies, but significant differences of terms and annotations. The low number of IPC class assignments and the lack of occurrences of class labels in patent texts imply that current patent search is severely limited. To overcome these limits, we evaluate a method for the automated assignment of additional classes to patent documents, and we propose a system for guided patent search based on the use of class co-occurrence information and external resources.
The rapidly growing wealth of published scientific work, produced by researchers and scholars, has resulted in a pressing need for more effective processes towards reviewing scientific articles and research data, organizing data journals, as well as for improved tools and techniques for bibliographic analysis and management of scientometrics. The ongoing EU research project OpenScienceLink aims to address these needs, as well as offer a wide range of opportunities for better collaboration between researchers, by introducing a web-based Platform which offers efficient and intelligent applications and services for exploiting open access scientific information in the biomedical domain. The Platform is empowered by the semantic and social networking capabilities of three leading edge background infrastructures, which have been adapted and integrated for the scope of the project. In this paper, we present the five pilot services that are provided by the OpenScienceLink project. All five services are integrated into the web-based OpenScienceLink platform that is publicly accessible at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.