Motivation: Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processing (NLP), extracting valuable information from biomedical literature has gained popularity among researchers, and deep learning has boosted the development of effective biomedical text mining models. However, directly applying the advancements in NLP to biomedical text mining often yields unsatisfactory results due to a word distribution shift from general domain corpora to biomedical corpora. In this article, we investigate how the recently introduced pre-trained language model BERT can be adapted for biomedical corpora. Results: We introduce BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora. With almost the same architecture across tasks, BioBERT largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre-trained on biomedical corpora. While BERT obtains performance comparable to that of previous state-of-the-art models, BioBERT significantly outperforms them on the following three representative biomedical text mining tasks: biomedical named entity recognition (0.62% F1 score improvement), biomedical relation extraction (2.80% F1 score improvement) and biomedical question answering (12.24% MRR improvement). Our analysis results show that pre-training BERT on biomedical corpora helps it to understand complex biomedical texts. Availability and implementation: We make the pre-trained weights of BioBERT freely available at https://github. com/naver/biobert-pretrained, and the source code for fine-tuning
Forages are usually inoculated with homofermentative and facultative heterofermentative lactic acid bacteria (LAB) to enhance lactic acid fermentation of forages, but effects of such inoculants on silage quality and the performance of dairy cows are unclear. Therefore, we conducted a meta-analysis to examine the effects of LAB inoculation on silage quality and preservation and the performance of dairy cows. A second objective was to examine the factors affecting the response to silage inoculation with LAB. The studies that met the selection criteria included 130 articles that examined the effects of LAB inoculation on silage quality and 31 articles that investigated dairy cow performance responses. The magnitude of the effect (effect size) was evaluated using raw mean differences (RMD) between inoculated and uninoculated treatments. Heterogeneity was explored by meta-regression and subgroup analysis using forage type, LAB species, LAB application rate, and silo scale (laboratory or farm-scale) as covariates for the silage quality response and forage type, LAB species, diet type [total mixed ration (TMR) or non-TMR], and the level of milk yield of the control cows as covariates for the performance responses. Inoculation with LAB (≥10 cfu/g as fed) markedly increased silage fermentation and dry matter recovery in temperate and tropical grasses, alfalfa, and other legumes. However, inoculation did not improve the fermentation of corn, sorghum, or sugarcane silages. Inoculation with LAB reduced clostridia and mold growth, butyric acid production, and ammonia-nitrogen in all silages, but it had no effect on aerobic stability. Silage inoculation (≥10 cfu/g as fed) increased milk yield and the response had low heterogeneity. However, inoculation had no effect on diet digestibility and feed efficiency. Inoculation with LAB improved the fermentation of grass and legume silages and the performance of dairy cows but did not affect the fermentation of corn, sorghum, and sugar cane silages or the aerobic stability of any silage. Further research is needed to elucidate how silage inoculated with homofermentative and facultative heterofermentative LAB improves the performance of dairy cows.
PubMed® is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID®, and identifying fine-grained affiliation data from MapAffil. Through the integration of these credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving an F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities.
The amount of biomedical literature is vast and growing quickly, and accurate text mining techniques could help researchers to efficiently extract useful information from the literature. However, existing named entity recognition models used by text mining tools such as tmTool and ezTag are not effective enough, and cannot accurately discover new entities. Also, the traditional text mining tools do not consider overlapping entities, which are frequently observed in multi-type named entity recognition results. We propose a neural biomedical named entity recognition and multi-type normalization tool called BERN. The BERN uses high-performance BioBERT named entity recognition models which recognize known entities and discover new entities. Also, probability-based decision rules are developed to identify the types of overlapping entities. Furthermore, various named entity normalization models are integrated into BERN for assigning a distinct identifier to each recognized entity. The BERN provides a Web service for tagging entities in PubMed articles or raw text. Researchers can use the BERN Web service for their text mining tasks, such as new named entity discovery, information retrieval, question answering, and relation extraction. The application programming interfaces and demonstrations of BERN are publicly available at https://bern.korea.ac.kr.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.