Scientific collaboration is essential in solving problems and breeding innovation. Coauthor network analysis has been utilized to study scholars' collaborations for a long time, but these studies have not simultaneously taken different collaboration features into consideration. In this paper, we present a systematic approach to analyze the differences in possibilities that two authors will cooperate as seen from the effects of homophily, transitivity, and preferential attachment. Exponential random graph models (ERGMs) are applied in this research. We find that different types of publications one author has written play diverse roles in his/her collaborations. An author's tendency to form new collaborations with her/his coauthors' collaborators is strong, where the more coauthors one author had before, the more new collaborators he/she will attract. We demonstrate that considering the authors' attributes and homophily effects as well as the transitivity and preferential attachment effects of the coauthorship network in which they are embedded helps us gain a comprehensive understanding of scientific collaboration.
PubMed® is an essential resource for the medical domain, but useful concepts are either difficult to extract or are ambiguous, which has significantly hindered knowledge discovery. To address this issue, we constructed a PubMed knowledge graph (PKG) by extracting bio-entities from 29 million PubMed abstracts, disambiguating author names, integrating funding data through the National Institutes of Health (NIH) ExPORTER, collecting affiliation history and educational background of authors from ORCID®, and identifying fine-grained affiliation data from MapAffil. Through the integration of these credible multi-source data, we could create connections among the bio-entities, authors, articles, affiliations, and funding. Data validation revealed that the BioBERT deep learning method of bio-entity extraction significantly outperformed the state-of-the-art models based on the F1 score (by 0.51%), with the author name disambiguation (AND) achieving an F1 score of 98.09%. PKG can trigger broader innovations, not only enabling us to measure scholarly impact, knowledge usage, and knowledge transfer, but also assisting us in profiling authors and organizations based on their connections with bio-entities.
The introduction and establishment of nonindigenous species (NIS) through global ship movements poses a significant threat to marine ecosystems and economies. While ballastvectored invasions have been partly addressed by some national policies and an international agreement regulating the concentrations of organisms in ballast water, biofouling-vectored invasions remain largely unaddressed. Development of additional efficient and costeffective ship-borne NIS policies requires an accurate estimation of NIS spread risk from both ballast water and biofouling. We demonstrate that the first-order Markovian assumption limits accurate modeling of NIS spread risks through the global shipping network. In contrast, we show that higher-order patterns provide more accurate NIS spread risk estimates by revealing indirect pathways of NIS transfer using Species Flow Higher-Order Networks (SF-HON). Using the largest available datasets of non-indigenous species for Europe and the United States, we then compare SF-HON model predictions against those from networks that consider only first-order connections and those that consider all possible indirect connections without consideration of their significance. We show that not only SF-HONs yield more accurate NIS spread risk predictions, but there are important differences in NIS spread via the ballast and biofouling vectors. Our work provides information that policymakers can use to develop more efficient and targeted prevention strategies for ship-borne NIS spread management, especially as management of biofouling is of increasing concern.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.