The organization of scientific papers typically follows a standardized pattern, the well-known IMRaD structure (introduction, methods, results, and discussion). Using the full text of 45,000 papers published in the PLoS series of journals as a case study, this paper investigates, from the viewpoint of bibliometrics, how references are distributed along the structure of scientific papers as well as the age of these cited references. Once the sections of articles are realigned to follow the IMRaD sequence, the position of cited references along the text of articles is invariant across all PLoS journals, with the introduction and discussion accounting for most of the references. It also provides evidence that the age of cited references varies by section, with older references being found in the methods and more recent references in the discussion. These results provide insight into the different roles citations have in the scholarly communication process.
Using the full-text corpus of more than 75,000 research articles published by seven PLOS journals, this paper proposes a natural language processing approach for identifying the function of citations. Citation contexts are assigned based on the frequency of n-gram co-occurrences located near the citations. Results show that the most frequent linguistic patterns found in the citation contexts of papers vary according to their location in the IMRaD structure of scientific articles. The presence of negative citations is also dependent on this structure. This methodology offers new perspectives to locate these discursive forms according to the rhetorical structure of scientific articles, and will lead to a better understanding of the use of citations in scientific articles.
If you would like to write for this, or any other Emerald publication, then please use our Emerald for Authors service information about how to choose which publication to write for and submission guidelines are available for all. Please visit www.emeraldinsight.com/authors for more information. About Emerald www.emeraldinsight.comEmerald is a global publisher linking research and practice to the benefit of society. The company manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as well as providing an extensive range of online products and additional customer resources and services.Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive preservation. AbstractPurpose -Scientific abstracts reproduce only part of the information and the complexity of argumentation in a scientific article. The purpose of this paper provides a first analysis of the similarity between the text of scientific abstracts and the body of articles, using sentences as the basic textual unit. It contributes to the understanding of the structure of abstracts. Design/methodology/approach -Using sentence-based similarity metrics, the authors quantify the phenomenon of text re-use in abstracts and examine the positions of the sentences that are similar to sentences in abstracts in the introduction, methods, results and discussion structure, using a corpus of over 85,000 research articles published in the seven Public Library of Science journals. Findings -The authors provide evidence that 84 percent of abstract have at least one sentence in common with the body of the paper. Studying the distributions of sentences in the body of the articles that are re-used in abstracts, the authors show that there exists a strong relation between the rhetorical structure of articles and the zones that authors re-use when writing abstracts, with sentences mainly coming from the beginning of the introduction and the end of the conclusion. Originality/value -Scientific abstracts contain what is considered by the author(s) as information that best describe documents' content. This is a first study that examines the relation between the contents of abstracts and the rhetorical structure of scientific articles. The work might provide new insight for improving automatic abstracting tools as well as information retrieval approaches, in which text organization and structure are important features.
The role of preprints in the scientific production and their part in citations have been growing over the past 10 years. In this paper we study preprint citations in several different aspects: the progression of preprint citations over time, their relative frequencies in relation to the IMRaD structure of articles, their distributions over time, per preprint database and per PLOS journal. We have processed the PLOS corpus that covers 7 journals and a total of about 240,000 articles up to January 2021, and produced a dataset of 8460 preprint citation contexts that cite 12 different preprint databases. Our results show that preprint citations are found with the highest frequency in the Method section of articles, though small variations exist with respect to journals. The PLOS Computational Biology journal stands out as it contains more than three times more preprint citations than any other PLOS journal. The relative parts of the different preprint databases are also examined. While ArXiv and bioRxiv are the most frequent citation sources, bioRxiv’s disciplinary nature can be observed as it is the source of more than 70% of preprint citations in PLOS Biology, PLOS Genetics and PLOS Pathogens. We have also compared the lexical content of preprint citation contexts to the citation content to peer-reviewed publications. Finally, by performing a lexicometric analysis, we have shown that preprint citation contexts differ significantly from citation contexts of peer-reviewed publications. This confirms that authors make use of different lexical content when citing preprints compared to the rest of citations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.