We report on two recent medium-scale initiatives annotating present day English corpora for animacy distinctions. We discuss the relevance of animacy for computational linguistics, specifically generation, the annotation categories used in the two studies and the interannotator reliability for one of the studies.
To analyse the words and expressions used in peer reviews of manuscripts that were later published as original research in the BMJ. Secondary aims were to estimate the differences in net sentiment between peer review reports on manuscripts subject to one or more rounds of peer review and and review reports on initially rejected manuscripts that were accepted after appeal. This observational study included all peer review reports published in the BMJ from September 2014 until the end of 2017. The study analysed the frequency of specific words in peer review reports for accepted manuscripts, identifying the most commonly occurring positive and negative words and their context, as well as the most common expressions. It also quantified differences in net sentiment in peer review reports between manuscripts accepted after appeal and manuscript accepted without appeal. The dataset consisting of 1716 peer review reports contained 908,932 word tokens. Among the most frequent positive words were "well", "important", "clear", "while the negative words included "risk", "bias", and "confounding". The areas where the reviewer makes the most positive and negative comments included: "well-written paper", "well-written manuscript", "this is an important topic", "answers an important question", "high risk of bias" and "selection bias". The sentiment analysis revealed that manuscripts accepted after appeal had lower scores on review reports for joy and positive sentiment, in addition to having higher scores for negative words expressing sadness, fear, disgust and anger compared with manuscripts that were not initially rejected. Peer review comments were mainly related to methodology rather than the actual results. Peer review reports on initially rejected manuscripts were more negative and more often included expressions related to a high risk of bias.
This article introduces the Clinton Email Corpus, comprising 33,000 recently released email messages sent to and from Hillary Clinton during her tenure as United States Secretary of State, and presents the results of a first investigation into the effect of status and gender on politeness-related linguistic choices within the corpus, based on a sample of 500 emails. We describe the composition of the corpus and mention the technical challenges inherent in its creation, and then present the 500-email subset, in which all messages are categorized according to sender and recipient gender, position in the workplace hierarchy, and personal closeness to Clinton. The analysis looks at the most frequent bigrams in each of these subsets as a starting point for the identification of linguistic differences. We find that the main differences relate to the content and function of the messages rather than their tone. Individuals lower in the hierarchy but not in Clinton's inner circle are more often engaged in practical tasks, while members of the inner circle primarily discuss issues and use email to arrange in-person conversations. Clinton herself is generally found to engage neither in extensive politeness nor in overt displays of power. These findings present further evidence of how corpus linguistics can be used to advance our understanding of workplace pragmatics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.