BackgroundWith the rapid development of new psychoactive substances (NPS) and changes in the use of more traditional drugs, it is increasingly difficult for researchers and public health practitioners to keep up with emerging drugs and drug terms. Substance use surveys and diagnostic tools need to be able to ask about substances using the terms that drug users themselves are likely to be using. Analyses of social media may offer new ways for researchers to uncover and track changes in drug terms in near real time. This study describes the initial results from an innovative collaboration between substance use epidemiologists and linguistic scientists employing techniques from the field of natural language processing to examine drug-related terms in a sample of tweets from the United States.ObjectiveThe objective of this study was to assess the feasibility of using distributed word-vector embeddings trained on social media data to uncover previously unknown (to researchers) drug terms.MethodsIn this pilot study, we trained a continuous bag of words (CBOW) model of distributed word-vector embeddings on a Twitter dataset collected during July 2016 (roughly 884.2 million tokens). We queried the trained word embeddings for terms with high cosine similarity (a proxy for semantic relatedness) to well-known slang terms for marijuana to produce a list of candidate terms likely to function as slang terms for this substance. This candidate list was then compared with an expert-generated list of marijuana terms to assess the accuracy and efficacy of using word-vector embeddings to search for novel drug terminology.ResultsThe method described here produced a list of 200 candidate terms for the target substance (marijuana). Of these 200 candidates, 115 were determined to in fact relate to marijuana (65 terms for the substance itself, 50 terms related to paraphernalia). This included 30 terms which were used to refer to the target substance in the corpus yet did not appear on the expert-generated list and were therefore considered to be successful cases of uncovering novel drug terminology. Several of these novel terms appear to have been introduced as recently as 1 or 2 months before the corpus time slice used to train the word embeddings.ConclusionsThough the precision of the method described here is low enough as to still necessitate human review of any candidate term lists generated in such a manner, the fact that this process was able to detect 30 novel terms for the target substance based only on one month’s worth of Twitter data is highly promising. We see this pilot study as an important proof of concept and a first step toward producing a fully automated drug term discovery system capable of tracking emerging NPS terms in real time.
This chapter provides an overview of the status of the world’s endangered languages, based primarily on data from the Catalogue of Endangered Languages. Difficulties in identifying and enumerating endangered languages and obstacles to assessing linguistic vitality on a large scale are discussed. Statistical overviews are provided of language endangerment by global region, comparing trends in language endangerment across the world. The availability (or widespread absence) of the kinds of data necessary to assess language endangerment is examined, and we encourage linguists to include these types of data in their field reports and other published work. Finally, widely circulated statistics of language endangerment and death are considered.
This paper provides an acoustic phonetic description of Hawai‘i English vowels. The data comprise wordlist tokens produced by twenty-three speakers (twelve males and eleven females) and spontaneous speech tokens produced by ten of those speakers. Analysis of these vowel tokens shows that while there are similarities between Hawai‘i English and other dialects, the particular combination of vowel realizations in Hawai‘i English is unique to this dialect. Additionally, there are characteristics of the Hawai‘i English vowel system that are not found in other English dialects. These findings suggest that Hawai‘i English is a unique regional variety that warrants further description.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.