How to Train good Word Embeddings for Biomedical NLP

Chiu, Billy; Crichton, Gamal K. O.; Korhonen, Anna; Pyysalo, Sampo

doi:10.18653/v1/w16-2922

Cited by 289 publications

(240 citation statements)

References 14 publications

Supporting

Mentioning

227

Contrasting

Order By: Relevance

“…However, we found that the majority of word similarity datasets fail to predict which representations will be successful in sequence labelling tasks, with only one intrinsic measure, SimLex-999, showing high correlation with extrinsic measures. In concurrent work, we have also observed a similar effect for biomedical domain tasks and word vectors (Chiu et al, 2016). We further considered the differentiation between relatedness (association) and similarity (synonymy) as an explanatory factor, noting that the majority of intrinsic evaluation datasets do not systematically make this distinction.…”

Section: Resultsmentioning

confidence: 66%

Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance

Chiu¹,

Korhonen²,

Pyysalo³

2016

Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

Self Cite

View full text Add to dashboard Cite

The quality of word representations is frequently assessed using correlation with human judgements of word similarity. Here, we question whether such intrinsic evaluation can predict the merits of the representations for downstream tasks. We study the correlation between results on ten word similarity benchmarks and tagger performance on three standard sequence labeling tasks using a variety of word vectors induced from an unannotated corpus of 3.8 billion words, and demonstrate that most intrinsic evaluations are poor predictors of downstream performance. We argue that this issue can be traced in part to a failure to distinguish specific similarity from relatedness in intrinsic evaluation datasets. We make our evaluation tools openly available to facilitate further study.

show abstract

Section: Resultsmentioning

confidence: 66%

Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance

Chiu¹,

Korhonen²,

Pyysalo³

2016

Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

Self Cite

View full text Add to dashboard Cite

show abstract

“…Before that, we count the frequency of occurrence of each word in the data-set, and use this word frequency to create a dictionary, then express each word in terms of frequency order of corresponding word (Kim, 2014). Next, we train word embeddding according to (Chiu et al, 2016), and at the same time download the trained embedding sets that have been trained 2 .…”

Section: System Descriptionmentioning

confidence: 99%

YNU_AI1799 at SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge of Different model ensemble

Liu¹,

Hongdou²,

Xaobing³

et al. 2018

Proceedings of the 12th International Workshop on Semantic Evaluation

View full text Add to dashboard Cite

In this paper, we describe a machine reading comprehension system that participated in SemEval-2018 Task 11: Machine Comprehension using commonsense knowledge. In this work, we train a series of neural network models such as multi-LSTM, BiLSTM, multiBiLSTM-CNN and attention-based BiLSTM, etc. On top of some sub models, there are two kinds of word embedding: (a) general word embedding generated from unsupervised neural language model; and (b) position embedding generated from general word embedding. Finally, we make a hard vote on the predictions of these models and achieve relatively good result. The proposed approach achieves 8th place in Task 11 with the accuracy of 0.7213.

show abstract

“…Chiu et. al [57]. The embedding for unknown words was initialized from a uniform (− , ) distribution, where was determined such that the unknown vectors have approximately the same variance as that of pre-trained data [58].…”

Section: Machine Learning Based Computational Modelsmentioning

confidence: 99%

Data-driven analysis of biomedical literature suggests broad-spectrum benevolence of culinary herbs and spices

Rakhi

Tuwani

Mukherjee

et al. 2018

Preprint

View full text Add to dashboard Cite

in ¶ These authors contributed equally to this work. AbstractSpices and herbs are key dietary ingredients used across cultures worldwide. Beyond their use as flavoring and coloring agents, the popularity of these aromatic plant products in culinary preparations has been attributed to their antimicrobial properties. Last few decades have witnessed an exponential growth of biomedical literature investigating the impact of spices and herbs on health, presenting an opportunity to mine for patterns from empirical evidence. Systematic investigation of empirical evidence to enumerate the health consequences of culinary herbs and spices can provide valuable insights into their therapeutic utility. We implemented a text mining protocol to assess the health impact of spices by assimilating, both, their positive and negative effects. We conclude that spices show broad-spectrum benevolence across a range of disease categories in contrast to negative effects that are comparatively narrow-spectrum. We also implement a strategy for disease-specific culinary recommendations of spices based on their therapeutic tradeoff against adverse effects. Further by integrating spice-phytochemical-disease associations, we identify bioactive spice phytochemicals potentially involved in their therapeutic effects. Our study provides a systems perspective on health effects of culinary spices and herbs with applications for dietary recommendations as well as identification of phytochemicals potentially involved in underlying molecular mechanisms. 2 Author SummarySpices and herbs are among the important ingredients in culinary preparations whose evolutionary utility is debatable. While their proximate function could be largely due to their flavor, the ultimate reason for their widespread incorporation in traditional recipes is hitherto not well understood. We implemented a computational framework for integrating spice-disease associations compiled from biomedical literature and evidence linking spice phytochemicals with diseases. By mining these tripartite data we highlight broad-spectrum therapeutic effects of spices to provide informed culinary recommendations against disease categories and seek for their potential molecular mechanisms. Through data-driven investigations, our study thus provides evidence-based applications of spices from culinary as well as medicinal perspectives.

show abstract

How to Train good Word Embeddings for Biomedical NLP

Cited by 289 publications

References 14 publications

Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance

Intrinsic Evaluation of Word Vectors Fails to Predict Extrinsic Performance

YNU_AI1799 at SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge of Different model ensemble

Data-driven analysis of biomedical literature suggests broad-spectrum benevolence of culinary herbs and spices

Contact Info

Product

Resources

About