Enhancing biomedical word embeddings by retrofitting to verb clusters

Chiu, Billy; Baker, Simon; Palmer, Martha; Korhonen, Anna

doi:10.18653/v1/w19-5014

Cited by 4 publications

(3 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For the HOC task, Recall benefited substantially from the retrofitting process, whereas for the CEA task both Precision and Recall improved slightly compared to the baseline. The reason behind the difference is likely because the HOC dataset contains classes that are very sparse (with only a small number of examples), and therefore recall would increase more substantially for these classes at the cost of precision; this has also been observed in prior work with the HOC task [56,63,64].…”

Section: Discussionmentioning

confidence: 79%

See 1 more Smart Citation

BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine

et al. 2021

Self Cite

View full text Add to dashboard Cite

Background Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames. Results We demonstrate the utility of the new resource in boosting model performance in document- and sentence-level classification in biomedicine. We apply an established retrofitting method to harness the verb class membership knowledge from BioVerbNet and transform a pretrained word embedding space by pulling together verbs belonging to the same semantic-syntactic class. The BioVerbNet knowledge-aware embeddings surpass the non-specialised baseline by a significant margin on both tasks. Conclusion This work introduces the first large, annotated semantic-syntactic classification of biomedical verbs, providing a detailed account of the annotation process, the key differences in verb behaviour between the general and biomedical domain, and the design choices made to accurately capture the meaning and properties of verbs used in biomedical texts. The demonstrated benefits of leveraging BioVerbNet in text classification suggest the resource could help systems better tackle challenging NLP tasks in biomedicine.

show abstract

Section: Discussionmentioning

confidence: 79%

“…The objective of this evaluation is to apply a standard retrofitting method to change the vector-space of the pretrained word embeddings to better capture the semantics represented by the BioVerbNet classes [56]. We apply retrofitting to our pretrained embeddings (we use the embeddings pre-trained by Chiu et al [57]).…”

Section: Discussionmentioning

confidence: 99%

BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…Jha et al [13] leveraged the rich taxonomic knowledge in the biomedical domain to transformed input embeddings into a new space where they are both interpretable and retain their original expressive features. Chiu et al [14] proposed a efficient method to align pretrained embeddings according to semantic verb clusters. Faruqui et al [15] proposed a corpus-based approach that can be used to build semantic lexicons for specific categories.…”

mentioning

confidence: 99%

Refining electronic medical records representation in manifold subspace

et al. 2022

View full text Add to dashboard Cite

Background Electronic medical records (EMR) contain detailed information about patient health. Developing an effective representation model is of great significance for the downstream applications of EMR. However, processing data directly is difficult because EMR data has such characteristics as incompleteness, unstructure and redundancy. Therefore, preprocess of the original data is the key step of EMR data mining. The classic distributed word representations ignore the geometric feature of the word vectors for the representation of EMR data, which often underestimate the similarities between similar words and overestimate the similarities between distant words. This results in word similarity obtained from embedding models being inconsistent with human judgment and much valuable medical information being lost. Results In this study, we propose a biomedical word embedding framework based on manifold subspace. Our proposed model first obtains the word vector representations of the EMR data, and then re-embeds the word vector in the manifold subspace. We develop an efficient optimization algorithm with neighborhood preserving embedding based on manifold optimization. To verify the algorithm presented in this study, we perform experiments on intrinsic evaluation and external classification tasks, and the experimental results demonstrate its advantages over other baseline methods. Conclusions Manifold learning subspace embedding can enhance the representation of distributed word representations in electronic medical record texts. Reduce the difficulty for researchers to process unstructured electronic medical record text data, which has certain biomedical research value.

show abstract

Word embedding for mixed-emotions analysis

MohammadiBaghmolaei

Ahmadi

2022

J Intell Inf Syst

View full text Add to dashboard Cite

Enhancing biomedical word embeddings by retrofitting to verb clusters

Cited by 4 publications

References 23 publications

BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine

BioVerbNet: a large semantic-syntactic classification of verbs in biomedicine

Refining electronic medical records representation in manifold subspace

Word embedding for mixed-emotions analysis

Contact Info

Product

Resources

About