Improving Slot Filling by Utilizing Contextual Information

Veyseh, Amir Pouran Ben; Dernoncourt, Franck; Nguyen, Thien Huu

doi:10.18653/v1/2020.nlp4convai-1.11

Cited by 3 publications

(1 citation statement)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…3 ) that consists of knowledge acquisition, knowledge representation, knowledge incorporation and data-driven ML model layers. In the knowledge acquisition layer, multi-source domain knowledge can be extracted through an information filter [ 72 ] or approaches based on natural language-processing technologies such as entity extraction [ 73 ], relation extraction [ 74 ] and entity–relation extraction [ 75 ]. Then, the knowledge representation layer represents the extracted knowledge in the form of feature importance [ 76 ], relation rules [ 77 ], a physics model [ 78 ] or a knowledge graph [ 79 ].…”

Section: A Synergistic Data Quantity Governance Flow With Incorporati...mentioning

confidence: 99%

Data quantity governance for machine learning in materials science

Liu

Yang

Zou

et al. 2023

National Science Review

View full text Add to dashboard Cite

Data-driven machine learning is widely employed in the analysis of materials structure-activity relationship, performance optimization and materials design due to its superior ability to reveal latent data patterns and make accurate prediction. However, because of the laborious process of materials data acquisition, machine learning models encounter the issue of the mismatch between high dimension of feature space and small sample size (for traditional machine learning models) or the mismatch between model parameters and sample size (for deep learning models), usually resulting in terrible performance. Here, we review the efforts for tackling this issue via feature reduction, sample augmentation, and specific machine learning approaches and show that the balance between the number of samples and features or model parameters should attract great attention during data quantity governance. Following this, we propose a synergistic data quantity governance flow with incorporation of materials domain knowledge. After summarizing the approaches to incorporating materials domain knowledge into the process of machine learning, we provide examples of incorporating domain knowledge into governance schemes to demonstrate the advantages of the approach and applications. The work paves the way for obtaining the required high-quality data to accelerate the materials design and discovery based on machine learning.

show abstract

Section: A Synergistic Data Quantity Governance Flow With Incorporati...mentioning

confidence: 99%

Data quantity governance for machine learning in materials science

Liu

Yang

Zou

et al. 2023

National Science Review

View full text Add to dashboard Cite

show abstract

Task Conditioned BERT for Joint Intent Detection and Slot-Filling

Tavares,

Azevedo,

Semedo

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation

Veyseh

Dernoncourt

Tran

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Acronyms are the short forms of phrases that facilitate conveying lengthy sentences in documents and serve as one of the mainstays of writing. Due to their importance, identifying acronyms and corresponding phrases (i.e., acronym identification (AI)) and finding the correct meaning of each acronym (i.e., acronym disambiguation (AD)) are crucial for text understanding. Despite the recent progress on this task, there are some limitations in the existing datasets which hinder further improvement. More specifically, limited size of manually annotated AI datasets or noises in the automatically created acronym identification datasets obstruct designing advanced highperforming acronym identification models. Moreover, the existing datasets are mostly limited to the medical domain and ignore other domains. In order to address these two limitations, we first create a manually annotated large AI dataset for scientific domain. This dataset contains 17,506 sentences which is substantially larger than previous scientific AI datasets. Next, we prepare an AD dataset for scientific domain with 62,441 samples which is significantly larger than previous scientific AD dataset. Our experiments show that the existing state-of-the-art models fall far behind human-level performance on both datasets proposed by this work. In addition, we propose a new deep learning model which utilizes the syntactical structure of the sentence to expand an ambiguous acronym in a sentence. The proposed model outperforms the state-of-the-art models on the new AD dataset, providing a strong baseline for future research on this dataset 1 . IntroductionAcronyms are shortened forms of a longer phrase. As a running example, in the sentence "The main key performance indicator, herein referred to as KPI, is the E2E throughput" there are two acronyms KPI and E2E. Also, the acronym KPI refers to the phrase key performance indicator (a.k.a. the long form of the acronym KPI). In written language, acronyms are prevalent in technical documents that helps to avoid the repetition of long and cumbersome terms, thus saving text space. For instance, about 15% of PubMed queries include abbreviations, and about 14.8% of all tokens in a clinical note dataset are abbreviations (Islamaj Dogan et al., 2009;Xu et al., 2007;Jin et al., 2019).Considering the widespread use of acronyms in texts, a text processing application, such as question answering or document retrieval, should be able to correctly process the acronyms in the text and find their meanings. To this end, two sub-tasks should be solved: Acronym Identification (AI): to find the acronyms and the phrases that have been abbreviated by the acronyms in the document. In the running example, the acronyms KPI and E2E and the phrase key performance indicator should be extracted. Acronym Disambiguation (AD): to find the right meaning for a given acronym in text. In the running example, the systems should be able to find the right meanings of the two acronyms KPI and E2E. Note that while the meaning of KPI is found in the senten...

show abstract

Improving Slot Filling by Utilizing Contextual Information

Cited by 3 publications

References 17 publications

Data quantity governance for machine learning in materials science

Data quantity governance for machine learning in materials science

Task Conditioned BERT for Joint Intent Detection and Slot-Filling

What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation

Contact Info

Product

Resources

About