COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval

Zhang, Xinliang Frederick; Sun, Heming; Yue, Xiang; Jesrani, Emmett; Lin, Simon; Sun, Huan

doi:10.18653/v1/2021.emnlp-main.305

Cited by 10 publications

(5 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Traditional security threats have prompted significant exploration into areas such as membership inference attacks (Shi et al, 2023b), backdoor attacks (Shi et al, 2023a;Xu et al, 2023), andothers (Wan et al, 2023;Shi et al, 2024). (Wang et al, 2023a;Huang et al, 2023c;Bi et al, 2023). A multitude of studies have extensively examined the trustworthiness of LLMs including the alignment (Wang et al, 2023b;Liu et al, 2023a), truthfulness (e.g., misinformation (Huang and Sun, 2023;Chen and Shu, 2023b,a) and hallucination (Xu et al, 2024;Tonmoy et al, 2024;Huang et al, 2023a;), accountability (He et al, 2024;, and fairness (Wang et al, 2023a;Huang et al, 2023c;Bi et al, 2023).…”

Section: Related Workmentioning

confidence: 99%

“…(Wang et al, 2023a;Huang et al, 2023c;Bi et al, 2023). A multitude of studies have extensively examined the trustworthiness of LLMs including the alignment (Wang et al, 2023b;Liu et al, 2023a), truthfulness (e.g., misinformation (Huang and Sun, 2023;Chen and Shu, 2023b,a) and hallucination (Xu et al, 2024;Tonmoy et al, 2024;Huang et al, 2023a;), accountability (He et al, 2024;, and fairness (Wang et al, 2023a;Huang et al, 2023c;Bi et al, 2023).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

The prevalence and characteristics of metabolic syndrome according to different definitions in China: a nationwide cross-sectional study, 2012–2015

et al. 2022

View full text Add to dashboard Cite

Background Metabolic syndrome (MetS) is characterized by a cluster of signs of metabolic disturbance and has caused a huge burden on the health system. The study aims to explore the prevalence and characteristics of MetS defined by different criteria in the Chinese population. Methods Using the data of the China Hypertension Survey (CHS), a nationally representative cross-sectional study from October 2012 to December 2015, a total of 28,717 participants aged 35 years and above were included in the analysis. The MetS definitions of the International Diabetes Federation (IDF), the updated US National Cholesterol Education Program Adult Treatment Panel III (the revised ATP III), and the Joint Committee for Developing Chinese Guidelines (JCDCG) on Prevention and Treatment of Dyslipidemia in Adults were used. Multivariable logistic regression was used to identify factors associated with MetS. Results The prevalence of MetS diagnosed according to the definitions of IDF, the revised ATP III, and JCCDS was 26.4%, 32.3%, and 21.5%, respectively. The MetS prevalence in men was lower than in women by IDF definition (22.2% vs. 30.3%) and by the revised ATP III definition (29.2% vs. 35.4%), but the opposite was true by JCDCG (24.4%vs 18.5%) definition. The consistency between the three definitions for men and the revised ATP III definition and IDF definition for women was relatively good, with kappa values ranging from 0.77 to 0.89, but the consistency between the JCDCG definition and IDF definition (kappa = 0.58) and revised ATP III definition (kappa = 0.58) was poor. Multivariable logistic regression showed that although the impact and correlation intensity varied with gender and definition, area, age, education, smoking, alcohol use, and family history of cardiovascular disease were factors related to MetS. Conclusions The prevalence and characteristics of the MetS vary with the definition used in the Chinese population. The three MetS definitions are more consistent in men but relatively poor in women. On the other hand, even if estimated according to the definition of the lowest prevalence, MetS is common in China.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

The prevalence and characteristics of metabolic syndrome according to different definitions in China: a nationwide cross-sectional study, 2012–2015

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Downstream Task Dataset Size. While the downstream task datasets may seem small, recent high-quality manually annotated datasets had similar sizes-COUGH dataset(1236 labeled sentences) (Zhang et al, 2021) and YASO dataset (2215 labeled sentences) (Orbach et al, 2021). Thus, the current size is comparable to the contemporaries.…”

Section: Xlnet5gmentioning

confidence: 98%

SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis

Karim,

Mubasshir,

Rahman

et al. 2023

Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)

View full text Add to dashboard Cite

5G is the 5 th generation state-of-the-art cellular network protocol designed to connect virtually everyone and everything with increased speed and reduced latency. Therefore, its development, analysis, and security are critical. However, all approaches to the 5G protocol development and security analysis, e.g., property extraction, protocol summarization, and semantic analysis of the protocol specifications and implementations are completely manual. To reduce such manual efforts, in this paper, we curate SPEC5G-the first-ever public 5G dataset for NLP research. The dataset contains 3,547,587 sentences with 134M words, from 13094 cellular network specifications and 13 online websites. By leveraging large-scale pre-trained language models that have achieved state-of-the-art results on NLP tasks, we use this dataset for security-related text classification and summarization. Security-related text classification can be used to extract relevant security-related properties for protocol testing. On the other hand, summarization can help developers and practitioners understand the highlevel idea of the protocol, which is itself a daunting task. To ensure the research community can benefit from this work, all the datasets and accompanying codebase are made publicly available 1 .

show abstract

“…COUGH This is another open English dataset [30] constructed by scraping data from 55 websites (like, CDC and WHO) containing user queries and FAQs about Covid-19.…”

Section: Datasetsmentioning

confidence: 99%

MFBE: Leveraging Multi-Field Information of FAQs for Efficient Dense Retrieval

Banerjee¹,

Jain²,

Kulkarni³

2023

Preprint

View full text Add to dashboard Cite

In the domain of question-answering in NLP, the retrieval of Frequently Asked Questions (FAQ) is an important sub-area which is well researched and has been worked upon for many languages. Here, in response to a user query, a retrieval system typically returns the relevant FAQs from a knowledge-base. The efficacy of such a system depends on its ability to establish semantic match between the query and the FAQs in real-time. The task becomes challenging due to the inherent lexical gap between queries and FAQs, lack of sufficient context in FAQ titles, scarcity of labeled data and high retrieval latency. In this work, we propose a bi-encoder-based query-FAQ matching model that leverages multiple combinations of FAQ fields (like, question, answer, and category) both during model training and inference. Our proposed Multi-Field Bi-Encoder (MFBE) model benefits from the additional context resulting from multiple FAQ fields and performs well even with minimal labeled data. We empirically support this claim through experiments on proprietary as well as open-source public datasets in both unsupervised and supervised settings. Our model achieves around 27% and 20% better top-1 accuracy for the FAQ retrieval task on internal and open datasets, respectively over the best performing baseline.

show abstract

COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval

Cited by 10 publications

References 19 publications

The prevalence and characteristics of metabolic syndrome according to different definitions in China: a nationwide cross-sectional study, 2012–2015

The prevalence and characteristics of metabolic syndrome according to different definitions in China: a nationwide cross-sectional study, 2012–2015

SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis

MFBE: Leveraging Multi-Field Information of FAQs for Efficient Dense Retrieval

Contact Info

Product

Resources

About