Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity in Web Corpora

Santini, Marina; Strandqvist, Wiktor; Nyström, Mikael; Alirezai, Marjan; Jönsson, Arne

doi:10.1007/978-3-319-99133-7_17

Cited by 2 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Essentially, they allow for an evaluation of the quality of a domain-specific web corpus and can also be used to pre-assess the portability of NLP tools from one domain-specific corpus to a different corpus belonging to another domain. Similar experiments have also been carried out on Swedish corpora with much the same results (Santini et al, 2018), showing that our approach may become a language-independent standardized step in corpus evaluation practice (intrinsic evaluation metrics).…”

Section: Discussionmentioning

confidence: 58%

See 1 more Smart Citation

Designing an Extensible Domain-Specific Web Corpus for “Layfication”

Santini

Jönsson

Strandqvist

et al. 2019

Advances in Systems Analysis, Software Engineering, and High Performance Computing

Self Cite

View full text Add to dashboard Cite

In the era of data-driven science, corpus-based language technology is an essential part of cyber physical systems. In this chapter, the authors describe the design and the development of an extensible domain-specific web corpus to be used in a distributed social application for the care of the elderly at home. The domain of interest is the medical field of chronic diseases. The corpus is conceived as a flexible and extensible textual resource, where additional documents and additional languages will be appended over time. The main purpose of the corpus is to be used for building and training language technology applications for the “layfication” of the specialized medical jargon. “Layfication” refers to the automatic identification of more intuitive linguistic expressions that can help laypeople (e.g., patients, family caregivers, and home care aides) understand medical terms, which often appear opaque. Exploratory experiments are presented and discussed.

show abstract

Section: Discussionmentioning

confidence: 58%

“…In this experiment, we evaluate how good the performance of the eCare term extractor is to bootstrap a web corpus based on the domain of the use cases. We measure the domainhood (or domain-specificity) against a reference corpus representing general language (see also Santini et al, 2018).…”

Section: Extrinsic Evaluation: Assessing Domainhoodmentioning

confidence: 99%

Designing an Extensible Domain-Specific Web Corpus for “Layfication”

Santini

Jönsson

Strandqvist

et al. 2019

Advances in Systems Analysis, Software Engineering, and High Performance Computing

Self Cite

View full text Add to dashboard Cite

show abstract

“…Since each corpus varies in email counts and email lengths, relative term frequencies are used. 20 While relative term frequencies control for corpus size, the scaling can introduce distortions, which complicate statistical tests. For a robustness measure, we sample emails from the larger corpus (Org-2) until the total term count equals the Org-1 and use absolute term frequency.…”

Section: Experiments Datasets and Their Characterizationmentioning

confidence: 99%

Transfer learning meets sales engagement email classification: Evaluation, analysis, and strategies

Liu¹,

Dmitriev²,

Huang³

et al. 2020

Concurrency and Computation

View full text Add to dashboard Cite

Enterprise email classification in the sales engagement platform is a challenge due to its evolving asynchronous conversational context during the sales process and differences across industries and organizations. This is further exacerbated by the limited amount of labeled emails due to security and privacy constraints. The leaderboard success of using pretrained language models (LMs) such as BERT and various transfer learning techniques promises a paradigm shift to natural language processing, yet the recipe for applying high performance transfer learning (HPTL) in practical applications remains unclear. This article investigates applying HPTL to sales engagement email classification through a series of experiments and analysis. The experiment datasets include two different organizations' emails. The contribution of this paper is 4-fold: (a) analysis and characterization of the email corpora from different organizations; (b) identification of the best combinations of pre-trained LMs under different modeling architectures; (c) study of the impact and trade-off of limited labeled data on the model accuracy and training time; and (d) characterization and study of the impact of different orgs' datasets on the model accuracy. Our results showed that a practical winning recipe that uses BERT-finetuning with as few as 500 labeled training examples can consistently outperform significantly with reasonable training time among all models evaluated. K E Y W O R D S cross-org transfer learning, domain shift, email intent classification, pre-trained language model, sales engagement, transfer learning 1 INTRODUCTION Sales are one of the oldest professions on earth. 1 Until very recently, a typical sales representative (sales rep) got a list of names (leads, or prospects) and manually went through the list one by one calling and emailing the prospects. The rise of Sales Engagement Platforms (SEPs) such as Outreach, SalesLoft, InsideSales, Groove, and Apollo has rapidly changed this state of affairs, leading to large improvements in rep performance. SEPs encode a company's sales process into a sequence of steps consisting of emails, phone calls, LinkedIn messages, and other tasks. Different sequences are usedfor different types of prospects, market segments, and so forth. SEP then ensures consistent execution of these sequences, completely automating some sales tasks (eg, auto-sending personalized emails and LinkedIn messages), while scheduling and reminding the rep when it is the right time to do the manual tasks (eg, phone call, custom manual email). As a result, every rep can simultaneously perform one-on-one personalized outreach to up to 10× more prospects than before.

show abstract

Can We Quantify Domainhood? Exploring Measures to Assess Domain-Specificity in Web Corpora

Cited by 2 publications

References 14 publications

Designing an Extensible Domain-Specific Web Corpus for “Layfication”

Designing an Extensible Domain-Specific Web Corpus for “Layfication”

Transfer learning meets sales engagement email classification: Evaluation, analysis, and strategies

Contact Info

Product

Resources

About