Sofie Labat scite author profile

Sofie Labat

4Publications

3Citation Statements Received

47Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks

Hadifar¹,

Labat²,

Hoste³

et al. 2021

View full text Add to dashboard Cite

In online domain-specific customer service applications, many companies struggle to deploy advanced NLP models successfully, due to the limited availability of and noise in their datasets. While prior research demonstrated the potential of migrating large open-domain pretrained models for domain-specific tasks, the appropriate (pre)training strategies have not yet been rigorously evaluated in such social media customer service settings, especially under multilingual conditions. We address this gap by (i) collecting a multilingual social media corpus containing customer service conversations (865k tweets), (ii) comparing various pipelines of pretraining and finetuning approaches, (iii) applying them on 5 different end tasks. We show that pretraining a generic multilingual transformer model on our in-domain dataset, before finetuning on specific end tasks, consistently boosts performance, especially in non-English settings. 1

show abstract

A Classification-Based Approach to Cognate Detection Combining Orthographic and Semantic Similarity Information

Labat¹,

Lefever

2019

View full text Add to dashboard Cite

This paper presents proof-of-concept experiments for combining orthographic and semantic information to distinguish cognates from non-cognates. To this end, a context-independent gold standard is developed by manually labelling English-Dutch pairs of cognates and false friends in bilingual term lists. These annotated cognate pairs are then used to train and evaluate a supervised binary classification system for the automatic detection of cognates. Two types of information sources are incorporated in the classifier: fifteen string similarity metrics capture form similarity between source and target words, while word embeddings model semantic similarity between the words. The experimental results show that even though the system already achieves good results by only incorporating orthographic information, the performance further improves by including semantic information in the form of embeddings.

show abstract

LT3 at SemEval-2020 Task 7: Comparing Feature-Based and Transformer-Based Approaches to Detect Funny Headlines

Vanroy¹,

Labat²,

Kaminska³

et al. 2020

View full text Add to dashboard Cite

This paper presents two different systems for the SemEval shared task 7 on Assessing Humor in Edited News Headlines, sub-task 1, where the aim was to estimate the intensity of humor generated in edited headlines. Our first system is a feature-based machine learning system that combines different types of information (e.g. word embeddings, string similarity, part-of-speech tags, perplexity scores, named entity recognition) in a Nu Support Vector Regressor (NuSVR). The second system is a deep learning-based approach that uses the pre-trained language model RoBERTa to learn latent features in the news headlines that are useful to predict the funniness of each headline. The latter system was also our final submission to the competition and is ranked seventh among the 49 participating teams, with a root-mean-square error (RMSE) of 0.5253.

show abstract

A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks

Hadifar¹,

Labat²,

Hoste³

et al. 2021

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sofie Labat

A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks

A Classification-Based Approach to Cognate Detection Combining Orthographic and Semantic Similarity Information

LT3 at SemEval-2020 Task 7: Comparing Feature-Based and Transformer-Based Approaches to Detect Funny Headlines

A Million Tweets Are Worth a Few Points: Tuning Transformers for Customer Service Tasks

Contact Info

Product

Resources

About