Hoyun Song scite author profile

Hoyun Song

4Publications

4Citation Statements Received

73Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

A Large-scale Comprehensive Abusiveness Detection Dataset with Multifaceted Labels from Reddit

Song¹,

Ryu²,

Lee³

et al. 2021

View full text Add to dashboard Cite

As users in online communities suffer from severe side effects of abusive language, many researchers attempted to detect abusive texts from social media, presenting several datasets for such detection. However, none of them contain both comprehensive labels and contextual information, which are essential for thoroughly detecting all kinds of abusiveness from texts, since datasets with such fine-grained features demand a significant amount of annotations, leading to much increased complexity. In this paper, we propose a Comprehensive Abusiveness Detection Dataset (CADD), collected from the English Reddit posts, with multifaceted labels and contexts. Our dataset is annotated hierarchically for an efficient annotation through crowdsourcing on a large-scale. We also empirically explore the characteristics of our dataset and provide a detailed analysis for novel insights. The results of our experiments with strong pre-trained natural language understanding models on our dataset show that our dataset gives rise to meaningful performance, assuring its practicality for abusive language detection.

show abstract

ELF22: A Context-based Counter Trolling Dataset to Combat Internet Trolls

Lee¹,

NA²,

Song³

et al. 2022

Preprint

View full text Add to dashboard Cite

Online trolls increase social costs and cause psychological damage to individuals. With the proliferation of automated accounts making use of bots for trolling, it is difficult for targeted individual users to handle the situation both quantitatively and qualitatively. To address this issue, we focus on automating the method to counter trolls, as counter responses to combat trolls encourage community users to maintain ongoing discussion without compromising freedom of expression. For this purpose, we propose a novel dataset for automatic counter response generation. In particular, we constructed a pair-wise dataset that includes troll comments and counter responses with labeled response strategies, which enables models fine-tuned on our dataset to generate responses by varying counter responses according to the specified strategy. We conducted three tasks to assess the effectiveness of our dataset and evaluated the results through both automatic and human evaluation. In human evaluation, we demonstrate that the model fine-tuned on our dataset shows a significantly improved performance in strategy-controlled sentence generation.

show abstract

Detecting Implicitly Abusive Language by Applying Out-of-Distribution Problem

Shin¹,

Song²,

Park³

2022

JOK

View full text Add to dashboard Cite

A Simple and Flexible Modeling for Mental Disorder Detection by Learning from Clinical Questionnaires

Song¹,

Shin²,

Lee³

et al. 2023

View full text Add to dashboard Cite

Social media is one of the most highly sought resources for analyzing characteristics of the language by its users. In particular, many researchers utilized various linguistic features of mental health problems from social media. However, existing approaches to detecting mental disorders face critical challenges, such as the scarcity of high-quality data or the trade-off between addressing the complexity of models and presenting interpretable results grounded in expert domain knowledge. To address these challenges, we design a simple but flexible model that preserves domain-based interpretability. We propose a novel approach that captures the semantic meanings directly from the text and compares them to symptom-related descriptions. Experimental results demonstrate that our model outperforms relevant baselines on various mental disorder detection tasks. Our detailed analysis shows that the proposed model is effective at leveraging domain knowledge, transferable to other mental disorders, and providing interpretable detection results.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hoyun Song

A Large-scale Comprehensive Abusiveness Detection Dataset with Multifaceted Labels from Reddit

ELF22: A Context-based Counter Trolling Dataset to Combat Internet Trolls

Detecting Implicitly Abusive Language by Applying Out-of-Distribution Problem

A Simple and Flexible Modeling for Mental Disorder Detection by Learning from Clinical Questionnaires

Contact Info

Product

Resources

About