Toxic comments in online platforms are an unavoidable social issue under the cloak of anonymity. Hate speech detection has been actively done for languages such as English, German, or Italian, where manually labeled corpus has been released. In this work, we first present 9.4K manually labeled entertainment news comments for identifying Korean toxic speech, collected from a widely used online news platform in Korea. The comments are annotated regarding social bias and hate speech since both aspects are correlated. The inter-annotator agreement Krippendorff's alpha score is 0.492 and 0.496, respectively. We provide benchmarks using CharCNN, BiL-STM, and BERT, where BERT achieves the highest score on all tasks. The models generally display better performance on bias identification, since the hate speech detection is a more subjective issue. Additionally, when BERT is trained with bias label for hate speech detection, the prediction score increases, implying that bias and hate are intertwined. We make our dataset publicly available and open competitions with the corpus and benchmarks.
In this paper we consider tests for nonlinear time series, which are motivated by the notion of serial dependence. The proposed tests are based on comparisons with the quantile spectral density, which can be considered as a quantile version of the usual spectral density function. The quantile spectral density 'measures' the sequential dependence structure of a time series, and is well defined under relatively weak mixing conditions. We propose an estimator for the quantile spectral density and derive its asympototic sampling properties. We use the quantile spectral density to construct a goodness of fit test for time series and explain how this test can also be used for comparing the sequential dependence structure of two time series. The asymptotic sampling properties of the test statistic is derived under the null and an alternative. Furthermore, a bootstrap procedure it proposed to obtain a finite sample approximation. The method is illustrated with simulations and some real data examples.
This study analyzes the political slants of user comments on Korean partisan media. We built a BERT-based classifier to detect political leaning of short comments via the use of semiunsupervised deep learning methods that produced an F1 score of 0.83. As a result of classifying 21.6K comments, we found the high presence of conservative bias on both conservative and liberal news outlets. Moreover, this study discloses an asymmetry across the partisan spectrum in that more liberals (48.0%) than conservatives (23.6%) comment not only on news stories resonating with their political perspectives but also on those challenging their viewpoints. These findings advance the current understanding of online echo chambers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.