Jan Šnajder scite author profile

Jan Šnajder

5Publications

294Citation Statements Received

70Citation Statements Given

How they've been cited

412

286

How they cite others

117

Affiliations

Brno University of Technology, University of Zagreb

Publications

Order By: Most citations

Cross-Domain Detection of Abusive Language Online

Karan¹,

Šnajder²

2018

View full text Add to dashboard Cite

We investigate to what extent the models trained to detect general abusive language generalize between different datasets labeled with different abusive language types. To this end, we compare the cross-domain performance of simple classification models on nine different datasets, finding that the models fail to generalize to out-domain datasets and that having at least some in-domain data is important. We also show that using the frustratingly simple domain adaptation (Daume III, 2007) in most cases improves the results over indomain training, especially when used to augment a smaller dataset with a larger one.

show abstract

Back up your Stance: Recognizing Arguments in Online Discussions

Boltużić

Šnajder

2014

129

View full text Add to dashboard Cite

In online discussions, users often back up their stance with arguments. Their arguments are often vague, implicit, and poorly worded, yet they provide valuable insights into reasons underpinning users' opinions. In this paper, we make a first step towards argument-based opinion mining from online discussions and introduce a new task of argument recognition. We match usercreated comments to a set of predefined topic-based arguments, which can be either attacked or supported in the comment. We present a manually-annotated corpus for argument recognition in online discussions. We describe a supervised model based on comment-argument similarity and entailment features. Depending on problem formulation, model performance ranges from 70.5% to 81.8% F1-score, and decreases only marginally when applied to an unseen topic.

show abstract

Reddit: A Gold Mine for Personality Prediction

Gjurković¹,

Šnajder²

2018

View full text Add to dashboard Cite

Automated personality prediction from social media is gaining increasing attention in natural language processing and social sciences communities. However, due to high labeling costs and privacy issues, the few publicly available datasets are of limited size and low topic diversity. We address this problem by introducing a large-scale dataset derived from Reddit, a source so far overlooked for personality prediction. The dataset is labeled with Myers-Briggs Type Indicators (MBTI) and comes with a rich set of features for more than 9k users. We carry out a preliminary feature analysis, revealing marked differences between the MBTI dimensions and poles. Furthermore, we use the dataset to train and evaluate benchmark personality prediction models, achieving macro F1-scores between 67% and 82% on the individual dimensions and 82% accuracy for exact or one-off accurate type prediction. These results are encouraging and comparable with the reliability of standardized tests.

show abstract

Constructing Coherent Event Hierarchies from News Stories

Glavaš

Šnajder

2014

View full text Add to dashboard Cite

News describe real-world events of varying granularity, and recognition of internal structure of events is important for automated reasoning over events. We propose an approach for constructing coherent event hierarchies from news by enforcing document-level coherence over pairwise decisions of spatiotemporal containment. Evaluation on a news corpus annotated with event hierarchies shows that enforcing global spatiotemporal coreference of events leads to significant improvements (7.6% F 1 -score) in the accuracy of pairwise decisions.

show abstract

Event graphs for information retrieval and multi-document summarization

Glavaš

Šnajder

2014

Expert Systems with Applications

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jan Šnajder

Cross-Domain Detection of Abusive Language Online

Back up your Stance: Recognizing Arguments in Online Discussions

Reddit: A Gold Mine for Personality Prediction

Constructing Coherent Event Hierarchies from News Stories

Event graphs for information retrieval and multi-document summarization

Contact Info

Product

Resources

About