The Workshop on Stylistic Variation (StyVa) at EMNLP 2017 is the first of its kind, offering a new venue for bringing together a large but previously underserved and splintered community within computational linguistics. Our goal in creating this workshop was to attract a variety of perspectives on style from traditional areas within NLP, including authorship attribution, author profiling, genre studies, natural language generation, distributional lexicography, and literary and educational applications; to this end we have defined stylistic variation as broadly as possible, to include any variation in phonological, lexical, syntactic, or discourse realization of particular semantic content, due to differences in extralinguistic variables such as individual speaker, speaker demographics, target audience, genre, etc.We received 22 submissions, of which we accepted 14 (64%), seven as talks and seven as poster presentations.Though there was indeed a great deal of diversity in the submissions, including at least one submission in several of the major topic areas discussed above, we also noted a clear trend: we received several papers on style-sensitive language generation, particularly using neural network models. This clearly reflects a more general interest in the field, and one we would expect to continue. More generally, we are pleased that this workshop has served as a venue for both traditional and cutting-edge approaches to style.We'd like to thank the authors for choosing StyVa as a venue for their excellent work, our invited speakers (Ani Nenkova and Walter Daelemans) for their invaluable contribution, and of course the reviews provided by our esteemed Program Committee. We'd also want to thank the ACL workshop organizing committee for giving us this opportunity to bring together the NLP stylistic community.We look forward to a great workshop in Copenhagen! Julian Brooke, Moshe Koppel, and Thamar Solorio
AbstractAs natural language processing research is growing and largely driven by the availability of data, we expanded research from news and small-scale dialog corpora to web and social media. User-generated data and crowdsourcing opened the door for investigating human language of various styles with more statistical power and real-world applications. In this position/survey paper, I will review and discuss seven language styles that I believe to be important and interesting to study: influential work in the past, challenges at the present, and potential impact for the future.
Top Three ProblemsThe top three problems for studying language styles are data, data and data. More specifically, they are data shortage, data fusion, and data annotation problems. The data shortage problem has been improving, which is the main reason that there is surge in the number of research studies on language styles. The data fusion problem is more specific to the area, due to the subtle and often subjective nature of linguistic styles. For instance, while men and women talk in different ways (note this is not the same as ta...