Over the past few years, online social networking sites (Facebook, Twitter, Youtube, Flickr, MySpace, LinkedIn, Metacafe, Vimeo, etc.) have revolutionized the way we communicate with individuals, groups and communities, and altered everyday practices. The unprecedented volume and variety of usergenerated content as well as the user interaction network constitute new opportunities for understanding social behavior and building socially-intelligent systems.This 5th workshop attracted several submissions from around the world. Each paper was assigned to four reviewers. For the final workshop program, and for inclusion in these proceedings, nine regular papers were selected. The workshop program features two keynote presentations: one by Kalina Bontcheva, Senior Researcher in the Natural Language Processing Group, Department of Computer Science, University of Sheffield, and one on Industrial perspectives presented by NLP Technologies, Montreal Canada, on social media monitoring and innovative tools.One of the goals of LASM 2014 was to reflect a wide range of different research efforts and results of language analysis with implications for fields such as natural language processing, computational linguistics, sociolinguistics and psycholinguistics. We invited original and unpublished research papers on all topics related to the analysis of language on social media, including the following topics:• What are people talking about on social media?• How are they expressing themselves?• Why do they scribe?• Natural language processing techniques for social media analysis• How do language and social network properties interact?• Semantic Web / Ontologies / Domain models to aid in social data understanding
• Characterizing Participants via Linguistic Analysis• Language, Social Media and Human Behavior This workshop would not have been possible without the hard work of many people. We would like to thank all Program Committee members and external reviewers for their effort in providing high-quality reviews in a timely manner. We thank all the authors who submitted their papers, as well as the authors whose papers were selected, for their help with preparing the final copy. We are in debt to the EACL 2014 Workshop co-Chairs. We would also like to thank our industry partners for their support and for making LASM 2014 a successful workshop; NLP Technologies, Microsoft Research and IBM Almaden.
AbstractUser-generated content has become a recurrent resource for NLP tools and applications, hence many efforts have been made lately in order to handle the noise present in short social media texts. The use of normalisation techniques has been proven useful for identifying and replacing lexical variants on some of the most informal genres such as microblogs. But annotated data is needed in order to train and evaluate these systems, which usually involves a costly process. Until now, most of these approaches have been focused on English and they were not taking into account demographic variables such as the user location and gender. ...