In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with the following 4 types: news, ongoing events, memes, and commemoratives. While previous research has analyzed trending topics over the long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This allows us to provide a filtered subset of trends to end users. We experiment with a set of straightforward language-independent features based on the social spread of trends and categorize them using the typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real-time, or to quickly identify viral memes that might inform marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend-setters. IntroductionAlthough social media have become mainstream, as shown by the evergrowing number of users, Twitter 1 stands out as the quintessential platform to openly access real-time updates on breaking news and ongoing events. With over 500 million users, Twitter sees a daily stream of more than 400 million short messages known as tweets.2 These tweets range from one-to-one conversations and chatter, to updates of wider interest about current affairs, encompassing all kinds of information.One of the appealing phenomena of the microblogging service is the fact that certain occurrences of wide interest for a community of users produce a sudden increase in real-time mentions as they unfold. Users live-tweet about sporting events as they watch them on TV, discuss breaking news as they learn about them, or commemorate certain events on a memorial day, among others. This results in spiky activity associated with the occurrence in question, which produces what is known as a social trend. While these can reveal what is going on early on, a list of social trends includes just the set of terms that are being mentioned at that moment, but no context is provided to learn more about what triggered the trend as well as the kind of event behind each of the trends. Little attention has been paid to researching social trends so as to mine additional characteristics from them and understand why they emerged. Discovering the trigger that produces a social trend not only can help inform users about social trends that match their interests, but also feed third parties with different interests: for instance, news media can be interested in breaking news (Zubiaga, Ji, & Knight, 2013), governments could be interested in tracking issues concerning certain events for security issues, and marketing professionals might beinterested in quickly identifying viral memes to react appropriately. Previous research on the analysis of social trends has focused on long-term analysis o...
User-generated annotations on social bookmarking sites can provide interesting and promising metadata for web document management tasks like web page classification. These user-generated annotations include diverse types of information, such as tags and comments. Nonetheless, each kind of annotation has a different nature and popularity level. In this work, we analyze and evaluate the usefulness of each of these social annotations to classify web pages over a taxonomy like that proposed by the Open Directory Project. We compare them separately to the content-based classification, and also combine the different types of data to augment performance. Our experiments show encouraging results with the use of social annotations for this purpose, and we found that combining these metadata with web page content improves even more the classifier's performance.
Information Retrieval (IR) approaches for semantic web search engines have become very populars in the last years. Popularization of different IR libraries, like Lucene, that allows IR implementations almost out-of-the-box have make easier IR integration in Semantic Web search engines. However, one of the most important features of Semantic Web documents is the structure, since this structure allow us to represent semantic in a machine readable format. In this paper we analyze the specific problems of structured IR and how to adapt weighting schemas for semantic document retrieval.
Language identification, as the task of determining the language a given text is written in, has progressed substantially in recent decades. However, three main issues remain still unresolved: (i) distinction of similar languages, (ii) detection of multilingualism in a single document, and (iii) identifying the language of short texts. In this paper, we describe our work on the development of a benchmark to encourage further research in these three directions, set forth an evaluation framework suitable for the task, and make a dataset of annotated tweets publicly available for research purposes. We also describe the shared task we organized to validate and assess the evaluation framework and dataset with systems submitted by seven different participants, and analyze the performance of these systems. The evaluation of the results submitted by the participants of the shared task helped us shed some light on the shortcomings of state-of-the-art language identification systems, and gives insight into the extent to which the brevity, multilingualism, and language similarity found in texts exacerbate the performance of language identifiers. Our dataset with nearly 35,000 tweets and the evaluation framework provide researchers and practitioners with suitable resources to further study the aforementioned issues on language identification within a common setting that enables to compare results with one another.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.