Detecting important events in high volume news streams is an important task for a variety of purposes. The volume and rate of online news increases the need for automated event detection methods that can operate in real time. In this paper we develop a network-based approach that makes the working assumption that important news events always involve named entities (such as persons, locations and organizations) that are linked in news articles. Our approach uses natural language processing techniques to detect these entities in a stream of news articles and then creates a time-stamped series of networks in which the detected entities are linked by co-occurrence in articles and sentences. In this prototype, weighted node degree is tracked over time and change-point detection used to locate important events. Potential events are characterized and distinguished using community detection on KeyGraphs that relate named entities and informative noun-phrases from related articles. This methodology already produces promising results and will be extended in future to include a wider variety of complex network analysis techniques.
The huge volume and velocity of media content published on the Web presents a substantial challenge to human analysts. In prior work, we developed a system (network event detection, NED) to assist analysts by detecting events within high-volume news streams in real time. NED can process a heterogeneous stream of news articles or social media user posts, combining text mining and network analysis to detect breaking news stories and generate an easy-to-understand event summary. In this paper, we expand the NED event detection and summarisation approach in two ways. First, we introduce a new approach to named entity disambiguation for tweets, which contain minimal information due to brevity. Second, we apply sentiment analysis techniques to documents associated with a detected event to characterise the event as either broadly 'positive' or 'negative' based on media portrayal. Our expansion focuses on Twitter streams since Twitter has become an important news dissemination platform and is often the site where emerging events are first seen. To test the extended methodology, we apply it here to three data sets related to political elections in the UK and the USA. The addition of sentiment analysis to the NED event detection methodology improves the insight gained by the user by allowing quick evaluation of the perceived impact of an event. This approach may have potential applications in domains where public sentiment is relevant to decision-making around events, such as financial markets and politics.
Question and answer (Q&A) websites are a medium where people can communicate and help each other. Stack Overflow is one of the most popular Q&A websites about programming, where millions of developers seek help or provide valuable assistance. Activity on the Stack Overflow website is moderated by the user community, utilizing a voting system to promote high quality content. The website was created on 2008 and has accumulated a large amount of crowd wisdom about the software development industry. Here we analyse this data to examine trends in the grouping of technologies and their users into different sub-communities. In our work we analysed all questions, answers, votes and tags from Stack Overflow between 2008 and 2020. We generated a series of user-technology interaction graphs and applied community detection algorithms to identify the biggest user communities for each year, to examine which technologies those communities incorporate, how they are interconnected and how they evolve through time. The biggest and most persistent communities were related to web development. In general, there is little movement between communities; users tend to either stay within the same community or not acquire any score at all. Community evolution reveals the popularity of different programming languages and frameworks on Stack Overflow over time. These findings give insight into the user community on Stack Overflow and reveal long-term trends on the software development industry.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.