The massive adoption of social media has provided new ways for individuals to express their opinions online. The blogosphere, an inherent part of this trend, contains a vast array of information about a variety of topics. It is thus a huge think tank that creates an enormous and ever-changing archive of open source intelligence. Modeling and mining this vast pool of data to extract, exploit and describe meaningful knowledge in order to leverage (content-related) structures and dynamics of emerging networks within the blogosphere is the higher-level aim of the research presented here. This paper focuses on this project's initial phase, in which the abovementioned data of interest needs to be collected and made available offline for further analyses. Our proprietary development of a tailor-made feed-crawler meets exactly this need. The main concept, the techniques and the implementation details of the crawler thus form the main interest of this paper and furthermore provide the basis for future project phases.
Information about upcoming trends is a valuable knowledge for both, companies and individuals. Detecting trends for a certain topic is of special interest. According to the latest information over 200 million blogs exist in the World Wide Web. Hence, every day millions of posts are published. These blogs contain an enormous think tank of open-source intelligence. Considering the continuously growing nature of the World Wide Web a primary factor of success is the ability to include the latest data and focus on the complete data set of blogs. The structured as well as unstructured data of blogs are available offline via a single database for further analyses. This paper describes and evaluates an algorithm to detect trends based on the data published in blog posts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.