The explosion of published information on the Web leads to the emergence of a Web syndication paradigm, which transforms the passive reader into an active information collector. Information consumers subscribe to RSS/Atom feeds and are notified whenever a piece of news (item) is published. The success of this Web syndication now offered on Web sites, blogs, and social media, however raises scalability issues. There is a vital need for efficient real-time filtering methods across feeds, to allow users to follow effectively personally interesting information. We investigate in this paper three indexing techniques for users' subscriptions based on inverted lists or on an ordered trie. We present analytical models for memory requirements and matching time and we conduct a thorough experimental evaluation to exhibit the impact of critical workload parameters on these structures.
Abstract. We are witnessing a widespread of web syndication technologies such as RSS or Atom for a timely delivery of frequently updated Web content. Almost every personal weblog, news portal, or discussion forum employs nowadays RSS/Atom feeds for enhancing pull-oriented searching and browsing of web pages with push-oriented protocols of web content. Social media applications such as Twitter or Facebook also employ RSS for notifying users about the newly available posts of their preferred friends. Unfortunately, previous works on RSS/Atom statistical characteristics do not provide a precise and updated characterization of feeds' behavior and content, characterization which can be used to successfully benchmark effectiveness and efficiency of various RSS processing/analysis techniques. In this paper, we present the first thorough analysis of three complementary features of real-scale RSS feeds, namely, publication activity, items structure and length, as well as, vocabulary of its content which we believe are crucial for Web 2.0 applications.
Due to their success, social network platforms are considered today as a major communication mean. In order to increase user engagement, they rely on recommender systems to personalize individual experience by filtering messages according to user interest and/or neighborhood. However some recent results exhibit that this personalization of content might increase the echo chamber effect and create filter bubbles. These filter bubbles restrain the diversity of opinions regarding the recommended content. In this paper, we first realize a thorough study of communities on a large Twitter dataset to quantify how recommender systems affect users' behavior and create filter bubbles. Then we propose the Community Aware Model (CAM) to counter the impact of different recommender systems on information consumption. Our results show that filter bubbles concern up to 10% of users and our model based on similarities between communities enhance recommender systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.