Social media data has become crucial to the advancement of scientific understanding. However, even though it has become ubiquitous, just collecting large-scale social media data involves a high degree of engineering skill set and computational resources. In fact, research is often times gated by data engineering problems that must be overcome before analysis can proceed. This has resulted recognition of datasets as meaningful research contributions in and of themselves.Reddit, the so called “front page of the Internet,” in particular has been the subject of numerous scientific studies. Although Reddit is relatively open to data acquisition compared to social media platforms like Facebook and Twitter, the technical barriers to acquisition still remain. Thus, Reddit's millions of subreddits, hundreds of millions of users, and billions of comments are at the same time relatively accessible, but time consuming to collect and analyze systematically.In this paper, we present the Pushshift Reddit dataset. Pushshift is a social media data collection, analysis, and archiving platform that since 2015 has collected Reddit data and made it available to researchers. Pushshift's Reddit dataset is updated in real-time, and includes historical data back to Reddit's inception. In addition to monthly dumps, Pushshift provides computational tools to aid in searching, aggregating, and performing exploratory analysis on the entirety of the dataset. The Pushshift Reddit dataset makes it possible for social media researchers to reduce time spent in the data collection, cleaning, and storage phases of their projects.
This paper describes a structurally-guided framework for the decomposition of a verification task into subtasks, each solved by a specialized algorithm for overall efficiency. Our contributions include the following: (1) a structural algorithm for computing a bound of a statetransition diagram's diameter which, for several classes of netlists, is sufficiently small to guarantee completeness of a bounded property check;(2) a robust backward unfolding technique for structural target enlargement: from the target states, we perform a series of compose-based preimage computations, truncating the search if resource limitations are exceeded; (3) similar to frontier simplification in symbolic reachability analysis, we use induction via don't cares for enhancing the presented target enlargement. In many practical cases, the verification problem can be discharged by the enlargement process; otherwise, it is passed in simplified form to an arbitrary subsequent solution approach. The presented techniques are embedded in a flexible verification framework, allowing arbitrary combinations with other techniques. Extensive experimental results demonstrate the effectiveness of the described methods at solving and simplifying practical verification problems.
In this paper we present the application of generalized retiming for temporal property checking. Retiming is a structural transformation that relocates registers in a circuit-based design representation without changing its actual input-output behavior. We discuss the application of retiming to minimize the number of registers with the goal of increasing the capacity of symbolic state traversal. In particular, we demonstrate that the classical definition of retiming can be generalized for verification by relaxing the notion of design equivalence and physical implementability. This includes (1) omitting the need for equivalent reset states by using an initialization stump, (2) supporting negative registers, handled by a general functional relation to future time frames, and (3) eliminating peripheral registers by converting them into simple temporal offsets. The presented results demonstrate that the application of retiming in verification can significantly increase the capacity of symbolic state traversal. Our experiments also demonstrate that the repeated use of retiming interleaved with other structural simplifications can yield reductions beyond those possible with single applications of the individual approaches. This result suggests that a tool architecture based on re-entrant transformation engines can potentially decompose and solve verification problems that otherwise would be infeasible.
Social media data has become crucial to the advancement of scientific understanding. However, even though it has become ubiquitous, just collecting large-scale social media data involves a high degree of engineering skill set and computational resources. In fact, research is often times gated by data engineering problems that must be overcome before analysis can proceed. This has resulted recognition of datasets as meaningful research contributions in and of themselves.Reddit, the so called "front page of the Internet," in particular has been the subject of numerous scientific studies. Although Reddit is relatively open to data acquisition compared to social media platforms like Facebook and Twitter, the technical barriers to acquisition still remain. Thus, Reddit's millions of subreddits, hundreds of millions of users, and hundreds of billions of comments are at the same time relatively accessible, but time consuming to collect and analyze systematically.In this paper, we present the Pushshift Reddit dataset. Pushshift is a social media data collection, analysis, and archiving platform that since 2015 has collected Reddit data and made it available to researchers. Pushshift's Reddit dataset is updated in real-time, and includes historical data back to Reddit's inception. In addition to monthly dumps, Pushshift provides computational tools to aid in searching, aggregating, and performing exploratory analysis on the entirety of the dataset. The Pushshift Reddit dataset makes it possible for social media researchers to reduce time spent in the data collection, cleaning, and storage phases of their projects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.