Trading data as a commodity has become increasingly popular in recent years, and data marketplaces have emerged as a new business model where data from a variety of sources can be collected, aggregated, processed, enriched, bought, and sold. They are effectively changing the way data are distributed and managed on the Internet. To get a better understanding of the emergence of data marketplaces, we have conducted several surveys in recent years to systematically gather and evaluate their characteristics. This paper takes a broader perspective and relates data marketplaces as currently discussed in computer science to the neoclassical notions of market and marketplace from economics. Specifically, we provide a typology of electronic marketplaces and discuss their approaches to the distribution of data. Finally, we provide a distinct definition of data marketplaces, leading to a classification framework that can provide structure for the emerging field of data marketplace research.
The survey presented in this work investigates emerging markets for data and is the third of its kind, providing a deeper understanding of this emerging type of market. The findings indicate that data providers focus on limited business models and that data remains individualized and differentiated. Nevertheless, a trend towards commoditization for certain types of data can be foreseen, which allows an outlook to further developments in this area.
Please refer to published version for the most recent bibliographic citation information. If a published version is known of, the repository item page linked to above, will contain details on accessing it.
Crime prediction is crucial to criminal justice decision makers and efforts to prevent crime. The paper evaluates the explanatory and predictive value of human activity patterns derived from taxi trip, Twitter and Foursquare data. Analysis of a six-month period of crime data for New York City shows that these data sources improve predictive accuracy for property crime by 19% compared to using only demographic data. This effect is strongest when the novel features are used together, yielding new insights into crime prediction. Notably and in line with social disorganization theory, the novel features cannot improve predictions for violent crimes.
Models of discrete-valued outcomes are easily misspecified if the data exhibit zero-inflation, overdispersion or contamination. Without additional knowledge about the existence and nature of this misspecification, model inference and prediction are adversely affected.Here, we introduce a robust discrepancybased Bayesian approach using the Total Variation Distance (TVD). In the process, we address and resolve two challenges: First, we study convergence and robustness properties of a computationally efficient estimator for the TVD between a parametric model and the data-generating mechanism. Secondly, we provide an efficient inference method adapted from Lyddon et al. ( 2019) which corresponds to formulating an uninformative nonparametric prior directly over the data-generating mechanism. Lastly, we empirically demonstrate that our approach is robust and significantly improves predictive performance on a range of simulated and real world data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.