A challenge for studies assessing routine activities theory is accounting for the spatial and temporal confluence of offenders and targets given that people move about during the daytime and nighttime. We propose exploiting social media (Twitter) data to construct estimates of the population at various locations at different times of day, and assess whether these estimates help predict the amount of crime during two-hour time periods over the course of the day. We address these questions using crime data for 97,428 blocks in the Southern California region, along with geocoded information on tweets in the region over an eight month period. The results show that this measure of the temporal ambient population helps explain the level of crime in blocks during particular time periods. The use of social media data appear promising for testing various implications of routine activities and crime patterning theories, given their explicit spatial and temporal nature.
In this paper we address the problem of building user models that can predict the rate at which individuals consume items from a finite set, including items they have consumed in the past and items that are new. This combination of repeat and new item consumption is common in applications such as listening to music, visiting web sites, and purchasing products. We use zero-inflated Poisson (ZIP) regression models as the basis for our modeling approach, leading to a general framework for modeling user-item consumption rates over time. We show that these models are more flexible in capturing user behavior than alternatives such as well-known latent factor models based on matrix factorization. We compare the performance of ZIP regression and latent factor models on three different data sets involving music, restaurant reviews, and social media. The ZIP regression models are systematically more accurate across all three data sets and across different prediction metrics.
Personalization is increasingly important for a range of applications that rely on location-based modeling. A key aspect in building personalized models is using populationlevel information to smooth noisy sparse data at the individual level. In this paper we develop a general mixture model framework for learning individual-level location models where the model adaptively combines different types of smoothing information. In a series of experiments with Twitter geolocation data and Gowalla check-in data we demonstrate that the proposed approach can be significantly more accurate than more traditional smoothing and matrix factorization techniques. The improvement in performance over matrix factorization is pronounced and may be explained by the tendency of dimensionality reduction methods to oversmooth and not retain enough detail at the individual level. CCS Concepts •Computing methodologies → Mixture models;
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.