Temporal information is crucial for recommendation problems because user preferences are naturally dynamic in the real world. Recent advances in deep learning, especially the discovery of various attention mechanisms and newer architectures in addition to widely used RNN and CNN in natural language processing, have allowed for better use of the temporal ordering of items that each user has engaged with. In particular, the SASRec model, inspired by the popular Transformer model in natural languages processing, has achieved state-of-the-art results. However, SASRec, just like the original Transformer model, is inherently an un-personalized model and does not include personalized user embeddings. To overcome this limitation, we propose a Personalized Transformer (SSE-PT) model, outperforming SASRec by almost 5% in terms of NDCG@10 on 5 real-world datasets. Furthermore, after examining some random users' engagement history, we find our model not only more interpretable but also able to focus on recent engagement patterns for each user. Moreover, our SSE-PT model with a slight modification, which we call SSE-PT++, can handle extremely long sequences and outperform SASRec in ranking results with comparable training speed, striking a balance between performance and speed requirements. Our novel application of the Stochastic Shared Embeddings (SSE) regularization is essential to the success of personalization. Code and data are open-sourced at https://github.com/wuliwei9278/SSE-PT.
We consider the problem of detecting a rectangle of activation in a grid of sensors in ddimensions with noisy measurements. This has applications to massive surveillance projects and anomaly detection in large datasets in which one detects anomalously high measurements over rectangular regions, or more generally, blobs. Recently, the asymptotic distribution of a multiscale scan statistic was established in (Kabluchko, 2011) under the null hypothesis, using non-constant boundary crossing probabilities for locally-stationary Gaussian random fields derived in (Chan and Lai, 2006). Using a similar approach, we derive the exact asymptotic level and power of four variants of the scan statistic: an oracle scan that knows the dimensions of the activation rectangle; the multiscale scan statistic just mentioned; an adaptive variant; and an -net approximation to the latter, in the spirit of (Arias-Castro et al., 2005). This approximate scan runs in time near-linear in the size of the grid and achieves the same asymptotic power as the adaptive scan. We complement our theory with some numerical experiments.
We consider the problem of deciding, based on a single noisy measurement at each vertex of a given graph, whether the underlying unknown signal is constant over the graph or there exists a cluster of vertices with anomalous activation. This problem is relevant to several applications such as surveillance, disease outbreak detection, biomedical imaging, environmental monitoring, etc. Since the activations in these problems often tend to be localized to small groups of vertices in the graphs, we model such activity by a class of signals that are supported over a (possibly disconnected) cluster with low cut size relative to its size. We analyze the corresponding generalized likelihood ratio (GLR) statistics and relate it to the problem of finding a sparsest cut in the graph. We develop a tractable relaxation of the GLR statistic based on the combinatorial Laplacian of the graph, which we call the graph Fourier scan statistic, and analyze its properties. We show how its performance as a testing procedure depends directly on the spectrum of the graph, and use this result to explicitly derive its asymptotic properties on a few significant graph topologies. Finally, we demonstrate theoretically and with simulations that the graph Fourier scan statistic can outperform naïve testing procedures based on global averaging and vertex-wise thresholding. We also demonstrate the usefulness of the GFSS by analyzing groundwater Arsenic concentrations from a U.S. Geological Survey dataset. *
The COVID-19 pandemic presented enormous data challenges in the United States. Policy makers, epidemiological modelers, and health researchers all require up-to-date data on the pandemic and relevant public behavior, ideally at fine spatial and temporal resolution. The COVIDcast API is our attempt to fill this need: Operational since April 2020, it provides open access to both traditional public health surveillance signals (cases, deaths, and hospitalizations) and many auxiliary indicators of COVID-19 activity, such as signals extracted from deidentified medical claims data, massive online surveys, cell phone mobility data, and internet search trends. These are available at a fine geographic resolution (mostly at the county level) and are updated daily. The COVIDcast API also tracks all revisions to historical data, allowing modelers to account for the frequent revisions and backfill that are common for many public health data sources. All of the data are available in a common format through the API and accompanying R and Python software packages. This paper describes the data sources and signals, and provides examples demonstrating that the auxiliary signals in the COVIDcast API present information relevant to tracking COVID activity, augmenting traditional public health reporting and empowering research and decision-making.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.