We present half-life regression (HLR), a novel model for spaced repetition practice with applications to second language acquisition. HLR combines psycholinguistic theory with modern machine learning techniques, indirectly estimating the "halflife" of a word or concept in a student's long-term memory. We use data from Duolingo -a popular online language learning application -to fit HLR models, reducing error by 45%+ compared to several baselines at predicting student recall rates. HLR model weights also shed light on which linguistic concepts are systematically challenging for second language learners. Finally, HLR was able to improve Duolingo daily student engagement by 12% in an operational user study.
There is a widespread intuitive sense that different kinds of information spread differently on-line, but it has been difficult to evaluate this question quantitatively since it requires a setting where many different kinds of information spread in a shared environment. Here we study this issue on Twitter, analyzing the ways in which tokens known as hashtags spread on a network defined by the interactions among Twitter users. We find significant variation in the ways that widely-used hashtags on different topics spread.Our results show that this variation is not attributable simply to differences in "stickiness," the probability of adoption based on one or more exposures, but also to a quantity that could be viewed as a kind of "persistence" -the relative extent to which repeated exposures to a hashtag continue to have significant marginal effects. We find that hashtags on politically controversial topics are particularly persistent, with repeated exposures continuing to have unusually large marginal effects on adoption; this provides, to our knowledge, the first large-scale validation of the "complex contagion" principle from sociology, which posits that repeated exposures to an idea are particularly crucial when the idea is in some way controversial or contentious. Among other findings, we discover that hashtags representing the natural analogues of Twitter idioms and neologisms are particularly non-persistent, with the effect of multiple exposures decaying rapidly relative to the first exposure.We also study the subgraph structure of the initial adopters for different widely-adopted hashtags, again finding structural differences across topics. We develop simulation-based and generative models to analyze how the adoption dynamics interact with the network structure of the early adopters on which a hashtag spreads.
Abstract-In social media settings where users send messages to one another, the issue of reciprocity naturally arises: does the communication between two users take place only in one direction, or is it reciprocated? In this paper we study the problem of reciprocity prediction: given the characteristics of two users, we wish to determine whether the communication between them is reciprocated or not. We approach this problem using decision trees and regression models to determine good indicators of reciprocity. We extract a network based on directed @-messages sent between users on Twitter, and identify measures based on the attributes of nodes and their network neighborhoods that can be used to construct good predictors of reciprocity. Moreover, we find that reciprocity prediction forms interesting contrasts with earlier network prediction tasks, including link prediction, as well as the inference of strengths and signs of network links.
Abstract. Given a graph with billions of nodes and edges, how can we find patterns and anomalies? Are there nodes that participate in too many or too few triangles? Are there close-knit near-cliques? These questions are expensive to answer unless we have the first several eigenvalues and eigenvectors of the graph adjacency matrix. However, eigensolvers suffer from subtle problems (e.g., convergence) for large sparse matrices, let alone for billion-scale ones. We address this problem with the proposed HEIGEN algorithm, which we carefully design to be accurate, efficient, and able to run on the highly scalable MAPRE-DUCE (HADOOP) environment. This enables HEIGEN to handle matrices more than 1000× larger than those which can be analyzed by existing algorithms. We implement HEIGEN and run it on the M45 cluster, one of the top 50 supercomputers in the world. We report important discoveries about near-cliques and triangles on several real-world graphs, including a snapshot of the Twitter social network (38Gb, 2 billion edges) and the "YahooWeb" dataset, one of the largest publicly available graphs (120Gb, 1.4 billion nodes, 6.6 billion edges).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.