The spread of fake news on social media became a public concern in the United States after the 2016 presidential election. We examined exposure to and sharing of fake news by registered voters on Twitter and found that engagement with fake news sources was extremely concentrated. Only 1% of individuals accounted for 80% of fake news source exposures, and 0.1% accounted for nearly 80% of fake news sources shared. Individuals most likely to engage with fake news sources were conservative leaning, older, and highly engaged with political news. A cluster of fake news sources shared overlapping audiences on the extreme right, but for people across the political spectrum, most political news exposure still came from mainstream media outlets.
A major goal in human genetics is to understand the role of common genetic variants in susceptibility to common diseases. This will require characterizing the nature of gene variation in human populations, assembling an extensive catalogue of single-nucleotide polymorphisms (SNPs) in candidate genes and performing association studies for particular diseases. At present, our knowledge of human gene variation remains rudimentary. Here we describe a systematic survey of SNPs in the coding regions of human genes. We identified SNPs in 106 genes relevant to cardiovascular disease, endocrinology and neuropsychiatry by screening an average of 114 independent alleles using 2 independent screening methods. To ensure high accuracy, all reported SNPs were confirmed by DNA sequencing. We identified 560 SNPs, including 392 coding-region SNPs (cSNPs) divided roughly equally between those causing synonymous and non-synonymous changes. We observed different rates of polymorphism among classes of sites within genes (non-coding, degenerate and non-degenerate) as well as between genes. The cSNPs most likely to influence disease, those that alter the amino acid sequence of the encoded protein, are found at a lower rate and with lower allele frequencies than silent substitutions. This likely reflects selection acting against deleterious alleles during human evolution. The lower allele frequency of missense cSNPs has implications for the compilation of a comprehensive catalogue, as well as for the subsequent application to disease association.
Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Traditional tree learning algorithms assume that instances in the training data are homogenous and independently distributed. Relational probability trees (RPTs) extend standard probability estimation trees to a relational setting in which data instances are heterogeneous and interdependent. Our algorithm for learning the structure and parameters of an RPT searches over a space of relational features that use aggregation functions (e.g. AVERAGE, MODE, COUNT) to dynamically propositionalize relational data and create binary splits within the RPT. Previous work has identified a number of statistical biases due to characteristics of relational data such as autocorrelation and degree disparity. The RPT algorithm uses a novel form of randomization test to adjust for these biases. On a variety of relational learning tasks, RPTs built using randomization tests are significantly smaller than other models and achieve equivalent, or better, performance.
There is a growing consensus that online platforms have a systematic influence on the democratic process. However, research beyond social media is limited. In this paper, we report the results of a mixed-methods algorithm audit of partisan audience bias and personalization within Google Search. Following Donald Trump's inauguration, we recruited 187 participants to complete a survey and install a browser extension that enabled us to collect Search Engine Results Pages (SERPs) from their computers. To quantify partisan audience bias, we developed a domain-level score by leveraging the sharing propensities of registered voters on a large Twitter panel. We found little evidence for the "filter bubble'' hypothesis. Instead, we found that results positioned toward the bottom of Google SERPs were more left-leaning than results positioned toward the top, and that the direction and magnitude of overall lean varied by search query, component type (e.g. "answer boxes"), and other factors. Utilizing rank-weighted metrics that we adapted from prior work, we also found that Google's rankings shifted the average lean of SERPs to the right of their unweighted average.
This paper reports on methods and results of an applied research project by a team consisting of SAIC and four universities to develop, integrate, and evaluate new approaches to detect the weak signals characteristic of insider threats on organizations' information systems. Our system combines structural and semantic information from a real corporate database of monitored activity on their users' computers to detect independently developed red team inserts of malicious insider activities. We have developed and applied multiple algorithms for anomaly detection based on suspected scenarios of malicious insider behavior, indicators of unusual activities, high-dimensional statistical patterns, temporal sequences, and normal graph evolution. Algorithms and representations for dynamic graph processing provide the ability to scale as needed for enterpriselevel deployments on real-time data streams. We have also developed a visual language for specifying combinations of features, baselines, peer groups, time periods, and algorithms to detect anomalies suggestive of instances of insider threat behavior. We defined over 100 data features in seven categories based on approximately 5.5 million actions per day from approximately 5,500 users. We have achieved area under the ROC curve values of up to 0.979 and lift values of 65 on the top 50 user-days identified on two months of real data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.