Missing data is an important, but often ignored, aspect of a network study. Measurement validity is affected by missing data, but the level of bias can be difficult to gauge. Here, we describe the effect of missing data on network measurement across widely different circumstances. In Part I of this study (Smith and Moody, 2013), we explored the effect of measurement bias due to randomly missing nodes. Here, we drop the assumption that data are missing at random: what happens to estimates of key network statistics when central nodes are more/less likely to be missing? We answer this question using a wide range of empirical networks and network measures. We find that bias is worse when more central nodes are missing. With respect to network measures, Bonacich centrality is highly sensitive to the loss of central nodes, while closeness centrality is not; distance and bicomponent size are more affected than triad summary measures and behavioral homophily is more robust than degree-homophily. With respect to types of networks, larger, directed networks tend to be more robust, but the relation is weak. We end the paper with a practical application, showing how researchers can use our results (translated into a publically available java application) to gauge the bias in their own data.
Social biases are encoded in word embeddings. This presents a unique opportunity to study society historically and at scale, and a unique danger when embeddings are used in downstream applications. Here, we investigate the extent to which publicly-available word embeddings accurately reflect beliefs about certain kinds of people as measured via traditional survey methods. We find that biases found in word embeddings do, on average, closely mirror survey data across seventeen dimensions of social meaning. However, we also find that biases in embeddings are much more reflective of survey data for some dimensions of meaning (e.g. gender) than others (e.g. race), and that we can be highly confident that embedding-based measures reflect survey data only for the most salient biases. thug child judge secretary Potency (weak−−strong) Evaluation (bad−−good) Gender (man−−woman) Black
Recent advances in artificial intelligence and computer science can be used by social scientists in their study of groups and teams. Here, we explain how developments in machine learning and simulations with artificially intelligent agents can help group and team scholars to overcome two major problems they face when studying group dynamics. First, because empirical research on groups relies on manual coding, it is hard to study groups in large numbers (the scaling problem). Second, conventional statistical methods in behavioral science often fail to capture the nonlinear interaction dynamics occurring in small groups (the dynamics problem). Machine learning helps to address the scaling problem, as massive computing power can be harnessed to multiply manual codings of group interactions. Computer simulations with artificially intelligent agents help to address the dynamics problem by implementing social psychological theory in data-generating algorithms that allow for sophisticated statements and tests of theory. We describe an ongoing research project aimed at computational analysis of virtual software development teams.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.