Researchers rely on metadata systems to prepare data for analysis. As the complexity of data sets increases and the breadth of data analysis practices grow, existing metadata systems can limit the efficiency and quality of data preparation. This article describes the redesign of a metadata system supporting the Fragile Families and Child Wellbeing Study on the basis of the experiences of participants in the Fragile Families Challenge. The authors demonstrate how treating metadata as data (i.e., releasing comprehensive information about variables in a format amenable to both automated and manual processing) can make the task of data preparation less arduous and less error prone for all types of data analysis. The authors hope that their work will facilitate new applications of machine-learning methods to longitudinal surveys and inspire research on data preparation in the social sciences. The authors have open-sourced the tools they created so that others can use and improve them.
Researchers rely on metadata systems to prepare data for analysis. As the complexity of datasets increases and the breadth of data analysis practices grow, existing metadata systems can limit the efficiency and quality of data preparation. This article describes the redesign of a metadata system supporting the Fragile Families and Child Wellbeing Study based on the experiences of participants in the Fragile Families Challenge. We demonstrate how treating metadata as data-that is, releasing comprehensive information about variables in a format amenable to both automated and manual processing-can make the task of data preparation less arduous and less error-prone for all types of data analysis. We hope that our work will facilitate new applications of machine learning methods to longitudinal surveys and inspire research on data preparation in the social sciences. We have open-sourced the tools we created so that others can use and improve them.Keywords metadata, survey research, data sharing, quantitative methodology, computational social science 1 We thank all the participants in the Fragile Families Challenge who shared their data processing scripts and stories with us, particularly Greg Gundersen for creating the first version of machine-actionable Fragile Families metadata. We also thank Ian Lundberg who hosted several of the getting started workshops, supported participants throughout the Challenge, and shared his own experiences with survey metadata. Participants in the Princeton Sociology Proseminar provided valuable feedback on a draft of the article, and Brandon Stewart provided catalytic conversation. We thank Ian Fellows for his help with the R package, Greg Gundersen for his help with the Python package, and Cambria Naslund for her assistance with the question text data. Finally, we gratefully acknowledge
Families formed by unmarried parents increased dramatically in the United States during the latter half of the 20th century. To learn more about these families, a team of researchers at Princeton University and Columbia University designed and implemented a large birth cohort study—The Fragile Families and Child Wellbeing Study. This chapter highlights several findings from the study. First, most unmarried parents have “high hopes” for a future together at the time of their child’s birth; but their resources are low and most relationships do not last. Second, unmarried mothers experience high levels of partnership instability and family complexity, both of which are associated with lower-quality parenting and poorer child well-being. Finally, welfare state, child support and criminal justice policies play a large role in the lives of fragile families.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.