Data Documentation Initiative-Lifecycle (DDI-L) introduced a robust metadata model to support the capture of questionnaire content and flow, and encouraged through support for versioning and provenancing, objects such as BasedOn for the reuse of existing question items. However, the dearth of questionnaire banks including both question text and response domains has meant that an ecosystem to support the development of DDI ready Computer Assisted Interviewing (CAI) tools has been limited. Archives hold the information in PDFs associated with surveys but extracting that in an efficient manner into DDI-Lifecycle is a significant challenge. While CLOSER Discovery has been championing the provision of high-quality questionnaire metadata in DDI-Lifecycle, this has primarily been done manually. More automated methods need to be explored to ensure scalable metadata annotation and uplift. This paper presents initial results in engineering a machine learning (ML) pipeline to automate the extraction of questions from survey questionnaires as PDFs. Using CLOSER Discovery as a ‘training and test dataset’, a number of machine learning approaches have been explored to classify parsed text from questionnaires to be output as valid DDI items for inclusion in a DDI-L compliant repository. The developed ML pipeline adopts a continuous build and integrate approach, with processes in place to keep track of various combinations of the structured DDI-L input metadata, ML models and model parameters against the defined evaluation metrics, thus enabling reproducibility and comparative analysis of the experiments. Tangible outputs include a map of the various metadata and model parameters with the corresponding evaluation metrics’ values, which enable model tuning as well as transparent management of data and experiments.
ABSTRACT ObjectivesThe aim of this project is to address important issues relevant to children’s health This will be done by enhancing information collected in the longitudinal, UK-wide Millennium Cohort Study (MCS) by linking participating children to their routine health records. These issues include: health service implications of early life onset of obesity and overweight; timeliness of immunisations; association of infections with asthma and allergic disorders in childhood; and burden of disease due to childhood injuries. ApproachThe MCS comprises information on the social, economic and health-related circumstances of children surveyed at ages 9 months, 3, 5, 7, 11 and 14 years. At the age 7 interview, 12517 (89.1%) of the 14043 adults with parental responsibility consented for information from their child’s routine heath records to be released to the MCS (a). Routine health records have been requested for Wales, England and Scotland to be linked to MCS responses within the Secure Anonymised Information Linkage Databank at Swansea University. Data will be analysed using weights for non-response, non-consent and non-linkage and the linkage reported according to the RECORD guidelines (b). ResultsTo date, all 1881 MCS children with valid consent who live or have lived in Wales have been linked by assigning an Anonymous Linking Field (ALF) to each individual which can be mapped across multiple datasets without risk of identification (c). Of these children, 1365 (72.3%) had experienced at least one hospital admission by the age of 14 years. Risk of admission by each of the survey ages for boys and girls separately will be calculated adjusting for non-response at different sweeps. These children have also been linked to their immunisation records (n = 1872), Emergency Department attendances (n = 1276), and available GP records (n = 1151) to enable analyses in fulfilment of the project objectives. ConclusionsRoutine health records are a potentially valuable enhancement to longitudinal studies, allowing evaluation of questions of relevance to public health and health services, and the completeness and consistency of records from these different sources to be addressed. Referencesa. Shepherd, P. (2013) Consent to linkage to child health data ISBN 978-1-906929-59-6b. Benchimol, E.I. et al (2015) DOI: 10.1371/journal.pmed.1001885 c. Ford, D.V. et al (2009) DOI: 10.1186/1472-6963-9-157
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.