BACKGROUNDOnline genealogies are promising data sources for demographic research, but their limitations are understudied. This paper takes a critical approach to evaluating the potential strengths and weaknesses of using online genealogical data for population studies. We focus on the FamiLinx dataset, which contains demographic information and kinship ties across multiple countries and centuries.
OBJECTIVEWe propose novel measures to assess the completeness and the quality of demographic variables in the FamiLinx data at both the individual and the familial level over the 1600-1900 period. Utilizing Sweden as a test country, we investigate how the age-sex distribution and the mortality levels of the digital population extracted from FamiLinx diverge from the registered population.
METHODWe employ descriptive statistics, negative binomial regression modeling, and standard life table techniques for our measures of completeness and quality.
RESULTSMissing values and accuracy in demographic information from FamiLinx are selective. When one demographic variable is available, researchers can effectively anticipate the availability of other demographic information. The completeness and quality of demographic variables within kinship networks are markedly higher for individuals with more complete and accurate demographic information. Populations from FamiLinx display lower mortality levels than the registered population and their representativeness improves towards the end of the 19 th century.