2019
DOI: 10.1002/sim.8445
|View full text |Cite
|
Sign up to set email alerts
|

The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities

Abstract: Biobanks linked to electronic health records provide rich resources for health‐related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large‐scale biorepositories provide the opportunity to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
58
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 77 publications
(63 citation statements)
references
References 225 publications
(438 reference statements)
1
58
0
Order By: Relevance
“…An earlier smoking PheWAS estimated the effects of related biases under different strengths of simulated confounding and found that even in the scenario where confounder-smoking status association was very strong (with an odds ratio of 10), there was still no evidence of inflation in false positive rate in the ever smokers [4]. In addition, we acknowledge that misclassification may have occurred at both the level of applying ICD codes and also in the automated process of converting them to phecodes [23,24] Furthermore, MR assumes a linear effect, which would not be able to precisely capture the detrimental effects of smoking intensity if the effect is non-linear. Finally, we acknowledge that this study was carried out in participants of White-British ancestry and other studies are required to confirm these associations and their magnitude in other populations.…”
Section: Discussionmentioning
confidence: 88%
“…An earlier smoking PheWAS estimated the effects of related biases under different strengths of simulated confounding and found that even in the scenario where confounder-smoking status association was very strong (with an odds ratio of 10), there was still no evidence of inflation in false positive rate in the ever smokers [4]. In addition, we acknowledge that misclassification may have occurred at both the level of applying ICD codes and also in the automated process of converting them to phecodes [23,24] Furthermore, MR assumes a linear effect, which would not be able to precisely capture the detrimental effects of smoking intensity if the effect is non-linear. Finally, we acknowledge that this study was carried out in participants of White-British ancestry and other studies are required to confirm these associations and their magnitude in other populations.…”
Section: Discussionmentioning
confidence: 88%
“…Large-scale biobanks with hundreds of thousands of genotyped and deeply phenotyped subjects are valuable resources to identify genetic components of complex phenotypes. 1,2 In biobanks, ordinal categorical data is a common type of phenotype, which is often collected from surveys, questionnaires, and testing to measure human behaviors, satisfaction, and preferences. 3,4 For example, a web questionnaire was used for 182,219 UK Biobank participants to collect 150 food and other health behavior related preferences, all of which are ordinal categorical phenotypes based on a 9-point hedonic scale of liking from 1 (extremely dislike) to 9 (extremely like).…”
Section: Mainmentioning
confidence: 99%
“…As the amount of data collected on a daily basis from hospital health care system keeps increasing, [1] the appeal for leveraging the full potential of these data for research purposes and to investigate clinical questions is also becoming stronger than ever. [2][3][4][5] Yet, EHR data are quite different from research oriented data (e.g. cohort or trial data): i) they are less structured, more heterogeneous, ii) they present finer granularity, iii) data collection is done for health care purpose.…”
Section: Introductionmentioning
confidence: 99%