Second generation p-values preserve the simplicity that has made p-values popular while resolving critical flaws that promote misinterpretation of data, distraction by trivial effects, and unreproducible assessments of data. The second-generation p-value (SGPV) is an extension that formally accounts for scientific relevance by using a composite null hypothesis that captures null and scientifically trivial effects. Because the majority of spurious findings are small effects that are technically nonnull but practically indistinguishable from the null, the second-generation approach greatly reduces the likelihood of a false discovery. SGPVs promote transparency, rigor and reproducibility of scientific results by a priori identifying which candidate hypotheses are practically meaningful and by providing a more reliable statistical summary of when the data are compatible with the candidate hypotheses or null hypotheses, or when the data are inconclusive. We illustrate the importance of these advances using a dataset of 247,000 single-nucleotide polymorphisms, i.e., genetic markers that are potentially associated with prostate cancer.
Increasing reliance on electronic medical records at large medical centers provides unique opportunities to perform population level analyses exploring disease progression and etiology. The massive accumulation of diagnostic, procedure, and laboratory codes in one place has enabled the exploration of co-occurring conditions, their risk factors, and potential prognostic factors. While most of the readily identifiable associations in medical records are (now) well known to the scientific community, there is no doubt many more relationships are still to be uncovered in EMR data. In this paper, we introduce a novel finding index to help with that task. This new index uses data mined from real-time PubMed abstracts to indicate the extent to which empirically discovered associations are already known (i.e., present in the scientific literature). Our methods leverage second-generation p-values, which better identify associations that are truly clinically meaningful. We illustrate our new method with three examples: Autism Spectrum Disorder, Alzheimer’s Disease, and Optic Neuritis. Our results demonstrate wide utility for identifying new associations in EMR data that have the highest priority among the complex web of correlations and causalities. Data scientists and clinicians can work together more effectively to discover novel associations that are both empirically reliable and clinically understudied.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.