2012
DOI: 10.1063/1.3675621
|View full text |Cite
|
Sign up to set email alerts
|

Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations

Abstract: This paper addresses how to calculate and interpret the time-delayed mutual information (TDMI) for a complex, diversely and sparsely measured, possibly non-stationary population of time-series of unknown composition and origin. The primary vehicle used for this analysis is a comparison between the time-delayed mutual information averaged over the population and the time-delayed mutual information of an aggregated population (here, aggregation implies the population is conjoined before any statistical estimates… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
43
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 41 publications
(45 citation statements)
references
References 24 publications
2
43
0
Order By: Relevance
“…Time-delayed MI has been successfully employed in a wide range of applications to estimate information flow between two time series (Alonso et al, 2010; Jin et al, 2010; Albers and Hripcsak, 2012), including to quantify the functional linkage between brain areas (Ioannides et al, 2000, 2002a, 2004a,b). Consider two time series X and Y , of the same length, but with the second time series sampled with a delay τ relative to the first.…”
Section: Methodsmentioning
confidence: 99%
“…Time-delayed MI has been successfully employed in a wide range of applications to estimate information flow between two time series (Alonso et al, 2010; Jin et al, 2010; Albers and Hripcsak, 2012), including to quantify the functional linkage between brain areas (Ioannides et al, 2000, 2002a, 2004a,b). Consider two time series X and Y , of the same length, but with the second time series sampled with a delay τ relative to the first.…”
Section: Methodsmentioning
confidence: 99%
“…Recent research has shown that naïve EHR statistical analyses can lead to the reversals of cause and effect [16], induction of spurious signals [17], large errors when predicting optimal drug dosage [18], cancellation of temporal signals when aggregating different cohorts[19, 20, 21], and model distortion when not accounting for redundancy in the narrative part of the EHR [22]. …”
Section: Introductionmentioning
confidence: 99%
“…Our motivation for devising a method for automatically summarizing laboratory data to be used in computational tasks such as phenotyping evolved from four directions: (i) our work on health care process and phenotyping where we observed and documented how the health care influences, confounds, and highlights features that are observable from EHR data [4,1,20,2,21,5,22]; (ii) our Bayesian approach to estimating personalized, time dependent hazard functions that predict the onset of chronic kidney disease—the functions used to model and represent the data were chosen to be Weibull rather than the more standard Gaussian distributions because of the properties of EHR data [18]; (iii) our intuition that the processes generating health care data are relatively sparse [23] and may be summarized and modeled by large contributions from a few dominant features rather than a small contributions from all possible features; and (iv) our work translating phenotypic information to clinical settings where it became clear to us that more simple representations of data, e.g., via single, parameterized families, are more understandable and hence more useful for clinicians than black box prediction [24,25]. In essence, we wanted to find a way to minimize garbage in for machine learning methods, to translate laboratory data to a summary that was simple, faithful, interpretable all while minimizing the amount of human effort necessary to clean and summarize the data and therefore minimizing the resources needed to use EHR data in a high throughput setting.…”
Section: Introductionmentioning
confidence: 99%
“…It has been previously demonstrated that the health care process (Hripcsak and Albers, 2012, 2013), as defined by measurement context (Hripcsak and Albers, 2013; Albers et al, 2012) and measurement patterns (Albers and Hripcsak, 2010, 2012), can influence how EHR data are distributed statistically (Kohane and Weber, 2013; Pivovarov et al, 2014). We construct an algorithm, PopKLD, which is based on information criterion model selection (Burnham and Anderson, 2002; Claeskens and Hjort, 2008), is intended to reduce and cope with health care process biases and to produce an intuitively understandable continuous summary.…”
mentioning
confidence: 99%