[3,4,5] by adding performance matrices to differentiate between ischemic and non-ischemic heart-rate related ST segment episodes for use with the LTST DB.
Performance measuresEvaluation of an ST segment episode detection algorithm should answer the following questions:• How well are ST episodes detected?• How well are ischemic and non-ischemic heart-rate related ST episodes differentiated?• How reliably are ST episode or ischemic ST episode duration measured?• How accurately are ST deviations measured?• How well will the ST algorithm perform in the real world? Transient ST segment episodes (the events of interest) are characterized by: 1) number, 2) length, and 3) extrema deviation. When evaluating multi-channel ST-algorithm performance, the ST annotation stream for all leads must be combined into one reference stream using a logical OR function. The fact that at any given time there is either an ST episode or an interval with no ST deviation implies the use of two-by-two performance evaluation matrices. We further assume that all ST episodes are equally important. Evaluation of ST episode detection algorithms consists of comparing algorithm-annotated episodes with referenceannotated episodes. Algorithm-and reference-annotated episodes may differ considerably in length, there is not a one-to-one correspondence between the episodes from the two groups, nor non-events can be counted.Sensitivity matrix (see figure 1, left) summarizes how the reference ischemic ST episodes were labelled by the algorithm, i.e., how many of the reference ST episodes were detected, TP S , and how many were missed, FN . The positive predictivity matrix (figure 1, right) summarizes how many of the algorithm-annotated ST episodes were actually ST episodes, TP P , and how many were falsely detected, FP . The performance measures to assess ability to detect ST episodes depend on the concept of matching [3]. In measuring sensitivity, we declared that matching of a reference ST episode occurs when the period of overlap includes at least one of the extrema of the reference ST episode, or at least one-half of the length of the reference ST episode. In measuring positive predictivity,