Early Warning Systems: The Neglected Importance of Timing
© 2019 Society of Hospital Medicine
a) 50% true cases of sepsis, with a mean time of 35 minutes after clinical time zero;
b) 50% true cases, with a mean time of 60 minutes before clinical time zero (prediction EWS);
c) 50% true cases of sepsis, with a mean time of 1.3 days since clinical time zero, but with 70% of these cases undiagnosed at the time of EWS detection;
d) 50% true cases of cases, with mean time of 1.3 days since clinical time zero, that is, all cases among those promptly detected and treated through routine clinician oversight.
Each of these situations features differing clinical utility to help meet the hospital objective of increasing early administration of antibiotics. More generally, three dimensions of timing are important for detection systems. The first dimension is the timing of detection relative to time zero. The second is the timing relative to ”real-world” clinician detection. The third is timing with respect to the associated clinical objective. For a given PPV, an EWS performs better when detecting a state (1) at, near, or in advance of time zero, (2) prior to clinician detection, and (3) sufficiently in advance of an operational objective to promote change. On the other hand, when an EWS consistently sends alerts after clinician action, it serves a lesser purpose and risks causing alert fatigue; such cases have been described in studies.15
OPERATIONALIZING TIMING IN EWS TRAINING AND EVALUATION
Acknowledging the importance of timing features implications for researchers and health system leaders. Researchers who develop EWS should include how these systems perform relative to both time zero and critical milestones in the clinical course. Operational leadership should understand the trade-offs that occur between alert fatigue (through lower PPV at the margin with earlier detection) and lead time to implement an intervention. Navigating these trade-offs involves a complex organizational decision. The “number needed to evaluate” is one way to quantify this fatigue factor.16 Such a measure gives a sense of the number of cases a clinician will need to evaluate per event. Collaborations between clinical leadership, operational leadership, and data scientists are needed to determine how to evaluate individual systems.
A good metric should capture the three important dimensions of timing while retaining intuitiveness to clinicians and leadership. One graphical option involves plotting the PPVs over time and relative to the clinical state evolution (Figure). This PPV-over-time curve shows when true positives occur relative to the time course of sepsis, including the three major dimensions of timing. This curve can also show a “clinically important window (CIW)”, which is bounded on the right by the latest point in time when recognition could still meet the clinical objective. For sepsis, the curve might be bounded at 2.5 hours to meet an objective of antibiotics within three hours, with the assumption that 0.5 hour is needed for a response. For detection systems, the window would be bounded on the left by clinical time zero. The graph can also designate the point when most cases of sepsis have been recognized clinically with historical data. The Figure depicts an example curve for a detection model.
The metrics derived from this curve may be used alongside the PPV for training and evaluation. Often, adjusting the PPV for its relationship to time zero and the CIW will aid in recognizing the existence of a time beyond which detection fails to help achieve the intended intervention. Detection beyond the window should not credited as a true positive if it fails to facilitate the objective. One option is to credit detection at or before time zero as one and discount later detection by the delay from time zero. More specifically, a true positive could be discounted by the difference between the end of the CIW and the moment of detection divided by the CIW length. This discounted PPV could be displayed alongside the PPV to gauge the temporal dimension of performance and be used for training.
The use of timing places additional demands on validation owing to the need for a time-based gold standard. In such a case, the unit of analysis in system development might not be the patient encounter but rather the patient-hour or patient-15-minute epoch, depending on how frequently the EWS updates risk information and may alert. By contrast, the sepsis detection models used in administrative databases rely on an encounter-level PPV, which provides more limited information compared with real-time EWSs.17 When time zero cannot be measured, alternatives may be used to capture several dimensions of timing; these alternatives include measurement of the percentage of cases that recognize the event prior to clinicians.15