Algorithms for Prediction of Clinical Deterioration on the General Wards: A Scoping Review

Journal of Hospital Medicine 16(10). 2021 October;612-619. Published Online First June 25, 2021 | 10.12788/jhm.3630

June 23, 2021|Journal of Hospital Medicine

Roel V Peelen, MD;Yassin Eddahchouri, MD;Mats Koeneman, MSc;Tom H van de Belt, MSc, PhD;Harry van Goor, MD, PhD;Sebastian JH Bredie, MD, PhD

OBJECTIVE: The primary objective of this scoping review was to identify and describe state-of-the-art models that use vital sign monitoring to predict clinical deterioration on the general ward. The secondary objective was to identify facilitators, barriers, and effects of implementing these models.

DATA SOURCES: PubMed, Embase, and CINAHL databases until November 2020.

STUDY SELECTION: We selected studies that compared vital signs–based automated real-time predictive algorithms to current track-and-trace protocols in regard to the outcome of clinical deterioration in a general ward population.

DATA EXTRACTION: Study characteristics, predictive characteristics and barriers, facilitators, and effects.

RESULTS: We identified 1741 publications, 21 of which were included in our review. Two of the these were clinical trials, 2 were prospective observational studies, and the remaining 17 were retrospective studies. All of the studies focused on hospitalized adult patients. The reported area under the receiver operating characteristic curves ranged between 0.65 and 0.95 for the outcome of clinical deterioration. Positive predictive value and sensitivity ranged between 0.223 and 0.773 and from 7.2% to 84.0%, respectively. Input variables differed widely, and predicted endpoints were inconsistently defined. We identified 57 facilitators and 48 barriers to the implementation of these models. We found 68 reported effects, 57 of which were positive.

CONCLUSION: Predictive algorithms can detect clinical deterioration on the general ward earlier and more accurately than conventional protocols, which in one recent study led to lower mortality. Consensus is needed on input variables, predictive time horizons, and definitions of endpoints to better facilitate comparative research.

There are several important limitations across these studies. In a clinical setting, these models would function as a screening test. Almost all studies report an AUROC; however, sensitivity and PPV or NNE (defined as 1/PPV) may be more useful than AUROC when predicting low-frequency events with high-potential clinical impact.⁴⁴ Assessing the NNE is especially relevant because of its relation to alarm fatigue and responsiveness of clinicians.⁴³ Alarm fatigue and lack of adequate response to alarms were repeatedly cited as potential barriers for application of automated scores. A more useful metric might be NNE over a certain timeframe and across a specified number of patients to more clearly reflect associated workload. Future studies should include these metrics as indicators of the usability and clinical impact of predictive models. This review could not assess PPV or NNE systematically due to inconsistencies in the reporting of these metrics.

Although the results of our scoping review are promising, there are limited data on clinical outcomes using these algorithms. Only three of five algorithms were used to guide clinical decision-making.^25,27,35 Kollef et al²⁷ showed shorter hospitalizations and Evans et al³⁵found decreased mortality rates in a multimorbid subgroup. Escobar et al²⁵ found an overall and consistent decrease in mortality in a large, heterogenic population of inpatients across 21 hospitals. While Escobar et al’s findings provide strong evidence that predictive algorithms and structured follow-up on alarms can improve patient outcomes, it recognizes that not all facilities will have the resources to implement them.²⁵ Dedicated round-the-clock follow-up of alarms has yet to be proven feasible for smaller institutions, and leaner solutions must be explored. The example set by Escobar et al²⁵ should be translated into various settings to prove its reproducibility and to substantiate the clinical impact of predictive models and structured follow-up.

According to expert opinion, the use of high-frequency or continuous monitoring at low-acuity wards and AI algorithms to detect trends and patterns will reduce failure-to-rescue rates.^4,9,43 However, most studies in our review focused on periodic spot-checked vital signs, and none of the AI algorithms were implemented in clinical care (Appendix Table 1). A significant barrier to implementation was uncertainty surrounding how to react to generated alarms.^9,45 As algorithms become more complex and predict earlier, interpretability and causality in general can diminish, and the response to this type of alarm will be different from that of an acute warning from an EWS.

The assessment of predictive algorithm protocols must include their impact on clinical workflow, workload, and resource utilization. Earlier detection of deterioration can potentially allow coordinated alarm follow-up and lead to more efficient use of resources.^{20,21,31,43,46,47}

Greater numbers of variables do not always improve the quality of monitoring. For example, in one study, an algorithm combining only heart rate, respiration rate, and age outperformed an EWS that tracked six vital sign measures.²³ Algorithms using fewer variables may facilitate more frequent and less complex error-sensitive monitoring. Leaner measurements may also lead to higher patient and clinician acceptance.^43,45The end goal of implementing predictive algorithms on the general ward is to provide timely, reliable, and actionable clinical decision support.⁴³ As shown in a recent study by Blackwell et al,⁴⁸ multiple prediction models for specific clinical events may increase interpretability and performance. Disease-specific algorithms may complement general algorithms for clinical deterioration and enhance overall performance.