Evaluating automated rules for rapid response system alarm triggers in medical and surgical patients

Journal of Hospital Medicine 12(4). 2017 April;217-223 | 10.12788/jhm.2712

March 27, 2017|Journal of Hospital Medicine

Santiago Romero-Brufau, MD;Bruce W. Morlan, MS;Matthew Johnson, MPH;Joel Hickman;Lisa L. Kirkland, MD;James M. Naessens, ScD;Jeanne Huddleston, MD

BACKGROUND

The use of rapid response systems (RRS), which were designed to bring clinicians with critical care expertise to the bedside to prevent unnecessary deaths, has increased. RRS rely on accurate detection of acute deterioration events. Early warning scores (EWS) have been used for this purpose but were developed using heterogeneous populations. Predictive performance may differ in medical vs surgical patients.

OBJECTIVE

To evaluate the performance of published EWS in medical vs surgical patient populations.

DESIGN

Retrospective cohort study.

SETTING

Two tertiary care academic medical center hospitals in the Midwest totaling more than 1500 beds.

PATIENTS

All patients discharged from January to December 2011.

INTERVENTION

None.

MEASUREMENTS

Time-stamped longitudinal database of patient variables and outcomes, categorized as surgical or medical. Outcomes included unscheduled transfers to the intensive care unit, activation of the RRS, and calls for cardiorespiratory resuscitation (“resuscitation call”). The EWS were calculated and updated with every new patient variable entry over time. Scores were considered accurate if they predicted an outcome in the following 24 hours.

RESULTS

All EWS demonstrated higher performance within the medical population as compared to surgical: higher positive predictive value (P < .0001 for all scores) and sensitivity (P < .0001 for all scores). All EWS had positive predictive values below 25%.

CONCLUSIONS

The overall poor performance of the evaluated EWS was marginally better in medical patients when compared to surgical patients. Journal of Hospital Medicine 2017;12:217-223. © 2017 Society of Hospital Medicine

Data Sources

We developed a time-stamped longitudinal database of patient data from the electronic health record, including vital signs, laboratory test results, demographics (age, sex), administrative data (including length of stay), comorbidities, resuscitation code status, location in hospital, and at the minute level throughout each patient’s hospital stay. Physiologically impossible values (eg, blood pressures of 1200 mm Hg) were considered entered in error and eliminated from the database. Time spent in the OR or ICU was excluded because RRS activation would not be applied in these already highly monitored areas. SAS Statistical software (SAS Institute Inc. Cary, North Carolina) was used for database creation.

We applied the current RRS calling criteria in our institution and calculated the Kirkland score,¹¹ along with some of the most widely used early warning scores:¹² Modified Early Warning System (MEWS),¹³ Standardized Early Warning Scoring System (SEWS),¹⁴ Global Modified Early Warning Score (GMEWS),¹⁵ Worthing physiologic scoring system,¹⁶ National Early Warning Score (NEWS),¹⁷ and VitaPAC Early Warning Score (ViEWS).¹⁸ Published thresholds for these scores were used to create rule triggers in the data. Once a trigger was created to calculate the number of false positives and true positives, all subsequent triggers were ignored until the end of the episode or until 24 hours elapsed. We calculated triggers in a rolling fashion throughout the episodes of care. The EWS score was updated every time a new parameter was entered into the analytical electronic health record, and the most recent value for each was used to calculate the score. SAS statistical software was used for calculation of scores and identification of outcomes.

For our analysis, events were treated as dependent variables, and triggers were independent variables. We calculated the score for each EWS to the minute level throughout our retrospective database. If the score for a specific EWS was higher than the published/recommended threshold for that EWS, an alert was considered to have been issued, and the patient was followed for 24 hours. If the patient had an event in the subsequent 24 hours, or 1 hour before (1-hour look-back), the alert was considered a true positive; if not, a false positive. Events that were not preceded by an alert were false negatives, and 24-hour intervals without either an alert or an event were considered true negatives. This simulation exercise was performed for each EWS in both subcohorts (medical and surgical). Clusters of RRS calls followed by transfers to the ICU within 3 hours were considered as a single adverse event (RRS calls, as it was the first event to occur) to avoid double counting. We have described how well this simulation methodology,⁸ correlates with results from prospective studies.¹⁹

Statistical Analysis

To calculate whether results were statistically significant for subgroups, a jackknife method of calculating variance²⁰ was used. The jackknife method calculates variance by repeating the calculations of the statistic leaving out 1 sample at a time. In our case, we repeated the calculation of sensitivity and PPV leaving out 1 patient at a time. Once the simulation method had been run and the false/true positives/negatives had been assigned, calculation of each metric (PPV and sensitivity) was repeated for n subsamples, each leaving out 1 patient. The variance was calculated and 2 Student t tests were performed for each EWS: 1 for PPV and another for sensitivity. SAS statistical software v 9.3 was used for the simulation analysis; R statistical software v 3.0.2 (The R Foundation, Vienna, Austria) was used for the calculation of the statistical significance of results. A univariable analysis was also performed to assess the sensitivity and PPVs for the published thresholds of the most common variables in each EWS: respiratory rate, systolic blood pressure, heart rate, temperature, and mental status as measured by the modified Richmond Agitation Sedation Score.²¹

RESULTS

The initial cohort included 60,020 hospitalizations, of which the following were excluded: 2751 because of a lack of appropriate research authorization; 6433 because the patients were younger than 18 years; 2129 as psychiatric admissions; 284 as rehabilitation admissions; 872 as research purposes-only admissions; and 1185 because the patient was never in a general care bed (eg, they were either admitted directly to the ICU, or they were admitted for an outpatient surgical procedure and spent time in the postanesthesia care unit).

Table 1 summarizes patient and trigger characteristics, overall and by subgroup. The final cohort included 75,240 total episodes in 46,366 hospitalizations, from 34,898 unique patients, of which 48.7% were male. There were 23,831 medical and 22,535 surgical hospitalizations. Median length of episode was 2 days both for medical and surgical patients. Median length of stay was 3 days, both for medical and for surgical patients.

Patient Characteristics, Events, and Triggers — Table 1

There were 3332 events in total, of which 1709 were RRS calls, 185 were resuscitation calls, and 1438 were unscheduled transfers to the ICU. The rate of events was 4.67 events per 100 episodes in the aggregate adult population. There were 3.93 events per 100 episodes for surgical hospitalizations, and 5.86 events per 100 episodes for medical hospitalizations (P < .001). The number of CRAs in our cohort was 0.27 per 100 episodes, 0.128 per hospital bed per year, or 4.37 per 1000 hospital admissions, similar to other reported numbers in the literature.^{3, 22,23}

The total number of EWS triggers varied greatly between EWS rules, with the volume ranging during the study year from 1363 triggers with the GMEWS rule to 77,711 triggers with the ViEWS score.

Performance of scores in medical and surgical patients — Figure

All scores had PPVs less than 25%. As seen in Table 2 and shown graphically in the Figure, all scores performed better on medical patients (blue) than on surgical patients (yellow). The P value was < .0001 for both PPV and sensitivity. The Worthing score had the highest sensitivity (0.78 for medical and 0.68 for surgical) but a very low PPV (0.04 for medical and 0.03 for surgical), while GMEWS was the opposite: low sensitivity (0.10 and 0.07) but the highest PPV (0.22 and 0.18).

Comparison of the Predictive Performance of Widely Used EWS in a Surgical and a Medical Population — Table 2

The results of the univariable analysis can be seen in Table 3. Most of the criteria performed better (higher sensitivity and PPV) as predictors in the medical hospitalizations than in the surgical hospitalizations.

Univariable Analysis in the Medical and Surgical Subpopulations — Table 3

References

Download Article