ADVERTISEMENT

Methodological Progress Note: Classification and Regression Tree Analysis

Journal of Hospital Medicine 15(9). 2020 September;549-551. Published Online First March 18, 2020 | 10.12788/jhm.3366
Author and Disclosure Information

© 2020 Society of Hospital Medicine

WHEN WOULD YOU USE CART ANALYSIS?

This method can be useful in multiple settings in which an investigator wants to characterize a subpopulation from a larger cohort. Adaptation of this could include, but is not limited to, risk stratification,6 diagnostics,7 and patient identification for medical interventions.8 Moreover, CART analysis has the added benefit of creating visually interpretable predictive models that can be utilized for front-line clinical decision making.9,10

STRENGTHS OF CART ANALYSIS

CART analysis has been shown to have several advantages over other commonly used modeling methods. First, it is a nonparametric model that can handle highly skewed data and does not require that the predictor, or predictors, takes on a predetermined form (allowing them to be constructed from the data). This is helpful as many clinical variables can have wide degrees of variance.

Unlike other modeling techniques, CART can identify higher-order interactions between multiple variables, meaning it can handle interactions that occur whenever one variable affects the nature of an interaction between two other variables. Further, CART can handle multiple correlated independent variables, something logistic regression models classically cannot do.

From a clinical standpoint, the “logic” of the visual-based CART output can be easier to interpret than the probabilistic output (eg, odds ratio) associated with logistic regression modeling, making it more practical, applicable, and easier for clinicians to adopt.10,12 Finally, CART software is easy to use for those who do not have strong statistical backgrounds, and it is less resource intensive than other statistical methods.2

LIMITATIONS OF CART ANALYSIS

Despite these features, CART does have several disadvantages. First, due to the ease with which CART analysis can be performed, “data dredging” can be a significant concern. Its ideal use is with a priori consideration of independent variables.2 Second, while CART is most beneficial in describing links and cutoffs between variables, it may not be useful for hypothesis testing.2 Third, large data sets are needed to perform CART, especially if the investigator is using the split sample approach mentioned above.11 Finally, while CART is the most utilized decision tree methodology, several other types of decision tree methods exist: C4.5, CRUISE, Quick, Unbiased, Efficient Statistical Trees, Chi-square-Automatic-Interaction-Detection, and others. Many of these allow for splitting into more than two groups and have other features that may be more advantageous to one’s analysis.13

WHY DID THE AUTHORS USE CART?

Decision trees offer simple, interpretable results of multiple factors that can be easily applied to clinical scenarios. In this case, the authors specifically used classification tree analysis to take advantage of CART’s machine-learning ability to consider higher-order interactions to build their model—as they lacked a priori evidence to help guide them in traditional (ie, logistic regression) model construction. Furthermore, CART analysis created an output that logically and visually illustrates which combination of characteristics is most associated with discharge placement and can potentially be utilized to help facilitate discharge planning in future hospitalized patients. To sum up, this machine-learning methodology allowed the investigators to determine which variables taken together were the most suitable in predicting their outcome of interest and present these findings in a manner that busy clinicians can interpret and apply.