ADVERTISEMENT

Gender and racial biases in Press Ganey patient satisfaction surveys

OBG Management. 2023 July;35(7):SS13-SS19 | doi: 10.12788/obgm.0294
Author and Disclosure Information

Are we collecting the best data, and enough of it, to evaluate physician performance based on patient satisfaction?

Gender, race, and age bias

Although the rationale behind gathering patient input is important, recent data suggest that patient satisfaction surveys are subject to inherent biases.6,13,14 These biases tend to negatively impact women and non-White physicians, adding to the systemic discrimination against women and physicians of color that already exists in health care.

In a single-site retrospective study performed in 2018 by Rogo-Gupta et al, female gynecologists were found to be 47% less likely to receive top patient satisfaction scores than their male counterparts owing to their gender alone, suggesting that gender bias may impact the results of patient satisfaction questionnaires.13 The authors encouraged that the results of patient satisfaction surveys be interpreted with great caution until the impact on female physicians is better understood.

A multi-center study by the same group (Rogo-Gupta et al) assessed the same construct across 5 different geographically diverse institutions.15 This study confirmed that female gynecologists were less likely to receive a top satisfaction score from their patients (19% lower odds when compared with male gynecologists). They also studied the effects of other patient demographics, including age, race/ethnicity, and race concordance. Older patients (aged ≥63 years) had an over-3-fold increase in odds of providing a top satisfaction score than younger patients. Additionally, Asian physicians had significantly lower odds of receiving a top satisfaction score when compared with White physicians, while Asian patients had significantly lower odds of providing a top satisfaction score when compared with White patients. Lastly, in most cases, when underrepresented-in-medicine patients saw an underrepresented-in-medicine physician (race concordance), there was a significant increase in odds of receiving a top satisfaction score. Asian race concordance, however, actually resulted in a lower likelihood of receiving a top satisfaction score.15

Literature from other specialties supports these findings. These results are consistent with emerging data from other medical specialties that also suggest that Press Ganey survey data are subject to inherent biases. For example, data from emergency medicine literature have shown discrepancies between patient satisfaction for providers at tertiary inner-city institutions versus those in affluent suburban populations,16 and that worse mortality is actually correlated with better patient satisfaction scores, and vice versa.17

Another study by Sotto-Santiago in 2019 assessed patient satisfaction scores in multiple specialties at a single institution where quality-related financial incentives were offered based on this metric. They found a significant difference in patient satisfaction scores between underrepresented and White physicians, which suggests a potential bias among patients and institutional practices—ultimately leading to pay inequities through differences in financial incentives.18

Percentile differences reveal small gaps in satisfaction ratings

When examining the difference between raw Press Ganey patient satisfaction data and the percentiles associated with these scores, an interesting finding arises. Looking at the 2023 multicenter study by Rogo-Gupta et al, the difference in the top raw scores between male and female gynecologists appears to be small (3.3%).15 However, in 2020, the difference in top scores separating the top (75th) and bottom (25th) percentile quartiles of physicians was also small, at only 6.9%.

Considering the percentiles, if a provider who scores in the 25th percentile is compared with a colleague who scores in the 75th percentile, they may think the reported satisfaction score differences were quite large. This may potentially invoke feelings of decreased self-worth, negatively impact their professional identity or overall well-being, and they may seek (or be told to seek) improvement opportunities. Now imagine the provider in question realizes the difference between the 25th percentile and 75th percentile is actually only 6.9%. This information may completely change how the results are interpreted and acted upon by administrators. This is further changed with the understanding that 3.3% of the difference may be due to gender alone, narrowing the gap even further. Providers would become understandably frustrated if measures of success such as reimbursement, financial bonus or incentives, promotion, or advancement are linked to these results. It violates the value of fairness and does not offer an equitable starting point.

Evolution of the data distribution. Another consideration, as noted by Robert C. Lloyd, PhD, one of the statisticians who helped develop the percentile statistical analysis mapping in 1985, is that it was based on a classic bell-shaped distribution of patient satisfaction survey scores.19 Because hospitals, medical groups, and physicians have been working these past 20 years to achieve higher Press Ganey scores, the data no longer have a bell-shaped distribution. Rather, there are significant clusters of raw scores at the high end with a very narrow response range. When these data are mapped to the percentile spectrum, they are highly inaccurate.19

Impact of sample size. According to Press Ganey, a minimum of 30 survey responses collected over the designated time period is necessary to draw meaningful conclusions of the data for a specific individual, program, or hospital. Despite this requirement to achieve statistical significance, Sullivan and DeLucia found that the firm often provides comparative data about hospital departments and individual physicians based on a smaller sample size that may create an unacceptably large margin of error.20 Sullivan, for example, said his department may only have 8 to 10 Press Ganey survey responses per month and yet still receives monthly reports from the company analyzing the data. Because of the small sample size, 1 month his department ranked in the 1st percentile and 2 months later it ranked in the 99th percentile.20

The effect of a high ceiling rate. A psychometrics report for the Press Ganey survey is available from the vendor that provides vague assessments of reliability and validity based on 2,762 surveys from 12 practices across 10 states. This report describes a 12-question version of the survey with “no problems encountered” with missingness and response variability. The report further states that the Press Ganey survey demonstrates construct, convergent, divergent, and predictive validities, and high reliability; however, these data are not made available.1

In response to this report, Presson et al analyzed more than 34,000 surveys from one institution to evaluate the reliability and validity of the Press Ganey survey.21 Overall, the survey demonstrated suitable psychometric properties for most metrics. However, Presson et al noted a significantly high ceiling rate of 29.3% for the total score, which ranged from 55.4% to 84.1% across items.21 (Ceiling rates are considered substantial if they occur more than 20% of the time.) Ultimately, a high ceiling rate reduces the power to discriminate between patients who have high satisfaction (everyone is “happy”) with those who are just slightly less than happy, but not dissatisfied. This data quality metric can impact the reliability and validity of a survey—and substantial ceiling rates can notably impact percentile rankings of scores within an institution, offering a possible explanation for the small percentage change between the top and bottom percentiles.

Continue to: Other issues with surveys...