Purpose: To establish a gold-standard methodology for accurately extracting progression-free survival (PFS) following Diffuse Large B-Cell Lymphoma (DLBCL) treatment using real-world electronic healthcare record (EHR) data.
Background: Randomized controlled trials using response evaluation criteria have long served as the gold standard for assessing response to therapy and PFS. However, characteristics of participants in clinical trials do not reflect the overall patient population, and formal response evaluation criteria are not used in realworld contexts. Furthermore, real-world data are often unstructured, preventing accurate comparison of PFS using structured clinical trial data versus real-world data, and existing approaches define PFS inconsistently. Despite the importance of assessing PFS in patients outside of controlled clinical trials, no goldstandard method for collecting and validating PFS from real-world evidence has been established.
Methods: Clinicians, programmers, and data scientists collaborated to develop an R Shiny10 application using Veterans Affairs Corporate Data Warehouse data from the EHR of 352 DLBCL patients. The application takes unstructured data such as clinical notes and facilitates the capture, annotation, and tagging of key words or phrases indicative of progression, thus allowing accurate determination of the date of first identification of progression by a treating clinician.
Data Analysis : In order to refine data-collection techniques and evaluate whether the application can enable calculation of real-world PFS, we conducted an adaptive and iterative process of reviewing EHR documents and capturing and annotating data until a consistent schema and methodology was established. In order to validate annotation schema and methodology, annotations of 50 patient records were performed by 2 annotators and assessed for concordance.
Results: We produced an R Shiny application that can capture, annotate, and transform unstructured EHR data into structured data—specifically, treatment lines, cycles, and response criteria with corresponding dates—ready for analysis of PFS. An annotation schema for capturing real-world data was also developed. Mapping of common phrases used by clinicians in real-world practice to response criteria resulted in a dictionary of these phrases.
Implications: These efforts show that it is possible to convert EHR context reliably into analyzable data such as PFS. Further attempts will be made to establish a gold-standard methodology.