The identification of patient cohorts for clinical and genomic research is a costly and time-consuming process. This bottleneck adversely affects public health by delaying research findings, and in some cases by making research costs prohibitively high. To address this issue, leveraging electronic health records (EHRs) for identifying patient cohorts has become an increasingly attractive option. With the rapidly growing adoption of EHR systems due to Meaningful Use, and linkage of EHRs to research biorepositories, evaluating the suitability of EHR data for clinical and translational research is becoming ever more important, with ramifications for genomic and observational research, clinical trials, and comparative effectiveness studies. A key component for identifying patient cohorts in the EHR is to define inclusion and exclusion criteria that algorithmically select sets of patients based on stored clinical data. This process is commonly referred to as “EHR-driven phenotyping”. Phenotypes are defined over both structured data (demographics, diagnoses, medications, lab measurements) as well as unstructured clinical text (radiology reports, encounter notes, discharge summaries). Phenotyping logic can be quite complex, and typically includes both Boolean and temporal operators applied to multiple clinical events. In general, the phenotyping algorithm development process is a multi-disciplinary team effort, including clinicians, domain experts, and informaticians, and is operationalized as database queries and software, customized to the local EHR environment. The typical way to share phenotyping algorithms across institutions is through the use of informal free text descriptions of algorithm logic, possibly augmented with graphical flowcharts and simple lists of structured codes. This is due to the lack of a widely accepted and standards-based formal information model for defining phenotyping algorithms. However, implementing a phenotyping algorithm from a free-text description is itself an error-prone and time-consuming process, due to the inherent ambiguities of free text as well as the necessity for human intermediaries to map algorithmic criteria expressed as free text to database queries and code.
- To help overcome these challenges, the proposed project will design, build and promote an open-access community infrastructure for standards-based development and sharing of phenotyping algorithms, as well as provide tools and resources for investigators, researchers and their informatics support staff to implement and execute the algorithms on native EHR data.
By participating in several DHHS/NIH funded projects (eMERGE, SHARPn, PGRN, NCBO, i2b2, Beacon, caBIG), our multidisciplinary team is demonstrably experienced in applying emerging informatics tools and techniques to clinical research, and is uniquely positioned to pursue the proposed research. In particular, we will accomplish the following Specific Aims in this proposal:
- To create a standards-based information model for representing phenotyping algorithms.
- We will investigate and adapt, where necessary, the Quality Data Model (QDM) from the National Quality Forum (NQF) for the modeling and representation of phenotyping algorithms. Expressed using HL7 Health Quality Measure Format (HQMF), the QDM provides the syntax, grammar and a set of basic logical and temporal operators to unambiguously articulate phenotype definition criteria. Consulting with a panel of clinical phenotyping experts, we will identify phenotypes of interest and implement them using QDM. We will propose extensions to QDM as necessary, relying upon existing standards whenever possible.
- To create an open-access repository and infrastructure for authoring, sharing and accessing computable, standardized phenotyping algorithms.
- We will leverage and extend the open-access, community-based PheKB (Phenotype Knowledgebase) collaborative platform, developed within the eMERGE consortium, to author, validate and share QDM-based phenotyping algorithms. This platform will be a national resource for the creation, demonstration, evaluation and evolution of phenotyping algorithms and associated tools for enabling clinical and translational research. For authoring algorithms, we will leverage NQF’s webbased Measure Authoring Tool (MAT), which will be extended for phenotyping algorithm development, providing a user-friendly way of generating documents in the QDM-based specification developed in Aim 1. In collaboration with eMERGE, SHARPn, PGRN and i2b2 investigators, we will invite and support other organizations that also wish to utilize and evaluate this tool.
- To develop informatics methods and tools for translating phenotyping algorithmic criteria into EHR-based executable queries.
- We will develop tools and resources for automatic translation of QDM-based phenotyping algorithms into executable code and scripts that can be implemented on existing EHR data. In particular, we will investigate open-source data analytics and business logic platform--JBoss® Drools business rules management system—for this task by mapping formal representations of algorithms to executable code. Our objective will be to develop and evaluate automated mappings from all algorithms defined in Aim 1 to heterogeneous EHR systems at Mayo Clinic, Northwestern, and Vanderbilt. We will also engage other academic medical centers and CTSA sites, and provide implementation support for those who wish to conduct evaluations of the phenotype algorithm authoring and execution platform.