Specific Aims

From Phenotype Modeling and Execution Architecture
Jump to: navigation, search

The identification of patient cohorts for clinical and genomic research is a costly and time-consuming process. This bottleneck adversely affects public health by delaying research findings, and in some cases by making research costs prohibitively high. To address this issue, leveraging electronic health records (EHRs) for identifying patient cohorts has become an increasingly attractive option. With the rapidly growing adoption of EHR systems due to Meaningful Use, and linkage of EHRs to research biorepositories, evaluating the suitability of EHR data for clinical and translational research is becoming ever more important, with ramifications for genomic and observational research, clinical trials, and comparative effectiveness studies. A key component for identifying patient cohorts in the EHR is to define inclusion and exclusion criteria that algorithmically select sets of patients based on stored clinical data. This process is commonly referred to as “EHR-driven phenotyping”. Phenotypes are defined over both structured data (demographics, diagnoses, medications, lab measurements) as well as unstructured clinical text (radiology reports, encounter notes, discharge summaries). Phenotyping logic can be quite complex, and typically includes both Boolean and temporal operators applied to multiple clinical events. In general, the phenotyping algorithm development process is a multi-disciplinary team effort, including clinicians, domain experts, and informaticians, and is operationalized as database queries and software, customized to the local EHR environment. The typical way to share phenotyping algorithms across institutions is through the use of informal free text descriptions of algorithm logic, possibly augmented with graphical flowcharts and simple lists of structured codes. This is due to the lack of a widely accepted and standards-based formal information model for defining phenotyping algorithms. However, implementing a phenotyping algorithm from a free-text description is itself an error-prone and time-consuming process, due to the inherent ambiguities of free text as well as the necessity for human intermediaries to map algorithmic criteria expressed as free text to database queries and code.

To help overcome these challenges, the proposed project will design, build and promote an open-access community infrastructure for standards-based development and sharing of phenotyping algorithms, as well as provide tools and resources for investigators, researchers and their informatics support staff to implement and execute the algorithms on native EHR data.


By participating in several DHHS/NIH funded projects (eMERGE, SHARPn, PGRN, NCBO, i2b2, Beacon, caBIG), our multidisciplinary team is demonstrably experienced in applying emerging informatics tools and techniques to clinical research, and is uniquely positioned to pursue the proposed research. In particular, we will accomplish the following Specific Aims in this proposal:

Phase 1

Aim 1

To create a standards-based information model for representing phenotyping algorithms.
We will investigate and adapt, where necessary, the Quality Data Model (QDM) from the National Quality Forum (NQF) for the modeling and representation of phenotyping algorithms. Expressed using HL7 Health Quality Measure Format (HQMF), the QDM provides the syntax, grammar and a set of basic logical and temporal operators to unambiguously articulate phenotype definition criteria. Consulting with a panel of clinical phenotyping experts, we will identify phenotypes of interest and implement them using QDM. We will propose extensions to QDM as necessary, relying upon existing standards whenever possible.

Aim 2

To create an open-access repository and infrastructure for authoring, sharing and accessing computable, standardized phenotyping algorithms.
We will leverage and extend the open-access, community-based PheKB (Phenotype Knowledgebase) collaborative platform, developed within the eMERGE consortium, to author, validate and share QDM-based phenotyping algorithms. This platform will be a national resource for the creation, demonstration, evaluation and evolution of phenotyping algorithms and associated tools for enabling clinical and translational research. For authoring algorithms, we will leverage NQF’s webbased Measure Authoring Tool (MAT), which will be extended for phenotyping algorithm development, providing a user-friendly way of generating documents in the QDM-based specification developed in Aim 1. In collaboration with eMERGE, SHARPn, PGRN and i2b2 investigators, we will invite and support other organizations that also wish to utilize and evaluate this tool.

Aim 3

To develop informatics methods and tools for translating phenotyping algorithmic criteria into EHR-based executable queries.
We will develop tools and resources for automatic translation of QDM-based phenotyping algorithms into executable code and scripts that can be implemented on existing EHR data. In particular, we will investigate open-source data analytics and business logic platform--JBoss® Drools business rules management system—for this task by mapping formal representations of algorithms to executable code. Our objective will be to develop and evaluate automated mappings from all algorithms defined in Aim 1 to heterogeneous EHR systems at Mayo Clinic, Northwestern, and Vanderbilt. We will also engage other academic medical centers and CTSA sites, and provide implementation support for those who wish to conduct evaluations of the phenotype algorithm authoring and execution platform.

Phase 2

Aim 1

To harden PhEMA by supporting computable phenotype execution using OMOP CDM, PCORnet CDM and FHIR.

We will develop new methods for expanding PhEMA phenotype execution capabilities by supporting OMOP CDM, PCORnet CDM and FHIR. We will implement RESTful services that automatically transform QDM-based phenotype definitions into executable queries conformant with these data models. We will benchmark query performance and run-time execution for multiple eMERGE and PCORnet phenotypes.

Aim 2

To extend PhEMA by creating a standards-based information model for representing deep phenotypes.

We will implement extensions to QDM for modeling natural language processing (NLP) logic for extracting high-resolution phenotypes from clinical narratives. This will include, but be not limited to, note section identification, concept extraction, conceptual modifier identification, and relation extraction. We will convene a panel of phenotyping experts to arrive at a consensus set of extensions, use them to define and identify deep phenotypes of interest across multiple EHR systems, and validate our results.

Aim 3

To scale PhEMA by implementing deep learning methods for computational phenotyping.

We will develop methods for unbiased, data-driven phenotype discovery by investigating advanced deep learning techniques. We will compare the performance of such computational approaches with gold-standard expert designated rule-based phenotyping algorithms developed across eMERGE and PCORnet, and incorporate deep learning capabilities for phenotype algorithm authoring and execution in PhEMA.