Difference between revisions of "Data Normalization"

From SHARP Project Wiki
Jump to navigationJump to search
Line 7: Line 7:
'''[[Project_1_Releases|Releases]]''' - Download, install, configure, and use the software produced.
'''[[Project_1_Releases|Releases]]''' - Download, install, configure, and use the software produced.
<br>[[Data Normalization Pipeline 1.0]]
'''[[CEMS|Clinical Element Models]] - CEMs are at the core of data normalization.
'''[[Project_1_Presentations|Presentations]]''' - Presentations made or found during the coarse of this grant that are relevant to this project.
'''[[Project_1_Presentations|Presentations]]''' - Presentations made or found during the coarse of this grant that are relevant to this project.

Revision as of 18:40, 29 October 2012

“The complexity of modern medicine exceeds the inherent limitations of the unaided human mind” – David M. Eddy, MD, PhD{{#ev:youtube| 1OSHKdNYYR8 }} Dr. Huff on Data Normalization Stanley M. Huff, M.D.; SHARPn Co-Principal Investigator; Professor (Clinical) - Biomedical Informatics at University of Utah - College of Medicine and Chief Medical Informatics Officer Intermountain Healthcare. Dr. Huff discusses the need to provide patient care at the lowest cost with advanced decision support requires structured and coded data.


Releases - Download, install, configure, and use the software produced.

Clinical Element Models - CEMs are at the core of data normalization.

Presentations - Presentations made or found during the coarse of this grant that are relevant to this project.

Documents - Documents created by or used by this project.

References - Additional resources relevant to this project.

Introduction to Data Normalization

Clinical Data Normalization

Clinical data comes in all different forms even for the same piece of information. For example, age could be reported as 40 years for an adult, 18 months for a toddler or 3 days for an infant. Database normalization of clinical data fields in general fosters a design that allows for efficient storage avoiding duplication or repetition of data; data querying becomes easier. Without normalization, data can’t be used as a single a dataset.

Un-normalized Normalized (days) Normalized (months)
40 years 1436 47
18 months 543 18
3 days 3 0.1
The Need for Clinical Models

Detailed clinical models are the basis for retaining computable meaning when data is exchanged between heterogeneous computer systems. Detailed clinical models are also the basis for shared computable meaning when clinical data is referenced in decision support logic.

  • The need for the clinical models is dictated by what we want to accomplish as providers of health care
  • The best clinical care requires the use of computerized clinical decision support and automated data analysis
  • Clinical decision support and automated data analysis can only function against standard structured coded data
  • The detailed clinical models provide the standard structure and terminology needed for clinical decision support and automated data analysis

Data normalization & Clinical Models are at the heart of secondary use of clinical data. If the data is not comparable between sources, it can’t be aggregated into large datasets and used for example to reliably to answer research questions or survey populations from multiple health organizations. Without models, there becomes too many ways to say the same thing.

For more details, see our information on Clinical Element Models (CEMs).

Practical modeling issues: Representing coded and structured patient data in EHR systems
Stanley Huff
AMIA Annual Symposium
October 22, 2011
Clinical Use Cases

In all of these situations, the goal is not just to have the data available for humans to read and understand, but to have the data structured and coded in a way that will allow computers to understand and use the information.

  • Data sharing
  • Real time decision support
  • Sharing of decision logic
  • Direct assignment of billing codes
  • Bio-surveillance
  • Data analysis and reporting
    • Reportable diseases
    • HEDIS measurements
    • Quality improvements
    • Adverse drug events
  • Clinical research
    • Clinical trials
    • Continuous quality improvement
Real time, patient specific, decision support
  • Alerts
    • Potassium and digoxin
    • Coagulation clinic
  • Reminders
    • Mammography
    • Immunizations
  • Protocols
    • Ventilator weaning
    • ARDS protocol
    • Prophylactic use of antibiotics in surgery
  • Advising
    • Antibiotic assistant
  • Critiquing
    • Blood ordering
  • Interpretation
    • Blood gas interpretation
  • Management – purpose specific aggregation and presentation of data
    • DVT management
    • Diabetic report
What Needs to be Modeled?

All data in the patient’s EMR, including:

  • Allergies
  • Problem lists
  • Laboratory results
  • Medication and diagnostic orders
  • Medication administration
  • Physical exam and clinical measurements
  • Signs, symptoms, diagnoses
  • Clinical documents
  • Procedures
  • Family history, medical history and review of symptoms
How are Clinical Models used?
  • Data entry screens, flow sheets, reports, ad hoc queries
    • Basis for application access to clinical data
  • Computer-to-Computer Interfaces
    • Creation of maps from departmental/foreign system models to the standard database model
  • Core data storage services
    • Validation of data as it is stored in the database
  • Decision logic
    • Basis for referencing data in decision support logic
  • Does NOT dictate physical storage strategy

Project Team

Thank you's go to the Data Normalization team.


  • Build generalizable data normalization pipeline
  • Semantic normalization annotators involving LexEVS
  • Establish a globally available resource for health terminologies and value sets
  • Establish and expand modular library of normalization algorithms
  • Consistent and standardized common model to support large-scale vocabulary use and adoption
  • Support mapping into canonical value sets
  • Normalize the data against CEMs.
  • Normalize retrospective data from the EMRs and compare it to normalized data that already exists in our data warehouses (Mayo Enterprise Data Trust, Intermountain).
  • Iteratively test normalization pipelines, including NLP where appropriate, against normalized forms, and tabulate discordance.
  • Use cohort identification algorithms in both EMR data and EDW data.
  • Sharing data through NHIN Connect and/or NHIN Direct
  • Comparison of data processed through SHARP to data in existing Mayo and Intermountain data trust, EDW, AHR
  • Evaluation of NLP outputs and value? Focus on a specific domain: X-rays, operative notes, progress notes, sleep studies?