Data Normalization Pipeline 1.0 Configuration Files

From SHARP Project Wiki
Jump to navigationJump to search

Overview

The Data Normalization Pipeline configuration files are separated in two groups

  1. Mapping between the models of the incoming data and the CEMs (normalizing the structure)
    • Configuration files are found in resources/config of the pipeline installation
    • Configuration file examples are named like "Institution + type of object being transformed (Meds, Labs, Admin) + message type (HL7, CDA, CCD) + _ModelMapping.txt" for example IHCMedsHL7_ModelMapping.txt
  2. Mapping between the terminologies used in incoming data and the value sets adopted by SHARPn (normalizing the semantics)
    • Configuration files are found in resources/semanticMapping of the pipeline installation
    • Configuration file examples are named like "Institution + type of object being transformed (Meds, Labs, Admin) + message type (HL7, CDA, CCD) + _SemanticMapping.txt" for example MayoAdmin_SemanticMapping.txt

These configuration files are critical components in the Data Normalization pipeline and the content of them depends on the structure and semantics of the incoming data.

Prerequisites

  1. Editing the configuration files requires an existing pipeline (steps 2 to 4 in the overview).
  2. Running this pipeline on real data will require configuration of the pipeline. In order to configure the pipeline you will need a developer that knows:
    • The fields (or schema) of the incoming data AND the schema of secondary use CEM fields in order to establish mappings to secondary use CEM fields
    • The meaning of the values within the fields of the incoming data in order to establish mappings to industry standards

Configure Model Mapping

Configuration files are:

  • Plain text files that describe how original model elements map to CEM model elements
  • Use # as the first character on a row for comments
  • Parts of each definition are divided by double bars (||)
  • Each row defines an instantiation of a CEM field in the format of:
    • Source Root Element||Mapping Type||Source xPath||Target Namespace||Target CEM Root Element||Target xPath
    • Source Root Element is the equivalent SHARPn UIMA type used to facilitate the mapping found in CCDUIMA.xml.
    • There are five mapping types
      • C = Constant: the target CEM field is a constant (the Source Root Element and Source xPath are ignored in the case of constant)
      • S = One to one mapping: one value from the source instantiates a value only in one CEM instance
      • M = One to many mapping: one value from the source instantiates a value in multiple CEM instances
      • I = Inference: the value of a field is inferred from another field (the Source Root Element and Source xPath will be acquired from the target CEMs rather than the incoming source data)
      • X = Conditional Inference: currently applies to Labs only - corresponds to inferences that are conditional (the Source Root Element and Source xPath will be acquired from the target CEMs rather than the incoming source data)
    • A Source xPath location is described by the "parent:child" relationships. Starting at the source object top level (just below root), you give the parent:child relationship (using a colon between) and pipe delimited notation to describe the path to a leaf node. This is required to know where to go to get a value.
    • Target Namespace is always the same for now.
    • Target CEM Root Element is the CEM type found in the CEM Browser searching for "Secondary".
    • A Target xPath location is described by the "element name:type of that element" relationships. Starting at the CEM top (just below root), you give the "element name:type of that element" (using a colon between) and pipe delimited notation to describe the path to a leaf node where we place the value.

For example these would be combined in one row and divided by (||):

  • Source Root Element: org.sharpn.type.ccd.PatientRole
  • Mapping Type: S
  • Source xPath: PatientRole:Patient|Patient:AdministrativeGenderCode|AdministrativeGenderCode:Code
  • Target Namespace: org.sharpn.type.cem
  • Target CEM Root Element: SecondaryUsePatient
  • Target xPath: AdministrativeGender:AdministrativeGender|CD:CD|Translation:CDT|Code:CS|Value:java.lang.String

Configure Semantic Mapping

Configuration files are:

  • Plain text files that describe mappings between the source and target concept codes
  • Use # as the first character on a row for comments
  • Each row defines the mapping between a source concept code found in the incoming data and its equivalent target concept code in the value sets
  • The source and target are divided by double bars (||) like this Source||Target which are further subdivided by (~|~) as follows
    • Source: Object Hierarchy~|~Local Code
    • Target: Target Code~|~Terminology OID~|~Designated Name~|~Version

For example these would be combined in one row and divided by (||):

  • Source: SecondaryUseNotedDrug ClinicalDrug CD CDT~|~000001
  • Target: 410909~|~2.16.840.1.113883.6.88~|~DIGITALIS LEAF 100 MG Oral~|~SHARPn05222012
    • a source clinical drug code 000001 is mapped to a RxNorm code 410909.

Terminologies

The semantic mapping utilizes a set of terminologies to transform original data semantically. Changes to the semantic mapping configuration file may necessitate changes to the default terminologies that have been provided. For example, you may have concept codes that do not exist in the default value sets.

Use the semantic mapping resources configuration to change the underlying terminologies being used.