Data Normalization Pipeline 1.0 Pre-built

From SHARP Project Wiki
Jump to navigationJump to search


  1. A cloud account for starting instances which in turn will require a Mayo Clinic VPN account. Only collaborators can use this path at this point.
  2. Industry standard Linux tools for accessing the system. See the cloud documentation to install these.
  3. Ability to navigate through Linux directories and files.
  4. Running this pipeline on real data will require configuration of the pipeline. In order to configure the pipeline you will need a developer that knows:
    • The fields (or schema) of the incoming data AND the schema of secondary use CEM fields in order to establish mappings to secondary use CEM fields
    • The meaning of the values within the fields of the incoming data in order to establish mappings to industry standards

Instantiate Data Normalization pipeline

The Data Normalization Pipeline can be invoked on its own, that is without Mirth and NwHIN (the connectivity software), given appropriate input data. This page discusses just this - steps 2, 3, and 4 as seen in the overview. However, you do not need to install the connectivity software as it is already in the pre-built image. Steps 1 and 5 describe the connectivity software that can be used to feed the pipeline and store the results in something other than MySQL.

  • Use the Instance Management documentation as your guide to launch the image pre-built with the Data Normalization pipeline software.
  • emi-11B51218 (your interface may have ami rather than emi)
  • m1.small is the smallest instance type. Start with the smallest instance type until you know a larger sever is required.

The remainder of these instructions may require the following information about the image.

  • <DATA_NORMALIZATION_HOME> is /usr/local/sharpn/SharpnNormalization_XMLTree
  • <MYSQL_HOME> is /usr/local/mysql Admin ID: ??? Password: ???
  • <NWHIN_HOME> is /nhin Admin ID: ??? Password: ???
  • <MIRTH_HOME> is /nhin/mirthconnect Admin ID: ??? Password: ???

NOTE: This image is from the XMLTree based implementation.

Connect to the cloud instance

It is possible to run workloads via the command line only. However, to get started and understand what is happening (the remaining instructions will require this) you must connect to your new instance using a mechanism that allows a GUI to be launched from the remote host.

Test a sample with default configuration

The instance has samples to use to understand the pipeline. The Collection Processing Engine (CPE) is used to run the pipeline. The CPE is configured by a file called a descriptor. This descriptor defines things like the location of input and output, the mechanism by which to process the collection of input, and what documents are being passed in and processed. There are more, but it is left to the reader to study the descriptor in the GUI or in its XML format.

  • ./
  • File -> Open CPE Descriptor
  • SharpNormalization_XMLTree -> desc -> collectionprocessingengine -> MayoLabsHL7CPE.xml
  • Notice the incoming data location (Input Directory) and the location of the output (Output Directory)
  • Use any editor to view the files in the input directory. The descriptor file names and directory names contain a designation of what kind of data is coming in. This is a good practice to use as it may not be easy to discern just by looking at the input files. These designations match the input types you read about in the Data_Normalization_Pipeline_1.0 overview.
  • HL7
  • CCD
  • CDA
  • Press the Play button (looks like a green arrow near the bottom of the interface)
  • Check out the data that is now in the output directory; each file is a CEM in XML format. The results are not displayed in the GUI. It is simply a means to run the pipeline. Use any editor to view the files by navigating through the system directories.

The output results for labs have a special naming convention. You will notice that lab results output file names do not have the HL7 lab categories: coded, narrative, ordinal, quantitative, quantitativeInterval

Administrating the software

Starting services
  • Mirth: ►cd /nhin/mirthconnect/Mirth\ Connect/sudo nohup sh mcserver &
  • NwHIN: ►sudo /etc/profile.d/nhin-profile.shsudo asadmin start-domain domain1
  • MySQL: ►sudo service mysql start
Stopping services
  • Mirth: ►ps -aux|grep mirthsudo kill -9 <number from list>
  • NwHIN: ►sudo asadmin stop-domain domain1
  • MySQL: ►sudo service mysql stop

Next Steps

The pipeline has an Analysis Engine defined with a set of special parameters. You can see these parameters in the middle of the Collection Processing Engine Configurator GUI.

  • MODEL_MAPPING - Configuration file mapping out how to transform from one model to another
  • SEMANTIC_MAPPING - Configuration file mapping out how to transform terms used in the data
  • Element property - A static property file. Use unchanged. A mechanism to move attributes from the source to the target.
  • LOINC2CEM MAPPING - A static property file. Use unchanged for lab processing. This file is used to tell the pipeline which of the six CEM lab types to use based on the LOINC codes that end up being involved.
  • DOCTYPE - This is the type of the document or data being sent into the pipeline. There are these types of documents: HL7, CCD, CDA
  • DEID - Reserved for future use.

When you create your own pipeline, you will use the sample configuration files as a basis only. Once you have created your own they would be pointed to here and saved to your own CPE descriptor. Look at the pipeline customizations.