Minutes from the Meeting

From SHARP Project Wiki
Jump to navigationJump to search

Area 4 Brief Projects Recap

Chris Chute: Apply tools to data to make it usable. To draw conclusions from data, it must be comparable and consistent. First, heterogeneous / inconsistent data must be normalized - two modalities for a generalizable data normalization pipeline: Syntactic and semantic. Mapping tables between vocabularies. LexGrid is a tool to manage this mapping. Second, as we build the tools, can we put them on a common framework? A library of tools and resources that can be used broadly and incrementally sequenced.

Guergana Savova:

Ross Martin: What are our opportunties to synergize the projects and make them greater than the sum of their parts?  How do we identify connections between different work streams? 

Marshall Shore: UIMA Exploitation An integrated framework facilitates moving algorithms into IBM products. In 2006, IBM moved the project to Apache to make it open source. Please participate because it is a pure open-source project. How can we make UIMA work better? UIMA-AS (asyhchronous scaleout), scale on multi-core architecture, compressing analytic time by scaling out to a farm with multiple stacks of blade. Here representing the infrastructure, IBM is participating to ensure that they can facilitate your use of UIMA, and to extend UIMA to deploy the high performance analytics that will support the uses of the SHARP project.

Chris Chute (for Kent Bailey): Data Quality (Project 5) Acknowledge that the data flow is great. How do we know that the data is not harmed or distorted through secondary use or making false assumptions about the quality of the data? Biostats community has been working on this problem for decades. Can we identify consistent and comparable metrics, processes, that can represent data quality? How do you reconcile conflicting data in a clinical record? Is the data moving consistently through the pipeline, is it creating inconsistencies in the data stream? Should we impute data? When is this appropriate? Some algorithms cannot operate with missing data. These require data imputation, but this should be done in an informed way. Given this vision of a pipeline, can we integrate imputation and data quality into the process? In NLP, UIMA has been successful for this. Can we add more functionalities into the UIMA data flow.

Stan Huff: Real-world evaluation framework (Project 6) Try to evaluate all of the other projects to practically use the outputs of the other 5 projects for SHARP. Normal retrospective data compared to normailzed data that exists in the data warehouse. Cohort identification from EMR and EDW data. Detailed clinical models are expressing the logical model of how data is represented (BP and patient position and device used to capture). Models become a target for the NLP output - how the data is represented. Sharing data between institutions through NHIN Connect. What should the structure of a payload look like be able to share through NHIN Connect / patient care?

Ross Martin: How are we keeping the SHARP community informed and connected throughout the program when we are not in person in Rochester. Please use the wiki!

Get sign-in to the wiki - Lacey will approve your request for a sign-in. We are encouraging everyone to use, edit and collaborate through the wiki. We want everyone to fill out their bio on the wiki, and those in-person will get pictures taken. We are posting on Twitter: SHARPn.org

Everyone involved in SHARP has a deep knowledge of their subject matter. Please take the time to ask a question of your colleagues either in real-time or on comment cards so that we all can understand each others vocabulary. (Introductions - please see the bios)

SHARP Area 1 Security, Carl Gunther University of Illinois

Collaboration of groups working on security issues with a healthcare focus. The SHARPS project is a collaboration of collaborations of different aspects of privacy and security in the healthcare space. Project advisory committe to assist with industry uses of the results of the project (Coogle etc.), security advocates and HIE.

Three sub-SHARPS projects:

1- EHR, hospital, HMO, small practce.

2- HIE, between enterprises and PHRs.

3- TEL, telemedicine for gathering information through devices, communicating through multimedia, etc.

EHR Self-protecting EHRs (PROT), enterprise outsourcing, encrypting data that goes to third parties for storage. POL Using encrypting to enforce policy decisions for the use health information, impacts on regulations for HIT. PAHIS Privacy-Aware HIS, disciplined techniques for building HIS with verifiable privacy and security assurances.

HIE RSHIE 50% of patients are associated with small (3-5 physicians) practices. Service-based model for HIE, security and privacy implications. EBAM Experience-Based Access Management. lifecycle model for learning from experience (see below). PHR consequences and ramifications of shifting information outside of traditional clinical units.

TEL IMD (see below)REMOTEmobile monitoring. IMMERSEonline visits. SAFETY What can we learn from FDA records to understand risks of approved devices.

Cross-cutting themes (there is a lack of areas for proving validity of new ideas): service models, regulations and policy, open validation.

The distinction between security and privacy.

Security - countering threats, how likely are threats in HIT? Non-HIT-specific threats, such as viruses are an issue. Specific threats, such as the tainting of Tylenol, for HIT, posting of images innapropriately.

Privacy - hard to define. Right of the subject to control access to their records - contextual integrity.

IMDs implanted medical devices that communicate remotely. These could potentially be reprogrammed without authorization (e.g. pacemakers)

EBAM Reconcile the ideal model - an abstraction of the intended access- and the enforced control - a subset of the rules for the ideal model.

Data deidentification? Tried to avoid the topic, thought it would be part of SHARP Area 4.

Main activity for Area 4 is taking EMR data and running analytics on un-encrypted data, producing conclusions based on this data. What is the role for SHARPS in how encryption and privacy takes place for original and derived data? Yes, network example of a firewall. Network layer encryption. Outsource data to HIE, data in encrypted. A system that makes conclusions from HIE data may be impeded by encryption on HIE. This is a good area of interaction between Area 1 and Area 4.

Sharp line between clinical data and research data. How is this taken into account for privacy regulation? Also, line between research and quality. This hasn't been fully scoped in Area 1. May be an area of interest to be more precise, and could work between two SHARP projects.

NLP and normalization technologies need to be brought to the data, potentially through cloud computing. This brings serious P&S issues, both technical and policy-wise. NLP topics could be interesting.

http://sharps.org

SHARP Area 2 Cognitive Support, Debora Simmons, University of Texas

SHARPC is a collaboration of many member institutions. Patient-centered cognitive support: problem-solving and decision-making capabilities.

Systems need to provide information in a way that supports clinicians in their daily work. Major research challenges (see slides)

Work-centered design of process improvements. Identify critical usability problems. Model the impact of HIT on cognitive and organizational healthcare. Produce validated tools and methods that vendors can use. Increase adoption and meaningful use. Reduce safety risk from unpredictable user behavior.

Cognitive foundations of decision. Intermediate constructs in critical care (can also be used in primary care). Prototype uses visualization and NLP. How do people integrate information? Observations, systems.

Context-specific factors for CDS. Sharing and adoption of CDS barriers, setting and patient specific. Impact CDS adoption and use. Cognitive mismatch in usage between settings.


Clinical summarization of key patient data. Examine an archive of clinical data in EHR and asseble it in a way a clinician can use the information to make decisions. Automation methods for this abstraction.

Cognitive Information Design and Visualization. Present EMR data in a meaningful way for clinicians. Design a visualization framework to test tools. Interactive visualization will provide the appropriate info at the right time to the clinician.

Communication of key clinical information. ID best practices for clinicians to interact with electronically provided information. Aim to reduce communication breakdown, which is a risk to patient safety. Also is a major reason for outpatient malpractice suits.

Please contact them with any questions or connections.

Vendors that provide the services have proprietary issues with how they provide information to clnicians. How do you integrate new research into proprietary systems? Clinicians think the interface is lacking. Modeling interface and coming with standards may drive adoption in the marketplace.


http://sharpc.org

SHARP Area 3 Applications Overview, Josh Mandel, Harvard University

SMArt; Substituable Medical Apps, reusable technologies.

EHR apps should be more like an iPhone. Hardward and apps. Developers can build software that conforms to the system. Consumers can demo and switch out apps. Not bound by vendors, etc.

Underlying data is the "hardware" in the EHR system. Very difficult to get a 3rd-party application to function on top of your system. Promote a common API to drive a competitive marketplace, reduce costs, increase quality.

How does the SMArt app interact with the platform? Well-structured connection points for specific data domains, meds, problems, demographics, etc. Core services building blocks.

RESTful API (Representational State Transfer) Each resource has a unique ID (such as a URL). API uses 4 operations, get, post, put, delete. Similar to a browser interface. This is a commonly used, scalable API.

APP Exchange. Central repository for applications. Different applications will access different subsets of information domains (meds, demos, problems). Until the data are accessible, no one will develop for it. Will develop wrappers for existing systems to enable developers to use this information. (e.g. Cerner, HealthVault, Amalga) Platform exchange exposes heterogenous data sources. Starting with 3 open source systems to build wrappers: i2b2, Indivo and CareWeb EMR.

What's in it for the vendors? Why would they build this into their systems? Starting with open source where the vendors are more interested. Hope that once the marketplace shows interest, vendors may show more interest. Business model to make this work? Consumers, money exchange, etc.

RESTful API. Each data source with a unique ID. Connection to RDF format? API may return RDF to the user. Format is undecided. Not just the address, but what is the information associated with the unique ID (terminology and data structure). This may be a great touchpoint between two SHARP areas. Great opportunity for synergy. Requires standards.

Metadata is not structured initally with RESTful. How to address this? Use O-Auth(?) (web specification that allows users to control acces to their data).

SHARP: An ONC Persective, Wil Yu

SHARP fits into an overall strategy for HIT innovation for the Federal Government. HIT adoption and infrastructure has been a focus of the Obama Administration from the outset. Health reform requires the innovative work being done through programs, including SHARP. This effort has instigated coordination across the Department on an new, high level.

A key mission of ONC is to promote adotion of HIT. This may not be the HIT we are familiar with from the last several years, but may be improved systems that support meaningful use.

SHARP is one of any programs, including the HITRC, RECs, HIE, Workforce Training and Beacon Communities, addressed in the HITECH Act.

Beacon Communities could be test beds for the outputs of the SHARP research projects.

SHARP awards will help change future HIT markets and radically shift what providers have available to them in the HIT marketplace over the next few years. This program is an exemple of how programs can change the environment of HIT. To that end, an emphasis of rapid transfer of research outputs into HIT products.

What is the vision of the ability to query federated health information? This program is under discussion - Element 3 - EHR adoption and HIE are the first two elements. What are iterations that the Federal Government can support to advance beyond the first two elements. This is a discussion between the ONC Office of the Chief Scientist and the IOM.

Vendors are a key part of these objectives, how engaged are they in implementing standards and innovations into their products? ONC plans on round table discussion with the private sector so that research outputs can be more quickly introduced into the marketplace.

Reflections on the morning

Ross Martin: Deloitte as a whole is supporting many of the programs at ONC. ONC was a science project before HITECH and has expanded greatly with the passage of HITECH. This is part of the context of how the endeavor matures, it can be refined.

Pete: A governance question. Deidentification of data is a critical part of the overall SHARP endeavor and it appears that it has fallen into a gap between Area 1 and Area 4. This info is important to ONC.

Ross Martin: As cooperative agreements, the projects can evolve to respond to gaps as they are discovered.

Wil: ONC is currently spec-ing out interactions between SHARP and Chief Security Officer at ONC. Will be determining how to provide support for these gaps moving forward. Please make sure these issues come to the attention of ONC project officers so they can address them.

Ross Martin: There are many Federal contracts that have yet to be awarded that could potentially address these types of gaps.

Project 3 High Throughput Phenotyping, Jyoti Pathak

The ability to extract clinical information from the EMR for clinical research is becoming much more important as we come into an age where genotyping or whole-genome sequencing is not a limiting information source.

The EMR has some limitations in terms of phenotyping, some data is not captured, biases are introduced in the collection of the data. These challenges are the basis of the high-througput phenotyping project for this SHARP Program.

EMR-based phenotyping algorithms are being explored in the eMERGE project. http://www.gwas.net

These algorithms are currently developed through an iterative process to define cases and controls. The quality of the algorithm's specification can introduce bias into a data set.

  • Cohort Amplification, Susan Welch, Utah

Create generalizable algorithms to identify research cohorts with the ability to reuse and repurpose in multiple settings and across multiple clinical conditions.

Project will use Intermountain Health's longitudinal clinical data, using attributes of structured data in the EHR, such as ICD-9 or CPT codes. This data is from inpatient as well as ambulatory care. Exemplars are used to help define what profile to search for in the EHR data. The process for defining a profile from the exemplar is a process that requires input from people with domain knowledge. Rules for the profile may not be just abnormal test results or a diagnosis code, they could also be the ordering of specific tests or the provision of counseling on specific behaviors.

  • Concept Frequency-based Cohort Identification, Wei-Qi Wei, Mayo

This project uses the number of times a specific concept is represented in a patient's medical record to classify individuals with a specific disease for a cohort. Concepts from SNOMED-CT.

Pilot project with Type II Diabetes, plan to expand to other diseases and include other data sources as concepts.

  • Commercial Viability, Jeff Tarlowe, Centerphase Solutions, Inc.

Addressing patient recruitent as one of the biggest bottlenecks to clinical research for the pharmaceutical industry.

To demonstrate the commercial viability of the outputs of the SHARP research projects.

Initial focus on clinical trials. Currently, feasibility studies for a clinical trial require providers to answer a questionaire guessing how any subjects they would be able to enroll. The ability to identify subjects from EHR data, feasibilty can be more accurately estimated.

Centerphase plans to work with multiple academic medical centers and health systems.

The current throughput process for eligibility criteria is still too dependent on manual selection.

  • Proposed Projects for Year 1

(Preliminary ideas for projects, feedback strongly encouraged)

    • Leverage machine learning methods for algorithm development. This involves collaboration with the Data Normalization project.
    • Develop an implementation-independent phenotyping logic representation template. There are open source artifacts that can be leveraged for this research question.
    • Evaluate and adjust for regional biases, practice and population in phenotyping. Coding practices at different sites may bias a cohort.
  • Other Potential Projects
    • Repository for phenotyping algorithms - queriable
    • Probabilistic cohort representation - what to do with patients that are neither 'case' nor 'control'? Can we use a probabilistic representation to avoid throwing away data?

What are deliverables that will enable the dissemination of this information to the marketplace?


Discussion

Project 3: High Throughput Phenotyping Discussion

  • What do we know about how Mayo patients that are part of resources like the EDT, how do they want their data to be used?
    • Interested in this type of patient engagement, soliciting their opinion and gaining their partnership.
    • At what level does our society buy into the idea that secondary use is an appropriate way to use health information. Are these research projects being undertaken in an ethical fashion, via Institutional Review Board (IRB), MN Research Authorization. Balance between the best tools being developed and respecting the wishes of the people who contributed data.
    • Is there a cross-SHARP way of addressing these types of issues? Was addressed quite at bit at the AHIC Consumer Empowerment Workgroup. There is a broad range of patient response to this type of use for their data.
  • What happens when the answers that result from the research are very difficult to explain to the public or the broader clinical community? Non-intuitive results.
    • In machine learning, go back to the training data and try to find insights from that set.
  • Diagnostic algorithms have clinical standards already. How precisely can this knowledge be represented by informatics? Start with well-defined algorithms (eMERGE), before moving into other data sources. Can you link phenotype to genotype data?
    • Reject the assertion that every phenotype is the result of a genotype. Here using phenotype as the numerator or denominator of a quality measure. What is epidemiologic truth? Not necessarily being approximated by this phenotyping effort. Clinical data is inherently messier than when data is collected during the course of an experiment where data is collected prospectively under laboratory conditions.
    • This is highly related to the data quality discussion tomorrow.
  • It is very difficult to pick up adverse events from two different sites. How do you address reliability across sites?
    • This project will examine how portable the algorithms really are.
    • Validated this in eMERGE that algorithms can be exported to different sites. Have shown reproducibility of cohort selection to a 95% confidence level.
    • Principle of Area 4 that the tools generated are portable.
    • Some stylistic differences at different institutions in terms of notes. The model trained on one dataset may not operate well on another. The tools ought to be portable, but they may need to be re-trained on local datasets.
  • How high-throughput is this now? Does it need to be scaled up?
    • Two major bottlenecks
      • Algorithm development
      • Extracting the relevant values of interest
    • Scalability and performance need to be explored.
    • Can we use UIMA components to execute these algorithms.
  • Under HIPAA, we have to protect the privacy of the provider. Getting negative feedback from a provider, by documenting process of care, they are not considering this as a final diagnosis for a particular patient.
  • Library of phenotyping algorithms. Is there a small set to be examined now?
    • Have 14 algorithms as part of the eMERGE project.
  • CDISC (Clinical Data Interchange Standards Consortium) can help out, particularly with the pieces described by Centerphase. Specifically in terms of executable protocol. Can do inclusion/exclusion criteria –exportable as a CCD based on CDASH.

Project 2 Natural Language Processing

  • Project Lead: Guergana Savova
  • Project Manager: Jay Doughty
    • Project & Team member introduction - Guergana Savova
  1. MPLUS - Templates for parsing sentences in clinical notes.
  2. ONYX - Templates, semantic models.
  3. cTAKES - clinical Text Abstraction and Knowledge Exchange System
  4. CLEAR-TK - Built for general NLP, not just clinical. Grounded in linguistic theory (lexical semantics).

1-3 address abstract clinical concepts, 4 addresses more general linguistic structure.

Peter Szolovits - Risk Architecture for Text Engineering (RATE). Tried using GATE without much success. Likes LIST because it is a dynamic programming environment. RATE is similar to those above, but based on a database so there is a persistent representation of the data. Optimized in core data structure. Integrated with various machine learning tools.

  • Standards for NLP
  1. UMLS
  2. Clinical Element Model - relationship between clnical concepts.

Once the toolset has been built, will be made available through cloud computing in collaboration with SHARP Area 1 to ensure security.

Evaluations similar to those of the i2b2 project.

Discussion of Synergies Across Projects

  • Is there a convergence to a common framework for NLP? UIMA based? For data normalization, it makes sense for them to fit into a framework.
    • Everyone is using UIMA, just need to convince people it's a good framework and train them to use it effectively.
    • As an intermediate step, interoperability between UIMA and GATE. In the long-run, it needs to be in the same framework.
    • CLEAR-TK is already UIMA compliant.
  • Was there an evaluation about whether UIMA was most appropriate?
    • It is part of project 4. Normalization as an annotator in UIMA. Hard to see how phenotyping rules fit in this paradigm. The evaluation is yet to come.
    • UIMA Type System. SHARP Area 4 can work on this together. Repositories supported as a group.
    • References to the models as type models. May need some changes. Want more input and sharing to create new and different models with a public discussion about what is optimal.
  • Helpful to have coordination between NLP, phenotyping and data normalization to have similar understanding of concepts. Different meanings of the phrase "indicates".
  • Some of this work is experimental, some foundational. Hope to declare some annotation, syntax as standard. Can then use them for training and testing systems. Experiments come as we look at different approaches against the foundational work. Need to decide which models are foundational, which are experimental.
    • Does foundational mean easy and already decided? No, this is a four year process. Some decisions will come quickly, others will take more work. Try not to all choose different models, because we will be unable to compare approaches.
    • Can we identify the 90% of things that are easily agreed upon? Natural link between projects 1, 2 and project 3.
  • Laboratory and structure where tools, data sources are all together (infrastructure). Who's in charge of that?
    • This is a core operation for Mayo to facilitate this process. This is an evolving process for SHARP Area 4. Wanted to come together at this Face-to-Face to get a sense of where all the projects are headed to be the basis of this common resource. Need to balance open resources for the whole projects and those that are academic work in progress that will be published.
    • There are ideas and there are patient data.
    • Can we post systems and data on the wiki? A wiki-like space, the wiki is not robust enough.
    • Dual level of access to different types of information.
  • Sometimes it is quite useful to annotate the same data with different approaches for semantic analysis, but it is a lot of work and you wouldn't want to do it manually all the time.
  • We need to do a culminating experiment that brings together all of the projects.
  • How do we scope the range of normalization targets?
    • What is a normalization target? Meaningful use - we will have to conform to the public requirements for terminologies, data standards, and data targets. Goal is to get information into a canonical form, which is MU (even if this is underspecified).
    • Terminologies are in the IFR (http://www.hhs.gov/news/press/2009pres/12/20091230a.html) RxNORM, SNOMED and/or ICD, etc. The usual suspects have been enumerated. There are also de facto information models, N-script, NCPDP, etc. The method of how a prescription goes from provider to pharmacy. We should use the components as the normalization target. CDA, CCD - normalization target of the best possible implentation of something like CCD. Even if these can be implemented more loosely in a way that is not computable.
    • Choice of these standards in the IFR were not necessarily made with secondary use of data in mind. So IFR may not be the best target. Regardless, the IFR standards will likely be the data that will be available. Keep in mind the opportunity cost for not pursuing a different standard.
    • Need to remain critical from a methodological perspective. Meaningful use may not answer all the questions about standards and targets for this work.
    • May be able to identify problems that are difficult but who's solution presents a solid advance versus those who's solutions do not provide large impact.
  • Back to data repositories:
    • Tools, annotations, UIMA, data "Integrated SHARP Sandbox"
    • This is more involved that an OH-NLP-type library. The Sandbox is a shared laboratory workbench. This should be co-developed across all Area 4 projects. An anticipated difficulty is the sub-optimization of capabilities for different projects. Will a single workbench serve all or are the compromises too significant? Probably end up with some kind of hybrid common/specific workbench across projects.
    • Could adopt a service oriented architecture that says everything is a web call. Takes a tremendous server hit. Speed is less important during a discovery phase, when testing you require higher-throughput.
    • http://www.u-compare.org Has a lot of these tools.
    • At the end of the 4 years, to have something usable, need to have something common and usable internally if we expect to produce something for dissemination outside of SHARP Area 4.
  • Deloitte's role in cross-project integration?
    • Small amount of resource, but the role is facilitation and helping to identify synergies across projects. This begins with a community-building process, such as what we are doing here.
    • What support does Deloitte have for specific teams? Deloitte is not project managers for the workstreams. Will be more focused on the 'glue' between projects. This is also Lacey and the project team leads meeting.
    • This meeting will help us define our role. Can also provide the informed consumer perspective.
  • A lot of work done to build up NLP to interpret clinical records. What is the work done to change the nature of how data is captured and recorded up front? Medical education?
    • Need to have a solid information model to capture this data well. Structured data may not be able to fully capture the information conveyed by free written language.
    • Are we losing a lot of data through NLP? Would fully structured data capture more of this?
    • Work being done, outside the scope of SHARP. Jim Cimino published a paper 20 years ago about the continuum of structured data representation. With use of CCD/CDA, beginning to define a template that clinicians will use to construct a note.
    1. Meaningful use is accelerating this evolution.
    2. Humans aren't very good at structured data entry. Slows providers down tremendously.
    • In Nottingham, hired data ombudsman to teach clinicians better techniques for data entries to record metrics around quality recorded in a more uniform fashion.
    • If we can create a set of characteristics around data quality, can incent the collection of higher quality data. Until you can prove the value of the structured data, no one will put effort into collecting more uniform, higher quality data.
    • PROMISE System at the University of Vermont. Touch screen system which enabled people to capture just about anything in a structured format. If a provider made a compelling argument that they needed to type something in free text, they'd get a keyboard. System was chucked as soon as it's author retired because no one liked the system.
    • Very difficult to train people in these types of structured data collection. Language is an innate human instinct. Hard for us not to use language.

Project 1 Data Normalization

  • Project Lead: Chris Chute
  • Project Manager: Lacey Hart
    • Project & Team member introduction - Chris Chute

Standards-conforming: Conform to meaningful use. Comparability and consistency are central.

Two dimensions:

  1. Semantic normalization - LexEVS. National repository? Need to have access to terminologies.
  2. Standardized data elements - clinical element modules. Practical data element specifications at Intermountain Health going back to 1967.

Both terminology and data models are requires for machine-interpretable information.

LexGrid - at its core is a common data model so that each institution need to undertake its own interpretation of data standards. These are non-proprietary. Well-formed terminologies and value sets (value sets are small pieces of a terminology). Value sets seen in numerators and denominators of quality metrics in the meaningful use definitions. A vast proliferation of terminologies that need to be managed, versioned and tracked. LexGrid is a tool for this purpose.

Mapping is tedious and requires human intervention, even with tools to assist. Data normalization if the facilitation of using these maps, not the mapping itself.

Data transformer feeds into the UIMA-based Data Normalizer. (see slide 5)

The ability to make ill-formed data on HIE into more well-formed data is important for secondary use, but is also important for clinicians who want to see information in the format that they are used to (presentation layer). This requires some standardization.

Discussion

  • How do you view the role of commercial vocabularies? There are some in meaningful use specifications, so you can't avoid proprietary codes altogether. If tools are to be international, SNOMED is proprietary, we just have a license within the United States. As an ONC-funded project, we must aim for meaningful use, but aren't sure how to get around the intellectual property issues.
    • The ability to put in proprietary terminologies flagged with licensing. Has this been explored for LexGrid? Some partitioning from non/proprietary done at NCI. Allows storage of copyrighted information, but nothing beyond that.
  • Does building LexGrid involve harmonizing the API of all the content providers? No, the API has been ignored. Content conforms, access through the APIs that serve all the terminologies.
    • Does this cause semantic problems? Can be a problem with SNOMED. Some functionality is lost. This is less a terminology access problem than a mapping problem.
  • Mayo vocabulary and Utah data model collaboration? Joint development of a commercialization of the terminology services through General Electric, Mayo and University of Utah. Still open source. This is the representation Utah and GE currently use. At an implementation level, working off the same page.
  • Same data can be modeled in multiple ways. Need flexibility to be able to map from one model to the next.
  • Mirth Interface Engines. Use NHIN Connect and Direct very heavily.
    • Can UIMA make a call to an engine off the pipeline? Yes, some annotators have written programs that go over the web.
    • Reigenstrief using this in their HOSS tool. Similar to what is envisioned for project 1.
  • There are lots of ways data can be ill-formed. How do you address this? Can have tools and widgets, but will likely need some manual curation to determine in what way the data is ill-formed. Message could be ill-formed or content could be ill-formed. Will need an array of tools to address the multitudinous ways data can be ill-formed.
    • VA and DoD mapping for medications. Even when they were using standardized terminologies, could not map between the two systems for some elements. Might feed into project 5 as a data quality issue.
    • Cases where you need to reference past messages to disambiguate. Is there a persistence model? Leverage a data repository? We could, but the vision for initial approaches is to treat messages as isolated occurrences prior to wiring in a call to other data.
    • Other ways to do this? Emory Fry, CDS rules engine. Data persistence can be seen as a "poor man's data repository", ripe for unintended consequences. Can there be a repository call rather than data persistence.
  • Mirth, HL7 Gelo for open source tools for clinical guidelines. Mirth may be interested in collaborating with this type of work.