Annual Gathering/6.12.12Mtg Notes

From SHARP Project Wiki
Jump to navigationJump to search

SHARPn Summit 2012

Tuesday, June 12, 2012 Presentations & Meeting Notes

SHARPn Project Leaders on Domain Milestones


Data Normalization Milestones - Hongfang Liu, Ph.D.
SHARPn Milestones: Natural Language Processing - Guergana Savova, Ph.D.
SHARP High-Throughput Phenotyping Jyoti Pathak, Ph.D.
Data Quality / Data Heterogeneity An evolving mission - Kent Bailey, Ph.D.

Meeting Notes:

NLP Methods


Part 1 - MedTagger: A Fast NLP Pipeline for Indexing Clinical Narratives; Siddhartha Jonnalagadda, Ph.D.
Part 2 - Paper: Knowledge-Based vs. Bottom-up Methods for Word Sense Disambiguation in Clinical Notes; Anna Rumshisky, Ph.D.; Rachel Chasin
Part 3 - Poster: Classification of Emergency Department CT Imaging Reports using Natural Language Processing and Machine Learning; Efsun Sarioglu; Kabir Yadav, M.D.; Hyeong-Ah Choi

Meeting Notes:

  • Siddhartha Jonnalagadda (Med Tagger)
    • Mayo Clinic’s clinical data warehouse ‘Enterprise Data Trust’, ensures 80 million unstructured documents of Mayo Clinic are processed (initially this process would take 6 months). To make this process faster Mayo collaborated with IBM to create the DDQB – allowing faster data discovery and query capabilities
    • Improving the Speed
      • Including sentence deletion, speech tagging, context detection, negation detection, etc.
    • Lexical Normalization
      • To normalize the data; download terminology from the UMLS; etc
      • Dictionary Look up – to optimize for time and space complexity. All terms are loaded like a tree and includes memory that allows identification and aggregations cross words.
    • Accuracy - included a 90.9% match
    • Semantic Groups
      • Take concepts based on 70 groups that allow greater accuracy and querying
    • Pre-release users
      • Enterprise Data Trust, Ask Mayo Expert, Medical Knowledge Summarization, and meTakes – including some collaborations with the University of Utah, National Library of Medicine and UNC at Chapel Hill
    • Q&A
      • How was the dictionary set up to get the 90.9% accuracy?
        • We used SNOMED and the LOINC index. The data structure was organized in which 3 words create terms, it will store the words and remember the terms between them.
      • Are you using clinical note indexer?
        • No. We were trying to move to open source. We didn’t want to change anything that is used by many users.
    • Presenter provides a demo and highlights the demo prerequisites and installation requirements [please see slides]
    • Conclusion
      • We have a reasonably fast pipeline to annotate our 80 million clinical notes in less than a month compared to six months previously
      • We demonstrated how the system is friendly to clinicians and data analysts
  • Anna Rumshisky (Knowledge-Based vs. Bottom-Up Methods)
    • There is ambiguity in Clinical Text
      • Word abbreviations, multiple interpretations, and different granularities
    • Approaches to Word Sense Disambiguation
      • Supervised/semi supervised and knowledge based techniques
      • Supervised techniques – separate classification
    • Knowledge Based Methods – for methods this approach didn’t make much difference
      • This project explored Semantic type similar methods & Concept graph (path based, IC based, disambiguation subgraph, personalized page rank)
      • Results were underwhelming, which were below the baseline for the target.
        • Page Rank was not helpful if one target has a lot more relations than the other it would skew disambiguation
    • Similarity-Based Approach
      • Target word, taxonomy, best-performing path
    • LDA Model
      • Bayesian Word Sense Induction was used
      • Evaluation Methods – to map targets and label sets
      • Clinical Text Study – was higher than the baseline.
    • Conclusions and Future Work
      • LDA outperforms knowledge based methods on test targets
      • Experiment with Bayesian methods
      • Labeling mapping sets for any targets to assist queries
    • Q&A
      • Did you use LDA Word Net?
        • No.
    • Efsun Sarioglu (Classification of Emergency Department CT Imaging Reports)
      • NLP can be helpful for extracting data from EHRs and to support CER
        • Used MedLEE, WEKA, and CART – including use of decision trees
      • Classification Performance – machine learning worked well with NLP
        • Results were comparable to inter-rater performance and prior studies of inter-physician agreement
        • Comparable to real world and classification studies
      • Concluding Remarks & Next Steps
        • A handful of real word NLP studies used validated reference standards
        • Translating existing NLP and machine technologies to support CER
        • Next Steps
          • Test validation
          • Evaluate different features or classification algorithms
      • Q&A
        • Were T-test conducted?
          • No, not for this project.
        • Was error analysis conducted for false positive and negative results?
          • Yes, we focused on the false negatives and this included patients with prior injuries.

HTP - Presentations

Paper - Exploring Patient Data in Context to Support Clinical Research Studies: Research Data Explorer (Adam Wilcox, Chunhua Weng, Sunmoo Yoon, Suzanne Bakken)

Poster - Utilizing Previous Result Sets as Criteria for New Queries within FURTHeR (Dustin Schultz, Richard Bradshaw, Joyce Mitchell)

Poster - Semantic Search Engine for Clinical Trials (Yugyung Lee)

Meeting Notes:

Exploring Patient Data in Context to Support Clinical Research Studies: Research Data Explorer (Adam Wilcox)

  • Adam discussed how we can make data patient centered across care institutions and within communities of care.
  • He shared his research data explorer and described his usability study that users participated in to access data from the Washington Heights/Inwood Informatics Infrastructure for Comparative Effectiveness Research (WICER).
  • Lessons learned:
    • User context important for usability
    • Patient context important for understanding data

Utilizing Previous Result Sets as Criteria for New Queries within FURTHeR (Dustin Schultz)

  • Dustin described what FURTHeR is, namely, a tool allowing real-time federation of health information from heterogeneous data sources which leverages standard terminology, logical federated querying language, and utilizes i2b2 web client as a front-end technology. This tool works over Utah’s enterprise data trust.
  • The value of the tool is that using these heterogeneous sources provides greater context to data. Researchers can refine their queries and get a relevant result sets to perform case/cohort and other patient studies.

Semantic Search Engine for Clinical Trials (Sourav Jana)

  • Sourav discussed why recruitment to clinical trials is difficult and how semantic search can help this activity.
  • In the traditional model, there is a lot of leg work; we want to make it more intelligent via intelligent matchmaking leveraging semantic clustering and eligibility criteria ontologies.
  • Workflow:
    • Semantic and model based clustering -> Intelligent matchmaking (eligibility ontology, patient database) -> patient enrollment (search engine with criteria search, suggestion, subjects satisfying selected criteria, potential subjects based on relaxed contraints).
  • Conclusions:
    • Our work presents a data driven approach to generate a minimal set of elibigilibty criteria for clinical trials
    • As well as a semantic model for enrolling them to clinical trials.

SE MN Beacon - In the field Med Rec


  • 900AM-1000AM, Innovation Lab, Rm 415

Using PH-Doc Deb Castellanos

Meeting Notes:

Medication Reconciliation

    • Sean Murphy (cTAKES drug NER tool)
      • Legacy Drug Type Systems – status, frequency, duration, route, dosage, strength, form, date, etc.
      • Drug NER Functionality
        • Explored narrative vs. list configuration
          • Implemented a way to keep track of narrative and list configurations
          • List is set as the default
          • Performance implications includes throughput speed and when you will get your name entities pulled
        • Multiple Drug Mentions – no dictionary/resource to conduct comparisons. Therefore, if we found a drug that was adjacent then we would share those attributes
        • Multiple Drug Signature Elements – if more than one mention; we will guess on the larger span to avoid getting lower dosages/information
        • Subsection handling for special change status phrases – including looking at the gradient list to determine how medications should be handled
      • Q&A
        • How would you distinguish two brand names?
          • We don’t care about the brand name or ingredient. We are trying to make sure that a drug is not left in the cold. We have no understanding of the branding or ingredient at this level.
      • Overview of algorithm – major challenges
        • Dosage Status change- still needs improvement
        • Problems with Drug NE Extraction – misspellings, orphaned or missed aligned drug elements
        • Performance issues related to both throughput and specificity – no normalization at look up, poor return even when permutations are increased, etc.
    • Sunghwan Sohn (MedER Medication)
      • Medication Information Extraction
        • Medical information annotation – dosage, strength, frequency, route, duration, form
      • drug NER
        • Includes UIMA framework, functionality, and technique
      • What to improve in drug NER
        • Normalization
          • Matches for ingredient and brand name
          • Variation of attribute description
        • Matching - 3 word tokens due to 3 permutations on lookups
      • MedER: Medical Extraction and Retrieval
        • UIMA framework focuses on mapping, user customizability, current medications (list format)
      • MedER Method
        • (1) Reviewed Dictionary Look up – brand name
        • (2) Medication Attribute Extraction
        • (3) Link medication with attributes
        • (4) Normalize medication to clinical drug name
        • (5) Matching
      • Improvements
        • Miss match for fully completed medications
          • Medication does not start with IN or BN
          • No dosage form definition
      • Summary
        • Medication maps to RxNorm
        • Normalization capability
        • User customizability
        • Efficient dictionary look up
      • Future
        • Provide look for open source
        • Test and expand narrative sections
        • Expand normalization capability
        • Implement medication status change
      • Q&A?
        • Is this information coming from the narrative field?
          • In the clinical note, there is a section called medication. These medications are important for clinical research. We are focused on the section first. In narrative we might have more specific conditions. We also wrote a paper for i2b2 to monitor patterns. We can use these patterns to support the narrative sections.
    • Jorge Herskovic (Pan-SHARP)
      • Medication Reconciliation – process to ensure patients receive the correct medication
        • This is important to prevent patient harm, to improve medication management and intervention, and to avoid discrepancies between reality and medication records
        • Skilled personnel is needed with enough time to avoid mistakes
        • MedRec is time consuming for clinicians
          • Can take 3 hours with some patients
          • 5-60 mins per admission
        • Reconciliation is in the spotlight
          • Meaningful use
          • National Patient Safety Goal from the Joint Commission
          • Most do MedRec poorly or not at all
        • ML Trends Research Results
          • Haven’t been able to demonstrate effectiveness
        • Improving medication reconciliation
          • Recognize it is a human task
          • Make reconciliation easier and faster
        • Pan-SHARP project
          • Year 1: Provided a demo to show integration of SHARP capabilities including
            • MDPnP: device data (data collection)
            • SHARPn: Data models and normalized data
            • SMART: Application Platform
            • SHARPC: User interface, reconciliation algorithms
            • SHARPS: security, attribution, provenance data
          • PanSHARP organization
            • Distributed team, highly collaborative, and are leveraging existing SHARP technology as much as possible
          • Q&A
            • There is a tendency to copy and paste prescriptions. Do you have a way to see when a prescription is really changed or updated?
              • We can do something like that. There are legitimate reasons for copying and pasting. The best way is to look at fulfillment data to determine when the medication was handled and when it was handed. We are going to try to include fulfillment data. This is the best source of data that we have. You will never have exact fulfillment data even if this information is recorded, because a patient can later spit out his/her medication. One of our sky in the pie ideas include the MDPnP folks developing an in-home dispenser to leverage remotely to ensure patients are receiving the medications that he/she should be getting.
    • Jessica Nadler
      • Jessica provides an overview of the need and challenges associated with MedRec
      • Jessica then discuss the SHARP process and highlights of each SHARPs capabilities
      • Q&A
        • You can drag and drop but can you also make changes?
          • Yes it’s very straightforward.
          • We looked to see what pharmaceutical companies were involved and we can match drugs. When this fails, we use an NLM approach that extracts for semantic types, etc. We use this to build profiles on how the drug is considered in the literature. When there is a match they will be highly unlikely.
          • Users are listed as patients or physicians, etc.
            • There are many users. Clinical users are the top users. We can also allow the patients to match what they are supposed to be taking vs. what they are taking. You can have a patient do this in a waiting room. Also pharmacist can use this because they will probably start doing more reconciliation. We built a general mapping solution and are building solutions for different markets in the future.

Applied NLP & Information Extraction future exploration


Part I: NLP for Clinical Decision Support Kavishwar Wagholikar, MBBS, Ph.D.
  • Two diseases are studied for point of care decision support: cervical cancer, colonoscopy.
  • Earlier work was not about decision support at the point of care. This work attempts to address this.
  • The guidelines for MDs to use are complex and may only be used correctly 50% of the time.
  • Many parts of the medical record are combined to support this work - not solely free text.
  • An interesting crowd-sourcing technique was used to test the proper decisions (recommendations from the clinicians). This included having clinicians give a pretend recommendation for a test, real-world scenario to compare with the automated system.
  • Observing the clinical, patient, treatment room was used to help realize where the most accurate information comes from which was placed into the system.
  • This system may be difficult to adapt to another institution. Work-flow may be different. Data sources may be different. NLP may not be trained the same.
Part II: Biomedical Informatics and Clinical NLP in Translational Science Research. Piet de Groen, M.D.
  • Real case patient work was looked at from patients that brought their own set of their records on paper and CDs. Yikes.
  • Time series of events was the goal of this work, but back then this was impossible. But simply finding "Lipitor" in the record on a time line was very helpful.
  • Colon Cancer studied for 13 years of data. Idea is to find the right physician, because the missed rate is indicative of an issue. Software can make it possible for the same person to see the history of what they had reported and NLP plays a big part of this.
  • The idea is to get the data in the hands of the clinicians in order to entice them to be the best they can be.
Poster: Enabling Medical Experts to Navigate Clinical Text for Co-Hort Identification Stephen Wu, Ph.D.
  • The goal is to bring NLP systems, typically for developers, to medical experts - called meTAKES (medical expert text analysis knowledge extraction system).
  • Requirements are greatly varied among exports. A system must support use-case specific along with a comprehensive approach.
  • Interface is currently in infancy. There is a lot of growth potential using text characteristics, semantics relationships, human computer interaction, building in structured data, and APIs.
  • The relation to phenotyping seems like something we should pursue.
  • Timeline again seems to be a factor. How do we take a snapshot and be able to combine that with future snapshots.
Part 4: Future Exploration of NLP needs in Clinical Settings facilitated session
  • Discussion was contained after each talk.

Data Quality

    • Kent Baily (Data Quality)
      • Kent provides an overview
        • User Cases
          • Clinical research – case control studies -requiring equal numbers and controls
          • Basic Scientists – Lab scientists, chemists – they just want high specificity and value but they don’t care about sensitivity
          • Business professionals – recruit for clinical trial, requiring no need for sensitivity since there is patient screening involved
          • Epidemiologists – looking for high sensitivity
        • Different questions, different needs for sensitivity and specificity
          • Consider statistical power as a function of sensitivity and specificity
          • Optimal ROC Point?
            • Depends on true cases
        • Summary
          • Data quality is suitable for the question
          • Different users, have different goals
          • Optimal ROC point
        • Q&A
          • How does the need for sensitivity, specificity, and quality differ for specialists?
            • Depends on the quality metrics, this would vary based on individual and group comparisons
    • Signal Noise Ratio
      • Implications
        • Power to find associations
        • Optimal choice of measure
        • Optimal weighting
      • Data Quality Implications
        • Create signal: noise measure for quantitative variables
        • Batch-level characteristic
        • Assign to datum or provide information about variables
        • Generalization to quantitative variables?
      • Summary
        • Standard statistical components of variance are helpful
        • Characterize variation across subjects
        • Implications for power, weighting
    • Susan Welch (Data Quality and Heterogeneity)
      • Conducted analysis to assess missed and patient visit locations/insurance coverage
      • Conclusions
        • Selection are helpful
          • Understand bias among selected cases
          • Important to compare across organizations
        • Unintended consequences of selection
          • Miss early or severe cases
          • Miss patients by artifacts

Phenotyping Tools Demo & Code Sprint

Part I: Phenoportal Demonstration Jyoti Pathak, Ph.D., Dingcheng Li, Ph.D.
  • Examples of the phenotyping software were shown and discussed.
  • Authoring software is being considered. Right now it is simply a pointer to an external editor.
  • Current testing in on simulated data not actual CEM data. The base used eventually could be CEMs from the data normalization pipeline execution.
  • Question and answer
    • How many of the transforms where XSLT transforms. Translation is thru UIMA mechanisms.
    • Does the workflow represent separate queries at each step? Yes, there are rules separate for each step. They run in order depending on the previous step. The result is a set of patients.
    • Can the rules run for just certain encounters? Yes.
    • Is the process dependent on how you write the HTML request? Yes, it runs in the order that you write in not trying to optomize. That is the user's job.
Part II: A Knowledge-Driven Workbench for Predictive Modeling Peter Haug, M.D.; Xinzi Wu; John Holmen; Matthew Ebert; Robert Hausam; Jeffrey Ferarro
  • Discussion and demo of the Analytic Workbench - a mechanism to help put together the resources for the data your research will happen on - the things that start your research:
    • Medical knowledge, terminologies, statistics.
  • A new terminology is used to represent this knowledge.
  • The system is currently a prototype.
  • Issues: errors in the original data, multiplicity of recording the original data, complexity.
  • Question and Answer
    • Are the criteria built into the ontology? Yes, the queries are then built from this.
    • How would this perform on non-normal levels of conditions in patients? Statistical uses can handle this.
Part III: Clinical analytics driven care coordination for 30-day readmission/ 360 Fresh Demonstration Ramesh Sairamesh, Ph.D.
  • 360 Fresh is a company doing things with phenotyping.
  • Methods used were covered and then the demo.
  • Real time analysis of all the data available for the patient in an attempt to determine risk of being readmitted.
  • Support at point of care can also be presented for further care like suggestions for nutrition, weight management, home care, etc.
  • Data for these systems are housed within the institution not at the business.
  • Human and machine data is combined for the real-time support. It is critical that intake from all the people affecting the patient are collected.
  • Many large hospitals are already using this. The developers preset the selections for the clinician depending on the disease they are looking into.
  • Things like negation are taken care of under the covers.
  • Dictionaries to use is some art some science. Dictionaries, models, and rules all go together but it is based on clinical knowledge from the past and study design.