Difference between revisions of "SHARP Project Wiki:Project Background"

From SHARP Project Wiki
Jump to navigationJump to search
Line 34: Line 34:
[[Media:Intro to SHARPn.mp4|thumb|center]]
[[Media:Intro to SHARPn.mp4|Intro to SHARP audio file]]
== Reports ==
== Reports ==

Revision as of 21:44, 17 October 2012


{{#ev:youtube|qy-d36wAtsU}} SHARP program from Dr. Chuck Friedman Charles P. Friedman, PhD.; Chief Scientific Officer for Information Technology at the Office of the National Coordinator for Health Information Technology (ONC) in the U.S. Department of Health and Human Services speaks about the Strategic Health IT Advanced Research Projects (SHARP) Programs.

Program Organization

SHARP Program Organization


Area 4: Themes & Projects

Area 4: Project Initiation Meeting Slides PDF

Area 4: Project Progress Update Sept 3, 2010

Area 4: 2010 Progress Report (04/01/2010-12/31/2010)

Project Proposal


We propose research that will generate a framework of open-source services that can be dynamically configured to transform EHR data into standards-conforming, comparable information suitable for large-scale analyses, inferencing, and integration of disparate health data. We will apply these services to phenotype recognition (disease, risk factor, eligibility, or adverse event) in medical centers and population-based settings. Finally, we will examine data quality and repair strategies with real-world evaluations of their behavior in Clinical and Translational Science Awards (CTSAs), health information exchanges (HIEs), and National Health Information Network (NwHIN) connections.

We have assembled a federated informatics research community committed to open-source resources that can industrially scale to address barriers to the broad-based, facile, and ethical use of EHR data for secondary purposes. We will collaborate to create, evaluate, and refine informatics artifacts that advance the capacity to efficiently leverage EHR data to improve care, generate new knowledge, and address population needs. Our goal is to make these artifacts available to the community of secondary EHR data users, manifest as open-source tools, services, and scalable software. In addition, we have partnered with industry developers who can make these resources available with commercial deployment. We propose to assemble modular services and agents from existing open-source software to improve the utilization of EHR data for a spectrum of use-cases and focus on three themes: Normalization, Phenotypes, and Data Quality/Evaluation. Our six projects span one or more of these themes, though together constitute a coherent ensemble of related research and development. Finally, these services will have open-source deployments as well as commercially supported implementations.

There are six strongly intertwined, mutually dependent projects, including: 1) Semantic and Syntactic Normalization; 2) Natural Language Processing (NLP); 3) Phenotype Applications; 4) Performance Optimization; 5) Data Quality Metrics; and 6) Evaluation Frameworks. The first two projects align with our Data Normalization theme, while Phenotype Applications and Performance Optimization span themes 1 and 2 (Normalization and Phenotyping); while the last two projects correspond to our third theme.

Narrative PDF


Intro to SHARP audio file


ONC & PAC Reports

2012 Annual Progress Report (01/01/2012-12/31/2012)

2012 Semi-Annual Progress Report (01/01/2012-06/30/2012)

2011 Annual Progress Report (01/01/2011-12/31/2011)

2011 Semi-Annual Progress Report (01/01/2011-06/30/2011)

Dr. Friedman Site Visit Presentation (09/03/2010)

2010 Annual Progress Report (04/01/2010-12/31/2010)

ARRA Reports

Recovery.gov ARRA Reports


  1. Aberdeen J. NLP techniques for clinical record de-identification, presentation to AcademyHealth Annual Research Meeting, Seattle, June 12-14, 2011.
  2. Chapman W, Nadkarni P, Hirschman L, D’Avolio L, Savova G, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of American Medical Informatics Association. 2011 -:1e4. doi:10.1136/amiajnl-2011-000465.
  3. Choi J, Palmer M. Getting the most out of Transition-based Dependency Parsing, In the Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011, June 19 - 24, 2011, Portland, OR.
  4. Choi J, Palmer M. Transition-based Semantic Role Labeling Using Predicate Argument Clustering, In the Proceedings of RELMS 2011: Relational Models of Semantics, held in conjunction with ACL-HLT 2011, June, 2011, Portland, OR.
  5. Chute CG, Pathak J, Savova GK, Bailey KR, Schor MI, Hart LA, Beebe CE, Huff SM. The SHARPn Project on Secondary Use of Electronic Medical Record Data: Progress, Plans and Possibilities. AMIA 2011 (paper).
  6. Clark C. Recent efforts in clinical NLP: Uncertainty discovery through NLP, presentation to Natural Language Processing Workshop, i2b2 Academic Users Group, Boston, June 28, 2011.
  7. Conway MA, Berg RL, Carrell D, Denny JC, Kho AN, Kullo IJ, Linneman JG, Pacheco JA, Pessig PL, Rasmussen L, Weston N, Chute CG, Pathak J. Analyzing Heterogeneity and Complexity of Electronic Health Record Oriented Phenotyping Algorithms. AMIA 2011 (paper).
  8. Conway MA, Pathak J. Analyzing the Prevalence of Hedges in Electronic Health Record Oriented Phenotyping Algorithms. AMIA 2011 (poster).
  9. Dligach D, Palmer M. Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling. In the Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011, June 19 - 24, 2011, Portland, OR.
  10. Dligach D, Palmer M. Reducing the Need for Double Annotation. In the Proceedings of the Fifth Linguistic Annotation Workshop (LAW V) held in conjunction with ACL-HLT 2011, June, 2011, Portland, OR.
  11. Hirschman L. Evaluation as a driver in Software Communities, presentation to Workshop on Designing an Ecosystem for Clinical NLP, Integrating Data for Analysis, Anonymization and Sharing (iDASH), University of California, San Diego, May 2-3, 2011.
  12. Liu H, Wagholikar K, Wu S. Using SNOMED CT to encode summary level data - a corpus analysis. AMIA CRI 2012.
  13. MITRE System for Clinical Assertion Status Classification, JAMIA 2011; Published Online First: 22 April 2011 doi:10.1136/amiajnl-2011-000164.
  14. Rea S, Pathak J, Savova GK, Oniki TA, Westberg L, Beebe CE, Tao C, Parker CG, Haug PJ, Huff SM, Chute CG. Building a Robust, Scalable and Standards-Driven Infrastructure for Secondary Use of EHR Data: The SHARPn Project. Second stage of review at JAMIA.
  15. Savova G, Olson J, Murphy S, Cafourek V, Couch F, Goetz M, Ingle J, Suman V, Chute C, Weinshilboum R. The electronic medical record and drug response research: automated discovery of drug treatment patterns for endocrine therapy of breast cancer. Journal of American Medical Informatics Association. 2011.
  16. Savova GK, Chapman WW, Elhadad N, Palmer M. 2011. Shared annotated resources for the clinical domain. AMIA ann symp. Panel.
  17. Sohn S, Kocher J-P, Chute CG, Savova GK. Drug side effect extraction from clinical narratives of psychiatry and psychology patients. JAMIA 2011; 18:i144-i149.
  18. Sohn S, Wu S. Dependency Parser-based Negation Detection in Clinical Narratives. AMIA CRI 2012.
  19. Tao C, Parker CG, Oniki TA, Pathak J, Huff SM, Chute CG. An OWL Meta-Ontology for Representing the Clinical Element Model. AMIA 2011 (paper).
  20. Tao C, Welch SR, Wei WQ, Oniki TA, Parker CA, Pathak J, Huff SM, Chute CG. Normalized Representation of Data Elements for Phenotype Cohort Identification in Electronic Health Record. AMIA 2011 (poster).
  21. Torii M, Wagholikar K, Liu H. Using machine learning for concept extraction on clinical documents from multiple data sources. JAMIA 2011 Sep-Oct; 18(5) 580-7
  22. Wagholikar K, Torii M, Jonnalagadda S, Liu H. Feasibility of pooling annotated corpora for clinical concept extraction. AMIA CRI 2011
  23. Wu ST, Kaggal VC, Savova GK, Liu H, Dligach D, Zheng J, Chapman WW, Chute CG. Generality and Reuse in a Common Type System for Clinical Natural Language Processing Proceedings of the First International Workshop on Managing Interoperability and compleXity in Health Systems. Glasgow, Scotland. 2011.
  24. Wu S, Liu H. Semantic Characteristics of NLP-extracted Concepts in Clinical Notes vs. Biomedical Literature Proceedings of the Annual AMIA Fall Symposium. Washington DC. 2011.
  25. Wu S, Liu H, Li D, Tao C, Musen M, Chute CG, Shah N. UMLS Term Occurrences in Clinical Notes: A Large-scale Corpus Analysis. AMIA CRI 2012.
  26. Wu S, Wagholikar K, Sohn S, Kaggal V, Liu H. Empirical Ontologies for Cohort Identification. Text REtrieval Conference. 2011.
  27. Zheng J, Chapman W, Miller T, Lin C, Crowley R, Savova G. In Press. A system for coreference resolution for the clinical narrative. Journal of the American Medical Informatics Association.

Cited Literature

  1. Open Health Natural Language Processing (OHNLP) Consortium, www.ohnlp.org, Last Access Date: January 20, 2010
  2. clinical Text Analysis and Knowledge Extraction System (cTAKES), www.ohnlp.org, Last Access Date:January 20, 2010
  3. Clinical Element Model (CEM), www.clinicalelement.com, Last Access Date: January 20, 2010
  4. Chute C, Beck S, Fisk T, et al.: The Enterprise Data Trust at Mayo Clinic: A semantically integrated warehouse of biomedical data. . JAMIA in press
  5. Health Open Source Software Collaborative, https://mi.regenstrief.org/wiki/display/hoss/Health+Open+Source+Software+Collaborative;jsessionid=C5FA8654DE95870C84B9925C66082FC8, Last Access Date: January 20, 2010
  6. LexBig and LexEVS, https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/LexBig_and_LexEVS, Last Access Date: January 20, 2010
  7. National Center For Biomedical Ontology, http://www.bioontology.org, Last Access Date: January 20, 2010
  8. Pathak J, Solbrig HR, Buntrock JD, et al.: LexGrid: A Framework for Representing, Storing, and Querying Biomedical Terminologies from Simple to Sublime. Journal of Americal Medical Informatics Association 16:305-315, 2009
  9. MirthConnect, www.mirthcorp.com/community/mirth-connect, Last Access Date: January 20, 2010
  10. Institute (ANSI) of its Common Terminology Services (CTS): ANSI/HL7 CTS, V1-2005 Health Level Seven Standard: Common Terminology Services, Version 1, 2005
  11. International Organization for Standardization: ISO International Standard (IS) 27951 Common Terminology Services Version 1, 2009
  12. Noy NF, Shah NH, Whetzel PL, et al.: BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res 37:W170-173, 2009 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19483092
  13. Savova G, Masanz J, Ogren P, et al.: Mayo Clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, evaluation and applications. JAMIA, under review
  14. Chapman W, Dowling J, Hripcsak G: Evaluation of Training with an Annotation Schema for Manual Annotation of Clinical Conditions from Emergency Department Reports. Int J Med Inf 77:107-113, 2008
  15. Chapman W, Dowling J, Wagner M: Generating a reliable reference standard set for syndromic case classification. J Am Med Inform Assoc 12:618-629, 2005
  16. Coden A, Savova G, Sominsky I, et al.: Automatically extracting cancer disease characteristics from pathology reports into a cancer disease knowledge model. Journal of Biomedical Informatics 42 (2009):937-949, 2009, doi:10.1016/j.jbi.2008.12.005
  17. Ogren P, Savova G, Chute C: Constructing evaluation corpora for automated clinical named entity recognition, in LREC, Marakesh, Morrocco, 2008, pp 3143-3150, http://www.lrec-conf.org/proceedings/lrec2008/
  18. Savova G, Bethard S, Styler W, et al.: Towards temporal relation discovery from the clinical narrative, in AMIA, San Francisco, CA, 2009
  19. Uzuner Ö: Recognizing Obesity and Co-morbidities in Sparse Data. Journal of the American Medical Informatics Association. 16:561-570, 2009
  20. Uzuner Ö, Goldstein I, Luo Y, et al.: Identifying Patient Smoking Status from Medical Discharge Records. Journal of the American Medical Informatics Association 15:14-24, 2008
  21. Uzuner Ö, Luo T, Szolovits P: Evaluating the State-of-the-Art in Automatic De-identification. Journal of the American Medical Informatics Association 14:550-563, 2007
  22. Chen J, Schein A, Ungar L, et al.: An Empirical Study of the Behavior of Active Learning for Word Sense Disambiguation, in Human Language Technology conference - North American chapter of the Association for Computational Linguistics annual meeting (HLT-NAACL), New York, NY, 2006
  23. CLEAR-TK, http://code.google.com/p/cleartk/ Last Access Date:January 20, 1999
  24. Palmer M, Gildea D, Kingsbury P: The Proposition Bank: A Corpus Annotated with Semantic Roles. Computational Linguistics 31, 2005
  25. Pradhan S, Hacioglu K, Krugler V, et al.: Support vector learning for semantic argument classification. Machine Learning 60:11-39, 2005
  26. Hacioglu K, Pradhan S, Ward W, et al.: Semantic Role Labeling by Tagging Syntactic Chunks, in Proceedings of the Eighth Conference on Natural Language Learning (CONLL-2004), 2004
  27. Bethard S, Lu Z, Martin J, et al.: Semantic Role Labeling for Protein Transport Predicates. BMC Bioinformatics Jun 11:9:277, 2008
  28. Bethard S, Martin J, Klingenstein S: Finding Temporal Structure in Text: Machine Learning of Syntactic Temporal Relations. International Journal of Semantic Computing (IJSC) 1, 2007
  29. Chapman W, Bridewell W, Hanbury P, et al.: A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of Biomedical Informatics 34:301-310, 2001
  30. Harkema H, Thornblade T, Dowling J, et al.: ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform 42:839-851, 2009
  31. Chapman W, Chu D, Dowling J: ConText: An algorithm for identifying contextual features from clinical text., in BioNLP Workshop of the Association for Computational Linguistics, Prague, Czech Republic, 2007, pp 81-88
  32. Christensen L, Harkema H, Irwin J, et al.: ONYX: A System for the Semantic Analysis of Clinical Text, in Proceedings of the BioNLP2009 Workshop of the ACL Conference, Denver, CO, 2009
  33. Aronsky D, Fiszman M, Chapman WW, et al.: Combining decision support methodologies to diagnose pneumonia. Proc AMIA Symp:12-16, 2001 http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11825148
  34. He T: Coreference Resolution on Entities and Events for Hospital Discharge Summaries, in EECS, Cambridge, MA, MIT. M.Eng, 2007
  35. Sibanda T: Was the Patient Cured? Understanding Semantic Categories and Their Relationships in Patient Records, in EECS, Cambridge, MA, MIT, 2006
  36. Uzuner Ö, Mailoa J, Sibanda T: Semantic Relations for Problem-Oriented Medical Records, in Fall Symposium of the American Medical Informatics Association (AMIA 2009), San Francisco, CA, 2009, p 661
  37. Uzuner Ö, Zhang X, Sibanda T: Two Approaches to Assertion Classification, in Fall Symposium of the American Medical Informatics Association (AMIA 2008), Washington, DC, 2008, p 752
  38. Uzuner Ö, Zhang X, Sibanda T: Machine Learning and Rule-based Approaches to Assertion Classification. Journal of the American Medical Informatics Association 16:109-115, 2009, DOI 10.1197/jamia.M2950
  39. Unstructured Information Management Architecture (UIMA), http://incubator.apache.org/uima/ Last Access Date:January 20, 2010
  40. HL7, www.hl7.org/v3ballot/html/welcome/environment/index.htm, Last Access Date: January 20, 2010
  41. Poesio M, Vieira R: A corpus-based investigation of definite description use. Computational Linguistics 24:183-216, 1998
  42. Hripcsak G, Rothschild A: Agreement, the F-Measure, and Reliability in Information Retrieval. J American Medical Informatics Association 12:296-298, 2005
  43. Marcus M, Santorini B, Marcinkiewicz M: Building a large annotated corpus of english: The
  44. penn treebank. Computational Linguistics 19:313-330, 1994
  45. Kipper K, Korhonen A, Ryant N, et al.: Extensive Classifications of English verbs., in Proceedings of the 12th EURALEX International Congress., Turin, Italy, 2006
  46. Uzuner Ö, Sibanda T, Luo Y, et al.: A De-identifier for Medical Discharge Summaries. International Journal Artificial Intelligence in Medicine 42:13-35, 2008
  47. Sowa J: Conceptual graphs for a database inference. IBM Journal of Research and Development 20:336-357, 1976
  48. Sowa J: Conceptual structures: information processing in mind and machine. Reading, MA, 1984
  49. The eMERGE Network, https://www.mc.vanderbilt.edu/victr/dcc/projects/acc/index.php/Main_Page, Last Access Date: January 20, 2010
  50. CDISC, http://www.cdisc.org, Last Access Date: January 20, 2010
  51. Biomedical Research Integrated Domain Group (BRIDG), http://www.bridgmodel.org, Last Access Date: January 20, 2010
  52. Liu B, Hsu W, Ma Y: Integrating classification and association rule mining, in Intelligence AAfA, New York, 1998
  53. Thabtah F: A review of associative classification mining. Knowledge Engineering Review 22:37-65, 2007
  54. Wei W, Chute C: Identification of Type 2 Diabetes Mellitus Patients by SNOMED CT Concept Frequency. . AMIA Annual Symposium, 2009
  55. CDISC Share, http://www.cdisc.org/cdisc-share, Last Access Date: January 20, 2010
  56. LexWiki, https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/LexWiki, Last Access Date: January 21, 2010
  57. International Health Terminology Standards Development Organization (IHTSDO ), http://www.ihtsdo.org/fileadmin/user_upload/Docs_01/About_IHTSDO/Publications/CompositionalGrammar_20081223.pdf, Last Access Date: January 20, 2010
  58. Rector AL, Brandt S: Why do it the hard way? The case for an expressive description logic for SNOMED. J Am Med Inform Assoc 15:744-751, 2008, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18755993
  59. caBIG® Vocabulary Knowledge Center, https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/Main_Page. Last Access Date: January 21, 2010
  60. Agrawal R, Imielinkski T, Swami A: Mining Associations between Sets of Items in Large Databases. ACM SIGMOD Int'l Conf on Management of Data:Washington, DC, 1993.
  61. Witten I, Frank E: Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations, Morgan Kaufmann Publishers, 2000
  62. Apache Open Source, http://incubator.apache.org/, Last Access Date: January 20, 2010
  63. Apache UIMA, http://incubator.apache.org/uima/, Last Access Date: January 20, 2010
  64. OASIS, http://www.oasis-open.org/news/oasis-news-2009-03-19.php, Last Access Date: January 20, 2010
  65. Text Analytics Tools and Runtime for IBM LanguageWare, http://www.alphaworks.ibm.com/tech/lrw, Last Access Date: January 20, 2010
  66. Open Health Natural Language Processing (OHNLP) Consortium, https://cabig-kc.nci.nih.gov/Vocab/KC/index.php/OHNLP, Last Access Date: January 20, 2010
  67. U-Compare, http://u-compare.org/, Last Access Date: January 20, 2010
  68. Getting Started: UIMA Asynchronous Scaleout, http://incubator.apache.org/uima/doc-uimaas-what.html, Last Access Date: January 20, 2010
  69. “Question Answering” is technology's next grand challenge, http://www.research.ibm.com/deepqa/index.shtml, Last Access Date: January 20, 2010
  70. Rubin D: Inference and missing data. Biometrika 63:581-592, 1976,
  71. rJAVA, http://rosuda.org/rJava/ Last Access Date: January 21, 2010
  72. Melton LJ, 3rd: History of the Rochester Epidemiology Project. Mayo Clin Proc 71:266-274, 1996, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=8594285
  73. Kurland LT, Molgaard CA: The patient record in epidemiology. Sci Am 245:54-63, 1981, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=7027437
  74. Project Management Institute: A Guide to the Project Management Body of Knowledge (PMBOK ® Guide) (4th ed.). Newtown Square, PA Project Management Institute, Inc., 2008
  75. Blue Gene, http://www.research.ibm.com/bluegene. Last Access Date: January 21, 2010