URIProposal

From CIMI
Jump to: navigation, search

Proposed approach to URI use for CIMI modeling effort.

Introduction

One of the foundational components of interoperability is the ability to unambiguously name the various resources (classes, individuals, models, model instances, model fragments, ...) to be exchanged. Naming involves three aspects:

  1. The "tacit" context - the context as understood by the community, organization, collection of individuals, etc.
  2. A namespace - "A namespace provides the scope within which the scoped identifier uniquely identifies the identified item."[1]
  3. The scoped identifier - "identifier of an identified item within a specified namespace"[1]

Our job, in part, is to minimize the tacit context component - to make the identifiers as explicitly unambiguous as possible. Were this the only goal, the solution would be simple - DCE UUIDs[2] can be generated on almost all modern computers and are pretty much guaranteed to be unique.[3]. We could generate a new UUID every time we encountered a new resource and, while these identifiers would unambiguously name an associated resource, a given resource might end up being assigned 0, 1 or 100 different identifiers. This, in turn, leads to two additional requirements:

  1. It should be easy to determine whether a given resource has already been assigned an identifier
  2. To the extent possible, each resource should be assigned exactly one identifier.

To meet the first requirement above, we need (a) a formula constructing a URI from an identifier and (b) a way to determine whether it is valid. By itself, however, this solution is still insufficient. You can produce a formula based on purlz, I can produce another formula based on ISO OID's and who is to say which of our formulas is the correct one? The way to address this impasse is:

  1. Have the organization responsible for the resources themselves declare how URI's are to be formed for their resources
  2. Base the resources on the http: protocol and a internet domain name owned by the responsible organization
  3. Have the organization host a service that responds when presented with a URI in a way that makes it clear that the URI is sanctioned by the organization.

As an example, the World Health Organization has declared that URI's for ICD 10 classes will be formed by: http://id.who.int/icd/release/10/code/{ICD 10 code}. This uses the http: protocol and a domain name that is owned by the world health organization. This allows to construct a URI for any ICD-10 code (e.g. http://id.who.int/icd/release/10/code/A00.0) and, if there is any doubt, de-reference it to get validation from the WHO that it is indeed recognized.

It should be noted that we still need to maintain the distinction between the resource itself and a description of the resource. A given resource may have many descriptions in many different formats. Descriptions may originate from the parent organization (e.g. the description of ICD-10 A00.0 comes from the WHO) or may come from secondary resources (e.g. IHTSDO to ICD-10 Map, usage notes for a given clinic, annotations, etc.). For this reason it is critical to differentiate the URI for the resource from the URI (usually URL) of a description of the resource. As an example, the CTS2 syntax states that you can ask a service for all known descriptions of a resource using the url http://<service base>/entitybyuri?uri={resource uri}. As an example, to find out what the CTS2 service hosted at Mayo says about the SNOMED-CT code "74400008", one would use: http://informatics.mayo.edu/cts2/services/py4cts2/entitybyuri?uri=http://snomed.info/id/74400008

Known URI's

Code System

A Code System represents collection of codes that describe "concept" resources (classes, categories, "terms" in some contexts, etc.). Code Systems evolve over time by adding, modifying and removing resource descriptions. The URI for a code system refers to the aspects of the code system that are version independent - who publishes it, what it is for, how often it is released, rules that govern access to the resource, etc.

SNOMED CT

SNOMED CT has two different "flavors" of resource that fall under the rubric "code system". The first is the notion of an "edition" or "extension", such as the SNOMED CT International Edition, the SNOMED CT US Extensions, etc. The second is that of a module - a collection of statements asserted by a particular resource. The following identifier patterns are in the final stages of being approved by IHTSDO:

Flavor Pattern Example Resource
Extension http://snomed.info/sct/{sctid for "root" module} http://snomed.info/sct/900000000000207008 SNOMED CT International Edition
Module http://snomed.info/id/{sctid for module} http://snomed.info/id/900000000000207008 SNOMED CT Core Module
http://snomed.info/id/900000000000012004 SNOMED CT model component

ICD and other WHO classifications

Pattern Example Resource
http://id.who.int/icd/release/{release}{ http://id.who.int/icd/release/9 ICD-9
http://id.who.int/icd/release/9-CM ICD-9-CM
http://id.who.int/icd/entity ICD Linearization Layer (part of the ICD-11 project)
http://id.who.int/icd/11/morbidity ICD-11 Morbidity Linearization

Code Systems in the UMLS but NOT defined by the source agency

Pattern Example Resource
http://umls.nlm.nih.gov/sab/{RSAB} http://umls.nlm.nih.gov/sab/LNC LOINC
http://umls.nlm.nih.gov/sab/MTH UMLS Metathesaurus

HL7 Code Systems

The HL7 Codes systems, while present in the UMLS, are represented as a monolithic system - there is no separate code system for, ActCode or AdministrativeGenter. HL7 has proposed the following:

Pattern Example Resource
http://hl7.org/vocab/v3/{CodeSystemName} http://hl7.org/vocab/v3/AdministrativeGender HL7 V3 Administrative Gender codes

Other Code Systems

The remaining code systems tend to come from the W3C space and, with the exception of Language and Mime Type, have well known URI's.

URI Description
http://www.w3.org/2004/02/skos/core#prefLabel SKOS "preferred label" property
http://www.w3.org/2004/02/skos/core#Concept SKOS Concept class
http://www.w3.org/1999/02/22-rdf-syntax-ns#type RDF type property
http://www.w3.org/2000/01/rdf-schema#subPropertyOf RDF Schema sub property of property
http://www.w3.org/2002/07/owl#Class OWL Class class
http://purl.org/dc/terms/ Dublin Core Terms
(TBD) IETF RFC5646 Language Code http://tools.ietf.org/html/rfc5646
http://www.w3.org/2001/XMLSchema XML Schema Data Types
http://www.iana.org/assignments/media-types/ IANA Media Types (proposed - not official with IANA)

Code System Versions

A Code System Version represents a "release" or "snapshot" of a code system at a point in time. It is a Code System Version that actually makes assertions about entities. Code System versions form a partial order, but one cannot assume that (a) the ordering is linear or (b) dates or timestamps can be used to determine which came first. The relationship between two versions should be stated explicitly - with predecessor / successor(s) relationships.

SNOMED CT

Flavor Pattern Example Resource
Extension http://snomed.info/sct/{sctid for "root" module}/version/{release id} http://snomed.info/sct/900000000000207008/version/20120731 SNOMED CT International Edition July 2012 Release
Module http://snomed.info/id/{sctid for module}/version/{release id} http://snomed.info/id/900000000000207008/version/20120731 SNOMED CT Core Module July 2012 Release

ICD and other WHO classifications

Pattern Example Resource
http://id.who.int/icd/release/{release}/{version}{ http://id.who.int/icd/release/9/2008 ICD-9 2008 Version
http://id.who.int/icd/release/9-CM/2010 ICD-9-CM 2010
http://id.who.int/icd/entity ICD Linearization Layer (TBD)
http://id.who.int/icd/11/2016/morbidity ICD-11 Morbidity Linearization 2016 Version

Code Systems in the UMLS but NOT defined by the source agency

Pattern Example Resource
http://umls.nlm.nih.gov/sab/{RSAB}/version/{VSAB} http://umls.nlm.nih.gov/sab/LNC/version/LNC238 LOINC version 238
http://umls.nlm.nih.gov/sab/MTH/version/MTH2012AB UMLS Metathesaurus 2012AB release (note: Not finalized with NLM)

HL7 Code Systems

The HL7 version identifier is drawn from the version identifier of the HL7 model.

Pattern Example Resource
http://hl7.org/vocab/v3/{CodeSystemName}/version/{hl7 version id} http://hl7.org/vocab/v3/AdministrativeGender/212 HL7 V3 Administrative Gender codes version 212

Other Code Systems

Version identifiers in other code systems such as SKOS and RDF follow a variety of patterns.

URI Description
http://www.w3.org/2004/02/skos/core# SKOS
http://www.w3.org/1999/02/22-rdf-syntax-ns# RDF
http://www.w3.org/2000/01/rdf-schema# RDF Schema
http://www.w3.org/2002/07/owl# OWL 1
http://purl.org/dc/terms/ Dublin Core Terms
http://www.loc.gov/standards/iso639-2/ ISO 639-2 language codes (proposed - not official via Library of Congress)
http://www.w3.org/2001/XMLSchema XML Schema Data Types
http://www.iana.org/assignments/media-types/ IANA Media Types (proposed - not official with IANA)


Concept Identifiers

Concept (entity, class, property, individual, category and, in some contexts, "term") URI's are typically (but not necessarily) constructed from the namespace of the "primary" or "defining" code system. Note, however, that the connection between the scoping namespace of a concept URI and a code system cannot be assumed. It is possible for a concept namespace to be completely separate from the defining code system. Note also, that version identifiers should not be a part of a concept identifier, as the question that needs to be asked is "what does version V of code system S say about concept C?".

SNOMED CT

Pattern Example Resource
http://snomed.info/id/{sctid} http://snomed.info/id/74400008 Appendicitis
http://snomed.info/id/447565001 Virtual therapeutic moiety simple reference set
http://snomed.info/id/900000000000207008 SNOMED CT core module
http://snomed.info/id/127489000 Has active ingredient attribute

ICD and other WHO classifications

Pattern Example Resource
http://id.who.int/icd/release/{release}/code/{code} http://id.who.int/icd/release/9/code/291.0 Delirium Tremens in ICD-9
http://id.who.int/icd/release/9-CM/code/297.3 Shared Psychotic Disorder
http://id.who.int/icd/entity/{code} http://id.who.int/icd/entity/1634725920 ICD Linearization Layer definition of Acute Myocardial Infarction
http://id.who.int/icd/11/{module}/{code} http://id.who.int/icd/11/morbidity/A00.0 ICD-11 Morbidity Linearization Code A00.0

Code Systems in the UMLS but NOT defined by the source agency

Pattern Example Resource
http://umls.nlm.nih.gov/sab/{RSAB}/code/{code} http://umls.nlm.nih.gov/sab/LNC/code/15369-2 LOINC Code for Lithium/Saliva
http://umls.nlm.nih.gov/sab/MTH/code/C0264716 UMLS Metathesaurus Code for Chronic Heart Failure

HL7 Code Systems

The HL7 Codes systems, while present in the UMLS, are represented as a monolithic system - there is no separate code system for, ActCode or AdministrativeGenter. HL7 has proposed the following:

Pattern Example Resource
http://hl7.org/vocab/v3/{CodeSystemName}/code/{code} http://hl7.org/vocab/v3/AdministrativeGender/code/M HL7 V3 Administrative Gender code for "Male"

Other Code Systems

URI Description
http://www.w3.org/2004/02/skos/core# SKOS
http://www.w3.org/1999/02/22-rdf-syntax-ns# RDF
http://www.w3.org/2000/01/rdf-schema# RDF Schema
http://www.w3.org/2002/07/owl# OWL 1
http://purl.org/dc/terms/ Dublin Core Terms
http://www.loc.gov/standards/iso639-2/ ISO 639-2 language codes (proposed - not official via Library of Congress)
http://www.w3.org/2001/XMLSchema XML Schema Data Types
http://www.iana.org/assignments/media-types/ IANA Media Types (proposed - not official with IANA)

Value Sets

Maps

Concept Domains

References

  1. 1.0 1.1 Information technology — Metadata registries (MDR) — Part 3: Registry metamodel and basic attributes. ISO/IEC FDIS 11179-3:2012(E) 2012-07-31
  2. http://www.itu.int/rec/dologin.asp?lang=e&id=T-REC-X.667-200409-S!!PDF-E
  3. UUIDs are based on the MAC Address - a scoping namespace established by the manufacturers of a network interface cards

References