"Addison's Disease" is a member of
"Diseases of the Adrenal Glands" is a member of
"Diseases of the endocrine system" is a member of
"Primary adrenocortical insufficiency" is a member of
"Other disorders of adrenal gland" is a member of
"Disorders of other endocrine gland"
"Addison's Disease" is a member of
"Adrenal Gland Hypofunction" is a member of
"Adrenal Gland Diseases" is a member of
"Endocrine Diseases" is a member of
This example on Addison's disease is borrowed from the presentation "The Unified Medical Language System (UMLS): What Is It and How to Use It?" by Oliver Bodenreider, Jan Willis, and William Hole.|
In order to cope with the incredible chaos these competing classification schemes bring into play, the UMLS forgoes any attempt to organize information into a logically rigorous fashion, focusing instead on being comprehensive and practical, ala Guys in the Garage- It is basically just a hodgepodge of source vocabularies without any kind of true description logic or precise ontology at its core, by design. However, It is still an incredible accomplishment, since it can tie together over a hundred distinct source vocabularies and contains many millions of terms, which it is able to map between the disparate sources. Even more remarkably, it is a true superset of many of its sources- A complete UMLS database (many gigs in size) can be algorithmically be manipulated to generate a byte-for-byte exact duplicate of many of its source database.
It consists of 2 main parts: The Metathesaurus and the Semantic Network.
The UMLS Metathesaurus
This is basically a grandaddy of medical thesauruses- It is the core of the UMLS. It has been generated by processing all of the source vocabularies of the UMLS, using a combination of automated processing software, along with extensive hand-editing by human editors. All the vocabulary is organized into Concepts, Terms, Strings, and Atoms.
If the processing software determines that a vocabulary name has not yet been seen in another vocabulary, a human editor will try and determine if it is just a different name for something already in the database, or if it is completely new. If it is new, then a new Concept is created in the UMLS, and is given a unique concept id (called a CUI). So, for instance, somewhere in the UMLS there would be a concept of "Headache".
If it is just a complete different name for a known concept, it is linked to that concept's CUI and is declared a new Linguistic Term for that concept. So, for instance, "Cephalgia" is a different Linguistic Term linked to headache. Each is given a unique linguistic identifier (called an LUI).
If it is just a minor textual variant of a known term, then it is just declared a new String and given a unique string id (called an SUI). So, for instance, capitalized "Headache" would be a different string from lowercase "headache".
Finally, since the thesaurus carefully tracks where each vocabulary item originates from, each appearance of a vocabulary item is assigned a unique Atom (referenced with the so-called AUI) which is a location in the database where detailed information is stored as to where this new appearance of the word was found.
The Semantic Network
Although no rigorous logical framework exists inside the UMLS, there is a simplistic "semantic network" that allows one to determine the basic relationship between two related concepts- Again, this is just a pragmatic add-on to the main database that is not guaranteed to give any kind of scientifically rigorous answers. It consists of a web of <100 common abstract ideas. For instance, it contains the idea that a disease may be found in an organ:
"Body Part, Organ, or Organ Component" --location-of--> "Disease or Syndrome"
... Using data in the UMLS, it one can find linkages to this idea from Addison's disease, for instance- This then makes it possible to learn that the Adrenal Cortex is a part of an organ and that Addison's disease is a disease that is located in that organ- But, again, only in a somewhat haphazard fashion.
The UMLS and the Future of Medical Software
Although it is hard to know what knowledge representation in medical software of the future will look like, it is clear that it will most likely involve some kind of system in which doctors and other domain experts (the Writers) can enter scientifically rigorous information directly into a software application. Such a software system would have an extensive understanding of differing medical vocabularies in order to cope with all entered data, even if it is entered in unexpected formats. Having a super medical thesaurus, like the UMLS, will be a key step in making such a system possible.
The Skunkworks at Google >>