- 1 Introduction
- 2 Download Semantator
- 3 Functionalities
- 4 Future Exploration
Querying and browsing data embedded in biomedical text is an important and challenging task. The emerging semantic web techniques envision a web of data which allows users to browse and query information of interest within documents directly. A prerequisite of this goal is to develop methods to produce decent structured data, i.e., converting data originally in free text into structured formats. Fully automatic approaches for data extraction are preferred but they do not always give satisfying results, and relying on manual annotation may not be realistic due to the large volume of text that needs to be processed. Therefore, semi-automatic data curation, where information from biomedical text is automatically extracted and then manual efforts are used to refine the annotations, is an attractive alternative. In addition, the results from semi-automatic processes could potentially serve as training sets for automatic systems to further improve their performance.
To support semi-automatic curation we developed Semantator, a user-friendly, semantic-web-oriented environment for browsing and querying the annotated data, as well as interactively refining annotation results if needed. Semantator is implemented as a Protege plug-in that allows users to view the annotation in its original context, the ontology used for annotation, and the annotation results in the same environment. Semantator provides two modes: (1) manual annotation; and (2) semi-automatic annotation. In the manual annotation model, a human expert curator can choose a document to be annotated and a domain ontology, highlight different pieces of information from the original text, and then mark which ontology concepts the information belongs to. The system will generate class instances according to curator’s annotation and displays different class instances in different colors. Curators can also link the instances together using the properties defined in the domain ontology. In the semi-automatic annotation mode, users can choose to use different automatic annotation tools such as the National Center for Biomedical Ontologies (NCBO) annotator and Mayo Clinic’s Clinical Text Analysis and Knowledge Extraction System (cTAKES), which are well-acknowledged tools for annotating biomedical and clinical text. Curators can review the annotation results in the Semantator environment and modify as needed. The annotation results are saved in RDF so that they can be used by tools developed by the semantic web community for querying and reasoning. In addition, Semantator also provides an interface where users can compare annotations done by different curators or annotation tools, to determine inter-annotator agreements, and to resolve conflicts among different annotations.
Download Semantator (Version 1.0) - Released February 16th 2012.
- Dowload the zip file
- Unzip the zip file in 'plugins' directory of Protege 4.1 (or newer).
- Move semantator.properties to one directory up (i.e. to Protege's installation directory).
CNTRO Timeline API Library Download Instructions.
Similar to Knowtator, the popular tool for manual annotation, Semantator also supports instance creation/deletion, relationship creation/deletion and generating equivalent instances. In this section, we will introduce each of these functionalities with relevant screenshots. Our tool is currently under testing and will be available soon.
Before a user can do any annotation, he/she needs to load a plain text document to Semantator. Click File -> Open, the user could then select a plain text file from his/her local disk. Please make sure to use the Semantator File button instead of the Protege File button.
We provide two approaches for a user to create instances: One-at-a-Time and Batch Creation.
A user can create a single instance at a time by following the steps: 1. Select a piece of text from the loaded document 2. Right click -> Create Instance 3. Choose a class from the class list of the popup window 4. Click “Done” 5. The first time a user creates an instance of a particular class, he/she will be asked to choose a datatype property to store the selected text; also, the system will also ask the user to pick a color to highlight all instances of the selected class.
The first method might work well when the loaded document is long and there are many instances of the same class that a user wants to annotate. To alleviate this problem, we provide the second option for creating instances, Batch Creation with the following steps: 1. Select one piece of text from the loaded document 2. Right click -> Add to Instance Creation List 3. The selected text will be added to the “Instances” panel that displays all the selected document pieces for creating new instances 4. In the menu, click Create -> Instances 5. Steps 3-5 apply here. Please note that, for Batch creation, we assume that all selected document pieces will used to create instances of the same class.
A user also has the option the delete any of the created instances with the following steps: 1. Move mouse cursor to the place that is between the start and end position of an instance; 2. Right clilck -> Delete -> Delete [instance]; Because multiple instances could overlap some text in the document, the system detects all relevant instances to the position of the mouse cursor, and users could then choose to delete any of these instances. Figure 8 & 9 demonstrate this deletion process.
Generate Instance Equivalences
In a real clinical document, it is possible that the same events occur multiple times within the document; therefore, it would be necessary to provide the functionality that users can generate equivalences between two or more instances as described below: 1. Move mouse cursor to the place that is between the start and end position of an instance; 2. Right click -> Add SameAs [instance] 3. Choose another instance by repeating step 2 4. Menu -> Create -> SameAs Figures 10 – 12 show this process. Although we only show how to generate equivalences between two instances, users can add more instances to the SameAs pool and generate the links.
Another type of important annotations is the relationships between ontology instances. Within Semantator, we allow users to create a single relationship between two instances at a time with the following procedure: 1. Move mouse cursor to the place that is between the start and end position of an instance; 2. Right click -> Add to Relate -> Relate [instance]; 3. Repeat steps 1 & 2 to choose the second instance; 4. Menu -> Create -> Relationships 5. Choose an object property to be used to link the two selected instances 6. On the bottom of the popup window, select one instance to be the subject of this new relationship 7. Done Figures 13 – 17 demonstrate this process.
Similar to instances, users can also delete any of the existing relationships by following the procedure below: 1. Move mouse cursor to the place that is between the start and end position of an instance; 2. Right clilck -> Delete Relationships -> Delete [instance]; 3. Choose one relationship to delete from the popup window -> Delete. Figures 18 – 19 show this deletion process.
One more functionality that Semantator provides is to annotate on the relationships between instances. For example, a user might want to add some customized comments or explanations to the instance relationships he/she created. This annotation process is described as following: 1. Following Steps 1-3 from Section 2.5; 2. Select a piece of text from the document; 3. Right click -> Annotate with Selected Text; 4. Choose one of the relationships between the two selected instances -> Done. Figure 20 shows this annotation process.
Furthermore, a user can annotate a relationship with another instance with the following steps: 1. Two instances (e.g., two events) and their relationships (e.g., before) need to be created before you can add more annotations to them; 2. Create another instance (e.g., an instance of the duration class); 3. Similarly to relate two instances, a user needs to add them to the relationship candidate list; 4. Left click and right click on the text of the duration instance created from step 2, and choose "Annotate with [Event2]"; 5. A new window will popup, listing all current relationships between the two instances added to the relationship candidate list; 5. The user can then choose any of these relationships to be annotated with the duration instance.
After users finish with their annotation on a loaded document, Semantator allows them to store the annotation results on their local hard disk in the RDF format. This can be done by clicking File -> Save -> Choose the format -> Provide the file path -> OK as shown in Figure 21.
In the future, we would like to add more functionalities to Semantator. First, since Knowtator has extensively used by researchers, we are interested in providing a module that can automatically translate Knowtator output to RDF with the support of a mapping ontology between the Knowtator schema and a domain ontology. Furthermore, we are interested in connecting Semantor to some Natural Language Processing (NLP) tools (e.g., cTAKES, OpenNLP, etc.) so that the loaded document can be processed automatically and then corrected and augmented through manual annotations. We hope that using such automatic tools could make the annotation process less time-consuming and thus facilitate the entire process. Finally, we would like to enhance Semantator with some query capability so that users can issue queries (e.g., SPARQL) based on the annotation results.