the annotation guideline manual: extracting

the annotation guideline manual: extracting
THE ANNOTATION GUIDELINE MANUAL:
EXTRACTING ADVERSE DRUG EVENT INFORMATON FROM
DISCHARGE SUMMARIES AND PROGRESS NOTES IN ELECTRONIC MEDICAL RECORDS
Version 2.0 May 18, 2015
Steven Belknap, Elaine Freund, Nadya Frid, Zuofeng Li, Rashmi Prasad, Balaji Ramesh, Hong Yu
Contents
INTRODUCTION ......................................................................................................................................... 4
General Background.............................................................................................................................. 4
Guidelines Background ......................................................................................................................... 4
NAMED ENTITY OR ANNOTATION FIELDS* ............................................................................................... 6
PHI and PII Annotation .......................................................................................................................... 6
Medication Annotation: Drug and Drug Attributes .............................................................................. 7
Medication Annotation: Medication Related Entities and Attributes .................................................. 8
Assertion Category ................................................................................................................................ 9
MedDRA Annotation ........................................................................................................................... 11
ANNOTATION OF RELATIONS .................................................................................................................. 12
ANNOTATION PRACTICE ......................................................................................................................... 13
General Considerations....................................................................................................................... 13
Choosing a span .................................................................................................................................. 13
Severity ............................................................................................................................................... 13
Anaphoric pronouns ........................................................................................................................... 13
Articles ................................................................................................................................................ 13
Titles .................................................................................................................................................... 14
Prepositions ........................................................................................................................................ 14
Frequency............................................................................................................................................ 15
Drugs ................................................................................................................................................... 15
Test results .......................................................................................................................................... 16
Longitudinal Information .................................................................................................................... 16
REFERENCES ........................................................................................................................................ 17
APPENDIX 1: Additional Annotation Practice Examples and Rules for Interannotator Agreement ....... 19
Choosing a span .................................................................................................................................. 19
S/S/LIF ................................................................................................................................................. 19
Titles .................................................................................................................................................... 19
Assertion ............................................................................................................................................. 20
Anaphoric Pronouns vs. Co-referent Items......................................................................................... 20
Drug..................................................................................................................................................... 21
Adverse Event ..................................................................................................................................... 22
MedDRA .............................................................................................................................................. 22
Indication ............................................................................................................................................ 22
Severity ............................................................................................................................................... 22
Test Results ......................................................................................................................................... 22
PHI ....................................................................................................................................................... 23
General Terms ..................................................................................................................................... 23
APPENDIX 2: Entity and Attribute Tables ................................................................................................ 24
APPENDIX 3: Protected Health Information (PHI)” and “Personally Identifiable Information ............... 25
APPENDIX 4: Routes of Drug Administration and Abbreviations............................................................ 26
APPENDIX 5: Frequency of Drug Administration and Abbreviations ...................................................... 31
Appendix 6: Reference Tables................................................................................................................. 32
APPENDIX 7: Annotation Tool Notes ...................................................................................................... 34
NLP Objectives: ................................................................................................................................... 34
Tool Use Notes .................................................................................................................................... 34
2
MedDRA .............................................................................................................................................. 36
Deviations from i2b2 Guidelines ......................................................................................................... 38
Export the Annotation to XML ............................................................................................................ 39
Semi-Automated Annotation with the BioNLP named entity tagger−Lancet..................................... 39
Summary of Annotation Processes and Tooling Changes for the ADE Pharmacovigiliance Project .. 40
3
INTRODUCTION
General Background
An adverse event (AE) is an injury to a patient, and an adverse drug event (ADE) is "an injury
resulting from a medical intervention related to a drug" (1). ADEs are common and occur at a
rate of 2.4─5.2 per 100 hospitalized adult patients (1–4). Each ADE is estimated to increase the
length of hospital stay by 2.2 days and to increase the hospital cost by $3,244 (3,5). Severe
ADEs are between the fourth and sixth leading causes of death in the United States (6).
Significant healthcare savings could be realized through prevention of ADEs and through early
detection and mitigation of ADEs (5,7,8). When a clinician recognizes an ADE, a hospital
system typically prompts an appropriate response, such as discontinuation of the drug,
adjustment of dose, administration of an antidote (e.g., blood transfusion, antihistamines,
antiarrhythmics, or intravenous fluid resuscitation), or other action. While particular instances of
ADEs may be recognized and appropriately ameliorated, these events are often not coded in
diagnostic or billing fields of the medical record and are therefore “lost” to
pharmacoepidemiologists, regulatory agencies, and clinicians. One result of this loss is a
paucity of high–quality information that can lead to errors in assessment of toxicity from cancer
drugs (9). The lack of timely and accurate ADE information has led to confusion for patients and
prescribers, especially when the FDA takes regulatory action (10) that appears to be
inconsistent with the available data, as recently happened with clopidogrel (11).
Studies have shown that the occurrence of the ADE is often buried in the EMR narrative
(e.g.,(12)). The ADE is not separately recorded in the form of diagnosis code or other data
accessible in the structured fields and is therefore difficult to detect and assess. However,
manual abstraction of data from discharge notes and from other unstructured text remains a
significant impediment to progress in pharmacovigilance research. Rapid, accurate, and
automated detection of ADEs in any patient population would provide significant cost and
logistical advantages over manual ADE detection (e.g., chart review or voluntary reporting) (13).
Consequently, robust biomedical natural language processing (BioNLP) approaches that
accurately detect ADEs in EMR narratives would be of great interest to other pharmacovigilance
researchers and also would have potential application in clinical settings.
Current projects utilizing EMR annotation involve clinical narratives from cancer, cardiovascular
and diabetes patients.
Guidelines Background
These guidelines are being used to annotate patient Electronic Medical Records (EMRs) which
will be made publicly available as a corpus with high quality annotation of ADEs. This corpus will
also be used to train an innovative NLP system which is part of pharmacovigilance toolkit. The
toolkit will be integrated into the open source translational research platform i2b2 (14), so these
annotation guidelines generally align with the i2b2 (14) guidelines. Annotation objectives are
the identification of relevant named entities (disease, medications and ADEs); and discourse
relations (e.g., causal, temporal and contrastive relations) between them; severity and Naranjo
element extraction method for assessing causality.
The annotation tools use Protégé with the Knowtator plugin (15 ) and incorporate, HHS PHI and
PII terms, the Naranjo scoring system (16 ) and MedDRA (17) terms in the user interface. The
guidelines have been iteratively developed during usage and with experts across many
4
domains. The guidelines and tooling will continue to develop and be refined throughout the
annotation process and as research progresses.
Short videos demonstrating use of the annotation tooling are available (you may want to use
another browser if the links do not open in IE). Alternatively you can go to the UMass BioNLP
Annotation Resource Page:
1 Getting Started - Annotation
2 Annotation Tool Orientation
3 First Annotation PHI
4 Spans and Corrections
5 Relations Annotation
6 Adverse Events and MedDRA
7 More on Attributes
In brief, you will open a record in the annotation tool and it will look similar to the picture below.
The first panel lists the classes [1], the second panel is the medical record window [2] and the
third panel is an attribute annotation window [3]. To annotate most classes, click the class in the
left panel or in the fast annotate bar [4] and highlight it in the middle panel. Some additional
attributes and associations [5] are made from the class panel and the annotation window. A few
are made from just the annotation window, i.e. Period.
There is a website with the annotation guidelines, videos on how to use the tool, and other
resources. http://ummsres12.umassmed.edu/jt/index.php/annotation
5
NAMED ENTITY OR ANNOTATION FIELDS*1
PHI and PII Annotation
To enact the Health Insurance Portability and Accountability Act (HIPAA)(18), the Dept. of
Health and Human Services published a national standard for the electronic exchange, privacy
and security of health information. The “Privacy Rule” protects all individually identifiable health
information transmitted in any form and calls this information “Protected Health Information
(PHI)” and “Personally Identifiable Information (PII).” There are 18 common identifiers
associated with PHI and PII and which must be removed to de-identify data for use or release.
These include things such as name, address, date, Social Security Number, etc. and the
complete list of PHI is in Appendix 1. PHI is annotated to build the named entity recognition in
NLP but also for removal during de-identification. How the PHI classes are to be used is
described below.
Date: This class covers all aspects of date (except year) directly related to an individual,
including birth date, admission date, discharge date, date of death.
Age over 89: Another date identifier applies to all ages over 89 and all elements of dates
(including year) indicative of such age, except that such ages and elements may be aggregated
into a single category of age 90 or older.
Medical Record Number: Use this class to include medical record numbers, health beneficiary
plan numbers and account numbers of any type.
Social Security Number: self-explanatory
Location: It will be valuable for machine learning to annotate address with some granularity.
Most of these location identifiers are self-explanatory but Named Sites would include things
such as Universities, Organizations, named buildings, Landmarks, etc.
Name: All aspects of any name are to be annotated, first name, last name, initials, names
following titles and indicators, nicknames, logins, handles.
Identifiers: This class covers certificate/license numbers; vehicle identifiers and serial numbers
including license plates; device identifiers and serial numbers; and biometric identifiers (which
would be mostly images and we almost surely will not see that type of data).
Electronic Identifiers: e-mail, web sites, IP addresses, username and password
1
* Classes are underlined in the color used as a highlighter in annotation
6
Medication Annotation: Drug and Drug Attributes2
When an adverse event is recognized, a physician will discontinue the drug, adjustment the
dose, or administrator an antidote. Drug and drug specific attributes are important elements to
annotate. Information will be used to assess causal relations between an adverse event and
drug administration.
Field
Definition
Example
Drug name [Entity]
Substances for which the
Eg1: Lotensin 20 mg p.o. daily.
patient has experienced or
Eg 2: He was started on
will experience; including
azithromycin and ceftriaxone.
drug class name or
medications referred with
pronouns. Drug name must
be mentioned either in USP
published drug list or
included in the orange book.
Dosage [Attribute]
The amount of a single
medication used in each
administration.
- Type (Discrete/Continuous)
- Strength
(Concentration/Amount)
- Form (solid, tablet, liquid,
injectable, cream)
Route [Attribute]
- PO, IV, Topical, Epidural,
Sublingual, Intramuscular, etc.
Eg 1: In the ER, the patient received
heparin 4000 units bolus, then 1000
units per hour.
Eg 2: Digoxin 0.125 mg every other
Quantified description of the day.
drug administered in each
administration.
Method for administering
the medication.
A list with abbreviations,
see Appendix 4
Eg 1: She continues to receive
antibiotics intravenously.
Eg 2: Glyburide 5 mg orally twice a
day.
Frequency [Attribute]
How often each dose of the Eg 1:A patient was prescribed
medication should be taken Melphalan 5mg (1 tablet) daily.
including both discrete and
- Times a day, etc.
Eg 2: Labetalol 300 mg by mouth
continuous values.
three times a day.
- Specified time of day or hours
Table with Abbreviations,
see Appendix 5
Duration [Attribute]
- Days, weeks, months, etc.
How long the medication is Eg 1: The patient received Taxol for
to be administered.
one month.
Eg 2: Continue home medications
and Flagyl 500 mg 1 tablet p.o. q.i.d.
for 10 days.
2
Yellow highlighted terms and spans in this document indicate these are annotatable, but are not an indication of
class type.
7
Attributes shared with other entities are described in subsequent sections. They are: Adverse
effect, Assertion, Outcome and Reason (Indication).
Medication Annotation: Medication Related Entities and Attributes
Elements beyond the drug administration to annotate include: why a drug is being given, the
injury resulting from a medical intervention related to a drug, and differentiating the ADE from
other signs and symptoms.
Field
Definition
Indication
Medical conditions for which Present: The patient was diagnosed
the medication is given in with hypertension and was treated
the past or the present.
with Accupril.
[annotated in the class
navigation bar and appears as
the Drug attribute “Reason”
when a relation is created]
Adverse Event (AE)
Example
Past : He did have some
hypokalemia which was treated with
p.o. K-Dur
Drug related injury to a
patient.
Present: She experienced a
hypersensitivity reaction while
receiving intravenous Taxol
(paclitaxel) therapy.
Past: Patient had anaphylaxis after
getting penicillin 10 years ago.
Signs, Symptoms, Abnormal
Test Findings, and Diseases
(S/S/LIF)
Severity
[an attribute of Indication, AE
and S/S/LIF]
Medical signs, symptoms
The patient has a history of COPD.
and diseases that are
neither adverse effects nor
reasons for administering a
medication.
Intensity of an adverse
Eg 1: Severe headache, moderate
effect.
chest pain.
Eg 2: The PLB has 50% stenosis
just proximal to a widely patent stent.
(annotated in class navigation
bar, but must be added in the
right annotation window for
Indication and S/S/LIF)
8
Outcome (default notMentioned)
Annotate the Outcome field for adverse events where possible. It has four values: recovered,
not completely recover, died, not mentioned (most common). The default value for this field is
notMentioned.
If an adverse event's Assertion is Absent, the Outcome field is not annotated.
Period (default current)
Annotate temporal information for several entities: Adverse effect, Indication and S/S/LIF. This
is done using the Period attribute. Period values are: current and history. The default value for
this field is current.
Assertion Category
Assertion (modality) expresses a speaker’s degree of commitment to the expressed
proposition’s believability, obligatoriness, desirability, or reality. Ascribe assertion values to
medications and diseases, namely, to “drug”, “adverse effects”, “indication”, and “other signs,
symptoms and diseases” entities.
Present (default)
“Present” means that problems associated with the patient can be present. The drugs the
patient receives are also annotated as Present.
Examples:
 a female patient died while receiving Taxol (Paclitaxel) therapy for the treatment of
endometrial cancer
 The patient had a history of hypertension
 She is on oxycodone 10mg for pain
In our annotation, the positive value ‘present’ is the default value, i.e. if an entity does not have
any assertion value ascribed it means that the value is positive/present.
In the examples below, bold is used for entities with positive assertion value where some other
value can be suspected:




At this point in time, he does not require any more antibiotics.
she has since been discontinued on digoxin
His enalapril was changed to lisinopril
His aspirin was held
Comment: replaced, held or discontinued drugs are annotated as “positive” and not as “absent”,
since they used to be taken.

The anaphylactic shock was possibly related to Taxol (relation between Taxol and
anaphylactic shock is Adverse)
9


The anaphylactic shock was most likely related to Taxol (relation between Taxol and
anaphylactic shock is Adverse)
The anaphylactic shock was not related to Taxol (no relation annotated between Taxol
and anaphylactic shock)
Comment: anaphylactic shock is ascribed positive value, since it did occur. It is only its relation
to Taxol that is questioned or negated and we do not ascribe assertion to relations in the current
schema.


The second episode of malaise, loss of consciousness, undetectable pulse, and tension
were identified as being part of shock. Since these were considered manifestations of
the shock and anaphylactoid reaction, the previously reported separate events of
dyspnea, malaise, abdominal pain, and erythema have been deleted from the file
Supplemental information received from the reporter via BMS Japan on January 15,
2002 indicated that the events dyspnea, blood pressure decreased and facial hot
flushes were changed to anaphylactic shock
Comment: Even if certain symptoms were identified as being part of another symptom and
deleted from the file or renamed, they still did exist and need to be annotated as positive.
Absent
“Absent” asserts that the problem does not exist in the patient. Also annotate drugs the patient
did not receive as Absent.
Examples:










no known drug allergy
the patient denied any dizziness, shortness of breath…
Without syncopal episodes
The patient currently is pain free
There were no clinical signs of congestive heart failure
CVA has been ruled out (cf a consult was placed to rule out CAD where CAD receives
possible value)
She is not a candidate for anticoagulation
Rule out congestive heart failure but doubt (cf CVA has been ruled out where CVA
receives negative value)
The patient had no fever
No antibiotics were given
Comment 1: Do not annotate the outcome of “absent” adverse events.
Comment 2: Link “absent” adverse events to drugs the same way we link “present” ones.
Possible
”Possible” asserts that the patient may have a problem, but there is uncertainty expressed in the
note.
Examples:
 Questionable DVT
 Question of DVT
10





Their differential is gliomatosis versus radiation effect.
Possible anterolateral ischemia
a consult was placed to rule out CAD
Rule out congestive heart failure but doubt
The differential diagnosis for his fever included possible inadequately pneumonia versus
bacteremia versus UTI versus CSF infection
Conditional
”Conditional” is used when the mention of the medical problem asserts that the patient
experiences the problem only under certain conditions.
Hypothetical
“Hypothetical: is used for medical problems the patient may develop.
Examples:
 Should her symptoms return or headache develop, please discontinue to taper and
notify Dr. **NAME[ZZZ]'s office.
 Call Dr. X if increased swelling or redness of the left lower extremity or starts to have
difficulty breathing
Not associated with Patient
The mention of the medical problem is associated with someone who is not the patient.


Family history of prostate cancer
Brother had asthma
If needed the classification can be further detailed. For example, a drug can be “absent”
because the doctor did not recommend it or because the patient refused it; or the probability of
a disease can vary from very low to very high.
MedDRA Annotation
The adverse effects are mapped to the concepts from MedDRA
Comment: Drug allergy is considered an adverse event in the past if a specific drug is
mentioned (e.g. ALLERGIES: IMDUR). The allergy is assigned "Drug hypersensitivity"
MedDRA:10013700 and "History" value to the "Period" field. Note that in spans like (no) drug
allergies, drug allergy is annotated as S/S/LIF and not as Adverse event since no specific drug
is mentioned.
See the UMass BioNLP Annotation Resource Page: for videos demonstrating MedDRA
annotation.
11
ANNOTATION OF RELATIONS
Annotate relations (connections) between entities and their attributes. See Appendix 2 for a full
table of possible relations.
Dosage, route, frequency, duration, indication and adverse event are drug attributes and are
related to their drugs. For example, in Albuterol 2 puffs p.o. “2 puffs” is linked to “Albuterol” by
“Dosage” relation and “p.o.” is linked to “Albuterol” by Route relation. Severity can be linked to
Indication, Adverse Event or S/S/LIF.
Examples:
Drug’s attributes
Context
She receives Albuterol 2 puffs p.o. q4-6h.
The patient was treated with ampicillin for two
weeks.
He later received chemotherapy for his lung
cancer.
Patient's death was due to anaphylactic shock
caused by the intravenously administered
penicillin.
Disease’s attributes
He has severe diarrhea.
Relation
Dosage (Albuterol, 2 puffs)
Route (Albuterol, p.o.)
Frequency (Albuterol, q4-6h)
Duration (Ampicillin, two weeks)
Reason (lung cancer, chemotherapy)
Adverse (penicillin, anaphylactic, shock)
Severity (diarrhea, severe)
12
ANNOTATION PRACTICE
General Considerations








Do not make assumptions
Do not consider longitudinal information in this workflow– annotate information in current
record
Do not diagnose
Do not annotate a patient’s mistaken beliefs when medical professional commentary is
contradictory
Do not annotate general terms such as “problem” and “disease”. They are not
informative. See Appendix 1 for more examples.
Do not annotate parts of words
Annotate negations and negated words with the Assertion “Absent” so it will be detected
in the negation algorithm. Negated word examples are: nontender, anicteric,
asymptomatic.
Make relevant relations, regardless of distance between terms
Choosing a span
We include most disease complements in its names.
For example, annotation of adverse effects:
 decreased blood pressure (71/53 mmHg) and not just decreased blood pressure
 shock to the liver and breast and not just shock
Severity
Create a relation between a severity assertion and the S/S/LIF.
Severity is often indicated by modifying words such as mild, minimal, markedly, severe,
endstage, stage IB, small, extremely, substantial. Severity terms can also be phrases such as:
borderline to slightly high, moderate-to-severe. Severity terms may also include numbers: 3+
edema, 8 out of 10 pain, 2/6 murmur, 75% to 85% stenosis, score of 5
Anaphoric pronouns
Anaphoric pronouns are the pronouns that refer back to another word or phrase. We do not
annotate anaphoric pronouns like it or this in examples below even though these refer to entities
we do annotate:
The patient had diplopia but it was resolved completely.
The patient had anaphylactic shock. This was caused by antihistamines.
Articles
Indefinite article "a" is not included in annotated entities: in the noun phrase a malignant tumor
of the breast, the span annotated is malignant tumor of the breast not a malignant tumor of the
breast.
13


Definite article "the" is not included in disease names, either. In the example below the
adverse effect is “anaphylactic shock” not “the anaphylactic shock”:
The anaphylactic shock was characterized by nausea.
Titles
Certain adverse effect reports include clinical trial title, for example:

Protocol title: (NON-BMS/RETRO TAXOL) RETROSPECTIVE DATA COLLECTION
TAXOL IN PATIENTS WITH SOLID TUMORS. Investigator causality assessment was
not provided.
Do not annotate the drug name (Taxol) and its related information in the title, since the name of
the clinical trial may include drugs that an individual patient in that clinical trial does not receive.
For example a clinical trial might have this name:

"A randomized, controlled, blinded clinical trial comparing miraclecillin to wondersporin."
In this trial, some of the patients got miraclecillin and other patients got wondersporin, but no
patients got both.
Entities like Suspect Drug/Causality in the example Suspect Drug/Causality: paclitaxel
are treated like titles and not annotated.
AE in the example below is not annotated either:
 AE outcome: The patient experienced death on [words marked]
EMR section title ALLERGIES can be annotated.
 “ALLERGIES: amoxicillin and vancomycin” Allergy is an adverse event and is linked to
each drug separately.
 If an allergic reaction occurred in the past, the allergy remains and is annotated as
“present.”
 “ALLERGIES: none” Annotate allergies where there is no drug named as a S/S/LIF with
the assertion “absent”
Prepositions
Avoid prepositions except where they provide meaning or create a contiguous span such as is
to include locations and coordinated locations relative to a S/S/LIF.
Do NOT include prepositions examples are:
 When annotating duration spans, e.g. for three weeks or for an unknown period of
time, we do not include for in duration spans.
o In the noun phrase via intravenous drip we only annotate the intravenous drip span”
DO include when prepositional phrases
• Add meaning such as in frequency spans
– "before/after meals" and "with meals"
– Around three weeks
14
•
•
– Between 1 and three weeks ago
Complete a span to include a location
– itching of her skin
– Scaling patches on her legs bilaterally
Create a contiguous span about the S/S/LIF
– pain on all movements/ pain with abduction
– range of motion is limited on internal and external rotation
– dyspnea on exertion
– Significant abnormalities of retroperitoneal lymphadenopathy
– liver is diffusely increased in echogenicity and coarsened
Frequency
The adjective weekly is annotated as frequency, e.g., weekly Taxol
We annotate x2 tabs a day as “2 tabs a day” span (not “x 2 tabs a day").
Drugs
Non-drug examples
Non-drug treatment options like blood transfusions, fluids, normal saline, oxygen and red
packed cells in the examples below are not annotated as drugs:
 Given multiple blood transfusions
 Pressors continued with fluids.
 He was admitted to the hospital and hydrated with normal saline.
 The event was treated with steroids and oxygen.
 Pancytopenia, treated with G-CSF, erythropoetin and red packed cells
Not annotating drug
Do not annotate drug in drug relationship phrase and similar contexts since it does not refer to a
specific drug:


According to the pharmacovigilance center reporter and to French methodology of
causality assessment, the drug relationship is unable to determine
According to the pharmacovigilance center reporter and to the French methodology of
causality assessment, drug relationship is probable
We do not annotate the term drug when it does not denote a specific drug, however we
annotate more specific terms like chemotherapy or pain medication.
Do not annotate the relation of a drug and its indication if they are separated by more than a
sentence.
Patient comes in for evaluation of psoriasis and …….4 SENTENCES….patient is using
Motrin 200 mg tablets up to 3 tablets twice a day
Psoriasis is the indication for Motrin, but it is not useful for NLP. The indication and the drug
should be in the same sentence or one sentence in either direction to be useful for NLP.
15
Test results
Both normal and abnormal test result lists in the form of uncommented numbers are not
annotated as diagnoses:
Laboratory data showed sodium of 143, potassium 4.1, chloride 105, CO2 of 26,
BUN 4, creatinine 0.7, glucose 90, calcium 9.4. White count 5.4, hemoglobin
12.7, hematocrit 27.7, platelet count 247.
We assume that if a certain test or measurement result is significantly abnormal, the diagnosis
is mentioned in the text separately. For example, if a patient’s blood pressure was 180/100, the
report will most likely mention “high blood pressure.”
Culture, culture pending are lab tests and are not annotated. When the result is positive, you
would annotate it as part of a diagnosis.
 ‘….took a surface culture and nothing…..’ and ‘Culture pending.’
Comment lab results are annotated.
 “3 ova and parasites being negative, Giardia being negative in a stool culture that as
negative.” Annotate these items and assert they are absent.
Longitudinal Information
Annotate what is in the specific record you are viewing. Do not make assumptions or consider
longitudinal information you may know (temporal aspects of adverse events are annotated in a
separate workflow with Naranjo scoring).
 “HISTORY OF PRESENT ILLNESS: ……..history of Burkett’s lymphoma….” Patient is
within a few months of treatment, which is not a timetable to consider it cured. The
patient in fact does go on to have a recurrence of the lymphoma, but here it is annotated
S/S/LIF and as history.
16
REFERENCES
1. Bates DW, Cullen DJ, Laird N, Petersen LA, Small SD, Servi D, et al. Incidence of adverse
drug events and potential adverse drug events. Implications for prevention. ADE
Prevention Study Group. Jama. 1995;274(1):29.
2.
Classen DC, Pestonik SL, Scott Evans R, Lloyd JF, Burke JP. Adverse drug events in
hospitalized patients: excess length of stay, extra costs, and attributable mortality.
Obstetrical & Gynecological Survey. 1997;52(5):291.
3.
Bates DW, Spell N, Cullen DJ, Burdick E, Laird N, Petersen LA, et al. The costs of adverse
drug events in hospitalized patients. Adverse Drug Events Prevention Study Group. Jama.
1997;277(4):307.
4.
Nebeker JR, Hoffman JM, Weir CR, Bennett CL, Hurdle JF. High rates of adverse drug
events in a highly computerized hospital. Arch. Intern. Med. 2005 May 23;165(10):1111–6.
5.
Handler SM, Altman RL, Perera S, Hanlon JT, Studenski SA, Bost JE, et al. A systematic
review of the performance characteristics of clinical event monitor signals used to detect
adverse drug events in the hospital setting. J Am Med Inform Assoc. 2007 Aug;14(4):451–
8.
6.
Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized
patients: a meta-analysis of prospective studies. JAMA. 1998 Apr 15;279(15):1200–5.
7.
Classen DC, Pestotnik SL, Evans RS, Burke JP. Description of a computerized adverse
drug event monitor using a hospital information system. Hosp Pharm. 1992 Sep;27(9):774,
776–9, 783.
8.
Kaushal R, Jha AK, Franz C, Glaser J, Shetty KD, Jaggi T, et al. Return on investment for
a computerized physician order entry system. J Am Med Inform Assoc. 2006
Jun;13(3):261–6.
9.
Belknap S. Review: beta-blockers for hypertension increase risk for new-onset diabetes
compared with nondiuretic antihypertensive agents. ACP J. Club. 2008 Apr;148(2):38.
10. FDA Announces New Boxed Warning on Plavix Alerts patients, health care professionals
to potential for reduced effectiveness.
http://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm204253.htm.
11. Paré G, MD SR., Yusuf S, Anand SS, Connolly SJ, Hirsh J, et al. Effects of CYP2C19
Genotype on Outcomes of Clopidogrel Treatment. New England Journal of Medicine.
2066–78.
12. Jha AK, Kuperman GJ, Teich JM, Leape L, Shea B, Rittenberg E, et al. Identifying adverse
drug events: development of a computer-based monitor and comparison with chart review
and stimulated voluntary report. Journal of the American Medical Informatics Association.
1998;5(3):305.
17
13. Classen DC, Pestotnik SL, Evans RS, Burke JP. Computerized surveillance of adverse
drug events in hospital patients. Quality and Safety in Health Care. 2005 Jun 1;14(3):221–
6.
14. Uzuner O. Second i2b2 Workshop on Natural Language Processing Challenges for Clinical
Records. In: AMIA... Annual Symposium proceedings/AMIA Symposium. AMIA
Symposium. 2008. p. 1252.
15. Ogren, P.V. Knowtator: A Protégé plug-in for annotated corpus construction. In:
Proceedings of the 2006 Conference of the North American Chapter of the Association for
Computational Linguistics on Human Language Technology: companion volume:
demonstrations.(2006) 273–275.
16. Naranjo, Cláudio A., et al. A method for estimating the probability of adverse drug reactions.
Clinical Pharmacology & Therapeutics 30.2 (1981): 239-245.
17. Mozzicato P[1]. MedDRA: An Overview of the Medical Dictionary for Regulatory Activities.
Pharmaceutical Medicine. 2009 Apr 2;23:65–75.
18. National Institutes of Health, HIPAA Privacy Rule: Information for Researchers. Web.
http://privacyruleandresearch.nih.gov/default.asp Accessed 2 April 2015.
19. U.S. Food and Drug Administration, FDA Data Standards Manual: Route of Administration.
Web.
http://www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/
ElectronicSubmissions/DataStandardsManualmonographs/ucm071667.htm Accessed 2
April 2015
20. Trotti A, Colevas AD, Setser A, Rusch V, Jaques D, Budach V, et al. CTCAE v3.0:
development of a comprehensive grading system for the adverse effects of cancer
treatment. Semin Radiat Oncol. 2003 Jul;13(3):176–81.
18
APPENDIX 1: Additional Annotation Practice Examples and Rules for
Interannotator Agreement
Choosing a span
Annotate the accurate span even if it is long and include coordinated locations. For example:
 “fibromyalgia causing pain in the neck and paracervical region and down the arm” This
is a long span but the pain is in both the neck and arm.
 “adenopathy in the supraclavicular or axillary regions”
Annotate the location as part of a span, especially when in the same sentence. Otherwise do
not annotate location terms that are distant to the S/S.
 pain of the lesion on the right shoulder
 swelling on the right shoulder. It is in the anterior aspect of the shoulder….
 pustular collection underneath the end of the nail about 5 days ago. It is her right middle
finger.
S/S/LIF
General Examples
 “Sit hunched over…” This is a sign of the back pain being described.
 “Cannot really straighten out” This is a sign of the back pain being described.
 “Hepatitis C, genotype 1b” Annotate genotypes, subtypes, variants, etc. when given.
This may give information related to treatment decisions.
 "hepatitis B vaccination" Hepatitis B will be pre-annotated but the context is the vaccine
vs. the disease, so unmark it.
A “formed stool” is a normal function, and generally not annotated. However, it may be relevant
to annotate in context. For example “started to have some very formed stools from diarrheal
stools”
When annotating a S/S/LIF, include the location of where it is occurring if provided and if it can
be part of a contiguous span. Locations do not need to be highly specific.
 “headache in the back of the head”
 “headache more pronounced occipitally”
Titles
Titles are study names, references to consumer or biomedical literature. Section headers
(usually in CAPS) are titles but there are cases where you may want to annotate. ALLERGIES
and PAST CHEMOTHERAPY may be appropriate to annotate in a relation.
If indication is the title in a list of symptoms, you can use it to associate with drug. If the phrasing
of the sentence with the drug also makes the association clear, use it as well.

Joint pain. He is willing to try glucosamine to help with joint pain but……
19
Assertion
Assertion values should not exclude each other. A span can be assigned two assertion values:
 no family history of diabetes: Absent+NotAssociatedWithThePatient
 highly probable: Present+Possible
 unlikely: Absent+Possible
Hypothetical vs Conditional
If/than words are cues to use the assertion ‘conditional’.
 “If normal, treat with oral anti-inflammatory medications and” Here the anti-inflammatory
medication is annotated as ‘conditional’.
Should symptoms return would indicate using the assertion ‘hypothetical’.
Present and Absent
It is possible to have sentences containing both present and absent information.
 Patient with chronic Hepatitis C “denies any sequela of hepatitis” is annotated as 2
spans: hepatitis is present and sequela of hepatitis is absent
Current and History
Annotate what is in the record you are viewing. Do not make assumptions or consider
longitudinal information you may know (the purpose is machine learning).
 “he no longer has the abdominal pain that he originally presented” is annotated as
S/S/LIF for abdominal pain with the assertion of absent (vs. history, absent).
 Section headers such as PAST MEDICAL HISTORY and ASSESSMENT/PLAN provide
context for “history” or “current” although current is the default value and specific
annotation for Period is not required.
 History is heterogeneous and can mean both what happened in the past but stopped
and what happened in the past and still happening. You may see in PAST MEDICAL
HISTORY things like hyperlipidemia, hypertension or a medication. In this case,
annotate those S/S/LIF as “history”. In the ASSESSMENT/PLAN” the patient is still
actively being treated with the medication for hypertension or hyperlipidemia. Annotate
the S/S/LIF and the default is current. This is common for chronic diseases such as
cancer, cardiovascular disease or diabetes.
In cases such as "No family history of breast or ovarian cancer", annotate as “Absent” vs.
“NotAssociatedWithThePatient”. “NotAssociatedWithThePatient” implies someone may have
the S/S/LIF. Here no one has so “Absent” is the better choice.
Anaphoric Pronouns vs. Co-referent Items
Coreference is when two or more expressions in a text refer to the same thing. We do not
annotate anaphoric pronouns, e.g. “it” below. However, we do annotate other coreferent items.


He has left-sided abdominal pain but it is not hurt with pressure.
“The patient was diagnosed with lung cancer in 2010. Now the disease progressed.”
20
In this example, “it was down” is meaningless on its own as well as “it” being an anaphoric
pronoun, so annotate just the first half
 Blood pressure was high 180/110, rechecked it was down to 137/100.
Drug
Chemotherapy can be tricky because regimen names often contain information about drug,
dosage, and duration.
 He received CHOP chemotherapy
 CHOP chemotherapy for 6 cycles
CHOP chemotherapy is a span because it is a drug regimen and CHOP names the four drugs
used in combination. It also provides information about duration. Cycles are the number of times
the treatment is repeated at a specified time interval. See http://www.lymphomation.org/chemoCHOP.htm
Do not annotate social self-medication with alcohol, tobacco, IV drugs, street drugs, etc.
Only annotate what is medically provided for an indication, or self-medication with legally
obtained OTC drugs. [Substance abuse can be factors impacting outcomes and treatment, but
we are not annotating them here where the focus is on adverse events of prescribed
medication.]
Do NOT annotate the word supplement unless it is required to add meaning. For example do
NOT annotate in “vitamin D supplement" or "iron supplement”, but do for “herbal supplement”.
Difficult Example of Indication to Drug Link Followed by Logic in How to Annotate:
/ID The patient developed febrile neutropenia 1/27 and blood cultures from that day revealed
pseudomonas and Strep viridans. Urine cultures showed 35000 colonies of E. faecalis.
Cefepime (1/27-2/27) was initiated (synercid was not used due to a history of adverse effect:
myalgias) in addition to the acyclovir and diflucan (stopped 2/5) which were started earlier for
prophylaxis. Daptomycin was started on 1/29 for VRE. Caspofungin was started 2/7, 2 days
after diflucan was stopped as fevers persisted. The patient remained afebrile for several weeks
until 2/27 at which point Cipro and meropenem were initiated, though no microbes were initiated
on culture. The patient remained on acyclovir, caspofungin, ciprofloxacin, daptomycin, and
meropenem (d/c'd 3/10), for the rest of this hospitalization, and remained afebrile since 3/1. He
will be discharged on p.o. ciprofloxacin (to d/c when ANC >1000), voriconazole, and acyclovir-the latter two medications to take indefinitely.
Suggested Indication to Drug Links:
Pseudomonas (Cefepime)
Strep Viridans (Cefepime)
E. Faecalis (Cefepime)
VRE (Daptomycin)
Logic:
21
The paragraph is indicating that they did a blood culture which revealed Pseudomonas
and Strep viridans. A urine culture was also done which showed E. Faecalis. These are
all bacterias. The paragraph continues by stating Cefepime was initiated. If you research
this drug you will see its an antibiotic (Cephalosporin class) and this antibiotic is used for
all the 3 bacterias mentioned previously. That is why I provided the guidance as this.
Secondly, the paragraph also is clearly providing the following indication to drug relation:
"Daptomycin was started on 01/29 for VRE". That is why I provided this second
guidance.
The rest of drugs on the paragraph are mentioned, but there is no clear direction as to
why they were provided.
Route
“Inhaler” is a route – see Appendix 4: RESPIRATORY (INHALATION)- Administration within the
respiratory tract by inhaling orally or nasally for local or systemic effect. - RESPIR
Adverse Event
….renal azotemia ,possibly caused by Vancomycin.
 Patient has renal azotemia, but it is also a possible adverse event. Annotate the
sign/symptom renal azotemia. Annotate the renal azotemia again as an adverse event,
assert as possible and create the relation to the drug Vancomycin.
MedDRA
Annotate adverse events with the MedDRA term that best applies to the span.
 “pain in the left back and the left upper abdomen” is annotated as one span. If pain is an
adverse effect, there are actually two different MedDRA matches: back pain
(MedDRA:10003988) and abdominal pain (MedDRA:10000081). Use the more generic
MedDRA term for pain to relate to the full span (MedDRA:10033371).
Indication
Do not double annotate Indications as S/S/LIF. If a medication id provided for a drug, it is an
indication.
Severity



“He is rather diffusely tender to palpation” Here ‘rather’ can be a severity meaning ‘to
some degree’
Note that some severity terms are part of the disease name and are not annotated. For
example, large is part of the name “large B cell lymphoma”
Medical use of the following terms is temporal, NOT severity: acute, chronic, acute-onchronic, flare
Test Results

“LABORATORY DATA: Alpha-fetoprotein tumor marker is 4.3” Do not annotate in this
workflow, it is an uncommented lab result. Do not diagnose or interpret here.
22
PHI
Pre name indicators such as Dr., Mr., Mrs., Ms. do not need to be removed. They are used as
tags for name PHI filters. Similarly, you do not need to annotate post name tags (Sr., Jr., III,
M.D., Ph.D.).
Year is not considered PHI. “2011” does not need to be annotated in the example “She was
diagnosed in 2011.” However, if a standalone year is pre-annotated, there is no need to remove
the annotation.
Abbreviations for buildings are Named Sites (i.e. AS for Albert Sherman Center, ). Annotate
wings, rooms, corridors, hallways as location. If unsure, use the general Location. The key thing
is to annotate any PHI in any PHI category so it is removed in de-identification.
General Terms
General terms are ones that are not informative. Judgement is required, but often descriptions
with greater specificity follow the use of a general term and annotating those are preferred.
Examples of general terms are:
concern
illness
complaint
issue
complication
problem
diagnosis
drug
difficulty
medication
diffuse
therapy
disease
treatment
23
APPENDIX 2: Entity and Attribute Tables
This table indicates the attribute fields shown in the Annotation Window for each Entity type.
Note that some fields may be present but are not applicable. For example Adverse is an entity
and attribute for Adverse Effect and is duplicative.
Attribute
Entity
Adverse effect
Drug
Indication
S/S/LIF
Adverse
NA*
Yes
No
Yes?
Assertion
Yes
Yes
Yes
Yes
MedDRA
Yes
No
No
No
Outcome
Yes
No
No
No
Dose
No
Yes
No
No
Duration
No
Yes
No
No
Frequency
No
Yes
No
No
Route
No
Yes
No
No
Period
Yes
No
Yes
Yes
Reason
(Indication)
NA
Yes
NA
No
Severity
Yes
No
Yes
Yes
*Field is present but not applicable
Notes:
 Indication is a S/S/LIF that is treated with a drug.
 Severity has field for assertion
Key:
 Green, can make associate in record window and annotation window
 Yellow, can only make association in annotation window
 Red, cannot make the association
24
APPENDIX 3: Protected Health Information (PHI)” and “Personally
Identifiable Information
The Privacy Rule’s Safe Harbor Method for De-identification (17).
“Under the safe harbor method, covered entities must remove all of a list of 18 enumerated
identifiers and have no actual knowledge that the information remaining could be used, alone or
in combination, to identify a subject of the information. The safe harbor is intended to provide
covered entities with a simple, definitive method that does not require much judgment by the
covered entity to determine if the information is adequately de-identified.”
1. Names; first name, last name, initials, names following titles and other indicators, login name,
screen name, nickname, or handle
2. All geographical subdivisions smaller than a State, including street address, city, county,
precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code,
if according to the current publicly available data from the Bureau of the Census: (1) The
geographic unit formed by combining all zip codes with the same three initial digits contains
more than 20,000 people; and (2) The initial three digits of a zip code for all such geographic
units containing 20,000 or fewer people is changed to 000.
3. All elements of dates (except year) for dates directly related to an individual, including birth
date, admission date, discharge date, date of death; and all ages over 89 and all elements of
dates (including year) indicative of such age, except that such ages and elements may be
aggregated into a single category of age 90 or older;
4. Phone numbers;
5. Fax numbers;
6. Electronic mail addresses;
7. Social Security numbers;
8. Medical record numbers;
9. Health plan beneficiary numbers;
10. Account numbers;
11. Certificate/license numbers;
12. Vehicle identifiers and serial numbers, including license plate numbers;
13. Device identifiers and serial numbers;
14. Web Universal Resource Locators (URLs);
15. Internet Protocol (IP) address numbers;
16. Biometric identifiers, including finger and voice prints;
17. Full face photographic images and any comparable images; and
18. Any other unique identifying number, characteristic, or code (note this does not mean the
unique code assigned by the investigator to code the data)
25
APPENDIX 4: Routes of Drug Administration and Abbreviations
FDA Standards Manual list of route of drug administration. For full list which includes FDA
codes and NCI concept codes, see: (19).
http://www.fda.gov/Drugs/DevelopmentApprovalProcess/FormsSubmissionRequirements/Electr
onicSubmissions/DataStandardsManualmonographs/ucm071667.htm
NAME
DEFINITION
SHORT NAME
AURICULAR (OTIC)
Administration to or by way of the ear.
OTIC
BUCCAL
Administration directed toward the
cheek, generally from within the
mouth.
BUCCAL
CONJUNCTIVAL
Administration to the conjunctiva, the CONJUNC
delicate membrane that lines the
eyelids and covers the exposed surface
of the eyeball.
CUTANEOUS
Administration to the skin.
CUTAN
DENTAL
Administration to a tooth or teeth.
DENTAL
ELECTRO-OSMOSIS
Administration of through the diffusion EL-OSMOS
of substance through a membrane in an
electric field.
ENDOCERVICAL
Administration within the canal of the E-CERVIC
cervix uteri. Synonymous with the term
intracervical..
ENDOSINUSIAL
Administration within the nasal sinuses E-SINUS
of the head.
ENDOTRACHEAL
Administration directly into the
trachea.
E-TRACHE
ENTERAL
Administration directly into the
intestines.
ENTER
EPIDURAL
Administration upon or over the dura
mater.
EPIDUR
EXTRA-AMNIOTIC
Administration to the outside of the
membrane enveloping the fetus
X-AMNI
EXTRACORPOREAL
Administration outside of the body.
X-CORPOR
HEMODIALYSIS
Administration through hemodialysate
fluid.
HEMO
INFILTRATION
Administration that results in
INFIL
substances passing into tissue spaces or
into cells.
INTERSTITIAL
Administration to or in the interstices
of a tissue.
INTERSTIT
INTRA-ABDOMINAL
Administration within the abdomen.
I-ABDOM
INTRA-AMNIOTIC
Administration within the amnion.
I-AMNI
INTRA-ARTERIAL
Administration within an artery or
arteries.
I-ARTER
26
INTRA-ARTICULAR
Administration within a joint.
I-ARTIC
INTRABILIARY
Administration within the bile, bile
ducts or gallbladder.
I-BILI
INTRABRONCHIAL
Administration within a bronchus.
I-BRONCHI
INTRABURSAL
Administration within a bursa.
I-BURSAL
INTRACARDIAC
Administration with the heart.
I-CARDI
INTRACARTILAGINOUS
Administration within a cartilage;
endochondral.
I-CARTIL
INTRACAUDAL
Administration within the cauda
equina.
I-CAUDAL
INTRACAVERNOUS
Administration within a pathologic
cavity, such as occurs in the lung in
tuberculosis.
I-CAVERN
INTRACAVITARY
Administration within a non-pathologic I-CAVIT
cavity, such as that of the cervix,
uterus, or penis, or such as that which
is formed as the result of a wound.
INTRACEREBRAL
Administration within the cerebrum.
I-CERE
INTRACISTERNAL
Administration within the cisterna
magna cerebellomedularis.
I-CISTERN
INTRACORNEAL
Administration within the cornea (the I-CORNE
transparent structure forming the
anterior part of the fibrous tunic of the
eye).
INTRACORONAL, DENTAL
Administration of a drug within a
portion of a tooth which is covered by
enamel and which is separated from
the roots by a slightly constricted
region known as the neck.
I-CORONAL
INTRACORONARY
Administration within the coronary
arteries.
I-CORONARY
INTRACORPORUS CAVERNOSUM
Administration within the dilatable
I-CORPOR
spaces of the corporus cavernosa of the
penis.
INTRADERMAL
Administration within the dermis.
I-DERMAL
INTRADISCAL
Administration within a disc.
I-DISCAL
INTRADUCTAL
Administration within the duct of a
gland.
I-DUCTAL
INTRADUODENAL
Administration within the duodenum.
I-DUOD
INTRADURAL
Administration within or beneath the
dura.
I-DURAL
INTRAEPIDERMAL
Administration within the epidermis.
I-EPIDERM
INTRAESOPHAGEAL
Administration within the esophagus.
I-ESO
INTRAGASTRIC
Administration within the stomach.
I-GASTRIC
INTRAGINGIVAL
Administration within the gingivae.
I-GINGIV
27
INTRAILEAL
Administration within the distal portion I-ILE
of the small intestine, from the jejunum
to the cecum.
INTRALESIONAL
Administration within or introduced
directly into a localized lesion.
I-LESION
INTRALUMINAL
Administration within the lumen of a
tube.
I-LUMIN
INTRALYMPHATIC
Administration within the lymph.
I-LYMPHAT
INTRAMEDULLARY
Administration within the marrow
cavity of a bone.
I-MEDUL
INTRAMENINGEAL
Administration within the meninges
(the three membranes that envelope
the brain and spinal cord).
I-MENIN
INTRAMUSCULAR
Administration within a muscle.
IM
INTRAOCULAR
Administration within the eye.
I-OCUL
INTRAOVARIAN
Administration within the ovary.
I-OVAR
INTRAPERICARDIAL
Administration within the pericardium. I-PERICARD
INTRAPERITONEAL
Administration within the peritoneal
cavity.
I-PERITON
INTRAPLEURAL
Administration within the pleura.
I-PLEURAL
INTRAPROSTATIC
Administration within the prostate
gland.
I-PROSTAT
INTRAPULMONARY
Administration within the lungs or its
bronchi.
I-PULMON
INTRASINAL
Administration within the nasal or
periorbital sinuses.
I-SINAL
INTRASPINAL
Administration within the vertebral
column.
I-SPINAL
INTRASYNOVIAL
Administration within the synovial
cavity of a joint.
I-SYNOV
INTRATENDINOUS
Administration within a tendon.
I-TENDIN
INTRATESTICULAR
Administration within the testicle.
I-TESTIC
INTRATHECAL
Administration within the cerebrospinal IT
fluid at any level of the cerebrospinal
axis, including injection into the
cerebral ventricles.
INTRATHORACIC
Administration within the thorax
(internal to the ribs); synonymous with
the term endothoracic.
INTRATUBULAR
Administration within the tubules of an I-TUBUL
organ.
INTRATUMOR
Administration within a tumor.
INTRATYMPANIC
Administration within the aurus media. I-TYMPAN
INTRAUTERINE
Administration within the uterus.
I-THORAC
I-TUMOR
I-UTER
28
INTRAVASCULAR
Administration within a vessel or
vessels.
I-VASC
INTRAVENOUS
Administration within or into a vein or
veins.
IV
INTRAVENOUS BOLUS
Administration within or into a vein or
veins all at once.
IV BOLUS
INTRAVENOUS DRIP
Administration within or into a vein or
veins over a sustained period of time.
IV DRIP
INTRAVENTRICULAR
Administration within a ventricle.
I-VENTRIC
INTRAVESICAL
Administration within the bladder.
I-VESIC
INTRAVITREAL
Administration within the vitreous body I-VITRE
of the eye.
IONTOPHORESIS
Administration by means of an electric
current where ions of soluble salts
migrate into the tissues of the body.
ION
IRRIGATION
Administration to bathe or flush open
wounds or body cavities.
IRRIG
LARYNGEAL
Administration directly upon the larynx. LARYN
NASAL
Administration to the nose;
administered by way of the nose.
NASOGASTRIC
Administration through the nose and
NG
into the stomach, usually by means of a
tube.
NOT APPLICABLE
Routes of administration are not
applicable.
NA
OCCLUSIVE DRESSING TECHNIQUE
Administration by the topical route
which is then covered by a dressing
which occludes the area.
OCCLUS
OPHTHALMIC
Administration to the external eye.
OPHTHALM
ORAL
Administration to or by way of the
mouth.
ORAL
OROPHARYNGEAL
Administration directly to the mouth
and pharynx.
ORO
OTHER
Administration is different from others
on this list.
OTHER
PARENTERAL
Administration by injection, infusion, or PAREN
implantation.
PERCUTANEOUS
Administration through the skin.
PERCUT
PERIARTICULAR
Administration around a joint.
P-ARTIC
PERIDURAL
Administration to the outside of the
dura mater of the spinal cord..
P-DURAL
PERINEURAL
Administration surrounding a nerve or
nerves.
P-NEURAL
PERIODONTAL
Administration around a tooth.
P-ODONT
NASAL
29
RECTAL
Administration to the rectum.
RECTAL
RESPIRATORY (INHALATION)
Administration within the respiratory
tract by inhaling orally or nasally for
local or systemic effect.
RESPIR
RETROBULBAR
Administration behind the pons or
behind the eyeball.
RETRO
SOFT TISSUE
Administration into any soft tissue.
SOFT TIS
SUBARACHNOID
Administration beneath the arachnoid.
S-ARACH
SUBCONJUNCTIVAL
Administration beneath the
conjunctiva.
S-CONJUNC
SUBCUTANEOUS
Administration beneath the skin;
hypodermic. Synonymous with the
term SUBDERMAL.
SC
SUBLINGUAL
Administration beneath the tongue.
SL
SUBMUCOSAL
Administration beneath the mucous
membrane.
S-MUCOS
TOPICAL
Administration to a particular spot on
TOPIC
the outer surface of the body. The E2B
term TRANSMAMMARY is a subset of
the term TOPICAL.
TRANSDERMAL
Administration through the dermal
layer of the skin to the systemic
circulation by diffusion.
T-DERMAL
TRANSMUCOSAL
Administration across the mucosa.
T-MUCOS
TRANSPLACENTAL
Administration through or across the
placenta.
T-PLACENT
TRANSTRACHEAL
Administration through the wall of the
trachea.
T-TRACHE
TRANSTYMPANIC
Administration across or through the
tympanic cavity.
T-TYMPAN
UNASSIGNED
Route of administration has not yet
been assigned.
UNAS
UNKNOWN
Route of administration is unknown.
UNKNOWN
URETERAL
Administration into the ureter.
URETER
URETHRAL
Administration into the urethra.
URETH
VAGINAL
Administration into the vagina.
VAGIN
30
APPENDIX 5: Frequency of Drug Administration and Abbreviations
Common frequency of drug administration abbreviations
Abbreviation Definition
q.d.
once a day
b.i.d.
twice a day
t.i.d.
three times a day
q.i.d.
four times a day
q.h.s.
before bed
5X a day
five times a day
q.4h
every four hours
q.6h
every six hours
q.o.d.
every other hour
prn.
as needed
31
Appendix 6: Reference Tables
INTENSITY OF PULSE SCALE (0 – 4):
 0 indicating no palpable pulse
 1 + indicating a faint, but detectable pulse
 2 + suggesting a slightly more diminished pulse than normal
 3 + is a normal pulse
 4 + indicating a bounding pulse.
Non-Hodgkin Lymphoma (NHL)
1) Low-Grade Non-Hodgkin Lymphoma





Follicular Lymphoma
Chronic Lymphocytic Leukemia (CLL)
Small Lymphocytic Lymphoma
Lymphoplasmacytoid Lymphoma/Waldenstrom Macroglobulinemia
Marginal Zone Lymphoma
2) High-Grade Non-Hodgkin Lymphoma




Diffuse Large B-Cell Lymhpoma
Primary CNS Lymphoma
Burkitt Lymphoma
Mantle Cell Lymphoma
3) T-Cell/Natural Killer Cell Non-Hodgkin Lymphoma







Peripheral T-Cell Lymphoma, Not Otherwise Specified (PTCL-NOS)
Angioimmunoblastic T-Cell Lymphoma
Anaplastic Large Cell Lymphoma
T-Cell Lymphoma/Natural Killer (NK) Cell Lymphoma
Hepatosplenic T-Cell Lymphoma
Enteropathy-associated T-Cell Lymphoma
Cutaneous T-Cell Lymphoma/Mycosis Fungoides
Hodgkin/Non-Hodgkin Lymphoma Classification
To avoid confusion, note that often people who are not oncologists just use the most generic
names like Hodgkin versus non-Hodgkin’s lymphoma when describing this cancer.
Hodgkin Lymphoma
1) Classical Hodgkin Lymphoma
32




Nodular Sclerosis
Mixed Cellularity
Lymphocyte-rich
Lymphocyte-depleted
2) Nodular Lymphocyte Predominant
33
APPENDIX 7: Annotation Tool Notes
NLP Objectives:
Current annotation informs ADE Pharmacovigilance. This includes use of current classes,
MedDRA annotation, Naranjo scoring, etc. When measured against objectives, the addition of
anything must be essential to this list to maintain annotation focus.
Disease
Drugs
Adverse Drug Events
Discourse relations
·
Temporal relations
·
Causal relations
·
Contrastive relations
Severity
Tool Use Notes
Navigating Protégé
There are three panels: The leftmost panel is the Class Navigation Bar with the annotation
schema. The middle panel is the record window with the clinical note. You can view the
complete note by using the scroll bar to the right in the panel. The rightmost panel is for attribute
annotation and comments. Note this panel has two sections and section sizes can be adjusted
by dragging the middle bar. The upper section is for attributes. For some entities, there will be
many possible attributes and a scroll bar will appear on the right. Scroll up and down to see
what is available. The comments sections in the Protégé tool is free text and as such, it is not
computable. [Note annotation of the clinical notes is to make them computable] Therefore the
best use of this field is for communication between editors and annotators.
34
Figure: The first panel lists the classes [1], the second panel is the medical record window [2]
and the third panel is an attribute annotation window [3]. To annotate most classes, click the
class in the left panel or in the fast annotate bar [4] and highlight it in the middle panel. Some
additional attributes and associations [5] are made from the class panel and the annotation
window. A few are made from just the annotation window, i.e. Period.
There is a website with the annotation guidelines, videos on how to use the tool, and other
resources. http://ummsres12.umassmed.edu/jt/index.php/annotation
Open Protégé and you can select from recent projects or you can navigate to the folder with the
patient file you will be annotating or editing, and open it. At the very top is a menu bar. If you
select Window, you have the option to increase font size, which most people need to do. One
the third menu bar there are several tabs. At the end is a Knowtator tab. Click this to open the
file if it doesn’t open automatically. A machine annotated record will appear with various colors
marking it up.
Above the Knowtator tab there is a white tool bar which contains the ‘Save’ icon. It is very
important to save your work when you end a session. There is also a section for “Text source
collection:” Arrows allow you to scroll through names of patient records. The name appears at
the top of the record window as “text source: date.txt” in blue. The rightmost arrows allow you to
scroll between annotations of the same category within the note.
35
The mark-ups are visible as text highlighted in colors corresponding to classes of interest.
These are described in the Annotation Guidelines and the classes appear in the leftmost panel
– the Class Navigation Bar. Most of the leftmost classes open up to subclasses and arrows
indicate which ones have further lower level classes.
A best practice is to start by scanning the text to familiarize yourself with the content. Next, read
it carefully line by line and review it for 1. missing and 2. incorrect annotation mark-ups. It is
particularly important to mark all PHI (Personal Health Information). These annotations will also
be used to remove the personal data from the files in the de-identification process - prior to any
data sharing.
To mark text, select (left click) the class type from the left Class Navigation Bar. On first use, a
drop down appears to the right of the class asking if you want to “create an annotation” or to
“fast annotate.” Select “fast annotate.” This class will now appear in the “fast annotate” bar
toolbar. You can select classes with one click directly from this bar (without the drop down) on
subsequent uses of this class. After selecting a class for fast annotation, selecting it again in the
left Class Navigation Bar will give you an additional choice to “remove” that class from fast
annotate. Once a class is selected, move your cursor to the Record Window and using a
highlight motion, highlight the term (or part of it) you would like to mark as this class. It will now
be marked with the class color. To unmark an entity, select it in the Record Window by left
clicking it. Near the top of the annotation window is span edit: can you can choose clear or
delete. Span edit can shrink or increase an annotation.
There are relations or associations that are annotated manually. Classes can be entities or
attributes. A common relation is between the entity Drug and its attributes of Dosage, Route,
Frequency, Duration, Indication and Adverse Event. (see Appendix 2). To annotate a relation
between an entity and its attributes, left-click on the entity or entity span, then right-click on the
attribute or attribute span. The attribute gets highlighted in a dotted box when you have created
the relation. To annotate a relation between a drug and its attribute, left-click on the drug span,
then right-click on the attribute span. The attribute is then highlighted in a dotted box. Continue
this process for each attribute associated with the drug. If an entity has several attributes from
the same category, you must create each separate association. For example, a Drug caused
multiple adverse effects; you must create each association of the Drug to the Adverse Event
individually. In other words, you must create several identical drug spans and link each one to
an Adverse Event. Confirm your work in the Annotation Window. Adjust your Annotation Panels
or scroll to see all of the fields filled in and these fields will change depending on the entity.
MedDRA
MedDRA is a five-level hierarchy of medical terms:
- System organ class (SOC): most general
- High level group term (HLGT)
- High level term (HLT)
- Preferred term (PT)
- Lowest level term (LLT): most specific
All the adverse events should only be assigned Preferred terms (PT). Lowest Level terms are
more specific but the term list contains synonyms so there is a lot of redundancy.
36
If the search did not bring any PT result but only LLT result (e.g. Itching) we double left click the
LLT term and choose on its corresponding PT (Pruritus)
When annotating an Adverse Event, after making the selection, you will see a pop up window
with MedDRA PT terms. Select the best option from the list. It will insert the term you
highlighted into the MedDRA field. Do not choose the class MedDRA, if you do, it will insert the
term you highlight into the MedDRA code field regardless of possible term matches, and it is not
a code. If there is no meaningful match, you can search for its synonym. For example, “expire”
does not produce a result but searching on “death” will. Sometimes a fairly generic term is the
best choice.
To manually enter a MedDRA code, left click on the Adverse Event. In the annotation window,
click the gray box in the MedDRA code field. Click the square with a superscript + in the
ConceptCode field and in the Term to manually add information.

Follow this link to browse the most current version of the MedDRA ontology from
anywhere except the Annotation Server (i.e. Outside the secure environment).
http://ummsres14.umassmed.edu/OntoSolr/browse

On the Annotation Server, please use this URL to search and browse MedDRA terms:
http://ummsqhslxweb01.umassmed.edu/OntoSolr/browse
To review, change or just see the possible MedDRA matches for an AE again, left click on the
AE. In the MedDRA Code field, pick the diamond in a star and the pop up should reappear by
the term.
To check on an MedDRA annotation, double click the grey square in the MedDRA Code field.
The annotation window will refresh and the MedDRA “instance” will be at the top and the
annotated MedDRA values will appear – the MedDRA code in the ConceptCode field, MedDRA
Code field is empty, and the term you selected will be in the Term field.
Attributes
Assertion, Period, Outcome
Default Values
There is no need to annotate default values for attributes.
Assertion = Present
37
Period = Current
Outcome = notMentioned
When annotating a span, or returning to an annotation, select the term or span to mark with an
assertion and the fields will appear in the Annotation Window. Click the “Add Instance’ icon (the
square with the superscript +) and select the appropriate option.
Assertion: Present, Absent, Conditional, Hypothetical,
Period: Current, History
Outcome:
When you are done, be sure to SAVE your work by clicking on the Save icon.
Deviations from i2b2 Guidelines
Conditional:
”Conditional” is used when the mention of the medical problem asserts that the patient
experiences the problem only under certain conditions.
He got 1 day of voriconazole for possible presumed aspergillosis, but given that he was improving on the
other antibiotics and his CT was not consistent with aspergillosis and he was no longer on
immunosuppression, it seemed like a less likely diagnosis. His urine and blood cultures were all negative.
Given these findings that presumed diagnosis is community-acquired pneumonia, he will complete a 10day course of azithromycin and Omnicef. The patient has been instructed to return if his fevers or cough
worsen and he gets worsening shortness of breath as this may indicate that the patient has a recurrent
aspergillosis
We have not come across examples of conditional value in our corpus so far. We believe that
i2b2 examples of Conditional value fall into "Present" category. For example, “dyspnea on
exertion” is a medical term and should be annotated as “dyspnea on exertion” with Present
assertion value (not as just “dyspnea” with Conditional assertion value).
Prepositions:
We do not include prepositions when annotating duration spans, e.g. for three weeks or for an
unknown period of time we do not include for in duration spans. Here we differ from i2b2 where
the preposition for is included in duration spans.
38
Export the Annotation to XML3
To get annotations out of Knowtator is to use the XML export. Select the menu option Knowtator
-> Export annotations to XML and then follow the directions. This will generate one XML file per
text source in your collection. The XML format used directly parallels the data model that
Knowtator uses for storing annotations in Protégé. Looking at the XML files may actually be
helpful to understand how Knowtator represents annotations in
Protégé.http://knowtator.sourceforge.net/faq.shtml
Table: Annotation Tools and other Resources
Software
URL
Document
Protégé
http://protege.cim3.net
/download/oldreleases/3.3.1/basic/
Knowtator
http://knowtator.sourc
eforge.net/
http://knowtator.sourceforge.net/install.shtml
MedDRA
Browser
http://www.meddrams
so.com/subscriber_do
wnload_tools_browser
.asp
Need MedDRA131E, import the folder named MedAscii.
I2b2
Medication
Annotation
Guideline
http://lancet.googlecod
e.com/files/Preliminary
.Annotation.Guidelines
.7.9.pdf
Semi-Automated Annotation with the BioNLP named entity tagger−Lancet
[This section was written some time ago and it is not clear how much of this is still applicable]
To increase the annotation speed, we apply the BioNLP named entity tagger Lancet, which is
trained on the annotated data, to automatically identify the named entities. An annotator then
corrects the automatically labeled corpus. The annotated corpus will be fed into the learner and
3
This is old text and refers to outdated versions of the Knowtator Plugin, MEDdra, and i2b2, but we wanted to
retain this historical information.
39
used to train a new model. Such interactive steps are repeated until a satisfactory performance
is met. This section is to guide the oracle on how to correct the automatically labeled corpus.
After importing the NLP tools annotation, the annotator attribute of each annotation is assigned
with a NLP tool name, such as Lancet UWM.
First, a default annotator is assigned. You can configure that by click: Knowtator-> Configure>default Annotator. This configuration does not change the attributes of any existing annotation
and is set for a new annotation.
In the event of partially correct annotations, the annotator needs to delete the annotation first
and then re-annotate. Otherwise, the correction work will not be recorded by Knowtator.
In the event when an entity is annotated more than once, please keep the correct annotation
and delete the other ones. If both or all of them are correct, just delete ones until only one is left.
Please look through the whole article and insert the absent annotations.
There is a trade-off between precision and recall of the NLP tools. Here, we prefer a high
precision annotation.
Summary of Annotation Processes and Tooling Changes for the ADE Pharmacovigiliance
Project
1. Reviewed PHI and PII requirements. Significantly streamlined PHI annotation and made
all PHI markings the same color for a simplified view.
2. We now have access to MedDRA releases and the terms used in annotation are now
updated with each release. MedDRA annotation has been brought into the annotation
tool; a major time saver. Upon selecting an adverse event, a MedDRA pop-up window
shows the top 10 matches to the selected term or span. Selection of a MedDRA term
from the pop-up window populates the MedDRA fields with term and concept codes.
Manual annotation of MedDRA terms is still possible and an updated browser on the
virtual machine makes searching an easier process.
3. Inter-annotator agreement is a priority as the team expands. Established a regular
meeting to compare and discuss annotations. Incorporation of conclusions, rules and
examples in the guidelines is now a routine process.
4. Updated guidelines with more examples, filled in gaps, created a section with examples
to specifically aid inter-annotator agreement, there is a history section, and another for
tracking major changes to processes and tooling.
5. Created videos demonstrating use of the Annotation Tool.
6. Added a webpage for Annotation related resources.
7. Established a workflow process and file system on the virtual machine to enable
annotation, editing and other separated workflows.
8. Post annotation processing will include:
a. assigning CTCAE (20) categories to severity annotations
b. assigning default values for
40
i. Assertion = present
ii. Period = current
iii. Outcome = notMentioned.
9. Negated words are marked with the Assertion “Absent”. They will be detected in the
negation algorithm. Negated word examples are: nontender, anicteric, asymptomatic.
10. Do not annotate general terms such as “problem” and “disease”. See Appendix 1 for
more examples.
11. Make all relevant relations, regardless of distance between terms
41
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement