/**
* <h1>Linguistic Annotation Pipeline</h1>
* The point of this package is to enable people to quickly and
* painlessly get complete linguistic annotations of their text. It
* is designed to be highly flexible and extensible. I will first discuss
* the organization and functions of the classes, and then I will give some
* sample code and a run-down of the implemented Annotators.
* <p>
* <h2>Annotation</h2>
* An Annotation is the data structure which holds the results of annotators.
* An Annotations is basically a map, from keys to bits of annotation, such
* as the parse, the part-of-speech tags, or named entity tags. Annotations
* are designed to operate at the sentence-level, however depending on the
* Annotators you use this may not be how you choose to use the package.
* <h2>Annotators</h2>
* The backbone of this package are the Annotators. Annotators are a lot like
* functions, except that they operate over Annotations instead of Objects.
* They do things like tokenize, parse, or NER tag sentences. In the
* javadocs of your Annotator you should specify what the Annotator is
* assuming already exists (for instance, the NERAnnotator assumes that the
* sentence has been tokenized) and where to find these annotations (in
* the example from the previous set of parentheses, it would be
* <code>TextAnnotation.class</code>). They should also specify what they add
* to the annotation, and where.
* <h2>AnnotationPipeline</h2>
* An AnnotationPipeline is where many Annotators are strung together
* to form a linguistic annotation pipeline. It is, itself, an
* Annotator. AnnotationPipelines usually also keep track of how much time
* they spend annotating and loading to assist users in finding where the
* time sinks are.
* However, the class AnnotationPipeline is not meant to be used as is.
* It serves as an example on how to build your own pipeline.
* If you just want to use a typical NLP pipeline take a look at StanfordCoreNLP
* (described later in this document).
* <h2>Sample Usage</h2>
* Here is some sample code which illustrates the intended usage
* of the package:
* <pre>
* public void testPipeline(String text) throws Exception {
* // create pipeline
* AnnotationPipeline pipeline = new AnnotationPipeline();
* pipeline.addAnnotator(new TokenizerAnnotator(false, "en"));
* pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
* pipeline.addAnnotator(new POSTaggerAnnotator(false));
* pipeline.addAnnotator(new MorphaAnnotator(false));
* pipeline.addAnnotator(new NERCombinerAnnotator(false));
* pipeline.addAnnotator(new ParserAnnotator(false, -1));
* // create annotation with text
* Annotation document = new Annotation(text);
* // annotate text with pipeline
* pipeline.annotate(document);
* // demonstrate typical usage
* for (CoreMap sentence: document.get(CoreAnnotations.SentencesAnnotation.class)) {
* // get the tree for the sentence
* Tree tree = sentence.get(TreeAnnotation.class);
* // get the tokens for the sentence and iterate over them
* for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
* // get token attributes
* String tokenText = token.get(TextAnnotation.class);
* String tokenPOS = token.get(PartOfSpeechAnnotation.class);
* String tokenLemma = token.get(LemmaAnnotation.class);
* String tokenNE = token.get(NamedEntityTagAnnotation.class);
* }
* }
* }
* </pre>
* <h2>Existing Annotators</h2>
* There already exist Annotators for many common tasks, all of which include
* default model locations, so they can just be used off the shelf. They are:
* <ul>
* <li>TokenizerAnnotator - tokenizes the text based on language or Tokenizer class specifications </li>
* <li>WordsToSentencesAnnotator - splits a sequence of words into a sequence of sentences</li>
* <li>POSTaggerAnnotator - annotates the text with part-of-speech tags </li>
* <li>MorphaAnnotator - morphological normalizer (generates lemmas)</li>
* <li>NERClassifierCombiner - combines several NER models </li>
* <li>TrueCaseAnnotator - detects the true case of words in free text (useful for all upper or lower case text)</li>
* <li>ParserAnnotator - generates constituent and dependency trees</li>
* <li>NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates</li>
* <li>TimeWordAnnotator - recognizes common temporal expressions, such as "teatime"</li>
* <li>QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities</li>
* <li>DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model </li>
* <li>NFLAnnotator - implements entity and relation mention extraction for the NFL domain</li>
* </ul>
* <h2>How Do I Use This?</h2>
* You do not have to construct your pipeline from scratch! For the typical NL processors, use
* StanfordCoreNLP. This pipeline implements the most common functionality needed: tokenization,
* lemmatization, POS tagging, NER, parsing and coreference resolution. Read below for how to use
* this pipeline from the command line, or directly in your Java code.
* <h3>Using StanfordCoreNLP from the Command Line</h3>
* The command line for StanfordCoreNLP is:
* <pre>
* ./bin/stanfordcorenlp.sh
* </pre>
* or
* <pre>
* java -cp stanford-corenlp-YYYY-MM-DD.jar:stanford-corenlp-YYYY-MM-DD-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP [ -props YOUR_CONFIGURATION_FILE ] -file YOUR_INPUT_FILE
* </pre>
* where the following properties are defined:
* (if <code>-props</code> or <code>annotators</code> is not defined, default properties will be loaded via the classpath)
* <pre>
* "annotators" - comma separated list of annotators
* The following annotators are supported: tokenize, ssplit, pos, lemma, ner, truecase, parse, dcoref, nfl
* </pre>
* More information is available here: <a href="http://nlp.stanford.edu/software/corenlp.shtml">Stanford CoreNLP</a>
* <!--
* where the following properties are defined:
* (if <code>-props</code> or <code>annotators</code> is not defined, default properties will be loaded via the classpath)
* <pre>
* "annotators" - comma separated list of annotators
* The following annotators are supported: tokenize, ssplit, pos, lemma, ner, truecase, parse, coref, dcoref, nfl
* If annotator "pos" is defined:
* "pos.model" - path towards the POS tagger model
* If annotator "ner" is defined:
* "ner.model.3class" - path towards the three-class NER model
* "ner.model.7class" - path towards the seven-class NER model
* "ner.model.MISCclass" - path towards the NER model with a MISC class
* If annotator "truecase" is defined:
* "truecase.model" - path towards the true-casing model; default: StanfordCoreNLPModels/truecase/noUN.ser.gz
* "truecase.bias" - class bias of the true case model; default: INIT_UPPER:-0.7,UPPER:-0.7,O:0
* "truecase.mixedcasefile" - path towards the mixed case file; default: StanfordCoreNLPModels/truecase/MixDisambiguation.list
* If annotator "nfl" is defined:
* "nfl.gazetteer" - path towards the gazetteer for the NFL domain
* "nfl.relation.model" - path towards the NFL relation extraction model
* If annotator "parse" is defined:
* "parser.model" - path towards the PCFG parser model
* Command line properties:
* "file" - run the pipeline on the contents of this file, or on the contents of the files in this directory
* XML output is generated for every input file "file" as file.xml
* "extension" - if -file used with a directory, process only the files with this extension
* "filelist" - run the pipeline on the list of files given in this file
* XML output is generated for every input file as file.outputExtension
* "outputDirectory" - where to put XML output (defaults to the current directory)
* "outputExtension" - extension to use for the output file (defaults to ".xml"). Don't forget the dot!
* "replaceExtension" - flag to chop off the last extension before adding outputExtension to file
* "noClobber" - don't automatically override (clobber) output files that already exist
* </pre>
* If none of the above are present, run the pipeline in an interactive shell (default properties will be loaded from the classpath).
* The shell accepts input from stdin and displays the output at stdout.
* To avoid clutter in the command line you can store some or all of these properties in a
* properties file and pass this file to <code>StanfordCoreNLP</code> using the <code>-props</code> option. For example,
* my <code>pipe.properties</code> file contains the following:
* <pre>
* annotators=tokenize,ssplit,pos,lemma,ner,parse,coref
* pos.model=models/left3words-wsj-0-18.tagger
* ner.model.3class=models/ner-en-3class.crf.gz
* ner.model.7class=models/all.7class.crf.gz
* ner.model.distsim=models/conll.distsim.crf.ser.gz
* #nfl.gazetteer = models/NFLgazetteer.txt
* #nfl.relation.model = models/nfl_relation_model.ser
* parser.model=models/englishPCFG.ser.gz
* coref.model=models/coref/corefClassifierAll.March2009.ser.gz
* coref.name.dir=models/coref
* wordnet.dir=models/wordnet-3.0-prolog
* </pre>
* Using this properties file, I run the pipeline's interactive shell as follows:
* <pre>
* java -cp classes/:lib/xom.jar -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props pipe.properties
* </pre>
* In the above setup, the system displays a shell-like prompt and waits for stdin input.
* You can input any English text.
* Processing starts after each new line and the output is displayed at the standard output in a format (somewhat) interpretable by humans.
* For example, for the input "Reagan announced he had Alzheimer's disease, an incurable brain affliction." the shell displays the following output:
* <pre>
* [Text=Reagan PartOfSpeech=NNP Lemma=Reagan NamedEntityTag=PERSON] [Text=announced PartOfSpeech=VBD Lemma=announce NamedEntityTag=O] [Text=he PartOfSpeech=PRP Lemma=he NamedEntityTag=O] [Text=had PartOfSpeech=VBD Lemma=have NamedEntityTag=O] [Text=Alzheimer PartOfSpeech=NNP Lemma=Alzheimer NamedEntityTag=O] [Text='s PartOfSpeech=POS Lemma='s NamedEntityTag=O] [Text=disease PartOfSpeech=NN Lemma=disease NamedEntityTag=O] [Text=, PartOfSpeech=, Lemma=, NamedEntityTag=O] [Text=an PartOfSpeech=DT Lemma=a NamedEntityTag=O] [Text=incurable PartOfSpeech=JJ Lemma=incurable NamedEntityTag=O] [Text=brain PartOfSpeech=NN Lemma=brain NamedEntityTag=O] [Text=affliction PartOfSpeech=NN Lemma=affliction NamedEntityTag=O] [Text=. PartOfSpeech=. Lemma=. NamedEntityTag=O]
* (ROOT
* (S
* (NP (NNP Reagan))
* (VP (VBD announced)
* (SBAR
* (S
* (NP (PRP he))
* (VP (VBD had)
* (NP
* (NP
* (NP (NNP Alzheimer) (POS 's))
* (NN disease))
* (, ,)
* (NP (DT an) (JJ incurable) (NN brain) (NN affliction)))))))
* (. .)))
* nsubj(announced-2, Reagan-1)
* nsubj(had-4, he-3)
* ccomp(announced-2, had-4)
* poss(disease-7, Alzheimer-5)
* dobj(had-4, disease-7)
* det(affliction-12, an-9)
* amod(affliction-12, incurable-10)
* nn(affliction-12, brain-11)
* appos(disease-7, affliction-12)
* </pre>
* where the first part of the output shows the individual words and their attributes, e.g., POS and NE tags,
* the second block shows the constituent parse tree, and the last block shows the syntactic dependencies extracted from the parse tree.
* Note that the coreference chains are stored in the individual words.
* For example, the referent for the "he" pronoun is stored as "CorefDest=1 1", which means that the referent is the first token
* in the first sentence in this text, i.e., "Reagan".
* <p>
* Alternatively, if you want to process all the .txt files in the directory data/, use this command line:
* <pre>
* java -cp classes/:lib/xom.jar -Xmx6g edu.stanford.nlp.pipeline.StanfordCoreNLP -props pipe.properties -file data -extension .txt
* </pre>
* Or, you can store all the files that you want processed one per line in a separate file, and pass the latter file to
* StanfordCoreNLP with the following options:
* <pre>
* java -cp classes/:lib/xom.jar -Xmx6g edu.stanford.nlp.pipeline.StanfordCoreNLP -props pipe.properties -filelist list_of_files_to_process.txt
* </pre>
* In the latter cases the pipeline generates a file.txt.xml output file for every file.txt it processes.
* For example, if file.txt contains the following text:
* <pre>
* Federal Reserve Chairman Ben Bernanke declared Friday
* that the U.S. economy is on the verge of a long-awaited recovery.
* </pre>
* the pipeline generates the following XML output in file.txt.xml:
* <pre>
* <?xml version="1.0" encoding="UTF-8"?>
* <root xmlns="http://nlp.stanford.edu">
* <sentence>
* <wordTable>
* <wordInfo id="1">
* <word>Federal</word>
* <lemma>Federal</lemma>
* <POS>NNP</POS>
* <NER>ORGANIZATION</NER>
* </wordInfo>
* <wordInfo id="2">
* <word>Reserve</word>
* <lemma>Reserve</lemma>
* <POS>NNP</POS>
* <NER>ORGANIZATION</NER>
* </wordInfo>
* <wordInfo id="3">
* <word>Chairman</word>
* <lemma>Chairman</lemma>
* <POS>NNP</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="4">
* <word>Ben</word>
* <lemma>Ben</lemma>
* <POS>NNP</POS>
* <NER>PERSON</NER>
* </wordInfo>
* <wordInfo id="5">
* <word>Bernanke</word>
* <lemma>Bernanke</lemma>
* <POS>NNP</POS>
* <NER>PERSON</NER>
* </wordInfo>
* <wordInfo id="6">
* <word>declared</word>
* <lemma>declare</lemma>
* <POS>VBD</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="7">
* <word>Friday</word>
* <lemma>Friday</lemma>
* <POS>NNP</POS>
* <NER>DATE</NER>
* </wordInfo>
* <wordInfo id="8">
* <word>that</word>
* <lemma>that</lemma>
* <POS>IN</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="9">
* <word>the</word>
* <lemma>the</lemma>
* <POS>DT</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="10">
* <word>U.S.</word>
* <lemma>U.S.</lemma>
* <POS>NNP</POS>
* <NER>LOCATION</NER>
* </wordInfo>
* <wordInfo id="11">
* <word>economy</word>
* <lemma>economy</lemma>
* <POS>NN</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="12">
* <word>is</word>
* <lemma>be</lemma>
* <POS>VBZ</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="13">
* <word>on</word>
* <lemma>on</lemma>
* <POS>IN</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="14">
* <word>the</word>
* <lemma>the</lemma>
* <POS>DT</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="15">
* <word>verge</word>
* <lemma>verge</lemma>
* <POS>NN</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="16">
* <word>of</word>
* <lemma>of</lemma>
* <POS>IN</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="17">
* <word>a</word>
* <lemma>a</lemma>
* <POS>DT</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="18">
* <word>long-awaited</word>
* <lemma>long-awaited</lemma>
* <POS>JJ</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="19">
* <word>recovery</word>
* <lemma>recovery</lemma>
* <POS>NN</POS>
* <NER>O</NER>
* </wordInfo>
* <wordInfo id="20">
* <word>.</word>
* <lemma>.</lemma>
* <POS>.</POS>
* <NER>O</NER>
* </wordInfo>
* </wordTable>
* <parse>(ROOT
* (S
* (NP (NNP Federal) (NNP Reserve) (NNP Chairman) (NNP Ben) (NNP Bernanke))
* (VP (VBD declared)
* (NP-TMP (NNP Friday))
* (SBAR (IN that)
* (S
* (NP (DT the) (NNP U.S.) (NN economy))
* (VP (VBZ is)
* (PP (IN on)
* (NP
* (NP (DT the) (NN verge))
* (PP (IN of)
* (NP (DT a) (JJ long-awaited) (NN recovery)))))))))
* (. .)))</parse>
* <dependencies>
* <dep type="nn">
* <governor idx="5">Bernanke</governor>
* <dependent idx="1">Federal</dependent>
* </dep>
* <dep type="nn">
* <governor idx="5">Bernanke</governor>
* <dependent idx="2">Reserve</dependent>
* </dep>
* <dep type="nn">
* <governor idx="5">Bernanke</governor>
* <dependent idx="3">Chairman</dependent>
* </dep>
* <dep type="nn">
* <governor idx="5">Bernanke</governor>
* <dependent idx="4">Ben</dependent>
* </dep>
* <dep type="nsubj">
* <governor idx="7">Friday</governor>
* <dependent idx="5">Bernanke</dependent>
* </dep>
* <dep type="dep">
* <governor idx="7">Friday</governor>
* <dependent idx="6">declared</dependent>
* </dep>
* <dep type="complm">
* <governor idx="12">is</governor>
* <dependent idx="8">that</dependent>
* </dep>
* <dep type="det">
* <governor idx="11">economy</governor>
* <dependent idx="9">the</dependent>
* </dep>
* <dep type="nn">
* <governor idx="11">economy</governor>
* <dependent idx="10">U.S.</dependent>
* </dep>
* <dep type="nsubj">
* <governor idx="12">is</governor>
* <dependent idx="11">economy</dependent>
* </dep>
* <dep type="ccomp">
* <governor idx="7">Friday</governor>
* <dependent idx="12">is</dependent>
* </dep>
* <dep type="prep">
* <governor idx="12">is</governor>
* <dependent idx="13">on</dependent>
* </dep>
* <dep type="det">
* <governor idx="15">verge</governor>
* <dependent idx="14">the</dependent>
* </dep>
* <dep type="pobj">
* <governor idx="13">on</governor>
* <dependent idx="15">verge</dependent>
* </dep>
* <dep type="prep">
* <governor idx="15">verge</governor>
* <dependent idx="16">of</dependent>
* </dep>
* <dep type="det">
* <governor idx="19">recovery</governor>
* <dependent idx="17">a</dependent>
* </dep>
* <dep type="amod">
* <governor idx="19">recovery</governor>
* <dependent idx="18">long-awaited</dependent>
* </dep>
* <dep type="pobj">
* <governor idx="16">of</governor>
* <dependent idx="19">recovery</dependent>
* </dep>
* </dependencies>
* </sentence>
* </root>
* </pre>
* <p>
* If the NFL annotator is enabled, additional XML output is generated for the corresponding domain-specific entities and relations.
* For example, for the sentence "The 49ers beat Dallas 20-10 in the Sunday game." the NFL-specific output is:
* <pre>
* <MachineReading>
* <entities>
* <entity id="EntityMention1">
* <type>NFLTeam</type>
* <span start="1" end="2" />
* </entity>
* <entity id="EntityMention2">
* <type>NFLTeam</type>
* <span start="3" end="4" />
* </entity>
* <entity id="EntityMention3">
* <type>FinalScore</type>
* <span start="4" end="5" />
* </entity>
* <entity id="EntityMention4">
* <type>FinalScore</type>
* <span start="6" end="7" />
* </entity>
* <entity id="EntityMention5">
* <type>Date</type>
* <span start="9" end="10" />
* </entity>
* <entity id="EntityMention6">
* <type>NFLGame</type>
* <span start="10" end="11" />
* </entity>
* </entities>
* <relations>
* <relation id="RelationMention-11">
* <type>teamScoringAll</type>
* <arguments>
* <entity id="EntityMention3">
* <type>FinalScore</type>
* <span start="4" end="5" />
* </entity>
* <entity id="EntityMention1">
* <type>NFLTeam</type>
* <span start="1" end="2" />
* </entity>
* </arguments>
* </relation>
* <relation id="RelationMention-17">
* <type>teamScoringAll</type>
* <arguments>
* <entity id="EntityMention4">
* <type>FinalScore</type>
* <span start="6" end="7" />
* </entity>
* <entity id="EntityMention2">
* <type>NFLTeam</type>
* <span start="3" end="4" />
* </entity>
* </arguments>
* </relation>
* <relation id="RelationMention-20">
* <type>teamFinalScore</type>
* <arguments>
* <entity id="EntityMention4">
* <type>FinalScore</type>
* <span start="6" end="7" />
* </entity>
* <entity id="EntityMention6">
* <type>NFLGame</type>
* <span start="10" end="11" />
* </entity>
* </arguments>
* </relation>
* <relation id="RelationMention-25">
* <type>gameDate</type>
* <arguments>
* <entity id="EntityMention5">
* <type>Date</type>
* <span start="9" end="10" />
* </entity>
* <entity id="EntityMention6">
* <type>NFLGame</type>
* <span start="10" end="11" />
* </entity>
* </arguments>
* </relation>
* </relations>
* </MachineReading>
* </pre>
* -->
* <h3>The StanfordCoreNLP API</h3>
* More information is available here: <a href="http://nlp.stanford.edu/software/corenlp.shtml">Stanford CoreNLP</a>
* <!--
* <p>
* To construct a pipeline object from a given set of properties, use StanfordCoreNLP(Properties props).
* This method creates the pipeline using the annotators given in the "annotators" property (see above for the complete list of properties).
* Currently, we support the following options for the "annotators" property:
* <ul>
* <li> tokenize - Tokenizes the text using TokenizingAnnotator. This annotator is required by all following annotators!</li>
* <li> ssplit - Splits the sequence of tokens into sentences using WordsToSentencesAnnotator. This annotator is required if the input text contains multiple sentences, e.g., it is an entire document.</li>
* <li> pos - Runs the POS tagger using POSTaggerAnnotator</li>
* <li> lemma - Generates the lemmas for all tokens using MorphaAnnotator</li>
* <li> ner - Runs a combination of NER models using OldNERCombinerAnnotator</li>
* <li> truecase - Detects the true case of words in free text</li>
* <li> parse - Runs the PCFG parser using ParserAnnotator</li>
* <li> coref - Implements pronominal anaphora resolution using a statistical model</li>
* <li> dcoref - Implements pronominal anaphora resolution using a deterministic model</li>
* <li> nfl - Implements entity and relation mention extraction for the NFL domain</li>
* </ul>
* <p>
* To run the pipeline over some text, use StanfordCoreNLP.process(Reader reader).
* This method returns an Annotation object, which stores all the annotations generated for the given text.
* To access these annotations use the following methods:
* <ul>
* <li>Annotation.get(CoreAnnotations.SentencesAnnotation.class) returns the list of all sentences in the given text
* as a List<CoreMap>. For each sentence annotation, sentence.get(CoreAnnotations.TokensAnnotation.class)
* returns the list of all tokens in that sentence as a List<CoreLabel>.
* Here you can access all the token-level information. For example:
* <ul>
* <li>Annotation.get(TextAnnotation.class) returns the text of the word</li>
* <li>Annotation.get(LemmaAnnotation.class) returns the lemma of this word</li>
* <li>Annotation.get(PartOfSpeechAnnotation.class) returns the POS tag of this word</li>
* <li>Annotation.ner(NamedEntityTagAnnotation.class) returns the NE label of this word</li>
* <li>Annotation.get(TrueCaseTextAnnotation.class) returns the true-cased text of this word</li>
* </ul>
* </li>
* <li>For each SentenceAnnotation, sentence.get(TreeAnnotation.class) returns the parse tree of the sentence.</li>
* <li>At the document level, Annotation.get(Annotation.CorefGraphAnnotation.class) returns the set of coreference links in this document if the DeterministicCorefAnnotator (dcoref) is enabled.
* Each link is stored as a Pair<IntTuple, IntTuple> where the first element point to the source, and the second to the destination.
* Each pointer is stored as a pair of integers, where the first integer is the offset of the sentence that contains the referent,
* and the second integer is the offset of the referent head word in this sentence. Note that both offsets start at 1 (not 0!).
* </li>
* <li>Additionally, there are now two additional annotations are available from the dcoref annotator. token.get(CorefClusterIdAnnotation.class) will return an arbitrary identifier for a
* group of coreferent words. In other words, two words are coreferent if they share the same coreference cluster ID: token1.get(CorefClusterIdAnnotation.class).equals(token2.get(CorefClusterIdAnnotation.class))
* token.get(CorefClusterAnnotation.class) will return a set of CoreLabels of all the words that are coreferent with the token. Note that currently, these CoreLabels will actually be
* CyclicCoreLabels which do not hash the same way as their CoreLabel counterparts.</li>
* <li>If the NFL annotator is enabled, then for each SentenceAnnotation:
* <ul>
* <li>sentence.get(MachineReadingAnnotations.EntityMentionsAnnotation.class) returns the list of EntityMention objects in this sentence.
* Relevant methods in the EntityMention class: (a) getType() returns the type of the mention, e.g., "NFLTeam";
* (b) getHeadTokenStart() returns the position of the first token of this mention;
* (c) getHeadTokenEnd() returns the position after the last token of this mention;
* (d) getObjectId() returns a unique String id corresponding this mention.
* </li>
* <li>sentence.get(MachineReadingAnnotations.RelationMentionsAnnotation.class) returns the list of RelationMention objects in this sentence.
* The RelationMention class supports all the same above methods plus getEntityMentionArgs(), which returns the list of arguments in this relation.
* Note that the order in which the arguments is returned is important. For example, in the NFL domain, relation arguments are always sorted in alphabetical order.
* Relations with the same arguments but stored in different order are not considered equal by the RelationMention.equals() method.
* </li>
* </ul>
* </li>
* </ul>
* -->
* @author Jenny Finkel
* @author Mihai Surdeanu
* @author Steven Bethard
* @author David McClosky
* <!-- hhmts start --> Last modified: May 7, 2012 <!-- hhmts end -->
*/
package edu.stanford.nlp.pipeline;