package-info.java example

Explorer
Stanford-NLP-master
- CoreNLP-master
/**
 * <h1>Linguistic Annotation Pipeline</h1>
 * The point of this package is to enable people to quickly and
 * painlessly get complete linguistic annotations of their text.  It
 * is designed to be highly flexible and extensible.  I will first discuss
 * the organization and functions of the classes, and then I will give some
 * sample code and a run-down of the implemented Annotators.
 * <p>
 * <h2>Annotation</h2>
 * An Annotation is the data structure which holds the results of annotators.
 * An Annotations is basically a map, from keys to bits of annotation, such
 * as the parse, the part-of-speech tags, or named entity tags.  Annotations
 * are designed to operate at the sentence-level, however depending on the
 * Annotators you use this may not be how you choose to use the package.
 * <h2>Annotators</h2>
 * The backbone of this package are the Annotators.  Annotators are a lot like
 * functions, except that they operate over Annotations instead of Objects.
 * They do things like tokenize, parse, or NER tag sentences.  In the
 * javadocs of your Annotator you should specify what the Annotator is
 * assuming already exists (for instance, the NERAnnotator assumes that the
 * sentence has been tokenized) and where to find these annotations (in
 * the example from the previous set of parentheses, it would be
 * <code>TextAnnotation.class</code>).  They should also specify what they add
 * to the annotation, and where.
 * <h2>AnnotationPipeline</h2>
 * An AnnotationPipeline is where many Annotators are strung together
 * to form a linguistic annotation pipeline.  It is, itself, an
 * Annotator.  AnnotationPipelines usually also keep track of how much time
 * they spend annotating and loading to assist users in finding where the
 * time sinks are.
 * However, the class AnnotationPipeline is not meant to be used as is.
 * It serves as an example on how to build your own pipeline.
 * If you just want to use a typical NLP pipeline take a look at StanfordCoreNLP
 * (described later in this document).
 * <h2>Sample Usage</h2>
 * Here is some sample code which illustrates the intended usage
 * of the package:
 * <pre>
 * public void testPipeline(String text) throws Exception {
 * // create pipeline
 * AnnotationPipeline pipeline = new AnnotationPipeline();
 * pipeline.addAnnotator(new TokenizerAnnotator(false, "en"));
 * pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
 * pipeline.addAnnotator(new POSTaggerAnnotator(false));
 * pipeline.addAnnotator(new MorphaAnnotator(false));
 * pipeline.addAnnotator(new NERCombinerAnnotator(false));
 * pipeline.addAnnotator(new ParserAnnotator(false, -1));
 * // create annotation with text
 * Annotation document = new Annotation(text);
 * // annotate text with pipeline
 * pipeline.annotate(document);
 * // demonstrate typical usage
 * for (CoreMap sentence: document.get(CoreAnnotations.SentencesAnnotation.class)) {
 * // get the tree for the sentence
 * Tree tree = sentence.get(TreeAnnotation.class);
 * // get the tokens for the sentence and iterate over them
 * for (CoreLabel token: sentence.get(CoreAnnotations.TokensAnnotation.class)) {
 * // get token attributes
 * String tokenText = token.get(TextAnnotation.class);
 * String tokenPOS = token.get(PartOfSpeechAnnotation.class);
 * String tokenLemma = token.get(LemmaAnnotation.class);
 * String tokenNE = token.get(NamedEntityTagAnnotation.class);
 * }
 * }
 * }
 * </pre>
 * <h2>Existing Annotators</h2>
 * There already exist Annotators for many common tasks, all of which include
 * default model locations, so they can just be used off the shelf.  They are:
 * <ul>
 * <li>TokenizerAnnotator - tokenizes the text based on language or Tokenizer class specifications </li>
 * <li>WordsToSentencesAnnotator - splits a sequence of words into a sequence of sentences</li>
 * <li>POSTaggerAnnotator - annotates the text with part-of-speech tags </li>
 * <li>MorphaAnnotator - morphological normalizer (generates lemmas)</li>
 * <li>NERClassifierCombiner - combines several NER models </li>
 * <li>TrueCaseAnnotator - detects the true case of words in free text (useful for all upper or lower case text)</li>
 * <li>ParserAnnotator - generates constituent and dependency trees</li>
 * <li>NumberAnnotator - recognizes numerical entities such as numbers, money, times, and dates</li>
 * <li>TimeWordAnnotator - recognizes common temporal expressions, such as "teatime"</li>
 * <li>QuantifiableEntityNormalizingAnnotator - normalizes the content of all numerical entities</li>
 * <li>DeterministicCorefAnnotator - implements anaphora resolution using a deterministic model </li>
 * <li>NFLAnnotator - implements entity and relation mention extraction for the NFL domain</li>
 * </ul>
 * <h2>How Do I Use This?</h2>
 * You do not have to construct your pipeline from scratch! For the typical NL processors, use
 * StanfordCoreNLP. This pipeline implements the most common functionality needed: tokenization,
 * lemmatization, POS tagging, NER, parsing and coreference resolution. Read below for how to use
 * this pipeline from the command line, or directly in your Java code.
 * <h3>Using StanfordCoreNLP from the Command Line</h3>
 * The command line for StanfordCoreNLP is:
 * <pre>
 * ./bin/stanfordcorenlp.sh
 * </pre>
 * or
 * <pre>
 * java -cp stanford-corenlp-YYYY-MM-DD.jar:stanford-corenlp-YYYY-MM-DD-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP [ -props YOUR_CONFIGURATION_FILE ] -file YOUR_INPUT_FILE
 * </pre>
 * where the following properties are defined:
 * (if <code>-props</code> or <code>annotators</code> is not defined, default properties will be loaded via the classpath)
 * <pre>
 * 	"annotators" - comma separated list of annotators
 * 		The following annotators are supported: tokenize, ssplit, pos, lemma, ner, truecase, parse, dcoref, nfl
 * </pre>
 * More information is available here: <a href="http://nlp.stanford.edu/software/corenlp.shtml">Stanford CoreNLP</a>
 * <!--
 * where the following properties are defined:
 * (if <code>-props</code> or <code>annotators</code> is not defined, default properties will be loaded via the classpath)
 * <pre>
 * 	"annotators" - comma separated list of annotators
 * 		The following annotators are supported: tokenize, ssplit, pos, lemma, ner, truecase, parse, coref, dcoref, nfl
 * 	If annotator "pos" is defined:
 * 	"pos.model" - path towards the POS tagger model
 * 	If annotator "ner" is defined:
 * 	"ner.model.3class" - path towards the three-class NER model
 * 	"ner.model.7class" - path towards the seven-class NER model
 * 	"ner.model.MISCclass" - path towards the NER model with a MISC class
 * 	If annotator "truecase" is defined:
 * 	"truecase.model" - path towards the true-casing model; default: StanfordCoreNLPModels/truecase/noUN.ser.gz
 * 	"truecase.bias" - class bias of the true case model; default: INIT_UPPER:-0.7,UPPER:-0.7,O:0
 * 	"truecase.mixedcasefile" - path towards the mixed case file; default: StanfordCoreNLPModels/truecase/MixDisambiguation.list
 * 	If annotator "nfl" is defined:
 * 	"nfl.gazetteer" - path towards the gazetteer for the NFL domain
 * 	"nfl.relation.model" - path towards the NFL relation extraction model
 * 	If annotator "parse" is defined:
 * 	"parser.model" - path towards the PCFG parser model
 * Command line properties:
 * 	"file" - run the pipeline on the contents of this file, or on the contents of the files in this directory
 * 	         XML output is generated for every input file "file" as file.xml
 * 	"extension" - if -file used with a directory, process only the files with this extension
 * 	"filelist" - run the pipeline on the list of files given in this file
 * 	             XML output is generated for every input file as file.outputExtension
 * 	"outputDirectory" - where to put XML output (defaults to the current directory)
 * 	"outputExtension" - extension to use for the output file (defaults to ".xml").  Don't forget the dot!
 * 	"replaceExtension" - flag to chop off the last extension before adding outputExtension to file
 * "noClobber" - don't automatically override (clobber) output files that already exist
 * </pre>
 * If none of the above are present, run the pipeline in an interactive shell (default properties will be loaded from the classpath).
 * The shell accepts input from stdin and displays the output at stdout.
 * To avoid clutter in the command line you can store some or all of these properties in a
 * properties file and pass this file to <code>StanfordCoreNLP</code> using the <code>-props</code> option. For example,
 * my <code>pipe.properties</code> file contains the following:
 * <pre>
 * annotators=tokenize,ssplit,pos,lemma,ner,parse,coref
 * pos.model=models/left3words-wsj-0-18.tagger
 * ner.model.3class=models/ner-en-3class.crf.gz
 * ner.model.7class=models/all.7class.crf.gz
 * ner.model.distsim=models/conll.distsim.crf.ser.gz
 * #nfl.gazetteer = models/NFLgazetteer.txt
 * #nfl.relation.model = models/nfl_relation_model.ser
 * parser.model=models/englishPCFG.ser.gz
 * coref.model=models/coref/corefClassifierAll.March2009.ser.gz
 * coref.name.dir=models/coref
 * wordnet.dir=models/wordnet-3.0-prolog
 * </pre>
 * Using this properties file, I run the pipeline's interactive shell as follows:
 * <pre>
 * java -cp classes/:lib/xom.jar -Xmx2g edu.stanford.nlp.pipeline.StanfordCoreNLP -props pipe.properties
 * </pre>
 * In the above setup, the system displays a shell-like prompt and waits for stdin input.
 * You can input any English text.
 * Processing starts after each new line and the output is displayed at the standard output in a format (somewhat) interpretable by humans.
 * For example, for the input "Reagan announced he had Alzheimer's disease, an incurable brain affliction." the shell displays the following output:
 * <pre>
 * [Text=Reagan PartOfSpeech=NNP Lemma=Reagan NamedEntityTag=PERSON] [Text=announced PartOfSpeech=VBD Lemma=announce NamedEntityTag=O] [Text=he PartOfSpeech=PRP Lemma=he NamedEntityTag=O] [Text=had PartOfSpeech=VBD Lemma=have NamedEntityTag=O] [Text=Alzheimer PartOfSpeech=NNP Lemma=Alzheimer NamedEntityTag=O] [Text='s PartOfSpeech=POS Lemma='s NamedEntityTag=O] [Text=disease PartOfSpeech=NN Lemma=disease NamedEntityTag=O] [Text=, PartOfSpeech=, Lemma=, NamedEntityTag=O] [Text=an PartOfSpeech=DT Lemma=a NamedEntityTag=O] [Text=incurable PartOfSpeech=JJ Lemma=incurable NamedEntityTag=O] [Text=brain PartOfSpeech=NN Lemma=brain NamedEntityTag=O] [Text=affliction PartOfSpeech=NN Lemma=affliction NamedEntityTag=O] [Text=. PartOfSpeech=. Lemma=. NamedEntityTag=O]
 * (ROOT
 * (S
 * (NP (NNP Reagan))
 * (VP (VBD announced)
 * (SBAR
 * (S
 * (NP (PRP he))
 * (VP (VBD had)
 * (NP
 * (NP
 * (NP (NNP Alzheimer) (POS 's))
 * (NN disease))
 * (, ,)
 * (NP (DT an) (JJ incurable) (NN brain) (NN affliction)))))))
 * (. .)))
 * nsubj(announced-2, Reagan-1)
 * nsubj(had-4, he-3)
 * ccomp(announced-2, had-4)
 * poss(disease-7, Alzheimer-5)
 * dobj(had-4, disease-7)
 * det(affliction-12, an-9)
 * amod(affliction-12, incurable-10)
 * nn(affliction-12, brain-11)
 * appos(disease-7, affliction-12)
 * </pre>
 * where the first part of the output shows the individual words and their attributes, e.g., POS and NE tags,
 * the second block shows the constituent parse tree, and the last block shows the syntactic dependencies extracted from the parse tree.
 * Note that the coreference chains are stored in the individual words.
 * For example, the referent for the "he" pronoun is stored as "CorefDest=1 1", which means that the referent is the first token
 * in the first sentence in this text, i.e., "Reagan".
 * <p>
 * Alternatively, if you want to process all the .txt files in the directory data/, use this command line:
 * <pre>
 * java -cp classes/:lib/xom.jar -Xmx6g edu.stanford.nlp.pipeline.StanfordCoreNLP -props pipe.properties -file data -extension .txt
 * </pre>
 * Or, you can store all the files that you want processed one per line in a separate file, and pass the latter file to
 * StanfordCoreNLP with the following options:
 * <pre>
 * java -cp classes/:lib/xom.jar -Xmx6g edu.stanford.nlp.pipeline.StanfordCoreNLP -props pipe.properties -filelist list_of_files_to_process.txt
 * </pre>
 * In the latter cases the pipeline generates a file.txt.xml output file for every file.txt it processes.
 * For example, if file.txt contains the following text:
 * <pre>
 * Federal Reserve Chairman Ben Bernanke declared Friday
 * that the U.S. economy is on the verge of a long-awaited recovery.
 * </pre>
 * the pipeline generates the following XML output in file.txt.xml:
 * <pre>
 * <?xml version="1.0" encoding="UTF-8"?>
 * <root xmlns="http://nlp.stanford.edu">
 * <sentence>
 * <wordTable>
 * <wordInfo id="1">
 * <word>Federal</word>
 * <lemma>Federal</lemma>
 * <POS>NNP</POS>
 * <NER>ORGANIZATION</NER>
 * </wordInfo>
 * <wordInfo id="2">
 * <word>Reserve</word>
 * <lemma>Reserve</lemma>
 * <POS>NNP</POS>
 * <NER>ORGANIZATION</NER>
 * </wordInfo>
 * <wordInfo id="3">
 * <word>Chairman</word>
 * <lemma>Chairman</lemma>
 * <POS>NNP</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="4">
 * <word>Ben</word>
 * <lemma>Ben</lemma>
 * <POS>NNP</POS>
 * <NER>PERSON</NER>
 * </wordInfo>
 * <wordInfo id="5">
 * <word>Bernanke</word>
 * <lemma>Bernanke</lemma>
 * <POS>NNP</POS>
 * <NER>PERSON</NER>
 * </wordInfo>
 * <wordInfo id="6">
 * <word>declared</word>
 * <lemma>declare</lemma>
 * <POS>VBD</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="7">
 * <word>Friday</word>
 * <lemma>Friday</lemma>
 * <POS>NNP</POS>
 * <NER>DATE</NER>
 * </wordInfo>
 * <wordInfo id="8">
 * <word>that</word>
 * <lemma>that</lemma>
 * <POS>IN</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="9">
 * <word>the</word>
 * <lemma>the</lemma>
 * <POS>DT</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="10">
 * <word>U.S.</word>
 * <lemma>U.S.</lemma>
 * <POS>NNP</POS>
 * <NER>LOCATION</NER>
 * </wordInfo>
 * <wordInfo id="11">
 * <word>economy</word>
 * <lemma>economy</lemma>
 * <POS>NN</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="12">
 * <word>is</word>
 * <lemma>be</lemma>
 * <POS>VBZ</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="13">
 * <word>on</word>
 * <lemma>on</lemma>
 * <POS>IN</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="14">
 * <word>the</word>
 * <lemma>the</lemma>
 * <POS>DT</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="15">
 * <word>verge</word>
 * <lemma>verge</lemma>
 * <POS>NN</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="16">
 * <word>of</word>
 * <lemma>of</lemma>
 * <POS>IN</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="17">
 * <word>a</word>
 * <lemma>a</lemma>
 * <POS>DT</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="18">
 * <word>long-awaited</word>
 * <lemma>long-awaited</lemma>
 * <POS>JJ</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="19">
 * <word>recovery</word>
 * <lemma>recovery</lemma>
 * <POS>NN</POS>
 * <NER>O</NER>
 * </wordInfo>
 * <wordInfo id="20">
 * <word>.</word>
 * <lemma>.</lemma>
 * <POS>.</POS>
 * <NER>O</NER>
 * </wordInfo>
 * </wordTable>
 * <parse>(ROOT
 * (S
 * (NP (NNP Federal) (NNP Reserve) (NNP Chairman) (NNP Ben) (NNP Bernanke))
 * (VP (VBD declared)
 * (NP-TMP (NNP Friday))
 * (SBAR (IN that)
 * (S
 * (NP (DT the) (NNP U.S.) (NN economy))
 * (VP (VBZ is)
 * (PP (IN on)
 * (NP
 * (NP (DT the) (NN verge))
 * (PP (IN of)
 * (NP (DT a) (JJ long-awaited) (NN recovery)))))))))
 * (. .)))</parse>
 * <dependencies>
 * <dep type="nn">
 * <governor idx="5">Bernanke</governor>
 * <dependent idx="1">Federal</dependent>
 * </dep>
 * <dep type="nn">
 * <governor idx="5">Bernanke</governor>
 * <dependent idx="2">Reserve</dependent>
 * </dep>
 * <dep type="nn">
 * <governor idx="5">Bernanke</governor>
 * <dependent idx="3">Chairman</dependent>
 * </dep>
 * <dep type="nn">
 * <governor idx="5">Bernanke</governor>
 * <dependent idx="4">Ben</dependent>
 * </dep>
 * <dep type="nsubj">
 * <governor idx="7">Friday</governor>
 * <dependent idx="5">Bernanke</dependent>
 * </dep>
 * <dep type="dep">
 * <governor idx="7">Friday</governor>
 * <dependent idx="6">declared</dependent>
 * </dep>
 * <dep type="complm">
 * <governor idx="12">is</governor>
 * <dependent idx="8">that</dependent>
 * </dep>
 * <dep type="det">
 * <governor idx="11">economy</governor>
 * <dependent idx="9">the</dependent>
 * </dep>
 * <dep type="nn">
 * <governor idx="11">economy</governor>
 * <dependent idx="10">U.S.</dependent>
 * </dep>
 * <dep type="nsubj">
 * <governor idx="12">is</governor>
 * <dependent idx="11">economy</dependent>
 * </dep>
 * <dep type="ccomp">
 * <governor idx="7">Friday</governor>
 * <dependent idx="12">is</dependent>
 * </dep>
 * <dep type="prep">
 * <governor idx="12">is</governor>
 * <dependent idx="13">on</dependent>
 * </dep>
 * <dep type="det">
 * <governor idx="15">verge</governor>
 * <dependent idx="14">the</dependent>
 * </dep>
 * <dep type="pobj">
 * <governor idx="13">on</governor>
 * <dependent idx="15">verge</dependent>
 * </dep>
 * <dep type="prep">
 * <governor idx="15">verge</governor>
 * <dependent idx="16">of</dependent>
 * </dep>
 * <dep type="det">
 * <governor idx="19">recovery</governor>
 * <dependent idx="17">a</dependent>
 * </dep>
 * <dep type="amod">
 * <governor idx="19">recovery</governor>
 * <dependent idx="18">long-awaited</dependent>
 * </dep>
 * <dep type="pobj">
 * <governor idx="16">of</governor>
 * <dependent idx="19">recovery</dependent>
 * </dep>
 * </dependencies>
 * </sentence>
 * </root>
 * </pre>
 * <p>
 * If the NFL annotator is enabled, additional XML output is generated for the corresponding domain-specific entities and relations.
 * For example, for the sentence "The 49ers beat Dallas 20-10 in the Sunday game." the NFL-specific output is:
 * <pre>
 * <MachineReading>
 * <entities>
 * <entity id="EntityMention1">
 * <type>NFLTeam</type>
 * <span start="1" end="2" />
 * </entity>
 * <entity id="EntityMention2">
 * <type>NFLTeam</type>
 * <span start="3" end="4" />
 * </entity>
 * <entity id="EntityMention3">
 * <type>FinalScore</type>
 * <span start="4" end="5" />
 * </entity>
 * <entity id="EntityMention4">
 * <type>FinalScore</type>
 * <span start="6" end="7" />
 * </entity>
 * <entity id="EntityMention5">
 * <type>Date</type>
 * <span start="9" end="10" />
 * </entity>
 * <entity id="EntityMention6">
 * <type>NFLGame</type>
 * <span start="10" end="11" />
 * </entity>
 * </entities>
 * <relations>
 * <relation id="RelationMention-11">
 * <type>teamScoringAll</type>
 * <arguments>
 * <entity id="EntityMention3">
 * <type>FinalScore</type>
 * <span start="4" end="5" />
 * </entity>
 * <entity id="EntityMention1">
 * <type>NFLTeam</type>
 * <span start="1" end="2" />
 * </entity>
 * </arguments>
 * </relation>
 * <relation id="RelationMention-17">
 * <type>teamScoringAll</type>
 * <arguments>
 * <entity id="EntityMention4">
 * <type>FinalScore</type>
 * <span start="6" end="7" />
 * </entity>
 * <entity id="EntityMention2">
 * <type>NFLTeam</type>
 * <span start="3" end="4" />
 * </entity>
 * </arguments>
 * </relation>
 * <relation id="RelationMention-20">
 * <type>teamFinalScore</type>
 * <arguments>
 * <entity id="EntityMention4">
 * <type>FinalScore</type>
 * <span start="6" end="7" />
 * </entity>
 * <entity id="EntityMention6">
 * <type>NFLGame</type>
 * <span start="10" end="11" />
 * </entity>
 * </arguments>
 * </relation>
 * <relation id="RelationMention-25">
 * <type>gameDate</type>
 * <arguments>
 * <entity id="EntityMention5">
 * <type>Date</type>
 * <span start="9" end="10" />
 * </entity>
 * <entity id="EntityMention6">
 * <type>NFLGame</type>
 * <span start="10" end="11" />
 * </entity>
 * </arguments>
 * </relation>
 * </relations>
 * </MachineReading>
 * </pre>
 * -->
 * <h3>The StanfordCoreNLP API</h3>
 * More information is available here: <a href="http://nlp.stanford.edu/software/corenlp.shtml">Stanford CoreNLP</a>
 * <!--
 * <p>
 * To construct a pipeline object from a given set of properties, use StanfordCoreNLP(Properties props).
 * This method creates the pipeline using the annotators given in the "annotators" property (see above for the complete list of properties).
 * Currently, we support the following options for the "annotators" property:
 * <ul>
 * <li> tokenize - Tokenizes the text using TokenizingAnnotator. This annotator is required by all following annotators!</li>
 * <li> ssplit - Splits the sequence of tokens into sentences using WordsToSentencesAnnotator. This annotator is required if the input text contains multiple sentences, e.g., it is an entire document.</li>
 * <li> pos - Runs the POS tagger using POSTaggerAnnotator</li>
 * <li> lemma - Generates the lemmas for all tokens using MorphaAnnotator</li>
 * <li> ner - Runs a combination of NER models using OldNERCombinerAnnotator</li>
 * <li> truecase - Detects the true case of words in free text</li>
 * <li> parse - Runs the PCFG parser using ParserAnnotator</li>
 * <li> coref - Implements pronominal anaphora resolution using a statistical model</li>
 * <li> dcoref - Implements pronominal anaphora resolution using a deterministic model</li>
 * <li> nfl - Implements entity and relation mention extraction for the NFL domain</li>
 * </ul>
 * <p>
 * To run the pipeline over some text, use StanfordCoreNLP.process(Reader reader).
 * This method returns an Annotation object, which stores all the annotations generated for the given text.
 * To access these annotations use the following methods:
 * <ul>
 * <li>Annotation.get(CoreAnnotations.SentencesAnnotation.class) returns the list of all sentences in the given text
 * as a List<CoreMap>. For each sentence annotation, sentence.get(CoreAnnotations.TokensAnnotation.class)
 * returns the list of all tokens in that sentence as a List<CoreLabel>.
 * Here you can access all the token-level information. For example:
 * <ul>
 * <li>Annotation.get(TextAnnotation.class) returns the text of the word</li>
 * <li>Annotation.get(LemmaAnnotation.class) returns the lemma of this word</li>
 * <li>Annotation.get(PartOfSpeechAnnotation.class) returns the POS tag of this word</li>
 * <li>Annotation.ner(NamedEntityTagAnnotation.class) returns the NE label of this word</li>
 * <li>Annotation.get(TrueCaseTextAnnotation.class) returns the true-cased text of this word</li>
 * </ul>
 * </li>
 * <li>For each SentenceAnnotation, sentence.get(TreeAnnotation.class) returns the parse tree of the sentence.</li>
 * <li>At the document level, Annotation.get(Annotation.CorefGraphAnnotation.class) returns the set of coreference links in this document if the DeterministicCorefAnnotator (dcoref) is enabled.
 * Each link is stored as a Pair<IntTuple, IntTuple> where the first element point to the source, and the second to the destination.
 * Each pointer is stored as a pair of integers, where the first integer is the offset of the sentence that contains the referent,
 * and the second integer is the offset of the referent head word in this sentence. Note that both offsets start at 1 (not 0!).
 * </li>
 * <li>Additionally, there are now two additional annotations are available from the dcoref annotator.  token.get(CorefClusterIdAnnotation.class) will return an arbitrary identifier for a
 * group of coreferent words.  In other words, two words are coreferent if they share the same coreference cluster ID: token1.get(CorefClusterIdAnnotation.class).equals(token2.get(CorefClusterIdAnnotation.class))
 * token.get(CorefClusterAnnotation.class) will return a set of CoreLabels of all the words that are coreferent with the token.  Note that currently, these CoreLabels will actually be
 * CyclicCoreLabels which do not hash the same way as their CoreLabel counterparts.</li>
 * <li>If the NFL annotator is enabled, then for each SentenceAnnotation:
 * <ul>
 * <li>sentence.get(MachineReadingAnnotations.EntityMentionsAnnotation.class) returns the list of EntityMention objects in this sentence.
 * Relevant methods in the EntityMention class: (a) getType() returns the type of the mention, e.g., "NFLTeam";
 * (b) getHeadTokenStart() returns the position of the first token of this mention;
 * (c) getHeadTokenEnd() returns the position after the last token of this mention;
 * (d) getObjectId() returns a unique String id corresponding this mention.
 * </li>
 * <li>sentence.get(MachineReadingAnnotations.RelationMentionsAnnotation.class) returns the list of RelationMention objects in this sentence.
 * The RelationMention class supports all the same above methods plus getEntityMentionArgs(), which returns the list of arguments in this relation.
 * Note that the order in which the arguments is returned is important. For example, in the NFL domain, relation arguments are always sorted in alphabetical order.
 * Relations with the same arguments but stored in different order are not considered equal by the RelationMention.equals() method.
 * </li>
 * </ul>
 * </li>
 * </ul>
 * -->
 * @author Jenny Finkel
 * @author Mihai Surdeanu
 * @author Steven Bethard
 * @author David McClosky
 * <!-- hhmts start --> Last modified: May 7, 2012 <!-- hhmts end -->
 */
package edu.stanford.nlp.pipeline;