/**
* <h1>Multi-pass Sieve Coreference Resolution System</h1>
* <a href="#authors">[authors]</a>
* <a href="#current">[current results]</a>
* <a href="#changes">[changes]</a>
* <a href="#usage">[usage]</a>
* <p>
* This system implements the multi-pass sieve coreference resolution system of Raghunathan et al. at EMNLP 2010.
* <p>
* Note that the current code in this package does not implement mention detection. All results reported here use gold mentions (just as in the paper).
* However, the DeterministicCorefAnnotator in StanfordCoreNLP implements a simple mention detection component, so this code can be used to perform coreference resolution on raw text.
* <p>
* Note that this code is already different from the system reported in the paper.
* After the EMNLP paper, two additional sieves were included. The current code gives slightly better scores than those in the paper.
* <h2><a name="authors">Authors</a></h2>
* <ul>
* <li>Karthik Raghunathan
* <li>Heeyoung Lee
* <li>Sudarshan Rangarajan
* <li>Jenny Finkel
* <li>Nathanael Chambers
* <li>Mihai Surdeanu
* <li>Dan Jurafsky
* <li>Christopher Manning
* </ul>
* <h2><a name="current">Current Results</a></h2>
* <pre>
* ----------------------------------------------------------------------------
* MUC B cubed Pairwise
* P R F1 P R F1 P R F1
* ----------------------------------------------------------------------------
* ACE2004 dev | 84.5 75.7 79.8 | 88.0 75.8 81.4 | 78.6 53.8 63.9
* ACE2004 test | 80.4 72.9 76.4 | 85.1 76.4 80.5 | 68.7 48.9 57.1
* ACE2004 nwire | 83.8 74.3 78.8 | 86.9 73.7 79.7 | 78.1 51.7 62.2
* MUC6 test | 90.5 69.0 78.3 | 90.5 62.5 73.9 | 89.3 56.1 68.9
* ----------------------------------------------------------------------------
* </pre>
* <h2><a name="changes">Changes</a></h2>
* <h3>August 26, 2010</h3>
* <p>
* This release is generally similar to the code used for EMNLP 2010,
* with one additional sieve: relaxed exact string match.<br>
* The score may differ also due to the change in Parser or NER.
* <p>
* Results:
* <pre>
* ----------------------------------------------------------------------------
* MUC B cubed Pairwise
* P R F1 P R F1 P R F1
* ----------------------------------------------------------------------------
* ACE2004 dev | 84.1 73.9 78.7 | 88.3 74.2 80.7 | 80.0 51.0 62.3
* ACE2004 test | 80.5 72.3 76.2 | 85.4 75.9 80.4 | 68.7 47.8 56.4
* ACE2004 nwire | 83.8 72.8 77.9 | 87.5 72.1 79.0 | 79.3 47.6 59.5
* MUC6 test | 90.3 68.9 78.2 | 90.5 62.3 73.8 | 89.4 55.5 68.5
* ----------------------------------------------------------------------------
* </pre>
* <h2><a name="usage">Usage</a></h2>
* <p>
* <h3> Running coreference resolution on raw text </h3>
* This software is now fully incorporated in StanfordCoreNLP, so all you have to do is add the dcoref annotator to the "annotators" property in StanfordCoreNLP.
* For example:
* <pre>
* annotators = tokenize, ssplit, pos, lemma, ner, parse, dcoref
* </pre>
* The required properties for dcoref are the following:
* <pre>
* dcoref.demonym
* dcoref.animate
* dcoref.inanimate
* dcoref.male
* dcoref.neutral
* dcoref.female
* dcoref.plural
* dcoref.singular
* sievePasses // If omitted, default value will be used.
* </pre>
* <p>
* See StanfordCoreNLP for more details.
* </p>
* <p>
* <h3> How to replicate the results in our EMNLP2010 paper</h3>
* To replicate the results in the paper run:
* <pre>
* java -Xmx8g edu.stanford.nlp.dcoref.SieveCoreferenceSystem -props <properties file>
* </pre>
* A sample properties file (coref.properties) is included in dcoref package.
* The properties file includes the following:
* <pre>
* annotators = pos, lemma, ner // annotators needed for coreference resolution
* pos.model // For POS model
* ner.model.3class
* ner.model.7class // For NER
* ner.model.MISCclass
* parser.model // For parser
* parser.maxlen = 100
* dcoref.demonym // The path for a file that includes a list of demonyms
* dcoref.animate // The list of animate/inanimate mentions (Ji and Lin, 2009)
* dcoref.inanimate
* dcoref.male // The list of male/neutral/female mentions (Bergsma and Lin, 2006)
* dcoref.neutral // Neutral means a mention that is usually referred by 'it'
* dcoref.female
* dcoref.plural // The list of plural/singular mentions (Bergsma and Lin, 2006)
* dcoref.singular
* sievePasses // Sieve passes - each class is defined in dcoref/sievepasses/
* logFile // Path for log file for coref system evaluation
* ace2004 or mucfile // Use either ace2004 or mucfile (not both)
* // ace2004: path for the directory containing ACE2004 files
* // mucfile: path for the MUC file
* </pre>
* This system can process both ACE2004 and MUC6 corpora in their original formats.
* Examples of corpus are given below.
* MUC6:
* <pre>
* ...
* <s> By/IN proposing/VBG <COREF ID="13" TYPE="IDENT" REF="6" MIN="date"> a/DT meeting/NN date/NN</COREF> ,/, <COREF ID="14" TYPE="IDENT" REF="0">
* <ORGANIZATION> Eastern/NNP</ORGANIZATION></COREF> moved/VBD one/CD step/NN closer/JJR toward/IN reopening/VBG current/JJ high-cost/JJ contract/NN agreements/NNS with/IN <COREF ID="15" TYPE="IDENT" REF="8" MIN="unions"><COREF ID="16" TYPE="IDENT" REF="14"> its/PRP$</COREF> unions/NNS</COREF> ./. </s>
* ...
* </pre>
* ACE2004:
* <pre>
* ...
* <document DOCID="20001115_AFP_ARB.0212.eng">
* <entity ID="20001115_AFP_ARB.0212.eng-E1" TYPE="ORG" SUBTYPE="Educational" CLASS="SPC">
* <entity_mention ID="1-47" TYPE="NAM" LDCTYPE="NAM">
* <extent>
* <charseq START="475" END="506">the Globalization Studies Center</charseq>
* </extent>
* <head>
* <charseq START="479" END="506">Globalization Studies Center</charseq>
* </head>
* </entity_mention>
* ...
* </pre>
*/
package edu.stanford.nlp.dcoref;