/** * This package contains annotators which help process interaction to relations. * * A relationship is defined by the Baleen type system. A relationship as both a Baleen annotation * and concept is a linkage between two entities which appear in a document. Relationships have a * main type and a subtype, though Baleen is not perscriptive of the meaning of these allows for * domain / corpus interpretation of the level of granularity required. For example, the sentence * "John lives in London" has highlights a relationship between John and London, the type of * location should be "located" and the subtype "lives". * * <h2>Interactions</h2> * * A subset of the classes in this package are based on the idea of interactions and extracting relationships * that contain an 'interaction' word. For more information on this, see {@link uk.gov.dstl.baleen.annotators.interactions} * and {@link uk.gov.dstl.baleen.annotators.patterns}. * * Assuming that interaction words have been identified they must first be annotated, generally * using a MongoStemming gazetteer. * * Then we need to add one or more interaction based annotators. These may be derived from * {@link uk.gov.dstl.baleen.annotators.relations.helpers.AbstractInteractionBasedRelationshipAnnotator} * or * {@link uk.gov.dstl.baleen.annotators.relations.helpers.AbstractInteractionBasedSentenceRelationshipAnnotator} * . For this example we will use the * {@link uk.gov.dstl.baleen.annotators.relations.SimpleInteraction} which is a * toy example and should not be used in a production pipeline! * * Relationship extraction associates entities and as such should occur after Entity extraction * * <pre> * annotators: * - # Entity extraction and cleaners * - # Interaction markup * - relations.SimpleInteraction * - cleaners.NaiveMergeRelations * - cleaners.RelationTypeFilter * </pre> * * Interaction based relationship extraction may generate a lot of erroneous relations. The relation * may be valid, but not between the two particularly entry types. For this reason most pipelines * will wish to include the CleanRelations (to remove duplicate relations) and the * FilterRelationType (to remove based relations where they link entities of invalid types). * * Note that to use the RelationTypeFilter you require a Mongo resource. * * Finally you may want to output relations, see * {@link uk.gov.dstl.baleen.consumers.print.Relations} to print to console. Or the Mongo * consumer will also saves relation information. * * * <h3>Building your own extractor</h3> * * To build a relationship extractor based on interaction processes a helper class is available * through the AbstractInteractionRelationshipExtractor. The abstract class provides numerous helper * functions, common to the needs of relationships extractors. It also simplifies the doProcess * annotator by performing common processing and offering a per sentence processing instead. * * <h2>UBMRE Relation Extractors</h2> * * Relationship extraction based on the paper, * "An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature" available * from http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0102039. * * Whilst the paper considers a combined approach our implementation is divided into two separate * annotators, and can be used independently or togeher in the same pipeline. * * Both annotators are based on the the use of grammatical parsing constructions to inform * relationship extraction. One algorithm uses constituent parsing (available through then * OpenNlpParser) and the other uses dependency grammar parsing (available through the MaltParser or * ClearNlp annotator). The annotators use the location of entities and interaction words within the * grammatical structure of the sentence to infer relations against a set of rules. * * Prior to use, the document must have been marked with entities, standard information (sentences, * wordtokens), interactions and grammatical information. * * Thus a typical pipeline will look like: * * <pre> * annotators: * - # Standard entity extraction pipeline first * # Perform grammar parking * - language.WordNetLemmatizer * - language.OpenNLPParser * - language.MaltParser * # Mark up the interaction words * - class: gazetteer.MongoStemming * collection: interactions * type: Interaction * - interactions.RemoveInteractionInEntities * - interactions.AssignTypeToInteraction * # Clean up entities prior to relations * - cleaners.RemoveOverlappingEntities * # Perform relation extraction * - relations.UbmreDependencyRelationship * - relations.UbmreConstituentRelationship * # Clean up relations * - cleaners.NaiveMergeRelations * - cleaners.RelationTypeFilter * </pre> * */ //Dstl (c) Crown Copyright 2017 package uk.gov.dstl.baleen.annotators.relations;