/* $Id: IEventActivity.java 988245 2010-08-23 18:39:35Z kwright $ */
/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.manifoldcf.crawler.interfaces;
import org.apache.manifoldcf.core.interfaces.*;
import org.apache.manifoldcf.agents.interfaces.*;
/** This interface abstracts from the activities that use and govern events.
*
* The purpose of this model is to allow a connector to:
* (a) insure that documents whose prerequisites have not been met do not get processed until those prerequisites are completed
* (b) guarantee that only one thread at a time deal with sequencing of documents
*
* The way it works is as follows. We define the notion of an "event", which is described by a simple string (and thus can be global,
* local to a connection, or local to a job, whichever is appropriate). An event is managed solely by the connector that knows about it.
* Effectively it can be in either of two states: "completed", or "pending". The only time the framework ever changes an event state is when
* the crawler is restarted, at which point all pending events are marked "completed".
*
* Documents, when they are added to the processing queue, specify the set of events on which they will block. If an event is in the "pending" state,
* no documents that block on that event will be processed at that time. Of course, it is possible that a document could be handed to processing just before
* an event entered the "pending" state - in which case it is the responsibility of the connector itself to avoid any problems or conflicts. This can
* usually be handled by proper handling of event signalling. More on that later.
*
* The presumed underlying model of flow inside the connector's processing method is as follows:
* (1) The connector examines the document in question, and decides whether it can be processed successfully or not, based on what it knows about sequencing
* (2) If the connector determines that the document can properly be processed, it does so, and that's it.
* (3) If the connector finds a sequencing-related problem, it:
* (a) Begins an appropriate event sequence.
* (b) If the framework indicates that this event is already in the "pending" state, then some other thread is already handling the event, and the connector
* should abort processing of the current document.
* (c) If the framework successfully begins the event sequence, then the connector code knows unequivocably that it is the only thread processing the event.
* It should take whatever action it needs to - which might be requesting special documents, for instance. [Note well: At this time, there is no way
* to guarantee that special documents added to the queue are in fact properly synchronized by this mechanism, so I recommend avoiding this practice,
* and instead handling any special document sequences without involving the queue.]
* (d) If the connector CANNOT successfully take the action it needs to to push the sequence along, it MUST set the event back to the "completed" state.
* Otherwise, the event will remain in the "pending" state until the next time the crawler is restarted.
* (e) If the current document cannot yet be processed, its processing should be aborted.
* (4) When the connector determines that the event's conditions have been met, or when it determines that an event sequence is no longer viable and has been
* aborted, it must set the event status to "completed".
*
* In summary, a connector may perform the following event-related actions:
* (a) Set an event into the "pending" state
* (b) Set an event into the "completed" state
* (c) Add a document to the queue with a specified set of prerequisite events attached
* (d) Request that the current document be requeued for later processing (i.e. abort processing of a document due to sequencing reasons)
*
*/
public interface IEventActivity extends INamingActivity
{
public static final String _rcsid = "@(#)$Id: IEventActivity.java 988245 2010-08-23 18:39:35Z kwright $";
/** Begin an event sequence.
* This method should be called by a connector when a sequencing event should enter the "pending" state. If the event is already in that state,
* this method will return false, otherwise true. The connector has the responsibility of appropriately managing sequencing given the response
* status.
*@param eventName is the event name.
*@return false if the event is already in the "pending" state.
*/
public boolean beginEventSequence(String eventName)
throws ManifoldCFException;
/** Complete an event sequence.
* This method should be called to signal that an event is no longer in the "pending" state. This can mean that the prerequisite processing is
* completed, but it can also mean that prerequisite processing was aborted or cannot be completed.
* Note well: This method should not be called unless the connector is CERTAIN that an event is in progress, and that the current thread has
* the sole right to complete it. Otherwise, race conditions can develop which would be difficult to diagnose.
*@param eventName is the event name.
*/
public void completeEventSequence(String eventName)
throws ManifoldCFException;
/** Abort processing a document (for sequencing reasons).
* This method should be called in order to cause the specified document to be requeued for later processing. While this is similar in some respects
* to the semantics of a ServiceInterruption, it is applicable to only one document at a time, and also does not specify any delay period, since it is
* presumed that the reason for the requeue is because of sequencing issues synchronized around an underlying event.
*@param localIdentifier is the document identifier to requeue
*/
public void retryDocumentProcessing(String localIdentifier)
throws ManifoldCFException;
}