/**
* Copyright 2011 The Buzz Media, LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.thebuzzmedia.sjxp.rule;
import com.thebuzzmedia.sjxp.XMLParser; /**
* Interface used to describe a "rule" in SJXP.
* <p/>
* The most important part of a rule is its <code>locationPath</code>, this
* literal {@link String} value is how the {@link XMLParser} matches up its
* current position inside of an XML doc with any {@link IRule}s that want
* information from that location.
* <p/>
* The <code>type</code> of the {@link IRule} indicates to the executing
* {@link XMLParser} when the rule should be queried for a match against its
* current position.
* <p/>
* All implementors must provide an implementation for the
* <code>handleParsedXXX</code> method matching the <code>type</code> of rule
* they have created. More specifically, if you are creating a
* {@link Type#ATTRIBUTE} rule, you need to implement the
* {@link #handleParsedAttribute(XMLParser, int, String, Object)} method; if you
* are implementing a {@link Type#CHARACTER} rule, you need to implement the
* {@link #handleParsedCharacters(XMLParser, String, Object)} method.
* <h3>Rule Matching</h3>
* Rules will execute every single time they match an element in an XML
* document. There is no XPath-like expression system to tell them to only get
* you the first, or 10th or every-other value from a document; you must
* implement that logic yourself inside of the <code>handleParsedXXX</code>
* handlers.
* <h3>Instance Reuse</h3>
* Instances of {@link IRule} are meant to be immutable and maintain no internal
* state which makes them safe for reuse among multiple instances of
* {@link XMLParser}.
* <h3>Rule Format</h3>
* The format of a location path is like a simple XPath rule with no
* expressions, for example:
*
* <pre>
* /library/book/title
* </pre>
*
* would point the "title" element inside of the "book" element which is inside
* the "library" element. If you are after a specific attribute of that element,
* simply provide its name as an attribute argument.
* <h3>Rule Format - Namespaces</h3>
* Referring to a namespace-qualified element in an XML doc is easy; whether it
* is part of the location path or an attribute name, all you have to do is
* prefix the local name of the element with brackets ([]) and the full
* namespace URI within the brackets, like:
*
* <pre>
* /library/[http://w3.org/texts]book/title
* </pre>
*
* In the example above, the "book" element is from a namespace defined by
* "http://w3.org/texts". Inside the actual XML markup, it is likely written
* with a friendly URI prefix that is defined at the top of the file, and would
* look more like this: <em>
* <txt:books>
* </em> but using the URI prefixes is not exact, as they can change from
* document to document, so SJXP requires that you reference the namespace using
* the URI itself, and not a prefix.
* <p/>
* In the case where the attribute itself is namespace-qualified, like
* <em><item rdf:about="blah" /></em>, you use the same notation for the
* attribute name, in this case (assuming the official RDF namespace) the
* attribute name argument you would actually return would look like this:
*
* <pre>
* [http://www.w3.org/1999/02/22-rdf-syntax-ns#]about
* </pre>
*
* It can look a little confusing, but it is exact and won't lead to
* impossible-to-debug scenarios.
* <h3>Rule Format - Default Namespaces</h3>
* Some XML files will define a default namespace using the <code>xmlns</code>
* argument, by itself, in the header. If your document does this, any tag in
* the document that isn't defined with a namespace prefix, will have to be
* referenced with the default namespace because that is how the XML file is
* technically defined.
* <p/>
* An example of this is Slashdot's RDF feed
* (http://rss.slashdot.org/Slashdot/slashdot); a default namespace of
* "http://purl.org/rss/1.0/" is defined, so all un-prefixed tags in the
* document (like <title>, <link> or <description>) all need
* to be qualified with that default URI, looking like this:
*
* <pre>
* [http://purl.org/rss/1.0/]title
* </pre>
*
* when you define the location path for those parse elements.
* <p/>
* It is important to be aware of this aspect of XML files otherwise you will
* run into scenarios where you can't understand why the parse value isn't being
* passed to you.
* <h3>Location Path & Attribute Name Strictness</h3>
* The implementation of SJXP is all based around strict name and namespace URI
* matching. If you do not specify a namespace URI for your element or attribute
* names, then only non-namespace-qualified elements will be looked for and
* matched; and visa-versa.
* <p/>
* If the XML content you are parsing is sloppy and you aren't sure if the
* values will be qualified correctly in every case, you will need to define 2
* {@link IRule}s; 1 for non-namespace-qualified values and 1 for
* namespace-qualified values.
* <p/>
* The SJXP library was purposefully designed to be pedantic to avoid "fuzzy"
* behavior that becomes maddening to debug in edge-case scenarios where you
* can't figure out why it is working one minute and breaking the next.
* <p/>
* Given the need of XML parsing in everything from video games to banking
* applications, SJXP had to take a very conservative approach and be as
* pedantic as possible so as not to hide any behavior from the caller.
*
* @param <T>
* The class type of any user-supplied object that the caller wishes
* to be passed through from one of the {@link XMLParser}'s
* <code>parse</code> methods directly to the handler when an
* {@link IRule} matches. This is typically a data storage mechanism
* like a DAO or cache used to store the parsed value in some
* valuable way, but it can ultimately be anything. If you do not
* need to make use of the user object, there is no need to
* parameterize the class.
*
* @author Riyad Kalla (software@thebuzzmedia.com)
*/
/**
* Used to describe the type of the parse rule.
*/
public interface IRule<T> {
/**
* Used to get the type of the rule.
* <p/>
* The {@link com.thebuzzmedia.sjxp.XMLParser} uses this value to decide when to call this rule to
* see if it matches the current position inside the doc and how to parse
* out the values the rule wants.
*
* @return the type of the rule.
*/
public ParsingMode getType();
/**
* Used to get the location path of the element inside the XML document that
* this rule is interested in.
* <p/>
* This value is compared literally against the internal path state of the
* {@link com.thebuzzmedia.sjxp.XMLParser} to see if they match before processing the rule. If you
* have a rule that isn't executing, chances are your location path is
* incorrect or mistyped or it is possible that your location path is
* correct but you have implemented the wrong <code>handleXXX</code> method
* so the default no-op one in {@link DefaultRule} is getting called.
* <h3>Namespaces</h3>
* Please refer to the class notes on the correct format used to define a
* path element that is namespace-qualified by using brackets.
* <p/>
* Namespace qualifiers can be specified for both element paths and
* attribute names.
*
* @return the location path of the element inside the XML document that
* this rule is interested in.
*/
public String getLocationPath();
/**
* Used to get a list of attribute names that are to be parsed from the
* element located at {@link #getLocationPath()}.
* <p/>
* If the rule type is {@link ParsingMode#CHARACTER}, the attribute name list
* should be ignored.
* <h3>Namespaces</h3>
* Please refer to the class notes on the correct format used to define a
* path element that is namespace-qualified by using brackets.
* <p/>
* Namespace qualifiers can be specified for both element paths and
* attribute names.
*
* @return a list of attribute names that are to be parsed from the element
* located at {@link #getLocationPath()}.
*/
public String[] getAttributeNames();
/**
* Handler method called by the {@link com.thebuzzmedia.sjxp.XMLParser} when an {@link IRule} of
* type {@link ParsingMode#TAG} matches the parser's current location in the
* document.
* <p/>
* This is a notification-style method, no data is parsed from the
* underlying document, the handler is merely called to give custom handling
* code a chance to respond to the matching open or close tag.
*
* @param parser
* The source {@link com.thebuzzmedia.sjxp.XMLParser} currently executing this rule.
* Providing access to the originating parser is handy if the
* rule wants to stop parsing by calling {@link com.thebuzzmedia.sjxp.XMLParser#stop()}
* .
* @param isStartTag
* Used to indicate if this notification is being made because
* the START_TAG (<code>true</code>) was encountered or the
* END_TAG (<code>false</code>) was encountered.
* @param userObject
* The user-supplied object passed through from the
* {@link com.thebuzzmedia.sjxp.XMLParser}'s <code>parse</code> method directly to this
* handler. This is typically a data storage mechanism like a DAO
* or cache used to hold parsed data or <code>null</code> if you
* do not need to make use of this pass-through mechanism and
* passed nothing to the {@link com.thebuzzmedia.sjxp.XMLParser} when you initiated the
* parse.
*/
public void handleTag(XMLParser<T> parser, boolean isStartTag, T userObject);
/**
* Handler method called by the {@link XMLParser} when an {@link IRule} of
* type {@link ParsingMode#ATTRIBUTE} matches the parser's current location in the
* document.
*
* @param parser
* The source {@link XMLParser} currently executing this rule.
* Providing access to the originating parser is handy if the
* rule wants to stop parsing by calling {@link XMLParser#stop()}
* .
* @param index
* The index of the attribute name (from
* {@link #getAttributeNames()}) that this value belongs to.
* @param value
* The value for the given attribute.
* @param userObject
* The user-supplied object passed through from the
* {@link XMLParser}'s <code>parse</code> method directly to this
* handler. This is typically a data storage mechanism like a DAO
* or cache used to hold parsed data or <code>null</code> if you
* do not need to make use of this pass-through mechanism and
* passed nothing to the {@link XMLParser} when you initiated the
* parse.
*
* @see #getLocationPath()
* @see #getAttributeNames()
*/
public void handleParsedAttribute(XMLParser<T> parser, int index,
String value, T userObject);
/**
* Handler method called by the {@link XMLParser} when an {@link IRule} of
* type {@link ParsingMode#CHARACTER} matches the parser's current location in the
* document.
* <p/>
* This method is not called by the {@link XMLParser} until all the
* character data has been coalesced together into a single {@link String}.
* You don't need to worry about re-combining chunked text elements.
*
* @param parser
* The source {@link XMLParser} currently executing this rule.
* Providing access to the originating parser is handy if the
* rule wants to stop parsing by calling {@link XMLParser#stop()}
* .
* @param text
* The character data contained between the open and close tags
* described by {@link #getLocationPath()}.
* @param userObject
* The user-supplied object passed through from the
* {@link XMLParser}'s <code>parse</code> method directly to this
* handler. This is typically a data storage mechanism like a DAO
* or cache used to hold parsed data or <code>null</code> if you
* do not need to make use of this pass-through mechanism and
* passed nothing to the {@link XMLParser} when you initiated the
* parse.
*
* @see #getLocationPath()
*/
public void handleParsedCharacters(XMLParser<T> parser, String text,
T userObject);
}