/**
* Copyright 2014 National University of Ireland, Galway.
*
* This file is part of the SIREn project. Project and contact information:
*
* https://github.com/rdelbru/SIREn
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* Programmatic API to search node-based inverted indexes.
*
* <h2>Introduction</h2>
*
* This package contains the API for building queries to search JSON data
* over node-based inverted indexes. For an introduction about the Lucene's
* search API, see the {@link org.apache.lucene.search} package documentation.
*
* <h2>Search Basics</h2>
*
* In contrast to the Lucene's {@link org.apache.lucene.search.Query} API
* which provides complex querying capabilities to search for documents, SIREn
* provide a {@link org.sindice.siren.search.node.NodeQuery} API to provide
* complex querying capabilities to search for nodes and documents. The
* information retrieved not only consists of the matching documents, but also
* of the matching nodes within these documents.
*
* <p>
*
* SIREn offers a wide variety of
* {@link org.sindice.siren.search.node.NodeQuery} implementations. Most of them
* are similar to the ones provided by the Lucene's
* {@link org.apache.lucene.search.Query} API. For example, while Lucene
* provides a {@link org.apache.lucene.search.TermQuery} implementation
* to search documents that contain a specific term, SIREn provides a {@link
* org.sindice.siren.search.node.NodeTermQuery} implementation to search nodes
* and documents that contain a specific term.
*
* <h3>Level and Range Constraints</h3>
*
* The {@link org.sindice.siren.search.node.NodeQuery} provides methods to set
* constraints on the nodes matched by the query. There are two types of
* constraints:
* <ul>
* <li> Level constraint: this constraint will filter out all nodes that do
* not belong to the specified level of the tree.
* <li> Interval constraint: this constraint will filter out all nodes in
* which the last integer of their dewey code vector is not contained in the
* specified interval.
* </ul>
*
* <h2>Query Classes</h2>
*
* <h3>{@link org.sindice.siren.search.node.NodeTermQuery}</h3>
*
* A {@link org.sindice.siren.search.node.NodeTermQuery} matches all the
* nodes that contain the specified {@link org.apache.lucene.index.Term},
* which is a word that occurs in a certain
* {@link org.apache.lucene.document.Field} containing JSON data.
* <p>
* Constructing a {@link org.sindice.siren.search.node.NodeTermQuery} is as
* simple as:
* <pre>
* NodeTermQuery tq = new NodeTermQuery(new Term("json-field", "term"));
* </pre>
*
* In this example, the {@link org.sindice.siren.search.node.NodeQuery}
* identifies all {@link org.apache.lucene.document.Document}s that have the
* {@link org.apache.lucene.document.Field} named <tt>"json-field"</tt>
* where a node contains the word <tt>"term"</tt>.
*
* <h3>{@link org.sindice.siren.search.node.NodePhraseQuery}
*
* A {@link org.sindice.siren.search.node.NodePhraseQuery} matches all the nodes
* containing the specified phrase. A phrase is defined as a sequence of
* {@link org.apache.lucene.index.Term}.
*
* <h3>{@link org.sindice.siren.search.node.NodeBooleanQuery}</h3>
*
* A {@link org.sindice.siren.search.node.NodeBooleanQuery} matches all the
* nodes containing the specified boolean combination of queries.
* A {@link org.sindice.siren.search.node.NodeBooleanQuery} contains multiple
* {@link org.sindice.siren.search.node.NodeBooleanClause}s, where each clause
* contains a sub-query
* ({@link org.sindice.siren.search.node.NodePrimitiveQuery} instance) and an
* operator (from {@link org.sindice.siren.search.node.NodeBooleanClause.Occur})
* describing how that sub-query is combined with the other clauses. The
* semantic of {@link org.sindice.siren.search.node.NodeBooleanClause.Occur} is
* identical to the semantic of {@link org.apache.lucene.search.BooleanClause.Occur}.
*
* <h3>{@link org.sindice.siren.search.node.NodeTermRangeQuery}</h3>
*
* A {@link org.sindice.siren.search.node.NodeTermRangeQuery} matches all
* nodes containing a term that occurs in the inclusive or exclusive range of a
* lower {@link org.apache.lucene.index.Term Term} and an upper
* {@link org.apache.lucene.index.Term Term} according to
* {@link org.apache.lucene.index.TermsEnum#getComparator TermsEnum.getComparator()}.
* It is not intended for numerical ranges; use
* {@link org.sindice.siren.search.node.NodeNumericRangeQuery} instead.
*
* <h3>{@link org.sindice.siren.search.node.NodeNumericRangeQuery}</h3>
*
* A {@link org.sindice.siren.search.node.NodeNumericRangeQuery} matches all
* nodes containing a value that occurs in a numeric range. For
* NodeNumericRangeQuery to work, you must index the values with the datatypes
* configured with the appropriate numeric analyzers
* ({@link org.sindice.siren.analysis.NumericAnalyzer}).
*
* <h3>{@link org.sindice.siren.search.node.NodePrefixQuery},
* {@link org.sindice.siren.search.node.NodeWildcardQuery},
* {@link org.sindice.siren.search.node.NodeRegexpQuery}</h3>
*
* A {@link org.sindice.siren.search.node.NodePrefixQuery} matches all nodes
* containing terms that begin with the specified string. A
* {@link org.sindice.siren.search.node.NodeWildcardQuery} generalizes this
* by allowing for the use of <tt>+</tt> (matches 1 or more characters),
* <tt>*</tt> (matches 0 or more characters) and
* <tt>?</tt> (matches exactly one character) wildcards. Note that the
* {@link org.sindice.siren.search.node.NodeWildcardQuery} can be quite slow. Also
* note that {@link org.sindice.siren.search.node.NodeWildcardQuery} should
* not start with <tt>+</tt>, <tt>*</tt> and <tt>?</tt>, as these are extremely slow.
* Some QueryParsers may not allow this by default, but provide a
* <code>setAllowLeadingWildcard</code> method to remove that protection.
* The {@link org.sindice.siren.search.node.NodeRegexpQuery} is even more
* general than NodeWildcardQuery, matching all nodes with terms that match a
* regular expression pattern.
*
* <h3>{@link org.sindice.siren.search.node.NodeFuzzyQuery}</h3>
*
* A {@link org.sindice.siren.search.node.NodeFuzzyQuery} matches nodes that
* contain terms similar to the specified term. Similarity is determined using
* <a href="http://en.wikipedia.org/wiki/Levenshtein">Levenshtein (edit)
* distance</a>.
*
* <h3>{@link org.sindice.siren.search.node.TwigQuery}</h3>
*
* A {@link org.sindice.siren.search.node.TwigQuery} enables to combine
* {@link org.sindice.siren.search.node.NodeQuery}s with a Parent-Child or
* Ancestor-Descendant relation. This is the basic building block to build
* tree-shaped queries.
*
* <p>
*
* A {@link org.sindice.siren.search.node.TwigQuery} is composed of a root and
* of one or more children or descendants:
* <ul>
* <li> The root is a {@link org.sindice.siren.search.node.NodeQuery} instance.
* An empty root is considered as a wildcard node query and will match all
* nodes. We call "root nodes" the set of nodes that are retrieved by the
* root query.
* <li> A descendant is a {@link org.sindice.siren.search.node.NodeQuery}
* associated to an operator (from
* {@link org.sindice.siren.search.node.NodeBooleanClause.Occur}). A
* descendant query will match all the nodes for which it exists a path
* to a root node. A descendant is associated to a node level, which
* corresponds to the relative distance (in term of levels) from the root.
* <li> A child is a descendant that is exactly one level above the root level.
* </ul>
*
* <p>
*
* A twig query is always associated to a level. If no level is specified, then
* by default the level is set to 1. When a twig query is used as a child or
* descendant of another twig query, then its level is automatically updated
* according to the level of the parent twig query. For example, given
* the following instructions:
* <pre>
* TwigQuery tw1 = new TwigQuery();
* TwigQuery tw2 = new TwigQuery();
* tw1.addChild(tw2, Occur.MUST);
* </pre>
*
* In this example, the first twig query <tt>tw1</tt> is defined at the default
* level 1. The second twig query <tt>tw2</tt>, after the call to
* {@link org.sindice.siren.search.node.TwigQuery#addChild(NodeQuery, org.sindice.siren.search.node.NodeBooleanClause.Occur)},
* will have its level updated to 2 since it is now a child of a twig query at a
* level 1.
*
* <h2>The Scorer Class</h2>
*
* The {@link org.sindice.siren.search.node.NodeScorer} abstract class provides
* common scoring functionality for all the node scorer implementations which
* are the heart of the SIREn scoring process.
*
* <p>
*
* The implementation of the query processing framework follows a node-at-a-time
* approach, where the query operators (i.e., {@link org.sindice.siren.search.node.NodeScorer})
* process one node at a time. The query processing framework has been
* designed for high efficiency processing:
* <ol>
* <li> All the query operators leverage has much as possible the lazy-loading
* feature of the
* {@link org.sindice.siren.index.codecs.siren10.Siren10PostingsReader}. For
* example, there is not the concept of next matching document (i.e.,
* {@link org.apache.lucene.search.Scorer#nextDoc()}) in the
* {@link NodeScorer} interface, but instead the concept of next candidate
* document (i.e.,
* {@link org.sindice.siren.search.node.NodeScorer#nextCandidateDocument()}).
* This enables {@link org.sindice.siren.search.node.NodeConjunctionScorer} to
* efficiently iterates over the document identifiers wihtout having to
* decode the node labels until a potential candidate is found.
* <li> The node label array (i.e., {@link org.apache.lucene.util.IntsRef})
* being processed is the same in all the query operators, which means that
* the same array is reused across and no new arrays are created during the
* query processing.
* <li> The node label array is itself a slice of the array of the
* uncompressed node block. The node label array is created by sliding a
* window (i.e., {@link org.apache.lucene.util.IntsRef}) over the array of the
* uncompressed node block.
* </ol>
*
*/
package org.sindice.siren.search.node;