/**
* Copyright 2014 National University of Ireland, Galway.
*
* This file is part of the SIREn project. Project and contact information:
*
* https://github.com/rdelbru/SIREn
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
/**
* A keyword query parser implemented with the Lucene's Flexible Query Parser,
* with support for twig queries.
*
* <h2>Query Parser Syntax</h2>
*
* <p>
* A keyword Query can be either:
* <ul>
* <li>a twig (<code>twig</code>) expression that represents a {@link org.sindice.siren.search.node.TwigQuery}; or
* <li>a boolean (<code>boolean</code>) expression that represents a {@link org.apache.lucene.search.BooleanQuery}.
* </ul>
*
* The boolean expression can be a mixture of primitive queries (e.g., a keyword
* search) and of twig queries.
* <p>
* The syntax allows to use custom datatype on any part of the query.
*
* <h3>Twig</h3>
*
* <p>
* A twig expression is defined by the special character ':'. For example, the following
*
* <pre>
* a : b
* </pre>
*
* where the term "a" appears in one of the top level node (i.e., one of the
* top level field names) and the term "b" appears in one of its child node
* (i.e., one of the value of the field).
* <pre>
* +---+
* | a |
* +-+-+
* |
* +-+-+
* | b |
* +---+
* </pre>
* </p>
* <p>
* In a twig query, the child clause has a
* {@link org.sindice.siren.search.node.NodeBooleanClause.Occur#MUST}
* occurrence by default.
* </p>
*
* <p>
* Multiple children can be associated to a same top level node (i.e., one of
* the top level field names) by using a JSON-like array syntax:
*
* <pre>
* a : [ b , c ]
* </pre>
*
* where the term "a" appears in one of the top level node, the term "b"
* appears in one of its child node (i.e., one of the value of the field),
* and the term "c" appears in a second
* child node.
* <pre>
* +---+
* | a |
* +-+-+
* |
* +----+----+
* +-+-+ +-+-+
* | b | | c |
* +---+ +---+
* </pre>
* </p>
*
* <p>
* A JSON-like object syntax is also supported. For example, the following query
*
* <pre>
* { a : b , c : d }
* </pre>
*
* allows to query for nested objects in a JSON document.
* This is a syntax sugar for
*
* <pre>* : [ a : b , c : d ]</pre>
*
* where the twigs <code>a : b</code> and <code>c : d</code> become children
* of a parent with an empty top-level node.
* <pre>
* +---+
* | * |
* +-+-+
* |
* +----+----+
* +-+-+ +-+-+
* | a | | c |
* +---+ +---+
* | |
* +-+-+ +-+-+
* | b | | d |
* +---+ +---+
* </pre>
* </p>
*
* <h4>Wildcard Node</h4>
*
* <p>
* The query syntax allows to use a wildcard <code>*</code> as a node
* in the twig query. For example, a twig query node with no constraint
* on the top-level node is written as
* <pre>* : b</pre>.
* The same goes for setting no constraint on the child:
* <pre>a : *</pre>.
* A chain of wildcards can be used to set a node constraint on a specific
* descendant. The query
* <pre>a : * : * : b</pre>
* defines a twig with the term "a" occurring on the top level node,
* and the term "b" occurring on a node three levels below.
* </p>
* <p>
* Wildcards can be used with any of the previous syntaxes.
* </p>
*
* <h4>Nested Twigs</h4>
*
* <p>
* The query syntax allows you to create complex twig queries by nesting
* arrays, objects, and other twigs. For example:
* <pre>
* a : { * : b , a : [ { b : c : d } , { e : a }, g ] }
* </pre>
* correspond to the following query tree:
* <pre>
* +---+
* | a |
* +-+-+
* |
* +------+--------+
* | |
* +-+-+ +-+-+
* | * | | a |
* +-+-+ +-+-+
* | |
* +-+-+ +------+------+
* | b | | | |
* +---+ +-+-+ +-+-+ +-+-+
* | * | | * | | g |
* +-+-+ +-+-+ +---+
* | |
* +-+-+ +-+-+
* | b | | e |
* +-+-+ +-+-+
* | |
* +-+-+ +-+-+
* | c | | a |
* +-+-+ +---+
* |
* +-+-+
* | d |
* +---+
* </pre>
* </p>
*
* <h3>Boolean</h3>
*
* <p>
* A boolean expression follows the Lucene query syntax, except for the ':'
* which does not define a field query but instead is used to build a twig
* query.
*
* <pre>
* a AND b
* </pre>
*
* matches documents where the terms "a" and "b" occur in any node
* of the JSON tree.
* </p>
*
* <h4>Boolean of Twigs</h4>
*
* <p>
* A boolean combination of twig queries is also possible:
*
* <pre>
* (a : b) AND (c : d)
* </pre>
*
* matches JSON documents where both twigs occurs.
* </p>
*
* <h4>Boolean in a Twig node</h4>
*
* <p>
* The complete Lucene query syntax, e.g., grouping, boolean operators or range
* queries, can be used to match a single node of a twig. For example, the query
*
* <pre>
* a : b AND c
* </pre>
*
* matches JSON documents where the term "a" occurs on the top level node,
* and with both terms "b" and "c" occurring in a child node.
* <pre>
* +---+
* | a |
* +-+-+
* |
* +---------+
* | b AND c |
* +---------+
* </pre>
*
* The twig operator ':' has priority over the boolean operators. Therefore,
* the query
*
* <pre>
* a : b AND c : d
* </pre>
*
* matches documents as in the previous query, with the additional constraint
* that a child node with an occurrence of the term "d" must be present. It is
* the same as the query
*
* <pre>
* a : (b AND c) : d
* </pre>
*
* <pre>
* +---+
* | a |
* +-+-+
* |
* +---------+
* | b AND c |
* +---------+
* |
* +---+
* | d |
* +-+-+
* </pre>
* </p>
*
* <h3>Datatype</h3>
*
* <p>
* Some terms need to be analyzed in a specific way in order to be correctly
* indexed and searched, e.g., numbers. For those terms to be searchable, the
* keyword syntax provides a way to set how a query term should be analyzed.
* Using a function-like syntax:
* <pre>
* datatype( ... )
* </pre>
* any query elements inside the parenthesis are processed using the datatype.
* </p>
* <p>
* A mapping from a datatype label to an {@link org.apache.lucene.analysis.Analyzer}
* is set thanks to configuration key
* {@link org.sindice.siren.qparser.keyword.config.KeywordQueryConfigHandler.KeywordConfigurationKeys#DATATYPES_ANALYZERS}.
* </p>
* <p>
* For example, I can search for documents where the field contains <code>age</code>,
* and the values are integers ranging from <code>5</code> to <code>10</code>
* using the range query below:
* <pre>
* age : int( [ 5 TO 50 ] )
* </pre>
* The keyword parser in that example is configured to use
* {@link org.sindice.siren.analysis.IntNumericAnalyzer}.
* for the datatype <code>int</code>.
* </p>
* <p>
* The top level node of a twig query is by default set to use the datatype
* {@link org.sindice.siren.util.JSONDatatype#JSON_FIELD}. Any query elements
* which is not wrapped in a custom datatype uses the datatype
* {@link org.sindice.siren.util.XSDDatatype#XSD_STRING}.
* </p>
*
* <h2>Query Examples</h2>
*
* <h3>Node query</h3>
*
* Match all the documents with one node containing the phrase
* "Marie Antoinette"
*
* <pre>
* "Marie Antoinette"
* </pre>
*
* <h3>Twig query</h3>
*
* Match all the documents with one node containing the term "genre" and with
* a child node containing the term "Drama".
*
* <pre>
* genre : Drama
* </pre>
*
* Such a twig query is the basic building block to query a particular
* field name of a JSON object. The field name is always the root of the twig
* query and the field value is defined as a child clause.
* <p>
* More complex twig queries can be constructed by using nested twig queries
* or using more than one child clause.
*
* <pre>
* director : { last_name : Eastwood , first_name : Clint }
* </pre>
*
* <h3>Boolean Query</h3>
*
* Node and twig queries can be combined freely.
*
* <pre>
* (genre : Drama) AND (year : 2010)
* </pre>
*
*/
package org.sindice.siren.qparser.keyword;