package-info.java example

Explorer
lucene-solr-master
- lucene
- solr
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/** 
 * Grouping.
 * <p>
 * This module enables search result grouping with Lucene, where hits
 * with the same value in the specified single-valued group field are
 * grouped together.  For example, if you group by the <code>author</code>
 * field, then all documents with the same value in the <code>author</code>
 * field fall into a single group.
 * </p>
 * 
 * <p>Grouping requires a number of inputs:</p>
 * 
 * <ul>
 *   <li><code>groupField</code>: this is the field used for grouping.
 *       For example, if you use the <code>author</code> field then each
 *       group has all books by the same author.  Documents that don't
 *       have this field are grouped under a single group with
 *       a <code>null</code> group value.
 * 
 *   <li><code>groupSort</code>: how the groups are sorted.  For sorting
 *       purposes, each group is "represented" by the highest-sorted
 *       document according to the <code>groupSort</code> within it.  For
 *       example, if you specify "price" (ascending) then the first group
 *       is the one with the lowest price book within it.  Or if you
 *       specify relevance group sort, then the first group is the one
 *       containing the highest scoring book.
 * 
 *   <li><code>topNGroups</code>: how many top groups to keep.  For
 *       example, 10 means the top 10 groups are computed.
 * 
 *   <li><code>groupOffset</code>: which "slice" of top groups you want to
 *       retrieve.  For example, 3 means you'll get 7 groups back
 *       (assuming <code>topNGroups</code> is 10).  This is useful for
 *       paging, where you might show 5 groups per page.
 * 
 *   <li><code>withinGroupSort</code>: how the documents within each group
 *       are sorted.  This can be different from the group sort.
 * 
 *   <li><code>maxDocsPerGroup</code>: how many top documents within each
 *       group to keep.
 * 
 *   <li><code>withinGroupOffset</code>: which "slice" of top
 *       documents you want to retrieve from each group.
 * 
 * </ul>
 * 
 * <p>The implementation is two-pass: the first pass ({@link
 *   org.apache.lucene.search.grouping.FirstPassGroupingCollector})
 *   gathers the top groups, and the second pass ({@link
 *   org.apache.lucene.search.grouping.SecondPassGroupingCollector})
 *   gathers documents within those groups.  If the search is costly to
 *   run you may want to use the {@link
 *   org.apache.lucene.search.CachingCollector} class, which
 *   caches hits and can (quickly) replay them for the second pass.  This
 *   way you only run the query once, but you pay a RAM cost to (briefly)
 *   hold all hits.  Results are returned as a {@link
 *   org.apache.lucene.search.grouping.TopGroups} instance.</p>
 * 
 * <p>Groups are defined by {@link org.apache.lucene.search.grouping.GroupSelector}
 *   implementations:</p>
 *   <ul>
 *     <li>{@link org.apache.lucene.search.grouping.TermGroupSelector} groups based on
 *     the value of a {@link org.apache.lucene.index.SortedDocValues} field</li>
 *     <li>{@link org.apache.lucene.search.grouping.ValueSourceGroupSelector} groups based on
 *     the value of a {@link org.apache.lucene.queries.function.ValueSource}</li>
 *   </ul>
 * 
 * <p>Known limitations:</p>
 * <ul>
 *   <li> Sharding is not directly supported, though is not too
 *     difficult, if you can merge the top groups and top documents per
 *     group yourself.
 * </ul>
 * 
 * <p>Typical usage for the generic two-pass grouping search looks like this using the grouping convenience utility
 *   (optionally using caching for the second pass search):</p>
 * 
 * <pre class="prettyprint">
 *   GroupingSearch groupingSearch = new GroupingSearch("author");
 *   groupingSearch.setGroupSort(groupSort);
 *   groupingSearch.setFillSortFields(fillFields);
 * 
 *   if (useCache) {
 *     // Sets cache in MB
 *     groupingSearch.setCachingInMB(4.0, true);
 *   }
 * 
 *   if (requiredTotalGroupCount) {
 *     groupingSearch.setAllGroups(true);
 *   }
 * 
 *   TermQuery query = new TermQuery(new Term("content", searchTerm));
 *   TopGroups<BytesRef> result = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
 * 
 *   // Render groupsResult...
 *   if (requiredTotalGroupCount) {
 *     int totalGroupCount = result.totalGroupCount;
 *   }
 * </pre>
 * 
 * <p>To use the single-pass <code>BlockGroupingCollector</code>,
 *    first, at indexing time, you must ensure all docs in each group
 *    are added as a block, and you have some way to find the last
 *    document of each group.  One simple way to do this is to add a
 *    marker binary field:</p>
 * 
 * <pre class="prettyprint">
 *   // Create Documents from your source:
 *   List<Document> oneGroup = ...;
 *   
 *   Field groupEndField = new Field("groupEnd", "x", Field.Store.NO, Field.Index.NOT_ANALYZED);
 *   groupEndField.setIndexOptions(IndexOptions.DOCS_ONLY);
 *   groupEndField.setOmitNorms(true);
 *   oneGroup.get(oneGroup.size()-1).add(groupEndField);
 * 
 *   // You can also use writer.updateDocuments(); just be sure you
 *   // replace an entire previous doc block with this new one.  For
 *   // example, each group could have a "groupID" field, with the same
 *   // value for all docs in this group:
 *   writer.addDocuments(oneGroup);
 * </pre>
 * 
 * Then, at search time, do this up front:
 * 
 * <pre class="prettyprint">
 *   // Set this once in your app & save away for reusing across all queries:
 *   Filter groupEndDocs = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term("groupEnd", "x"))));
 * </pre>
 * 
 * Finally, do this per search:
 * 
 * <pre class="prettyprint">
 *   // Per search:
 *   BlockGroupingCollector c = new BlockGroupingCollector(groupSort, groupOffset+topNGroups, needsScores, groupEndDocs);
 *   s.search(new TermQuery(new Term("content", searchTerm)), c);
 *   TopGroups groupsResult = c.getTopGroups(withinGroupSort, groupOffset, docOffset, docOffset+docsPerGroup, fillFields);
 * 
 *   // Render groupsResult...
 * </pre>
 * 
 * Or alternatively use the <code>GroupingSearch</code> convenience utility:
 * 
 * <pre class="prettyprint">
 *   // Per search:
 *   GroupingSearch groupingSearch = new GroupingSearch(groupEndDocs);
 *   groupingSearch.setGroupSort(groupSort);
 *   groupingSearch.setIncludeScores(needsScores);
 *   TermQuery query = new TermQuery(new Term("content", searchTerm));
 *   TopGroups groupsResult = groupingSearch.search(indexSearcher, query, groupOffset, groupLimit);
 *
 *   // Render groupsResult...
 * </pre>
 * 
 * Note that the <code>groupValue</code> of each <code>GroupDocs</code>
 * will be <code>null</code>, so if you need to present this value you'll
 * have to separately retrieve it (for example using stored
 * fields, <code>FieldCache</code>, etc.).
 * 
 * <p>Another collector is the <code>AllGroupHeadsCollector</code> that can be used to retrieve all most relevant
 *    documents per group. Also known as group heads. This can be useful in situations when one wants to compute group
 *    based facets / statistics on the complete query result. The collector can be executed during the first or second
 *    phase. This collector can also be used with the <code>GroupingSearch</code> convenience utility, but when if one only
 *    wants to compute the most relevant documents per group it is better to just use the collector as done here below.</p>
 * 
 * <pre class="prettyprint">
 *   TermGroupSelector grouper = new TermGroupSelector(groupField);
 *   AllGroupHeadsCollector c = AllGroupHeadsCollector.newCollector(grouper, sortWithinGroup);
 *   s.search(new TermQuery(new Term("content", searchTerm)), c);
 *   // Return all group heads as int array
 *   int[] groupHeadsArray = c.retrieveGroupHeads()
 *   // Return all group heads as FixedBitSet.
 *   int maxDoc = s.maxDoc();
 *   FixedBitSet groupHeadsBitSet = c.retrieveGroupHeads(maxDoc)
 * </pre>
 *
 */
package org.apache.lucene.search.grouping;