package-info.java example

Explorer
drill-master
/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
/**
 * Provides a light-weight, simplified set of column readers and writers that
 * can be plugged into a variety of row-level readers and writers. The classes
 * and interfaces here form a framework for accessing rows and columns, but do
 * not provide the code to build accessors for a given row batch. This code is
 * meant to be generic, but the first (and, thus far, only) use is with the test
 * framework for the java-exec project. That one implementation is specific to
 * unit tests, but the accessor framework could easily be used for other
 * purposes as well.
 * <p>
 * Drill provides a set of column readers and writers. Compared to those, this
 * set:
 * <ul>
 * <li>Works with all Drill data types. The other set works only with repeated
 * and nullable types.</li>
 * <li>Is a generic interface. The other set is bound tightly to the
 * {@link ScanBatch} class.</li>
 * <li>Uses generic types such as <tt>getInt()</tt> for most numeric types. The
 * other set has accessors specific to each of the ~30 data types which Drill
 * supports.</li>
 * </ul>
 * The key difference is that this set is designed for developer ease-of-use, a
 * primary requirement for unit tests. The other set is designed to be used in
 * machine-generated or write-once code and so can be much more complex.
 * <p>
 * That is, the accessors here are optimized for test code: they trade
 * convenience for a slight decrease in speed (the performance hit comes from
 * the extra level of indirection which hides the complex, type-specific code
 * otherwise required.)
 * <p>
 * {@link ColumnReader} and {@link ColumnWriter} are the core abstractions: they
 * provide simplified access to the myriad of Drill column types via a
 * simplified, uniform API. {@link TupleReader} and {@link TupleWriter} provide
 * a simplified API to rows or maps (both of which are tuples in Drill.)
 * {@link AccessorUtilities} provides a number of data conversion tools.
 * <p>
 * Overview of the code structure:
 * <dl>
 * <dt>TupleWriter, TupleReader</dt>
 * <dd>In relational terms, a tuple is an ordered collection of values, where
 * the meaning of the order is provided by a schema (usually a name/type pair.)
 * It turns out that Drill rows and maps are both tuples. The tuple classes
 * provide the means to work with a tuple: get the schema, get a column by name
 * or by position. Note that Drill code normally references columns by name.
 * But, doing so is slower than access by position (index). To provide efficient
 * code, the tuple classes assume that the implementation imposes a column
 * ordering which can be exposed via the indexes.</dd>
 * <dt>ColumnAccessor</dt>
 * <dd>A generic base class for column readers and writers that provides the
 * column data type.</dd>
 * <dt>ColumnWriter, ColumnReader</dt>
 * <dd>A uniform interface implemented for each column type ("major type" in
 * Drill terminology). The scalar types: Nullable (Drill optional) and
 * non-nullable (Drill required) fields use the same interface. Arrays (Drill
 * repeated) are special. To handle the array aspect, even array fields use the
 * same interface, but the <tt>getArray</tt> method returns another layer of
 * accessor (writer or reader) specific for arrays.
 * <p>
 * Both the column reader and writer use a reduced set of data types to access
 * values. Drill provides about 38 different types, but they can be mapped to a
 * smaller set for programmatic access. For example, the signed byte, short,
 * int; and the unsigned 8-bit, and 16-bit values can all be mapped to ints for
 * get/set. The result is a much simpler set of get/set methods compared to the
 * underlying set of vector types.</dt>
 * <dt>ArrayWriter, ArrayReader
 * <dt>
 * <dd>The interface for the array accessors as described above. Of particular
 * note is the difference in the form of the methods. The writer has only a
 * <tt>setInt()</tt> method, no index. The methods assume write-only, write-once
 * semantics: each set adds a new value. The reader, by contrast has a
 * <tt>getInt(int index)</tt> method: read access is random.</tt>
 * <dt>ScalarWriter<dt>
 * <dd>Because of the form of the array writer, both the array writer and
 * column writer have the same method signatures. To avoid repeating these
 * methods, they are factored out into the common <tt>ScalarWriter</tt>
 * interface.</dd>
 * <dt>ColumnAccessors (templates)</dt>
 * <dd>The Freemarker-based template used to generate the actual accessor
 * implementations.</dd>
 * <dt>ColumnAccessors (accessors)</dt>
 * <dd>The generated accessors: one for each combination of write/read, data
 * (minor) type and cardinality (data model).
 * <dd>
 * <dt>RowIndex</dt>
 * <dd>This nested class binds the accessor to the current row position for the
 * entire record batch. That is, you don't ask for the value of column a for row
 * 5, then the value of column b for row 5, etc. as with the "raw" vectors.
 * Instead, the implementation sets the row position (with, say an interator.)
 * Then, all columns implicitly return values for the current row.
 * <p>
 * Different implementations of the row index handle the case of no selection
 * vector, a selection vector 2, or a selection vector 4.</dd>
 * <dt>VectorAccessor</dt>
 * <dd>The readers can work with single batches or "hyper"
 * batches. A hyper batch occurs in operators such as sort where an operator
 * references a collection of batches as if they were one huge batch. In this
 * case, each column consists of a "stack" of vectors. The vector accessor picks
 * out one vector from the stack for each row. Vector accessors are used only
 * for hyper batches; single batches work directly with the corresponding
 * vector.
 * <p>
 * You can think of the (row index + vector accessor, column index) as forming a
 * coordinate pair. The row index provides the y index (vertical position along
 * the rows.) The vector accessor maps the row position to a vector when needed.
 * The column index picks out the x coordinate (horizontal position along the
 * columns.)</dt>
 * </dl>
 */

package org.apache.drill.exec.vector.accessor;