package-info.java example

Explorer
drill-master
/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
/**
 * Defines a mock data source which generates dummy test data for use
 * in testing. The data source operates in two modes:
 * <ul>
 * <li><b>Classic:</b> used in physical plans in many unit tests.
 * The plan specifies a set of columns; data is generated by the
 * vectors themselves based on two alternating values.</li>
 * <li><b>Enhanced:</b> available for use in newer unit tests.
 * Enhances the physical plan description to allow specifying a data
 * generator class (for various types, data formats, etc.) Also
 * provides a data storage engine framework to allow using mock
 * tables in SQL queries.</li>
 * </ul>
 * <h3>Classic Mode</h3>
 * Create a scan operator that looks like the following (from
 * <tt>/src/test/resources/functions/cast/two_way_implicit_cast.json</tt>,
 * used in {@link TestReverseImplicitCast}):
 * <pre><code>
 *    graph:[
 *        {
 *            @id:1,
 *            pop:"mock-scan",
 *            url: "http://apache.org",
 *            entries:[
 *                {records: 1, types: [
 *                    {name: "col1", type: "FLOAT4", mode: "REQUIRED"},
 *                    {name: "col2", type: "FLOAT8", mode: "REQUIRED"}
 *                ]}
 *            ]
 *        },
 *    }, ...
 * </code></pre>
 * Here:
 * <ul>
 * <li>The <tt>pop</tt> must be <tt>mock-scan</tt>.</li>
 * <li>The <tt>url</tt> is unused.</li>
 * <li>The <tt>entries</tt> section can have one or more entries. If
 * more than one entry, the storage engine will enable parallel scans
 * up to the number of entries, as though each entry was a different
 * file or group.</li>
 * <li>The entry <tt>name</tt> is arbitrary, though color names seem
 * to be the traditional names used in Drill tests.</li>
 * <li>The <tt>type</tt> is one of the supported Drill
 * {@link MinorType} names.</li>
 * <li>The <tt>mode</tt> is one of the supported Drill
 * {@link DataMode} names: usually <tt>OPTIONAL</tt> or <tt>REQUIRED</tt>.</li>
 * </ul>
 * <p>
 * Recent extensions include:
 * <ul>
 * <li><tt>repeat</tt> in either the "entry" or "record" elements allow
 * repeating entries (simulating multiple blocks or row groups) and
 * repeating fields (easily create a dozen fields of some type.)</li>
 * <li><tt>generator</tt> in a field definition lets you specify a
 * specific data generator (see below.)</tt>
 * <li><tt>properties</tt> in a field definition lets you pass
 * generator-specific values to the data generator (such as, say
 * a minimum and maximum value.)</li>
 * </ul>
 *
 * <h3>Enhanced Mode</h3>
 * Enhanced builds on the Classic mode to add additional capabilities.
 * Enhanced mode can be used either in a physical plan or in SQL. Data
 * is randomly generated over a wide range of values and can be
 * controlled by custom generator classes. When
 * in a physical plan, the <tt>records</tt> section has additional
 * attributes as described in {@link MockTableDef.MockColumn}:
 * <ul>
 * <li>The <tt>generator</tt> lets you specify a class to generate the
 * sample data. Rules for the class name is that it can either contain
 * a full package path, or just a class name. If just a class name, the
 * class is assumed to reside in this package. For example, to generate
 * an ISO date into a string, use <tt>DateGen</tt>. Additional generators
 * can (and should) be added as the need arises.</li>
 * <li>The <tt>repeat</tt> attribute lets you create a very wide row by
 * repeating a column the specified number of times. Actual column names
 * have a numeric suffix. For example, if the base name is "blue" and
 * is repeated twice, actual columns are "blue1" and "blue2".</li>
 * </ul>
 * When used in SQL, use the <tt>mock</tt> name space as follows:
 * <pre><code>
 * SELECT id_i, name_s50 FROM `mock`.`employee_500`;
 * </code></pre>
 * Both the column names and table names encode information that specifies
 * what data to generate.
 * <p>
 * Columns are of the form <tt><i>name</i>_<i>type</i><i>length</i>?</tt>.
 * <ul>
 * <li>The name is anything you want ("id" and "name" in the example.)</li>
 * <li>The underscore is required to separate the type from the name.</li>
 * <li>The type is one of "i" (integer), "d" (double) or "s" (string).
 * Other types can be added as needed: n (decimal number), l (long), etc.</li>
 * <li>The length is optional and is used only for string (<tt>VARCHAR</tt>)
 * columns. The default string length is 10.</li>
 * <li>Columns do not yet support nulls. When they do, the encoding will
 * be "_n<i>percent</i>" where the percent specifies the percent of rows
 * that should contain null values in this column.<l/i>
 * <li>The column is known to SQL as its full name, that is "id_i" or
 * "name_s50".</li>
 * </ul>
 * <p>
 * Tables are of the form <tt><i>name</i>_<i>rows</i><i>unit<i>?</tt> where:
 * <ul>
 * <li>The name is anything you want. ("employee" in the example.)</li>
 * <li>The underscore is required to separate the row count from the name.</li>
 * <li>The row count specifies the number of rows to return.</li>
 * <li>The count unit can be none, K (multiply count by 1000) or M
 * (multiply row count by one million), case insensitive.</li>
 * <li>Another field (not yet implemented) might specify the split count.</li>
 * </ul>
 * <h3>Enhanced Mode with Definition File</h3>
 * You can reference a mock data definition file directly from SQL as follows:
 * <pre<code>SELECT * FROM `mock`.`your_defn_file.json`</code></pre>
 * <h3>Data Generators</h3>
 * The classic mode uses data generators built into each vector to generate
 * the sample data. These generators use a very simple black/white alternating
 * series of two values. Simple, but limited. The enhanced mode allows custom
 * data generators. Unfortunately, this requires a separate generator class for
 * each data type. As a result, we presently support just a few key data types.
 * On the other hand, the custom generators do allow tests to specify a custom
 * generator class to generate the kind of data needed for that test.
 * <p>
 * All data generators implement the {@link FieldGen} interface, and must have
 * a non-argument constructor to allow dynamic instantiation. The mock data
 * source either picks a default generator (if no <tt>generator</tt> is provided)
 * or uses the custom generator specified in <tt>generator<tt>. Generators
 * are independent (though one could, perhaps, write generators that correlate
 * field values.)
 */
package org.apache.drill.exec.store.mock;