/** * Copyright (C) 2001-2017 by RapidMiner and the contributors * * Complete list of developers available at our web site: * * http://rapidminer.com * * This program is free software: you can redistribute it and/or modify it under the terms of the * GNU Affero General Public License as published by the Free Software Foundation, either version 3 * of the License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without * even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Affero General Public License for more details. * * You should have received a copy of the GNU Affero General Public License along with this program. * If not, see http://www.gnu.org/licenses/. */ /** * Contains the {@link ColumnarExampleTable} and the {@link Column}s used for it. Columns consist of * several Chunks. * * <p> * A {@link DoubleAutoColum} contains {@link DoubleAutoChunk}s where each can be either * {@link DoubleAutoDenseChunk} or {@link DoubleAutoSparseChunk}. It comes in two modes: either * {@link DataManagement#AUTO} or {@link DataManagement#MEMORY_OPTIMIZED}. If the mode is * {@link DataManagement#AUTO}, then the {@link DoubleAutoSparseChunk} contains a * {@link DoubleHighSparsityChunk}, otherwise it contains a {@link DoubleMediumSparsityChunk}. * * <p> * In mode {@link DataManagement#AUTO}, every {@link DoubleAutoChunk} starts as a * {@link DoubleAutoDenseChunk} which contains an array of maximal 2048 elements to store the added * values. If the 2048th element is added, the density of these elements is checked. If it is below * 1%, the chunk changes to a {@link DoubleAutoSparseChunk} with the calculated default value. * Otherwise, it stays a {@link DoubleAutoDenseChunk} but with internal array size as big as the * expected size. Such {@link DoubleAutoDenseChunk}s will not check their density again. * {@link DoubleAutoSparseChunk}s however continue to check their density. If it grows above 2%, * they change back to a {@link DoubleAutoDenseChunk}s with the full expected size. * * <p> * In mode {@link DataManagement#MEMORY_OPTIMIZED}, every {@link DoubleAutoChunk} starts as a * {@link DoubleAutoSparseChunk} containing a {@link DoubleMediumSparsityChunk} with default value * 0. If the density reaches over 50% with respect to the first 2048 values before inserting the * 2048th element, the chunk is changed back to {@link DoubleAutoDenseSparse} and its sparsity and * default value is again checked when the 2048th element is inserted. If the density reaches over * 55% when inserting an element (after the 2048th element was inserted), then the chunk is changed * back to {@link DoubleAutoDenseChunk} and stays dense. * * <p> * If the {@link DoubleAutoColumn#complete()} method is called before the 2048th element is * inserted, the chunk grows to the full expected size. * * <p> * Here an overview of the transitions: * <p> * <table summary="Transitions of Chunks"> * <tr> * <th>Type A</th> * <th>Scenario</th> * <th>Type B</th> * </tr> * * <tr> * <td> * * {@link DoubleAutoDenseChunk} in mode {@link DataManagement#AUTO} with density {@code < 1%} * * </td> * <td>Insertion of 2048th element</td> * <td> * * {@link DoubleAutoSparseChunk} containing {@link DoubleHighSparsityChunk} in mode * {@link DataManagement#AUTO} * * </td> * </tr> * * <tr> * <td> * * {@link DoubleAutoDenseChunk} in mode {@link DataManagement#AUTO} with density {@code >= 1%} * * </td> * <td>Insertion of 2048th element</td> * <td> * * {@link DoubleAutoDenseChunk} in mode {@link DataManagement#AUTO} with the full ensured size * * </td> * </tr> * * <tr> * <td> * * {@link DoubleAutoSparseChunk} containing {@link DoubleHighSparsityChunk} in mode * {@link DataManagement#AUTO} * * </td> * <td>Insertion of non-default element making density > 2%</td> * <td> * * {@link DoubleAutoDenseChunk} in mode {@link DataManagement#AUTO} with the full ensured size * * </td> * </tr> * * <tr> * <td> * * {@link DoubleAutoSparseChunk} containing a {@link DoubleMediumDensityChunk} in mode * {@link DataManagement#MEMORY_OPTIMIZED} * * </td> * <td>Insertion of element element before the 2048th that grows density to over 50% wrt. * max(ensuredSize, 2048)</td> * <td> * * {@link DoubleAutoDenseChunk} in mode {@link DataManagement#MEMORY_OPTIMIZED} with space for * maximal 2048 values * * </td> * </tr> * * <tr> * <td> * * {@link DoubleAutoDenseChunk} in mode {@link DataManagement#MEMORY_OPTIMIZED} with density * {@code < 50%} * * </td> * <td>Insertion of 2048th element</td> * <td> * * {@link DoubleAutoSparseChunk} containing {@link DoubleMediumSparsityChunk} in mode * {@link DataManagement#MEMORY_OPTIMIZED} * * </td> * </tr> * * <tr> * <td> * * {@link DoubleAutoDenseChunk} in mode {@link DataManagement#MEMORY_OPTIMIZED} with density * {@code >= 50%} * * </td> * <td>Insertion of 2048th element</td> * <td> * * {@link DoubleAutoDenseChunk} in mode {@link DataManagement#MEMORY_OPTIMIZED} the full ensured * size * * </td> * </tr> * * <tr> * <td> * * {@link DoubleAutoSparseChunk} containing a {@link DoubleMediumDensityChunk} in mode * {@link DataManagement#MEMORY_OPTIMIZED} * * </td> * <td>Insertion of element element after the 2048th that grows density to over 55%</td> * <td> * * {@link DoubleAutoDenseChunk} in mode {@link DataManagement#MEMORY_OPTIMIZED} with full ensured * size * * </td> * </tr> * * <tr> * <td> * * {@link DoubleAutoDenseChunk} with fewer than 2048 elements inserted * * </td> * <td>Call of {@link DoubleAutoColumn#complete()}</td> * <td> * * {@link DoubleAutoDenseChunk} with full ensured size * * </td> * </tr> * * </table> * * <p> * A {@link IntegerAutoColumn} with its {@link IntegerAutoChunk}s works exactly the same except * that, instead of 50% below 2048 elements and 55% above, the threshold for changing back to dense * in mode {@link DataManagement#MEMORY_OPTIMIZED} is always 45%. Also, the threshold for going to * sparse is 40% instead of 50% in that mode. * <p> * * The columns and chunks with Incomplete instead of Auto in its name work analogously to the Auto * ones. The only difference is, that their dense chunks allocate always the full expected size * instead of only 2048 values first before the sparsity check. * * @author Gisa Schaefer * */ package com.rapidminer.example.table.internal;