/* Copyright (C) SYSTAP, LLC DBA Blazegraph 2006-2016. All rights reserved. Contact: SYSTAP, LLC DBA Blazegraph 2501 Calvert ST NW #106 Washington, DC 20008 licenses@blazegraph.com This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ /* * Created on Dec 23, 2007 */ package com.bigdata.btree; import java.nio.ByteBuffer; import junit.framework.TestCase; import com.bigdata.io.DataOutputBuffer; /** * Test harness used to develop a compacting buffer for maintaining branch nodes * and leaves in a B+Tree that minimizes copying on mutation of the node, helps * to minimize heap churn and GC latency related to long lived allocations, and * maintains the data in a serializable format. * <p> * The basic data model is a managed byte[] on which we can write. Random * updates in the array are allowed and variable length data are simply appended * onto the end of the array. The array will grow if necessary, in which case * the data are copied onto a new byte[]. The copy is a compacting operation, * similar to GC, in which only the "live" bytes are copied forward. Compacting * restores the sort order of the keys. During mutations, the sort order is * maintained by an indirection vector having the offset and length of the * current location for each key. The order of the indirection vector is * maintained by an insertion sort. * * @todo The base class should be shared with {@link DataOutputBuffer}, should * NOT throw IOException, should be extended to provide both random * read/write, and should be extended for node/leaf data structures and * their compaction semantics. The same class can support branch nodes and * leaves. * * @todo representation is either directly serializable or fast to serialize and * de-serialize. note that serialization onto a {@link DataOutputBuffer} * tends to be cheap while de-serialization tends to be much more * expensive, in part because there are more allocations. * * @todo reuse buffers for a btree. * <p> * note that the disk write cache causes copying (to prevent the data from * being changed if the write cache is flushed). however if the btree * copies the data from the {@link ByteBuffer} then the byte[]s backing * the {@link ByteBuffer} will be short lived allocations (in the * nursery). the option is to let the caller pass in a buffer of * sufficient size and to let the caller decode the record length and keep * track of the #of valid bytes in the returned buffer. * * @todo minimize cost of mutations (insert/update/remove). * <p> * Note that adaptive packed memory arrays also seek to minimize the cost * of mutations. They leave "holes" in the data such that the cost of * mutations on a large set of ordered items (in the millions) may be * minimized. Periodically new holes are created to retain a balance of * the distribution of holes vs the #of entries in the array. Compact * serialization would naturally copy the data into a dense form. * De-serialization could lazily restore the holes on the first mutation. * * @todo minimize cost of mutations when the prefix length changes. it will be * relatively expensive to re-factor the keys to isolate the longest * common prefix. the prefix may be determined by comparing the 1st key * and either the last key or the separator key for the right sibling (if * available). none of these options are very stable as mutations on the * node or its right sibling could change the first key, the last key, and * the separator for the right sibling. * <p> * Explore cases where the prefix length changes and see if we can handle * deferred growth of the prefix. This should be possible if we track both * the prefix offset and length since we can just reduce the prefix length * and then we will just compare more bytes than are abolutely required in * each of the remainder keys. Likewise, examine cases where the prefix * length shrinks and see if we can minimize the cost of the mutation by * only changing some of the keys and their offsets. Finally, are there * cases where the prefix bytes change with or without a change in length * and how does that get handled. * * @todo support dictionary-based compression for keys and values. * <p> * ordered preserving compression may be used for keys in their * de-serialized state, reducing the #of bytes in the key. special * comparison logic is required since a probe key containing symbols not * in the code dictionary can not be mapped into a compressed key, * resulting in an insertion point vs a retrieval index. * <p> * Hu-tucker compression is not static, meaning that each node or leaf * will have its own order preserving compression. In this case the key * must be fully materialized when extracting separator keys for the * parent. (If the order preserving compression is static then we do not * need to decode before extracting a key and any key can be mapped in.) * <p> * The compression algorithms should be definable by the application. This * makes sense even for order preserving compression of the keys where we * expect to use hu-tucker or a variant since the choice of the alphabet * can vary by application (byte vs int vs long) and since people may then * experiment with other order-preserving compression techniques. * <p> * application defined non-order preserving compression may be used for * keys and values. for keys, we have to de-compress during * de-serialization since we need to compare the uncompressed keys in * order to have the order semantics. for values, we do not need to * decompress until we deliver the value to the application. an example of * the use of non-order preserving compression for keys would be the * long[3] keys in an RDF statement index. When the buffer is used to * promote serialization the keys can be compressed using a non-order * preserving technique that assigns bit codes to values in inverse * frequency (in fact, this is quite similar to hu-tucker). Note that * order preserving compression is not meaningful for the on the wire * format between a client and a data service since the latter will need * to use the uncompressed keys and the keys will then be recoded using a * local order preserving technique if that is supported by the index * partition. * * @todo support micro indexing? this is where a branch structure is represented * over the keys within the node or leaf to minimize the costs of the * binary search. whether this is efficient or not depends on the behavior * of the code with respect to cache lines. * * @todo reserve a few bits for a code indicating which of the supported * alternative representations is in effect. * * @todo store a field which is the pre-compression size when the keys are * compressed or perhaps the target "mutable" size if the buffer is to * undergo mutations. this will let us choose a suitable buffer size when * "de-serializing". * * @todo support a btree specific cache of buffers for nodes (and one for * leaves). the target buffer size is estimated based on the maximum * accepted mutable size before we force compaction and is never less than * the largest compact node/leaf size that we have seen. There is slop in * this estimate and the upper value for the capacity is bounded by how * many mutations we permit before forcing compaction while the lower * value for the capacity is bounded by the actual size of de-serialized * records. In a scenario in which mutation does not occur the capacity * should remain bounded by the actual record sizes. It's ok for us to * scan the pool N references looking for a suitable buffer and then we * start to discard buffers that are too small and update the capacity * bound. * * @author <a href="mailto:thompsonbry@users.sourceforge.net">Bryan Thompson</a> * @version $Id$ */ public class TestCompactingByteArrayBuffer extends TestCase { /** * */ public TestCompactingByteArrayBuffer() { } /** * @param arg0 */ public TestCompactingByteArrayBuffer(String arg0) { super(arg0); } // /** // * Trial balloon for node/leaf data structure using a mutable buffer and a // * compacting GC. // * // * @todo also implement {@link INodeData} or have two concrete classes of // * the same {@link CompactingByteBuffer} base class that exposes the // * node vs leaf data interfaces. // * // * @todo reconcile with the {@link NodeSerializer}. remove the use of the // * {@link IValueSerializer} and only support serialization of byte[] // * values (and version counters when the index is unisolated). // * <p> // * Should the btree api automatically (de-)serialize values using a // * local extSer data structure or should strong typing of values be // * required or should people use a utility object to wrap a btree and // * provide key/val encoding and decoding? // * // * @author <a href="mailto:thompsonbry@users.sourceforge.net">Bryan Thompson</a> // * @version $Id$ // */ // public static class CompactingByteBuffer extends ByteArrayBuffer implements ILeafData { // // /** // * The serialization version. // */ // private static transient int SIZEOF_VERSION = Bytes.SIZEOF_BYTE; // /** // * A set of bit flags. // * // * @todo if more than 8 bits are required then look into generalized bit // * stream support or just read the data as a short, int or long and // * then do the bit stuff on that value. // */ // private static transient int SIZEOF_FLAGS = Bytes.SIZEOF_BYTE; // /** // * The branching factor (m). // */ // private static transient int SIZEOF_BRANCHING_FACTOR = Bytes.SIZEOF_SHORT; // /** // * The #of keys. // */ // private static transient int SIZEOF_NKEYS = Bytes.SIZEOF_SHORT; // // private static transient int OFFSET_VERSION = 0x0; // private static transient int OFFSET_FLAGS = OFFSET_VERSION + SIZEOF_VERSION; // private static transient int OFFSET_BRANCHING_FACTOR = OFFSET_FLAGS + SIZEOF_BRANCHING_FACTOR; // private static transient int OFFSET_NKEYS = OFFSET_BRANCHING_FACTOR + SIZEOF_NKEYS; // // /** // * Mask for flags revealing the bit whose value is ONE (1) iff the record // * represents a leaf (otherwise it represents a node). // */ // private static transient int MASK_IS_LEAF = 0x01; // // /* // * @todo fields m (branchingFactor), isLeaf, nkeys(aka nvals), keys, vals. // */ // // public boolean isLeaf() { // // return (buf[OFFSET_FLAGS] & MASK_IS_LEAF) == 1; // // } // // public int getBranchingFactor() { // // return 0; // // } // // public int getEntryCount() { // // TODO Auto-generated method stub // return 0; // } // // public int getKeyCount() { // // TODO Auto-generated method stub // return 0; // } // // public IKeyBuffer getKeys() { // // TODO Auto-generated method stub // return null; // } // // public int getValueCount() { // // TODO Auto-generated method stub // return 0; // } // // public Object[] getValues() { // // TODO Auto-generated method stub // return null; // } // // } }