package-info.java example

Explorer
voltdb-master
/*
 * package-info.java
 * Written by Gil Tene of Azul Systems, and released to the public domain,
 * as explained at http://creativecommons.org/publicdomain/zero/1.0/
 */

/**
 * <h3>A High Dynamic Range (HDR) Histogram Package</h3>
 * <p>
 * An HdrHistogram histogram supports the recording and analyzing sampled data value counts across a configurable
 * integer value range with configurable value precision within the range. Value precision is expressed as the number
 * of significant digits in the value recording, and provides control over value quantization behavior across the
 * value range and the subsequent value resolution at any given level.
 * </p>
 * <p>
 * In contrast to traditional histograms that use linear, logarithmic, or arbitrary sized bins or buckets,
 * HdrHistograms use a fixed storage internal data representation that simultaneously supports an arbitrarily high
 * dynamic range and arbitrary precision throughout that dynamic range. This capability makes HdrHistograms extremely
 * useful for tracking and reporting on the distribution of percentile values with high resolution and across a wide
 * dynamic range -- a common need in latency behavior characterization.
 * </p>
 * <p>
 * The HdrHistogram package was specifically designed with latency and performance sensitive applications in mind.
 * Experimental u-benchmark measurements show value recording times as low as 3-6 nanoseconds on modern
 * (circa 2012) Intel CPUs. All Histogram variants can maintain a fixed cost in both space and time. When not
 * configured to auto-resize, a Histogram's memory footprint is constant, with no allocation operations involved in
 * recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data
 * value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in
 * recording a sample is constant, and directly computes storage index locations such that no iteration or searching
 * is ever involved in recording data values.
 * <p>
 *     NOTE: Histograms can optionally be configured to auto-resize their dynamic range as a convenience feature.
 *     When configured to auto-resize, recording operations that need to expand a histogram will auto-resize its
 *     dynamic range to include recorded values as they are encountered. Note that recording calls that cause
 *     auto-resizing may take longer to execute, and that resizing incurs allocation and copying of internal data
 *     structures.
 * </p>
 * <p>
 * The combination of high dynamic range and precision is useful for collection and accurate post-recording
 * analysis of sampled value data distribution in various forms. Whether it's calculating or
 * plotting arbitrary percentiles, iterating through and summarizing values in various ways, or deriving mean and
 * standard deviation values, the fact that the recorded value count information is kept in high
 * resolution allows for accurate post-recording analysis with low [and ultimately configurable] loss in
 * accuracy when compared to performing the same analysis directly on the potentially infinite series of sourced
 * data values samples.
 * </p>
 * <p>
 * An HdrHistogram histogram is usually configured to maintain value count data with a resolution good enough
 * to support a desired precision in post-recording analysis and reporting on the collected data. Analysis can include
 * the computation and reporting of distribution by percentiles, linear or logarithmic arbitrary value buckets, mean
 * and standard deviation, as well as any other computations that can supported using the various iteration techniques
 * available on the collected value count data. In practice, a precision levels of 2 or 3 decimal points are most
 * commonly used, as they maintain a value accuracy of +/- ~1% or +/- ~0.1% respectively for derived distribution
 * statistics.
 * </p>
 * <p>
 * A good example of HdrHistogram use would be tracking of latencies across a wide dynamic range. E.g. from a
 * microsecond to an hour. A Histogram can be configured to track and later report on the counts of observed integer
 * usec-unit  latency values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits
 * across that range. Such an example Histogram would simply be created with a
 * <b><code>highestTrackableValue</code></b> of 3,600,000,000, and a
 * <b><code>numberOfSignificantValueDigits</code></b> of 3, and would occupy a fixed, unchanging memory footprint
 * of around 185KB (see "Footprint estimation" below).
 * <br>
 * Code for this use example would include these basic elements:
 * <br>
 * <pre>
 * <code>
 * {@link org.HdrHistogram_voltpatches.Histogram} histogram = new {@link org.HdrHistogram_voltpatches.Histogram}(3600000000L, 3);
 * .
 * .
 * .
 * // Repeatedly record measured latencies:
 * histogram.{@link org.HdrHistogram_voltpatches.AbstractHistogram#recordValue(long) recordValue}(latency);
 * .
 * .
 * .
 * // Report histogram percentiles, expressed in msec units:
 * histogram.{@link org.HdrHistogram_voltpatches.AbstractHistogram#outputPercentileDistribution(java.io.PrintStream, Double) outputPercentileDistribution}(histogramLog, 1000.0)};
 * </code>
 * </pre>
 * Specifying 3 decimal points of precision in this example guarantees that value quantization within the value range
 * will be no larger than 1/1,000th (or 0.1%) of any recorded value. This example Histogram can be therefor used to
 * track, analyze and report the counts of observed latencies ranging between 1 microsecond and 1 hour in magnitude,
 * while maintaining a value resolution 1 microsecond (or better) up to 1 millisecond, a resolution of 1 millisecond
 * (or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At it's maximum tracked
 * value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).
 * <h3>Histogram variants and internal representation</h3>
 * The HdrHistogram package includes multiple implementations of the {@link org.HdrHistogram_voltpatches.AbstractHistogram} class:
 * <ul>
 *  <li> {@link org.HdrHistogram_voltpatches.Histogram}, which is the commonly used Histogram form and tracks value counts
 * in <b><code>long</code></b> fields. </li>
 *  <li>{@link org.HdrHistogram_voltpatches.IntCountsHistogram} and {@link org.HdrHistogram_voltpatches.ShortCountsHistogram}, which track value counts
 * in <b><code>int</code></b> and
 * <b><code>short</code></b> fields respectively, are provided for use cases where smaller count ranges are practical
 * and smaller overall storage is beneficial (e.g. systems where tens of thousands of in-memory histogram are
 * being tracked).</li>
 *  <li>{@link org.HdrHistogram_voltpatches.AtomicHistogram}, {@link org.HdrHistogram_voltpatches.ConcurrentHistogram}
 *  and {@link org.HdrHistogram_voltpatches.SynchronizedHistogram}</li>
 * </ul>
 * <p>
 * Internally, data in HdrHistogram variants is maintained using a concept somewhat similar to that of floating
 * point number representation: Using a an exponent a (non-normalized) mantissa to
 * support a wide dynamic range at a high but varying (by exponent value) resolution.
 * AbstractHistogram uses exponentially increasing bucket value ranges (the parallel of
 * the exponent portion of a floating point number) with each bucket containing
 * a fixed number (per bucket) set of linear sub-buckets (the parallel of a non-normalized mantissa portion
 * of a floating point number).
 * Both dynamic range and resolution are configurable, with <b><code>highestTrackableValue</code></b>
 * controlling dynamic range, and <b><code>numberOfSignificantValueDigits</code></b> controlling
 * resolution.
 * </p>
 * <h3>Synchronization and concurrent access</h3>
 * In the interest of keeping value recording cost to a minimum, the commonly used {@link org.HdrHistogram_voltpatches.Histogram}
 * class and it's {@link org.HdrHistogram_voltpatches.IntCountsHistogram} and {@link org.HdrHistogram_voltpatches.ShortCountsHistogram}
 * variants are NOT internally synchronized, and do NOT use atomic variables. Callers wishing to make potentially
 * concurrent, multi-threaded updates or queries against Histogram objects should either take care to externally
 * synchronize and/or order their access, or use the {@link org.HdrHistogram_voltpatches.ConcurrentHistogram},
 * {@link org.HdrHistogram_voltpatches.AtomicHistogram}, or {@link org.HdrHistogram_voltpatches.SynchronizedHistogram} variants.
 * <p>
 * A common pattern seen in histogram value recording involves recording values in a critical path (multi-threaded
 * or not), coupled with a non-critical path reading the recorded data for summary/reporting purposes. When such
 * continuous non-blocking recording operation (concurrent or not) is desired even when sampling, analyzing, or
 * reporting operations are needed, consider using the {@link org.HdrHistogram_voltpatches.Recorder} and
 * {@link org.HdrHistogram_voltpatches.SingleWriterRecorder} variants that were specifically designed for that purpose.
 * Recorders provide a recording API similar to Histogram, and internally maintain and coordinate active/inactive
 * histograms such that recording remains wait-free in the presense of accurate and stable interval sampling.
 * </p>
 * <p>
 * It is worth mentioning that since Histogram objects are additive, it is common practice to use per-thread
 * non-synchronized histograms or {@link org.HdrHistogram_voltpatches.SingleWriterRecorder}s, and using a summary/reporting
 * thread perform histogram aggregation math across time and/or threads.
 * </p>
 * <h3>Iteration</h3>
 * Histograms supports multiple convenient forms of iterating through the histogram data set, including linear,
 * logarithmic, and percentile iteration mechanisms, as well as means for iterating through each recorded value or
 * each possible value level. The iteration mechanisms all provide {@link org.HdrHistogram.HistogramIterationValue}
 * data points along the histogram's iterated data set, and are available via the following methods:
 * <ul>
 *     <li>{@link org.HdrHistogram_voltpatches.AbstractHistogram#percentiles percentiles} :
 *     An {@link java.lang.Iterable}{@literal <}{@link org.HdrHistogram.HistogramIterationValue}{@literal >} through the
 *     histogram using a {@link org.HdrHistogram.PercentileIterator} </li>
 *     <li>{@link org.HdrHistogram_voltpatches.AbstractHistogram#linearBucketValues linearBucketValues} :
 *     An {@link java.lang.Iterable}{@literal <}{@link org.HdrHistogram.HistogramIterationValue}{@literal >} through
 *     the histogram using a {@link org.HdrHistogram.LinearIterator} </li>
 *     <li>{@link org.HdrHistogram_voltpatches.AbstractHistogram#logarithmicBucketValues logarithmicBucketValues} :
 *     An {@link java.lang.Iterable}{@literal <}{@link org.HdrHistogram.HistogramIterationValue}{@literal >}
 *     through the histogram using a {@link org.HdrHistogram.LogarithmicIterator} </li>
 *     <li>{@link org.HdrHistogram_voltpatches.AbstractHistogram#recordedValues recordedValues} :
 *     An {@link java.lang.Iterable}{@literal <}{@link org.HdrHistogram.HistogramIterationValue}{@literal >} through
 *     the histogram using a {@link org.HdrHistogram.RecordedValuesIterator} </li>
 *     <li>{@link org.HdrHistogram_voltpatches.AbstractHistogram#allValues allValues} :
 *     An {@link java.lang.Iterable}{@literal <}{@link org.HdrHistogram.HistogramIterationValue}{@literal >} through
 *     the histogram using a {@link org.HdrHistogram.AllValuesIterator} </li>
 * </ul>
 * <p>
 * Iteration is typically done with a for-each loop statement. E.g.:
 * <br><pre><code>
 * for (HistogramIterationValue v : histogram.percentiles(<i>percentileTicksPerHalfDistance</i>)) {
 *     ...
 * }
 * </code></pre>
 * or
 * <br><pre><code>
 * for (HistogramIterationValue v : histogram.linearBucketValues(<i>valueUnitsPerBucket</i>)) {
 *     ...
 * }
 * </code>
 * </pre>
 * The iterators associated with each iteration method are resettable, such that a caller that would like to avoid
 * allocating a new iterator object for each iteration loop can re-use an iterator to repeatedly iterate through the
 * histogram. This iterator re-use usually takes the form of a traditional for loop using the Iterator's
 * <b><code>hasNext()</code></b> and <b><code>next()</code></b> methods:
 *
 * to avoid allocating a new iterator object for each iteration loop:
 * <br>
 * <pre>
 * <code>
 * PercentileIterator iter = histogram.percentiles().iterator(<i>percentileTicksPerHalfDistance</i>);
 * ...
 * iter.reset(<i>percentileTicksPerHalfDistance</i>);
 * for (iter.hasNext() {
 *     HistogramIterationValue v = iter.next();
 *     ...
 * }
 * </code>
 * </pre>
 * <h3>Equivalent Values and value ranges</h3>
 * <p>
 * Due to the finite (and configurable) resolution of the histogram, multiple adjacent integer data values can
 * be "equivalent". Two values are considered "equivalent" if samples recorded for both are always counted in a
 * common total count due to the histogram's resolution level. Histogram provides methods for determining the
 * lowest and highest equivalent values for any given value, as we as determining whether two values are equivalent,
 * and for finding the next non-equivalent value for a given value (useful when looping through values, in order
 * to avoid double-counting count).
 * </p>
 * <h3>Raw vs. corrected recording</h3>
 * <p>
 * Regular, raw value data recording into an HdrHistogram is achieved with the
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram#recordValue(long) recordValue()} method.
 * <p>
 * Histogram variants also provide an auto-correcting
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram#recordValueWithExpectedInterval(long, long) recordValueWithExpectedInterval()}
 * form in support of a common use case found when histogram values are used to track response time
 * distribution in the presence of Coordinated Omission - an extremely common phenomenon found in latency recording
 * systems.
 * This correcting form is useful in [e.g. load generator] scenarios where measured response times may exceed the
 * expected interval between issuing requests, leading to the "omission" of response time measurements that would
 * typically correlate with "bad" results. This coordinated (non random) omission of source data, if left uncorrected,
 * will then dramatically skew any overall latency stats computed on the recorded information, as the recorded data set
 * itself will be significantly skewed towards good results.
 * </p>
 * <p>
 * When a value recorded in the histogram exceeds the
 * <b><code>expectedIntervalBetweenValueSamples</code></b> parameter, recorded histogram data will
 * reflect an appropriate number of additional values, linearly decreasing in steps of
 * <b><code>expectedIntervalBetweenValueSamples</code></b>, down to the last value
 * that would still be higher than <b><code>expectedIntervalBetweenValueSamples</code></b>).
 * </p>
 * <p>
 * To illustrate why this corrective behavior is critically needed in order to accurately represent value
 * distribution when large value measurements may lead to missed samples, imagine a system for which response
 * times samples are taken once every 10 msec to characterize response time distribution.
 * The hypothetical system behaves "perfectly" for 100 seconds (10,000 recorded samples), with each sample
 * showing a 1msec response time value. At each sample for 100 seconds (10,000 logged samples
 * at 1msec each). The hypothetical system then encounters a 100 sec pause during which only a single sample is
 * recorded (with a 100 second value).
 * An normally recorded (uncorrected) data histogram collected for such a hypothetical system (over the 200 second
 * scenario above) would show ~99.99% of results at 1msec or below, which is obviously "not right". In contrast, a
 * histogram that records the same data using the auto-correcting
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram#recordValueWithExpectedInterval(long, long) recordValueWithExpectedInterval()}
 * method with the knowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly represent the
 * real world response time distribution of this hypothetical system. Only ~50% of results will be at 1msec or below,
 * with the remaining 50% coming from the auto-generated value records covering the missing increments spread between
 * 10msec and 100 sec.
 * </p>
 * <p>
 * Data sets recorded with and with
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram#recordValue(long) recordValue()}
 * and with
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram#recordValueWithExpectedInterval(long, long) recordValueWithExpectedInterval()}
 * will differ only if at least one value recorded was greater than it's
 * associated <b><code>expectedIntervalBetweenValueSamples</code></b> parameter.
 * Data sets recorded with
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram#recordValueWithExpectedInterval(long, long) recordValueWithExpectedInterval()}
 * parameter will be identical to ones recorded with
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram#recordValue(long) recordValue()}
 * it if all values recorded via the <b><code>recordValue</code></b> calls were smaller
 * than their associated <b><code>expectedIntervalBetweenValueSamples</code></b> parameters.
 * </p>
 * <p>
 * In addition to at-recording-time correction option, Histrogram variants also provide the post-recording correction
 * methods
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram#copyCorrectedForCoordinatedOmission(long) copyCorrectedForCoordinatedOmission()}
 * and
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram#addWhileCorrectingForCoordinatedOmission(AbstractHistogram, long) addWhileCorrectingForCoordinatedOmission()}.
 * These methods can be used for post-recording correction, and are useful when the
 * <b><code>expectedIntervalBetweenValueSamples</code></b> parameter is estimated to be the same for all recorded
 * values. However, for obvious reasons, it is important to note that only one correction method (during or post
 * recording) should be be used on a given histogram data set.
 * </p>
 * <p>
 * When used for response time characterization, the recording with the optional
 * <code><b>expectedIntervalBetweenValueSamples</b></code> parameter will tend to produce data sets that would
 * much more accurately reflect the response time distribution that a random, uncoordinated request would have
 * experienced.
 * </p>
 * <h3>Floating point values and DoubleHistogram variants</h3>
 * The above discussion relates to integer value histograms (the various subclasses of
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram} and their related supporting classes). HdrHistogram supports floating
 * point value recording and reporting with a similar set of classes, including the
 * {@link org.HdrHistogram_voltpatches.DoubleHistogram}, {@link org.HdrHistogram_voltpatches.ConcurrentDoubleHistogram} and
 * {@link org.HdrHistogram_voltpatches.SynchronizedDoubleHistogram} histogram classes. Support for floating point value
 * iteration is provided with {@link org.HdrHistogram.DoubleHistogramIterationValue} and related iterator classes (
 * {@link org.HdrHistogram.DoubleLinearIterator}, {@link org.HdrHistogram.DoubleLogarithmicIterator},
 * {@link org.HdrHistogram.DoublePercentileIterator}, {@link org.HdrHistogram.DoubleRecordedValuesIterator},
 * {@link org.HdrHistogram.DoubleAllValuesIterator}). Support for interval recording is provided with
 * {@link org.HdrHistogram_voltpatches.DoubleRecorder} and
 * {@link org.HdrHistogram_voltpatches.SingleWriterDoubleRecorder}.
 * <h4>Auto-ranging in floating point histograms</h4>
 * Unlike integer value based histograms, the specific value range tracked by a {@link
 * org.HdrHistogram_voltpatches.DoubleHistogram} (and variants) is not specified upfront. Only the dynamic range of values
 * that the histogram can cover is (optionally) specified. E.g. When a {@link org.HdrHistogram_voltpatches.DoubleHistogram}
 * is created to track a dynamic range of 3600000000000 (enough to track values from a nanosecond to an hour),
 * values could be recorded into into it in any consistent unit of time as long as the ratio between the highest
 * and lowest non-zero values stays within the specified dynamic range, so recording in units of nanoseconds
 * (1.0 thru 3600000000000.0), milliseconds (0.000001 thru 3600000.0) seconds (0.000000001 thru 3600.0), hours
 * (1/3.6E12 thru 1.0) will all work just as well.
 * <h3>Footprint estimation</h3>
 * Due to it's dynamic range representation, Histogram is relatively efficient in memory space requirements given
 * the accuracy and dynamic range it covers. Still, it is useful to be able to estimate the memory footprint involved
 * for a given <b><code>highestTrackableValue</code></b> and <b><code>numberOfSignificantValueDigits</code></b>
 * combination. Beyond a relatively small fixed-size footprint used for internal fields and stats (which can be
 * estimated as "fixed at well less than 1KB"), the bulk of a Histogram's storage is taken up by it's data value
 * recording counts array. The total footprint can be conservatively estimated by:
 * <pre><code>
 *     largestValueWithSingleUnitResolution = 2 * (10 ^ numberOfSignificantValueDigits);
 *     subBucketSize = roundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution);

 *     expectedHistogramFootprintInBytes = 512 +
 *          ({primitive type size} / 2) *
 *          (log2RoundedUp((highestTrackableValue) / subBucketSize) + 2) *
 *          subBucketSize
 *
 * </code></pre>
 * A conservative (high) estimate of a Histogram's footprint in bytes is available via the
 * {@link org.HdrHistogram_voltpatches.AbstractHistogram#getEstimatedFootprintInBytes() getEstimatedFootprintInBytes()} method.
 */

package org.HdrHistogram_voltpatches;