/* XXL: The eXtensible and fleXible Library for data processing
Copyright (C) 2000-2011 Prof. Dr. Bernhard Seeger
Head of the Database Research Group
Department of Mathematics and Computer Science
University of Marburg
Germany
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 3 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; If not, see <http://www.gnu.org/licenses/>.
http://code.google.com/p/xxl/
*/
package xxl.core.cursors.groupers;
import java.util.Comparator;
import java.util.Iterator;
import xxl.core.collections.queues.Heap;
import xxl.core.cursors.AbstractCursor;
import xxl.core.cursors.Cursor;
import xxl.core.cursors.Cursors;
/**
* The replacement-selection operator takes an iteration as its input and
* creates sorted runs as its output. This technique is described in "Donald
* Knuth.: <i>Sorting and Searching</i>. Addison-Wesley 1970." The
* replacement-selection algorithm is especially useful for external sorting.
* Recall, that an external merge-sort is performed by first producing
* <code>n</code> sorted input-runs. These runs are then recursively merged to
* a single output-run. The runs produced by replacement-selection operator
* tend to be twice as big as the available memory or even bigger.
*
* <p><b>Implementation details:</b> When initializing the
* replacement-selection operator the input iteration's elements are inserted
* in a array with length <code>size</code> until this array is filled up or
* the input iteration has no more elements. With the help of this array and
* the given comparator a new {@link xxl.core.collections.queues.Heap heap} is
* created with the intention to order the elements. If a default
* {@link xxl.core.comparators.ComparableComparator comparator} which assumes
* that the elements implement the {@link java.lang.Comparable comparable}
* interface is used, the elements will be returned in a natural ascending
* order, because they will be organized in a min-heap. But if an
* {@link xxl.core.comparators.InverseComparator inverse} comparator is used
* instead, they will be organized in a max-heap and a descending order will
* result. The method <code>check</code> verifies whether there are more
* elements to process. If so and the heap is empty, it creates a new heap
* using the elements that are reside in memory. The integer field
* <code>n</code> displays the current position in the array. The array is used
* for the creation of a new heap. It is initialized with <code>size</code>,
* namely the number of elements that can be kept in memory. This array is
* builds up a new heap during the run-creation. Consider the example a
* comparable comparator defines the order in the heap, so a min-heap manages
* the inserted elements of the input cursor. If such an element is lower than
* the <code>peek</code> element of the heap a new run has to start, therefore
* this element is written to the array. If the current heap has been emptied,
* a new heap is instantly created by the method <code>check</code> called in
* the method <code>next</code> of the replacement-selection operator. So a new
* heap is built up during the other heap is consumed. The next element to be
* returned is computed as follows:<br />
* A next element only exists if <code>n < array.length</code> or the heap
* contains further elements. If this is the case the method <code>check</code>
* is performed. If the input cursor contains more elements and the
* <code>peek</code> method of the heap returns an element that is lower than
* or equal to the next elemet of the input iteration concerning the used
* comparator, the next element of the heap is returned after the next element
* of the input iteration has been inserted into the heap. If the comparator
* returned a value greater than 0, the next element of the heap is returned
* and the next element of the input iteration is inserted in the array at
* position <code>n</code>. The array builds up a new heap for the next run.
* After that <code>n</code> is decremented. If the input iteration does not
* contain further elements the next element of the heap is returned.</p>
*
* <p><b>Note:</b> If the input iteration is given by an object of the class
* {@link java.util.Iterator}, i.e., it does not support the <code>peek</code>
* operation, it is internally wrapped to a cursor.</p>
*
* <p><b>Example usage (1):</b>
* <code><pre>
* ReplacementSelection<Integer> cursor = new ReplacementSelection<Integer>(
* new Enumerator(11),
* 3,
* ComparableComparator.INTEGER_COMPARATOR
* );
*
* cursor.open();
*
* while (cursor.hasNext())
* System.out.print(cursor.next() + "; ");
* System.out.flush();
* System.out.println();
*
* cursor.close();
* </pre></code>
* This instance of a replacement-selection operator sorts the enumerator's
* elements with range [0,11[ by using a memory size of 3, i.e., the heap
* consists of a maximum of three elements. In this case a
* {@link xxl.core.comparators.ComparableComparator comparator} which assumes
* that the elements implement the {@link java.lang.Comparable comparable}
* interface is used, so the elements are sorted in a natural order. If the
* whole replacement-selection operator is consumed, only one single run
* containing all of the underlying enumerator in ascending order is created.
* That is the fact because using a comparable comparator causes that the heap
* is organized as a min-heap and therefore the next element of the heap
* (minimum) is returned every time. Due to the <code>peek</code> element of
* the input iteration is greater than the <code>peek</code> element of the
* heap. The next element of the input iteration is inserted in the heap and
* then the heap is reorganized. Because the enumerator's elements are deliverd
* in an ascending order a heap has to be build up only for one time. The
* generated output looks as follows:
* <pre>
* 0; 1; 2; 3; 4; 5; 6; 7; 8; 9; 10;
* </pre></p>
*
* <p><b>Example usage (2):</b>
* <code><pre>
* cursor = new ReplacementSelection<Integer>(
* new Permutator(20),
* 3,
* ComparableComparator.INTEGER_COMPARATOR
* );
*
* cursor.open();
*
* int last = 0;
* boolean first = true;
* while (cursor.hasNext())
* if (last > cursor.peek() || first) {
* System.out.println();
* System.out.print(" Run: ");
* last = cursor.next();
* System.out.print(last + "; ");
* first = false;
* }
* else {
* last = cursor.next();
* System.out.print(last + "; ");
* }
* System.out.flush();
*
* cursor.close();
* </pre></code>
* This instance of a replacement-selection uses a
* {@link xxl.core.cursors.sources.Permutator permutator} with range [0,20[ and
* memory size of 3, that is also the heap size. In this case a comparable
* comparator is specified to compare two elements, i.e., a natural order will
* result and the used heap is a min-heap, too. This example shows that more
* than only one run can be created. A new run starts if the <code>peek</code>
* element of the input iteration (permutator) is lower than the
* <code>peek</code> element of the heap, i.e., the minimal element. The output
* demonstrates the different created runs each having an ascending order
* concerning the permutator's elements.</p>
*
* @param <E> the type of the elements returned by this iteration.
* @see java.util.Iterator
* @see xxl.core.cursors.Cursor
* @see xxl.core.cursors.AbstractCursor
* @see xxl.core.cursors.sorters.MergeSorter
*/
public class ReplacementSelection<E> extends AbstractCursor<E> {
/**
* The input iteration the runs should be created of.
*/
protected Cursor<? extends E> input;
/**
* The array used to create a heap. The length of the array is specified by
* the parameter <code>size</code>.
*/
protected Object[] array;
/**
* The number of elements that can be kept in memory.
*/
protected int size;
/**
* Current position in the array.
*/
protected int n;
/**
* The comparator used to compare two elements of the input iteration.
*/
protected Comparator<? super E> comparator;
/**
* The heap used for the replacement-selection algorithm, ordering the
* elements.
*/
protected Heap<E> heap;
/**
* Creates a new instance of the replacement-selection operator.
*
* @param iterator the input iteration the sorted runs should be created
* of.
* @param size the number of elements that can be kept in memory.
* @param comparator the comparator used to compare two elements.
*/
public ReplacementSelection(Iterator<? extends E> iterator, int size, Comparator<? super E> comparator) {
this.input = Cursors.wrap(iterator);
this.comparator = comparator;
this.size = size;
}
/**
* Opens the replacement-selection operator, i.e., signals the cursor to
* reserve resources, open the input iteration and initializing the
* internally used heap. Before a cursor has been opened calls to methods
* like <code>next</code> or <code>peek</code> are not guaranteed to yield
* proper results. Therefore <code>open</code> must be called before a
* cursor's data can be processed. Multiple calls to <code>open</code> do
* not have any effect, i.e., if <code>open</code> was called the cursor
* remains in the state <i>opened</i> until its <code>close</code> method
* is called.
*
* <p>Note, that a call to the <code>open</code> method of a closed cursor
* usually does not open it again because of the fact that its state
* generally cannot be restored when resources are released respectively
* files are closed.</p>
*/
public void open() {
if (!isOpened)
init();
super.open();
}
/**
* Initializes the replacement-selection operator. The implementation of
* this method is as follows:
* <code><pre>
* array = new Object[n = size];
* input.open();
* while (input.hasNext() && n > 0)
* array[--n] = input.next();
* (heap = new Heap(array, 0, comparator)).open();
* </pre></code>
* The input iteration's elements are inserted in an array with length
* <code>size</code> until this array is filled up or the input iteration
* has no more elements. With the help of this array and the given
* comparator a new heap is created with the intention to order the
* elements.
*/
@SuppressWarnings("unchecked") // internally stored as Object array by Heap
protected void init() {
array = new Object[n = size];
input.open();
while (input.hasNext() && n > 0)
array[--n] = input.next();
(heap = new Heap<E>((E[])array, 0, comparator)).open();
}
/**
* Closes the replacement-selection operator, i.e., signals the cursor to
* clean up resources, close the input iteration and the internally used
* heap. When a cursor has been closed calls to methods like
* <code>next</code> or <code>peek</code> are not guaranteed to yield
* proper results. Multiple calls to <code>close</code> do not have any
* effect, i.e., if <code>close</code> was called the cursor remains in the
* state <i>closed</i>.
*
* <p>Note, that a closed cursor usually cannot be opened again because of
* the fact that its state generally cannot be restored when resources are
* released respectively files are closed.</p>
*/
public void close() {
if (isClosed)
return;
super.close();
input.close();
heap.close();
}
/**
* Checks whether there are more elements to process. If so and the heap is
* empty, it creates a new heap using the elements that are reside in
* memory and sets <code>n</code> to <code>array.length</code>.
*/
@SuppressWarnings("unchecked") // internally stored as Object array by Heap
protected void check() {
if (heap.isEmpty()) {
System.arraycopy(
array,
Math.max(n, array.length - n),
array,
0,
Math.min(n, array.length - n)
);
(heap = new Heap<E>((E[])array, array.length - n, comparator)).open();
n = array.length;
}
}
/**
* Returns <code>true</code> if the iteration has more elements. (In other
* words, returns <code>true</code> if <code>next</code> or
* <code>peek</code> would return an element rather than throwing an
* exception.)
*
* @return <code>true</code> if the cursor has more elements.
*/
protected boolean hasNextObject() {
return n < array.length || !heap.isEmpty();
}
/**
* Returns the next element in the iteration. This element will be
* accessible by some of the cursor's methods, e.g., <code>update</code> or
* <code>remove</code>, until a call to <code>next</code> or
* <code>peek</code> occurs. This is calling <code>next</code> or
* <code>peek</code> proceeds the iteration and therefore its previous
* element will not be accessible any more.
*
* <p>Such an element exists if <code>n < array.length</code> or the
* heap contains further elements. If this is the case the method
* <code>check</code> is performed. If the input iteration contains more
* elements and the next element of the heap is lower than or equal to the
* next element of the input iteration concerning the used comparator, the
* next element of the heap is returned after the next element of the input
* iteration has been inserted into the heap. If the comparator returned a
* value greater than 0, the next element of the heap is returned and the
* next element of the input iteration is inserted in the array at position
* <code>n</code>. The array builds up a new heap for the next run. After
* that <code>n</code> is decremented. So a new heap is built up during the
* other heap is consumed. If the input cursor does not contain further
* elements the next element of the heap is returned.
*
* @return the next element in the iteration.
*/
protected E nextObject() {
check();
if (input.hasNext())
if (comparator.compare(heap.peek(), input.peek()) <= 0)
return heap.replace(input.next());
else {
E result = heap.dequeue();
array[--n] = input.next();
return result;
}
else
return heap.dequeue();
}
/**
* Resets the replacement-selection to its initial state (optional
* operation). So the caller is able to traverse the underlying iteration
* again.
*
* @throws UnsupportedOperationException if the <code>reset</code> method
* is not supported by the replacement-selection operator.
*/
public void reset() throws UnsupportedOperationException {
super.reset();
input.reset();
heap.close();
init();
}
}