Aggregate_Partial.java example

Explorer
sql-layer-master
/**
 * Copyright (C) 2009-2013 FoundationDB, LLC
 *
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU Affero General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU Affero General Public License for more details.
 *
 * You should have received a copy of the GNU Affero General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */

package com.foundationdb.qp.operator;

import com.foundationdb.qp.row.Row;
import com.foundationdb.qp.row.ValuesHolderRow;
import com.foundationdb.qp.rowtype.AggregatedRowType;
import com.foundationdb.qp.rowtype.RowType;
import com.foundationdb.server.explain.*;
import com.foundationdb.server.types.TAggregator;
import com.foundationdb.server.types.TClass;
import com.foundationdb.server.types.TInstance;
import com.foundationdb.server.types.mcompat.aggr.MCount;
import com.foundationdb.server.types.value.*;
import com.foundationdb.util.ArgumentValidation;
import com.foundationdb.util.tap.InOutTap;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.*;

/**

 <h1>Overview</h1>

 Aggregation_Partial applies a partial aggregation to rows. By partial
 we mean that if the aggregation contains a GROUP BY, output rows will
 be streamed out as soon as a change is detected in the GROUP BY
 columns; no attempt is made to sort or hash values. This means that if
 the incoming rows are not sorted by their GROUP BY columns, this
 operator may output more than one aggregation per GROUP BY value. If
 this happens, each aggregation happens independent of the others.

 <h1>Arguments</h1>

 <ul>

 <li><b>input:</b> the input operator

 <li><b>inputsIndex:</b> the first index of the input rows that
 represents an input; indexes before this are GROUP BY
 fields. Required: <i>0 <= inputsIndex < input.rowType().nFields()</i>

 <li><b>aggregatorFactory:</b> a mapping of <i>String -> Aggregator</i>

 <li><b>aggregatorNames:</b> a list of aggregator function to be given
 to the factory. Required: <i>inputsIndex + aggregatorNames.size() ==
 input.rowType().nFields()</i>

 </ul>

 <h1>Behavior</h1>

 This operator takes as input a row which is interpreted as having two
 sections: a GROUP BY section and an inputs section. These are
 delimited by the <i>inputsIndex</i> argument, which specifies the
 index of the first input. The operator also has a list of aggregator
 functions (specified by the <i>aggregatorNames</i> list), one per
 input.

 For each input row of type <i>input.rowType()</i>, the
 Aggregation_Partial's cursor applies each input to its appropriate
 aggregator. When the cursor notices a change in any of the GROUP BY
 columns, or when the input cursor is finished, the aggregation cursor
 outputs a row with the GROUP BY columns and the result of each
 aggregation.

 <h2>Example</h2>

 Let's say we have a table describing various SKUs in warehouses. Each
 warehouse has a category for SKUs, but these categories are not unique
 among all warehouses. We want to get the sum, min and max price of
 each category in each warehouse.

 <i>SELECT warehouse, product_category, SUM(price), MIN(price), MAX(price) FROM products GROUP BY warehouse, product_category</i>

 The input rows to the Aggregation_Partial cursor would be something like:

 <table>
 <tr><td> warehouse </td><td> product_category </td><td> price </td><td> price </td><td> price </td><td> notes </td></tr>
 <tr><td> 0001 </td><td> AAA </td><td>  5.00 </td><td>  5.00 </td><td>  5.00 </td><td> sku 1 </td></tr>
 <tr><td> 0001 </td><td> AAA </td><td> 10.00 </td><td> 10.00 </td><td> 10.00 </td><td> sku 2 </td></tr>
 <tr><td> 0002 </td><td> AAA </td><td>  7.00 </td><td>  7.00 </td><td>  7.00 </td><td> sku 3 </td></tr>
 <tr><td> 0002 </td><td> AAA </td><td>  3.00 </td><td>  3.00 </td><td>  3.00 </td><td> sku 4 </td></tr>
 <tr><td> 0002 </td><td> BBB </td><td> 11.00 </td><td> 11.00 </td><td> 11.00 </td><td> sku 5 </td></tr>
 </table>

 The <i>inputsIndex</i> here is <i>2</i>. Note that the "price" column has been repeated three times,
 once for each aggregate function. In this case, the input rows are already ordered by their GROUP BY columns;
 if we knew this were the case (due to another operator's ordering), this partial aggregation would also be a
 full aggregation.

 The output rows would look like:

 <table>
 <tr><td> warehouse </td><td> product_category </td><td> SUM(price) </td><td> MIN(price) </td><td> MAX(price) </td></tr>
 <tr><td> 0001 </td><td> AAA </td><td> 15.00 </td><td>  5.00 </td><td> 10.00 </td></tr>
 <tr><td> 0002 </td><td> AAA </td><td> 10.00 </td><td>  3.00 </td><td>  7.00 </td></tr>
 <tr><td> 0002 </td><td> BBB </td><td> 11.00 </td><td> 11.00 </td><td> 11.00 </td></tr>
 </table>

 <h2>Notes</h2>

 If there are no input rows, behavior depends on
 the <i>inputIndex</i>. If it is 0 (no GROUP BY) columns, this
 Aggregation_Partial will not output any rows. If <i>inputIndex > 0</i>
 (there is a GROUP BY), a single row will be outputted with all NULL
 values. This is due to the SQL spec.

 This operator cannot do something like an average directly. Instead,
 the operator tree would use this operator to get a SUM and COUNT of
 rows, and another operator would be responsible for dividing them to
 get the average.

 <h1>Output</h1>

 All input rows are swallowed. Output rows are as described above. All
 rows from the incoming operator with a type other
 than <i>input.rowType()</i> are passed through unchanged.

 <h1>Assumptions</h1>

 None; but if you want a full aggregation, it is up to you to pre-order
 the rows by their GROUP BY columns. If an aggregator is not amenable
 to this piecemeal aggregation, you should not use it with a partial
 aggregation (none of the aggregators we plan on writing have this
 problem).

 <h1>Performance</h1>

 Partially dictated by aggregators, though expected to be
 minimal. Comparison of GROUP BY columns is O(N).

 <h1>Memory requirements</h1>

 One row and one set of grouping column values.

 */

final class Aggregate_Partial extends Operator
{

    // Operator interface

    @Override
    protected Cursor cursor(QueryContext context, QueryBindingsCursor bindingsCursor) {
        return new AggregateCursor(context, bindingsCursor);
    }

    @Override
    public void findDerivedTypes(Set<RowType> derivedTypes) {
        inputOperator.findDerivedTypes(derivedTypes);
        derivedTypes.add(outputType);
    }

    @Override
    public List<Operator> getInputOperators() {
        return Collections.singletonList(inputOperator);
    }

    @Override
    public RowType rowType() {
        return outputType;
    }

    // AggregationOperator interface

    public Aggregate_Partial(Operator inputOperator,
                             RowType inputRowType,
                             int inputsIndex,
                             List<? extends TAggregator> aggregatorFactories,
                             List<? extends TInstance> pAggrTypes,
                             List<Object> options) {
        this.inputOperator = inputOperator;
        this.inputRowType = inputRowType;
        this.inputsIndex = inputsIndex;
        this.outputType = inputRowType.schema().newAggregateType(inputRowType, inputsIndex, pAggrTypes);
        this.pAggrs = aggregatorFactories;
        this.pAggrTypes = pAggrTypes;
        this.options = options;
        validate();
        
    }

    // Object interface

    @Override
    public String toString() {
        if (inputsIndex == 0) {
            return String.format("Aggregation(without GROUP BY: %s)", aggrsToString());
        }
        if (inputsIndex == 1) {
            return String.format("Aggregation(GROUP BY 1 field, then: %s)", aggrsToString());
        }
        return String.format("Aggregation(GROUP BY %d fields, then: %s)", inputsIndex, aggrsToString());
    }

    private String aggrsToString() {
        int pAggersLen = pAggrs.size();
        StringBuilder sb = new StringBuilder(pAggersLen * 6); // guess at the size, doesn't matter much
        sb.append('[');
        for (int i = 0; i < pAggersLen; ++i) {
            TAggregator aggregator = pAggrs.get(i);
            sb.append(aggregator);
            if (! (aggregator instanceof MCount)) {
                sb.append(rowType().typeAt(i + inputsIndex).typeClass().name().unqualifiedName());
            }
            if ( (i+1) < pAggersLen)
                sb.append(", ");
        }
        sb.append(']');
        return sb.toString();
    }

    // private methods

    private void validate() {
        if (inputOperator == null || inputRowType == null || outputType == null)
            throw new NullPointerException();
        ArgumentValidation.isBetween("inputsIndex", 0, inputsIndex, inputRowType.nFields()+1);
        int aggersSize;
        aggersSize = pAggrs.size();
        if (pAggrTypes.size() != pAggrs.size())
            throw new IllegalArgumentException("aggregators and aggregator types mismatch in size");
        if (inputsIndex + aggersSize != inputRowType.nFields()) {
            throw new IllegalArgumentException(
                    String.format("inputsIndex(=%d) + aggregatorNames.size(=%d) != inputRowType.nFields(=%d)",
                            inputsIndex, aggersSize, inputRowType.nFields()
            ));
        }
    }
    
    // class state
    
    private static final InOutTap TAP_OPEN = OPERATOR_TAP.createSubsidiaryTap("operator: Aggregate_Partial open");
    private static final InOutTap TAP_NEXT = OPERATOR_TAP.createSubsidiaryTap("operator: Aggregate_Partial next");
    private static final Logger LOG = LoggerFactory.getLogger(Aggregate_Partial.class);

    // object state

    private final Operator inputOperator;
    private final RowType inputRowType;
    private final AggregatedRowType outputType;
    private final int inputsIndex;
    private final List<? extends TInstance> pAggrTypes;
    private final List<? extends TAggregator> pAggrs;
    private final List<Object> options; // currently only used by GROUP_CONCAT, meaning the optional SEPARATOR string

    @Override
    public CompoundExplainer getExplainer(ExplainContext context)
    {
        Attributes atts = new Attributes();
        atts.put(Label.NAME, PrimitiveExplainer.getInstance(getName()));
        for (TAggregator agg : pAggrs)
            atts.put(Label.AGGREGATORS, PrimitiveExplainer.getInstance(agg.displayName().toUpperCase()));
        atts.put(Label.GROUPING_OPTION, PrimitiveExplainer.getInstance(inputsIndex));
        atts.put(Label.INPUT_OPERATOR, inputOperator.getExplainer(context));
        atts.put(Label.INPUT_TYPE, inputRowType.getExplainer(context));
        atts.put(Label.OUTPUT_TYPE, outputType.getExplainer(context));
        return new CompoundExplainer(Type.AGGREGATE, atts);
    }

    // nested classes

    private class AggregateCursor extends ChainedCursor
    {

        // Cursor interface

        @Override
        public void open() {
            TAP_OPEN.in();
            try {
                super.open();
            } finally {
                TAP_OPEN.out();
            }
        }

        @Override
        public Row next() {
            if (TAP_NEXT_ENABLED) {
                TAP_NEXT.in();
            }
            try {
                // CursorLifecycle.checkIdleOrActive(this);
                checkQueryCancelation();
                if (CURSOR_LIFECYCLE_ENABLED) {
                    CursorLifecycle.checkIdleOrActive(this);
                }
                if (isIdle()) {
                    if (LOG_EXECUTION) {
                        LOG.debug("Aggregate_Partial: null");
                    }
                    return null;
                }
                
                while (true) {
                    Row input = nextInput();
                    Row output;
                    if (input == null) {
                        if (everSawInput) {
                            output = createOutput();
                        }
                        else if (noGroupBy()) {
                            output = createEmptyOutput();
                        }
                        else {
                            output = null;
                        }
                        if (LOG_EXECUTION) {
                            LOG.debug("Aggregate_Partial: yield {}", output);
                        }
                        setIdle();
                        return output;
                    }
                    if (!(input.rowType() == inputRowType)) {
                        if (LOG_EXECUTION) {
                            LOG.debug("Aggregate_Partial: yield {}", input);
                        }
                        return input; // pass through
                    }
                    everSawInput = true;
                    if (outputNeeded(input)) {
                        saveInput(input); // save this input for the next time this method is invoked
                        output = createOutput();
                        if (LOG_EXECUTION) {
                            LOG.debug("Aggregate_Partial: yield {}", output);
                        }
                        return output;
                    }
                    aggregate(input);
                }
            } finally {
                if (TAP_NEXT_ENABLED) {
                    TAP_NEXT.out();
                }
            }
        }

        // for use in this class

        private void aggregate(Row input) {
            for (int i=0; i < pAggrs.size(); ++i) {
                TAggregator aggregator = pAggrs.get(i);
                int inputIndex = i + inputsIndex;
                TInstance inputType = input.rowType().typeAt(inputIndex);
                ValueSource inputSource = input.value(inputIndex);
                aggregator.input(inputType, inputSource, pAggrTypes.get(i), pAggrsStates.get(i), options.get(i));
            }
        }

        private Row createOutput() {
            ValuesHolderRow outputRow = newOutputRow();
            for(int i = 0; i < inputsIndex; ++i) {
                Value value = outputRow.valueAt(i);
                Value key = keyValues.get(i);
                ValueTargets.copyFrom(key, value);
            }
            for (int i = inputsIndex; i < inputRowType.nFields(); ++i) {
                Value value = outputRow.valueAt(i);
                int aggregatorIndex = i - inputsIndex;
                Value aggregatorState = pAggrsStates.get(aggregatorIndex);
                if (aggregatorState.hasAnyValue())
                    ValueTargets.copyFrom(aggregatorState, value);
                else
                    pAggrs.get(aggregatorIndex).emptyValue(value);
                aggregatorState.unset();
            }
            return outputRow;
        }

        private Row createEmptyOutput() {
            assert noGroupBy() : "shouldn't be creating null output row when I have a grouping";
            ValuesHolderRow outputRow = newOutputRow();
            for (int i = 0; i < outputRow.rowType().nFields(); ++i) {
                pAggrs.get(i).emptyValue(outputRow.valueAt(i));
            }
            return outputRow;
        }

        private boolean noGroupBy() {
            return inputsIndex == 0;
        }

        private boolean outputNeeded(Row givenInput) {
            if (noGroupBy()) {
                this.gatheringRows = false;
                return false;   // no GROUP BYs, so aggregate until givenInput is null
            }

            // check for any changes to keys
            // Coming into this code, we're either RUNNING (within a GROUP BY run) or OPENING (about to start
            // a new run).
            if (gatheringRows) {
                for (int i = 0; i < keyValues.size(); ++i) {
                    ValueTargets.copyFrom(givenInput.value(i), keyValues.get(i));
                }
                gatheringRows = false;
                return false;
            }
            else {
                // If any keys are different, switch mode to OPENING and return true; else return false.
                for (int i = 0; i < keyValues.size(); ++i) {
                    Value key = keyValues.get(i);
                    ValueSource input = givenInput.value(i);
                    if (!TClass.areEqual(key, input)) {
                        gatheringRows = true;
                        return true;
                    }
                }
                return false;
            }
        }

        private Row nextInput() {
            final Row result;
            if (holder != null) {
                result = holder;
                holder = null;
            }
            else {
                result = input.next();
            }
            return result;
        }

        private void saveInput(Row input) {
            assert holder == null : holder;
            holder = input;
        }

        private ValuesHolderRow newOutputRow() {
            return new ValuesHolderRow(outputType);
        }

        // AggregateCursor interface

        private AggregateCursor(QueryContext context, QueryBindingsCursor bindingsCursor) {
            super(context, inputOperator.cursor(context, bindingsCursor));
            keyValues = new ArrayList<>(inputsIndex);
            for (int i = 0; i < inputsIndex; ++i) {
                keyValues.add(new Value(outputType.typeAt(i)));
            }
            int nAggrs = pAggrs.size();
            pAggrsStates = new ArrayList<>(nAggrs);
            for (int i = 0; i < nAggrs; i++) {
                Value state = new Value(pAggrTypes.get(i));
                pAggrsStates.add(state);
            }
        }


        // object state

        private final List<Value> keyValues;
        private final List<Value> pAggrsStates;
        private Row holder;
        private boolean everSawInput = false;
        private boolean gatheringRows = true;
    }

}