/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.apache.pig; import org.apache.pig.classification.InterfaceAudience; import org.apache.pig.classification.InterfaceStability; /** * An interface to declare that an EvalFunc's * calculation can be decomposed into intitial, intermediate, and final steps. * More formally, suppose we have to compute an function f over a bag X. In general, we need to know the entire X * before we can make any progress on f. However, some functions are <i>algebraic</i> e.g. SUM. In * these cases, you can apply some initital function f_init on subsets of X to get partial results. * You can then combine partial results from different subsets of X using an intermediate function * f_intermed. To get the final answers, several partial results can be combined by invoking a final * f_final function. For the function SUM, f_init, f_intermed, and f_final are all SUM. * * See the code for builtin AVG to get a better idea of how algebraic works. * * When eval functions implement this interface, Pig will attempt to use MapReduce's combiner. * The initial funciton will be called in the map phase and be passed a single tuple. The * intermediate function will be called 0 or more times in the combiner phase. And the final * function will be called once in the reduce phase. It is important that the results be the same * whether the intermediate function is called 0, 1, or more times. Hadoop makes no guarantees * about how many times the combiner will be called in a job. * * */ @InterfaceAudience.Public @InterfaceStability.Stable public interface Algebraic{ /** * Get the initial function. * @return A function name of f_init. f_init should be an eval func. * The return type of f_init.exec() has to be Tuple */ public String getInitial(); /** * Get the intermediate function. * @return A function name of f_intermed. f_intermed should be an eval func. * The return type of f_intermed.exec() has to be Tuple */ public String getIntermed(); /** * Get the final function. * @return A function name of f_final. f_final should be an eval func parametrized by * the same datum as the eval func implementing this interface. */ public String getFinal(); }