/** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with this * work for additional information regarding copyright ownership. The ASF * licenses this file to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the * License for the specific language governing permissions and limitations under * the License. */ /** * Hadoop Table - tabular data storage for Hadoop MapReduce and PIG. * <p> * Hadoop Table provides tabular-type data storage for <a * href="http://hadoop.apache.org/core/docs/current/mapred_tutorial.html">Hadoop * MapReduce Framework</a>. It is also planned to allow Table to be closely * integrated with <a href="http://wiki.apache.org/pig/FrontPage">PIG</a>. * <p> * For this release, the basic construct of HadoopTable is called * {@link org.apache.hadoop.zebra.io.BasicTable}. A BasicTable is a create-once, * read-only kind of persisten data storage entity. A BasicTable contains zero * or more keyed rows. * <p> * The API uses Hadoop {@link org.apache.hadoop.io.BytesWritable} objects to * represent row keys, and PIG {@link org.apache.pig.data.Tuple} objects to * represent rows. * <p> * Each BasicTable maintains a {@link org.apache.hadoop.zebra.schema.Schema} , * which, for this release, is nothing but a collection of column names. Given a * schema, we can deduce the integer index of a particular column, and use it to * extract (get) the desired datum from PIG Tuple object (which only allows * index-based access). * <p> * Typically, applications use * {@link org.apache.hadoop.zebra.mapreduce.BasicTableOutputFormat} (which implements * the Hadoop {@link org.apache.hadoop.mapred.OutputFormat} interface) to create * BasicTables through MapReduce. And they use * {@link org.apache.hadoop.zebra.mapreduce.TableInputFormat} (which implements the * Hadoop {@link org.apache.hadoop.mapred.InputFormat} to feed the data as their * MapReduce input. * <p> * The API is structured in three packages: * <UL> * <LI> {@link org.apache.hadoop.zebra.mapreduce} : The MapReduce layer. It contains * two classes: BasicTableOutputFormat for creating BasicTable; and * TableInputFormat for readding table. * * <LI> {@link org.apache.hadoop.zebra.types} : Miscellaneous facilities that handle * column types and tuple serializations. Currently, it is a place holder that * redirects to PIG serialization. There is no type information being managed by * Table for individual columns. * * <LI> org.apache.hadoop.zebra.io : This is the internal IO layer. It deals * with the physical storage (files) management of BasicTable. It also provides * facilities to help MapReduce layer create splits, such as partitioning * BasicTables for reading and reporting data block placement distributions * based on range-partitions or key-partitions. * </UL> */ package org.apache.hadoop.zebra;