package-info.java example

Explorer

xml-to-avro-master
- avro-to-xml
  - src
    - main
      - java
        avro
        complex_schema
        allTheThings.java
        anyAndFriends.java
        avroEnum.java
        backtrack.java
        complexExtension.java
        enums
        avroEnum.java
        firstMap.java
        fixed.java
        listOfUnion.java
        mixedType.java
        prohibit.java
        qName.java
        realRoot.java
        secondMap.java
        simpleExtension.java
        simpleRestriction.java
        unsignedLongList.java
        value.java
        xmlEnum.java
        mpigott
        avro
        xml
        Main.java
        sql
        xml
        SqlAttribute.java
        SqlRelationship.java
        SqlSchema.java
        SqlSchemaGenerator.java
        SqlTable.java
        SqlType.java
        SqlXmlConfig.java
        package-info.java
        org
        apache
        avro
        xml
        AvroPathNode.java
        AvroRecordInfo.java
        AvroSchemaApplier.java
        AvroSchemaGenerator.java
        Utils.java
        XmlDatumConfig.java
        XmlDatumReader.java
        XmlDatumWriter.java
        package-info.java
        w3
        www
        _2001
        xmlschema
        qName.java
    - test
      - java
        org
        apache
        avro
        xml
        TestAvroSchemaApplier.java
        TestAvroSchemaGenerator.java
        TestUtils.java
        TestXmlDatumWriter.java
        TestXmlToAvroAndBack.java
        UtilsForTests.java

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

/**
 * <h1>Convert XML Documents to Avro, and Back, through XML Schema</h1>
 *
 * <p>
 * {@link org.apache.avro.xml.XmlDatumWriter} generates an Avro
 * {@link org.apache.avro.Schema} from one or more XML Schemas, and
 * will write XML Documents into Avro format using that Avro schema.
 * </p>
 *
 * <p>
 * {@link org.apache.avro.xml.XmlDatumReader} will read Avro data using an Avro
 * schema generated by <code>XmlDatumWriter</code>, and use it to reconstruct
 * the original XML document.  Conversion from XML to Avro is lossy (more
 * details below), and the Avro schema generated by <code>XmlDatumWriter</code>
 * contains the locations of the XML Schemas used to generate it.
 * </p>
 *
 * <p>
 * {@link org.apache.avro.xml.XmlDatumConfig} is used to configure
 * <code>XmlDatumWriter</code>. The {@link java.net.URL}s and
 * {@link java.io.File}s containing XML Schemas are defined there,
 * as well as the root node in the XML Schema to use to generate
 * the corresponding Avro <code>Schema</code>.
 * </p>
 *
 * <h2>Avro Schema Generation</h2>
 *
 * <p>
 * The following describes how an Avro Schema will be generated from an XML
 * Schema.
 * </p>
 *
 * <h3>XML Elements Map to Avro Records</h3>
 *
 * <p>
 * XML elements are represented as Avro records.  Each of the element's
 * attributes are stored as a field in the record.  The element's content is
 * stored as a field named after the element.  If the element has simple
 * content, that content will be stored directly.  If the element has child
 * elements, they are stored as an array of union of those children.
 * </p>
 *
 * <p>
 * The content of empty mixed elements will be stored as a string, while the
 * content of non-empty mixed elements will be an array of union of the child
 * element types, along with string.
 * </p>
 *
 * <p>
 * <b>Note:</b> Unlike XML attributes, Avro fields do not have their own
 * namespace.  This means that two attributes with the same name but different
 * namespaces cannot co-exist in the same Avro record, and an error will be
 * thrown when the element's record is generated.
 * </p>
 * <p>
 * In addition, because the children of the element are stored in a field under
 * the element's name, no attribute in the element can have the same name as
 * the element itself.
 * </p>
 *
 * <h3>XML Simple Type Mapping to Avro Types</h3>
 *
 * <p>
 * The following is a mapping of XML Schema simple types to their Avro
 * counterparts.  Any derived types of these XML Schema simple types will also
 * be represented using this type.
 * </p>
 * 
 * <table border="1">
 *   <thead>
 *     <tr>
 *       <th>XML Schema Type</th>
 *       <th>Avro Schema Type</th>
 *       <th>Logical Type / Record Structure</th>
 *     </tr>
 *   </thead>
 *   <tbody>
 *     <tr>
 *       <td><code>boolean</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.BOOLEAN}</td>
 *       <td />
 *     </tr>
 *     <tr>
 *       <td><code>decimal</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.BYTES}</td>
 *       <td>Logical Type <code>decimal</code></td>
 *     </tr>
 *     <tr>
 *       <td><code>double</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.DOUBLE}</td>
 *       <td />
 *     </tr>
 *     <tr>
 *       <td><code>float</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.FLOAT}</td>
 *       <td />
 *     </tr>
 *     <tr>
 *       <td><code>base64</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.BYTES}</td>
 *       <td />
 *     </tr>
 *     <tr>
 *       <td><code>hexBinary</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.BYTES}</td>
 *       <td />
 *     </tr>
 *     <tr>
 *       <td><code>long</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.LONG}</td>
 *       <td />
 *     </tr>
 *     <tr>
 *       <td><code>unsignedInt</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.LONG}</td>
 *       <td />
 *     </tr>
 *     <tr>
 *       <td><code>int</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.INT}</td>
 *       <td />
 *     </tr>
 *     <tr>
 *       <td><code>unsignedShort</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.INT}</td>
 *       <td />
 *     </tr>
 *     <tr>
 *       <td><code>QName</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.RECORD}</td>
 *       <td>
 *         <table border="1">
 *           <thead>
 *             <tr>
 *               <th>Field</th>
 *               <th>Type</th>
 *               <th>Value</th>
 *             </tr>
 *           </thead>
 *           <tbody>
 *             <tr>
 *               <td>namespace</td>
 *               <td><code>string</code></td>
 *               <td>The <code>QName</code>'s namespace</td>
 *             </tr>
 *             <tr>
 *               <td>localPart</td>
 *               <td><code>string</code></td>
 *               <td>The <code>QName</code>'s local name.</td>
 *             </tr>
 *           </tbody>
 *         </table>
 *       </td>
 *     </tr>
 *     <tr>
 *       <td><code>list</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.ARRAY}</td>
 *       <td/>
 *     </tr>
 *     <tr>
 *       <td><code>union</code></td>
 *       <td>{@link org.apache.avro.Schema.Type.UNION}</td>
 *       <td/>
 *     </tr>
 *   </tbody>
 * </table>
 *
 * <h4><code>decimal</code></h4>
 * 
 * The <code>totalDigits</code> and <code>fractionDigits</code> facets will be
 * used to define the <code>decimal</code>'s precision and scale, respectively.
 * If not defined, the default precision is 34 (following the IEEE 754R
 * Decimal128 format), and the default scale is 8.
 *
 * <h4><code>Enums</code></h4>
 *
 * If all of the <code>enumeration</code> facet values can be represented as an
 * Avro {@link org.apache.avro.Schema.Type.ENUM}, an Avro enum will be used.
 * Otherwise, the original type will be used instead.
 *
 * <h3>Avro Map Generation</h3>
 *
 * <p>
 * If an element has exactly one non-optional attribute of type
 * <code>ID</code>, an Avro {@link org.apache.avro.Schema.Type.MAP} will be
 * generated for that element, and its direct siblings.
 * </p>
 *
 * <p>
 * If multiple differently-named children of the same element can be
 * represented as maps, an Avro map of union of those elements will be
 * generated instead.  However, only elements of the same name and type
 * will exist in the same map instance.
 * </p>
 *
 * <p>
 * XML Elements will not be re-ordered in the Avro document, so if elements of
 * the same name and type are not direct siblings, they will not co-exist in
 * the same map.  Separate maps will be generated instead.  Consider the
 * following:
 * </p>
 *
 * <pre>
 *   <!-- In XML Schema -->
 *   <element name="map">
 *     <complexType>
 *       <simpleContent type="string" />
 *       <attribute name="id" type="ID" />
 *     </complexType>
 *   </element>
 *   <element name="record" type="string" />
 *
 *   <!-- In XML Document -->
 *   <map id="id1">This is the first record in a map.</map>
 *   <map id="id2">This is the second record in the same map.</map>
 *   <record>This ends the previous map.</record>
 *   <map id="id3">This is the start of a new map.</map>
 * </pre>
 * 
 * <h3>Wildcard Elements and Attributes</h3>
 *
 * Wildcard elements (<code><any></code>) and attributes
 * (<code><anyAttribute></code>) do not have an equivalent concept in
 * Avro, and likewise are skipped over.  Any elements and attributes acting
 * as wildcards in the XML document will not appear in the Avro document.
 *
 * <h3>Optional Attributes and Nillable Elements</h3>
 *
 * Optional attributes and nillable elements will be represented as a
 * union of null and the simple type, as per Avro's handling of optional
 * values.  If the element or attribute was already a union, the null
 * type will be added to that union.
 *
 * <h2>Generating an Avro Document From XML</h2>
 *
 * <p>
 * {@link org.apache.avro.xml.XmlDatumWriter} will generate an Avro schema
 * from one or more XML Schemas using the above specification, and write
 * an XML {@link org.w3c.dom.Document} to an Avro {org.apache.avro.io.Encoder}
 * accordingly.  The generated Avro <code>Schema</code> can be retrieved from
 * {@link org.apache.avro.xml.XmlDatumWriter#getSchema()} before encoding the
 * first XML <code>Document</code>.
 * </p>
 *
 * <p>
 * A {@link org.apache.avro.xml.XmlDatumConfig} is required to set up the
 * <code>XmlDatumWriter</code>.  This is used to indicate where to read the
 * XML Schemas from, and also to define the root element in the corresponding
 * XML Documents.  (XML Schemas do not have a way to indicate what their root
 * element is.)
 * </p>
 *
 * <p>
 * <code>XmlDatumWriter</code> will encode the
 * {@link org.apache.avro.xml.XmlDatumConfig} in the resulting Avro
 * <code>Schema</code>, allowing for <code>XmlDatumReader</code> to reconstruct
 * the XML <code>Document</code> as best it can.  (Wildcard elements and
 * attributes are lost, and will not reappear in regenerated XML Documents.)
 * </p>
 *
 * <h2>Generating an XML Document From Avro</h2>
 *
 * <p>
 * {@link org.apache.avro.xml.XmlDatumReader} will construct an XML
 * {@link org.w3c.dom.Document} from an Avro schema generated by
 * <code>XmlDatumWriter</code> and a {@link org.apache.avro.io.Decoder}.
 * The <code>XmlDatumWriter</code>'s generated {@link org.apache.avro.Schema}
 * is required as it contains information on how to retrieve the corresponding
 * XML Schemas.
 * </p>
 *
 * <p>
 * However, the resulting document will not be precisely reconstructed.  Any
 * wildcard elements and attributes were not encoded in Avro, and likewise
 * cannot be reconstructed.  In addition, namespace prefixes will not match,
 * as they are also not encoded in Avro.  Of course, the new prefixes will
 * map namespaces and scopes correctly.
 * </p>
 */
package org.apache.avro.xml;