package-info.java example

Explorer
mahout-rbmClassifier-master
/**
 * <h2>MapReduce (parallel) implementation of FP Growth Algorithm for frequent Itemset Mining</h2>
 *
 * <p>We have a Top K Parallel FPGrowth Implementation. What it means is that given a huge transaction list,
 * we find all unique features(field values) and eliminates those features whose frequency in the whole dataset
 * is less that {@code minSupport}. Using these remaining features N, we find the top K closed patterns for
 * each of them, generating NK patterns. FPGrowth Algorithm is a generic implementation, we can use any object
 * type to denote a feature. Current implementation requires you to use a String as the object type. You may
 * implement a version for any object by creating {@link java.util.Iterator}s, Convertors
 * and TopKPatternWritable for that particular object. For more information please refer the package
 * {@code org.apache.mahout.fpm.pfpgrowth.convertors.string}.</p>
 *
 * {@code
 * FPGrowth<String> fp = new FPGrowth<String>();
 * Set<String> features = new HashSet<String>();
 * fp.generateTopKStringFrequentPatterns(
 *   new StringRecordIterator(
 *     new FileLineIterable(new File(input), encoding, false), pattern),
 *     fp.generateFList(
 *       new StringRecordIterator(new FileLineIterable(new File(input), encoding, false), pattern), minSupport),
 *     minSupport,
 *     maxHeapSize,
 *     features,
 *     new StringOutputConvertor(new SequenceFileOutputCollector<Text,TopKStringPatterns>(writer)));}
 *
 * <ul>
 * <li>The first argument is the iterator of transaction in this case its {@code Iterator<List<String>>}</li>
 * <li>The second argument is the output of generateFList function, which returns the frequent items and
 *  their frequencies from the given database transaction iterator</li>
 * <li>The third argument is the minimum Support of the pattern to be generated</li>
 * <li>The fourth argument is the maximum number of patterns to be mined for each feature</li>
 * <li>The fifth argument is the set of features for which the frequent patterns has to be mined</li>
 * <li>The last argument is an output collector which takes [key, value] of Feature and TopK Patterns of the format
 *  {@code [String, List<Pair<List<String>,Long>>]} and writes them to the appropriate writer class
 *  which takes care of storing the object, in this case in a
 *  {@link org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat}</li>
 * </ul>
 *
 * <p>The command line launcher for string transaction data {@code org.apache.mahout.fpm.pfpgrowth.FPGrowthJob}
 * has other features including specifying the regex pattern for spitting a string line of a transaction into
 * the constituent features.</p>
 *
 * <p>The {@code numGroups} parameter in FPGrowthJob specifies the number of groups into which transactions
 * have to be decomposed. The {@code numTreeCacheEntries} parameter specifies the number of generated
 * conditional FP-Trees to be kept in memory so as not to regenerate them. Increasing this number
 * increases the memory consumption but might improve speed until a certain point. This depends entirely on
 * the dataset in question. A value of 5-10 is recommended for mining up to top 100 patterns for each feature.</p>
 */
package org.apache.mahout.fpm.pfpgrowth;