/**
* <h2>MapReduce (parallel) implementation of FP Growth Algorithm for frequent Itemset Mining</h2>
*
* <p>We have a Top K Parallel FPGrowth Implementation. What it means is that given a huge transaction list,
* we find all unique features(field values) and eliminates those features whose frequency in the whole dataset
* is less that {@code minSupport}. Using these remaining features N, we find the top K closed patterns for
* each of them, generating NK patterns. FPGrowth Algorithm is a generic implementation, we can use any object
* type to denote a feature. Current implementation requires you to use a String as the object type. You may
* implement a version for any object by creating {@link java.util.Iterator}s, Convertors
* and TopKPatternWritable for that particular object. For more information please refer the package
* {@code org.apache.mahout.fpm.pfpgrowth.convertors.string}.</p>
*
* {@code
* FPGrowth<String> fp = new FPGrowth<String>();
* Set<String> features = new HashSet<String>();
* fp.generateTopKStringFrequentPatterns(
* new StringRecordIterator(
* new FileLineIterable(new File(input), encoding, false), pattern),
* fp.generateFList(
* new StringRecordIterator(new FileLineIterable(new File(input), encoding, false), pattern), minSupport),
* minSupport,
* maxHeapSize,
* features,
* new StringOutputConvertor(new SequenceFileOutputCollector<Text,TopKStringPatterns>(writer)));}
*
* <ul>
* <li>The first argument is the iterator of transaction in this case its {@code Iterator<List<String>>}</li>
* <li>The second argument is the output of generateFList function, which returns the frequent items and
* their frequencies from the given database transaction iterator</li>
* <li>The third argument is the minimum Support of the pattern to be generated</li>
* <li>The fourth argument is the maximum number of patterns to be mined for each feature</li>
* <li>The fifth argument is the set of features for which the frequent patterns has to be mined</li>
* <li>The last argument is an output collector which takes [key, value] of Feature and TopK Patterns of the format
* {@code [String, List<Pair<List<String>,Long>>]} and writes them to the appropriate writer class
* which takes care of storing the object, in this case in a
* {@link org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat}</li>
* </ul>
*
* <p>The command line launcher for string transaction data {@code org.apache.mahout.fpm.pfpgrowth.FPGrowthJob}
* has other features including specifying the regex pattern for spitting a string line of a transaction into
* the constituent features.</p>
*
* <p>The {@code numGroups} parameter in FPGrowthJob specifies the number of groups into which transactions
* have to be decomposed. The {@code numTreeCacheEntries} parameter specifies the number of generated
* conditional FP-Trees to be kept in memory so as not to regenerate them. Increasing this number
* increases the memory consumption but might improve speed until a certain point. This depends entirely on
* the dataset in question. A value of 5-10 is recommended for mining up to top 100 patterns for each feature.</p>
*/
package org.apache.mahout.fpm.pfpgrowth;