public class CubeDimensions extends EvalFunc<DataBag>
{ (a, b, c), (null, null, null), (a, b, null), (a, null, c),
(a, null, null), (null, b, c), (null, null, c), (null, b, null) }
The "all" marker is null by default, but can be set to an arbitrary string by invoking a constructor (via a DEFINE). The constructor takes a single argument, the string you want to represent "all".
Usage goes something like this:
events = load '/logs/events' using EventLoader() as (lang, event, app_id);
cubed = foreach x generate
FLATTEN(piggybank.CubeDimensions(lang, event, app_id))
as (lang, event, app_id),
measure;
cube = foreach (group cubed
by (lang, event, app_id) parallel $P)
generate
flatten(group) as (lang, event, app_id),
COUNT_STAR(cubed),
SUM(measure);
store cube into 'event_cube';
Note: doing this with non-algebraic aggregations on large data can result in very slow reducers, since one of the groups is going to get all the records in your relation.
EvalFunc.SchemaTypelog, pigLogger, reporter, returnType| Constructor and Description |
|---|
CubeDimensions() |
CubeDimensions(String allMarker) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
allowCompileTimeCalculation()
Whether the UDF should be evaluated at compile time if all inputs are constant.
|
static void |
convertNullToUnknown(Tuple tuple) |
DataBag |
exec(Tuple tuple)
This callback method must be implemented by all subclasses.
|
Schema |
outputSchema(Schema input)
Report the schema of the output of this UDF.
|
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLoadCaster, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, needEndOfAllInputProcessing, progress, setEndOfAllInput, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warnpublic CubeDimensions()
public CubeDimensions(String allMarker)
public DataBag exec(Tuple tuple) throws IOException
EvalFuncexec in class EvalFunc<DataBag>tuple - the Tuple to be processed.IOExceptionpublic static void convertNullToUnknown(Tuple tuple) throws ExecException
ExecExceptionpublic Schema outputSchema(Schema input)
EvalFunc
The default implementation interprets the OutputSchema annotation,
if one is present. Otherwise, it returns null (no known output schema).
outputSchema in class EvalFunc<DataBag>input - Schema of the inputpublic boolean allowCompileTimeCalculation()
EvalFuncallowCompileTimeCalculation in class EvalFunc<DataBag>Copyright © 2007-2017 The Apache Software Foundation