Generates a method that returns true if the group-by keys exist at a given index in the associated org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch.
Generates a method that returns true if the group-by keys exist at a given index in the associated org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch.
Generates a method that returns a org.apache.spark.sql.catalyst.expressions.UnsafeRow which keeps track of the aggregate value(s) for a given set of keys.
Generates a method that returns a org.apache.spark.sql.catalyst.expressions.UnsafeRow which keeps track of the aggregate value(s) for a given set of keys. If the corresponding row doesn't exist, the generated method adds the corresponding row in the associated org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch.
Generates a method that computes a hash by currently xor-ing all individual group-by keys.
Generates a method that computes a hash by currently xor-ing all individual group-by keys. For instance, if we have 2 long group-by keys, the generated function would be of the form:
private long hash(long agg_key, long agg_key1) { return agg_key ^ agg_key1; }
This is a helper class to generate an append-only row-based hash map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates (and fall back to the
BytesToBytesMap
if a given key isn't found). This is 'codegened' in HashAggregate to speed up aggregates w/ key.We also have VectorizedHashMapGenerator, which generates a append-only vectorized hash map. We choose one of the two as the 1st level, fast hash map during aggregation.
NOTE: This row-based hash map currently doesn't support nullable keys and falls back to the
BytesToBytesMap
to store them.