org.apache.spark.sql.execution
(Changed in version 2.8.0) collect
has changed. The previous behavior can be reproduced with toSeq
.
Holds a copy of an input row that is in the current group.
Holds null or the row that will be returned on next call to next()
in the inner iterator.
Return true if we already have the next iterator or fetching a new iterator is successful.
Return true if we already have the next iterator or fetching a new iterator is successful.
Note that, if we get the iterator by next
, we should consume it before call hasNext
,
because we will consume the input data to skip to next group while fetching a new iterator,
thus make the previous iterator empty.
Creates a row containing only the key for a given input row.
Compares two input rows and returns 0 if they are in the same group.
Iterates over a presorted set of rows, chunking it up by the grouping expression. Each call to next will return a pair containing the current group and an iterator that will return all the elements of that group. Iterators for each group are lazily constructed by extracting rows from the input iterator. As such, full groups are never materialized by this class.
Example input:
Result:
Note, the class does not handle the case of an empty input for simplicity of implementation. Use the factory to construct a new instance.