Generates the logical plan for generating window ranges on a timestamp column.
Generates the logical plan for generating window ranges on a timestamp column. Without
knowing what the timestamp value is, it's non-trivial to figure out deterministically how many
window ranges a timestamp will map to given all possible combinations of a window duration,
slide duration and start time (offset). Therefore, we express and over-estimate the number of
windows there may be, and filter the valid windows. We use last Project operator to group
the window columns into a struct so they can be accessed as window.start
and window.end
.
The windows are calculated as below: maxNumOverlapping <- ceil(windowDuration / slideDuration) for (i <- 0 until maxNumOverlapping) windowId <- ceil((timestamp - startTime) / slideDuration) windowStart <- windowId * slideDuration + (i - maxNumOverlapping) * slideDuration + startTime windowEnd <- windowStart + windowDuration return windowStart, windowEnd
This behaves as follows for the given parameters for the time: 12:05. The valid windows are marked with a +, and invalid ones are marked with a x. The invalid ones are filtered using the Filter operator. window: 12m, slide: 5m, start: 0m :: window: 12m, slide: 5m, start: 2m 11:55 - 12:07 + 11:52 - 12:04 x 12:00 - 12:12 + 11:57 - 12:09 + 12:05 - 12:17 + 12:02 - 12:14 +
The logical plan
the logical plan that will generate the time windows using the Expand operator, with the Filter operator for correctness and Project for usability.
Name for this rule, automatically inferred based on class name.
Name for this rule, automatically inferred based on class name.
Maps a time column to multiple time windows using the Expand operator. Since it's non-trivial to figure out how many windows a time column can map to, we over-estimate the number of windows and filter out the rows where the time column is not inside the time window.