Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
Used to plan the aggregate operator for expressions based on the AggregateFunction2 interface.
Select the proper physical plan for join based on joining keys and size of logical plan.
Select the proper physical plan for join based on joining keys and size of logical plan.
At first, uses the ExtractEquiJoinKeys pattern to find joins where at least some of the predicates can be evaluated by matching join keys. If found, Join implementations are chosen with the following precedence:
- Broadcast: if one side of the join has an estimated physical size that is smaller than the user-configurable SQLConf.AUTO_BROADCASTJOIN_THRESHOLD threshold or if that side has an explicit broadcast hint (e.g. the user applied the org.apache.spark.sql.functions.broadcast() function to a DataFrame), then that side of the join will be broadcasted and the other side will be streamed, with no shuffling performed. If both sides of the join are eligible to be broadcasted then the - Shuffle hash join: if the average size of a single partition is small enough to build a hash table. - Sort merge: if the matching join keys are sortable.
If there is no joining keys, Join implementations are chosen with the following precedence: - BroadcastNestedLoopJoin: if one side of the join could be broadcasted - CartesianProduct: for Inner join - BroadcastNestedLoopJoin
Plans special cases of limit operators.
Plans special cases of limit operators.
Used to plan aggregation queries that are computed incrementally as part of a StreamingQuery.
Used to plan aggregation queries that are computed incrementally as part of a StreamingQuery. Currently this rule is injected into the planner on-demand, only when planning in a org.apache.spark.sql.execution.streaming.StreamExecution
This strategy is just for explaining Dataset/DataFrame
created by spark.readStream
.
This strategy is just for explaining Dataset/DataFrame
created by spark.readStream
.
It won't affect the execution, because StreamingRelation
will be replaced with
StreamingExecutionRelation
in StreamingQueryManager
and StreamingExecutionRelation
will
be replaced with the real relation using the Source
in StreamExecution
.
Collects placeholders marked as planLater by strategy and its LogicalPlans
Collects placeholders marked as planLater by strategy and its LogicalPlans
Used to build table scan operators where complex projection and filtering are done using separate physical operators.
Used to build table scan operators where complex projection and filtering are done using separate physical operators. This function returns the given scan operator with Project and Filter nodes added only when needed. For example, a Project operator is only used when the final desired output requires complex expressions to be evaluated or when columns can be further eliminated out after filtering has been done.
The prunePushedDownFilters
parameter is used to remove those filters that can be optimized
away by the filter pushdown optimization.
The required attributes for both filtering and expression evaluation are passed to the
provided scanBuilder
function so that it can avoid unnecessary column materialization.
Prunes bad plans to prevent combinatorial explosion.
Prunes bad plans to prevent combinatorial explosion.
A list of execution strategies that can be used by the planner
A list of execution strategies that can be used by the planner