Returns true if expr
can be evaluated using only the output of plan
.
Returns true if expr
can be evaluated using only the output of plan
. This method
can be used to determine when it is acceptable to move expression evaluation within a query
plan.
For example consider a join between two relations R(a, b) and S(c, d).
- canEvaluate(EqualTo(a,b), R)
returns true
- canEvaluate(EqualTo(a,c), R)
returns false
- canEvaluate(Literal(1), R)
returns true
as literals CAN be evaluated on any plan
Returns true iff expr
could be evaluated as a condition within join.
Returns true iff expr
could be evaluated as a condition within join.
Returns a placeholder for a physical plan that executes plan
.
Returns a placeholder for a physical plan that executes plan
. This placeholder will be
filled in automatically by the QueryPlanner using the other execution strategies that are
available.
Select the proper physical plan for join based on joining keys and size of logical plan.
At first, uses the ExtractEquiJoinKeys pattern to find joins where at least some of the predicates can be evaluated by matching join keys. If found, Join implementations are chosen with the following precedence:
- Broadcast: if one side of the join has an estimated physical size that is smaller than the user-configurable SQLConf.AUTO_BROADCASTJOIN_THRESHOLD threshold or if that side has an explicit broadcast hint (e.g. the user applied the org.apache.spark.sql.functions.broadcast() function to a DataFrame), then that side of the join will be broadcasted and the other side will be streamed, with no shuffling performed. If both sides of the join are eligible to be broadcasted then the - Shuffle hash join: if the average size of a single partition is small enough to build a hash table. - Sort merge: if the matching join keys are sortable.
If there is no joining keys, Join implementations are chosen with the following precedence: - BroadcastNestedLoopJoin: if one side of the join could be broadcasted - CartesianProduct: for Inner join - BroadcastNestedLoopJoin