A batch of rules.
A batch of rules.
A strategy that runs until fix point or maxIterations times, whichever comes first.
A strategy that runs until fix point or maxIterations times, whichever comes first.
An execution strategy for rules that indicates the maximum number of executions.
An execution strategy for rules that indicates the maximum number of executions. If the execution reaches fix point (i.e. converge) before maxIterations, it will stop.
Analyze cte definitions and substitute child plan with analyzed cte definitions.
Extracts Generator from the projectList of a Project operator and create Generate operator under Project.
Extracts Generator from the projectList of a Project operator and create Generate operator under Project.
This rule will throw AnalysisException for following cases:
1. Generator is nested in expressions, e.g. SELECT explode(list) + 1 FROM tbl
2. more than one Generator is found in projectList,
e.g. SELECT explode(list), explode(list) FROM tbl
3. Generator is found in other operators that are not Project or Generate,
e.g. SELECT * FROM tbl SORT BY explode(list)
Extracts WindowExpressions from the projectList of a Project operator and aggregateExpressions of an Aggregate operator and creates individual Window operators for every distinct WindowSpecDefinition.
Extracts WindowExpressions from the projectList of a Project operator and aggregateExpressions of an Aggregate operator and creates individual Window operators for every distinct WindowSpecDefinition.
This rule handles three cases:
For every case, the transformation works as follows: 1. For a list of Expressions (a projectList or an aggregateExpressions), partitions it two lists of Expressions, one for all WindowExpressions and another for all regular expressions. 2. For all WindowExpressions, groups them based on their WindowSpecDefinitions. 3. For every distinct WindowSpecDefinition, creates a Window operator and inserts it into the plan tree.
Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of corresponding Attributes of its children output Attributes.
Fixes nullability of Attributes in a resolved LogicalPlan by using the nullability of corresponding Attributes of its children output Attributes. This step is needed because users can use a resolved AttributeReference in the Dataset API and outer joins can change the nullability of an AttribtueReference. Without the fix, a nullable column's nullable field can be actually set as non-nullable, which cause illegal optimization (e.g., NULL propagation) and wrong answers. See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
Turns projections that contain aggregate expressions into aggregations.
Correctly handle null primitive inputs for UDF by adding extra If expression to do the null check.
Correctly handle null primitive inputs for UDF by adding extra If expression to do the null check. When user defines a UDF with primitive parameters, there is no way to tell if the primitive parameter is null or not, so here we assume the primitive input is null-propagatable and we should return null if the input is null.
A strategy that only runs once.
A strategy that only runs once.
Pulls out nondeterministic expressions from LogicalPlan which is not Project or Filter, put them into an inner Project and finally project them away at the outer Project.
This rule finds aggregate expressions that are not in an aggregate operator.
This rule finds aggregate expressions that are not in an aggregate operator. For example, those in a HAVING clause or ORDER BY clause. These expressions are pushed down to the underlying aggregate operator and then projected away after the original operator.
Replaces UnresolvedAliass with concrete aliases.
Replaces UnresolvedDeserializer with the deserialization expression that has been resolved to the given input attributes.
Replaces UnresolvedFunctions with concrete Expressions.
Rewrites table generating expressions that either need one or more of the following in order to be resolved:
Rewrites table generating expressions that either need one or more of the following in order to be resolved:
Names for the output Attributes are extracted from Alias or MultiAlias expressions that wrap the Generator.
In many dialects of SQL it is valid to sort by attributes that are not present in the SELECT clause.
In many dialects of SQL it is valid to sort by attributes that are not present in the SELECT clause. This rule detects such queries and adds the required attributes to the original projection, so that they will be available during sorting. Another projection is added to remove these attributes after sorting.
The HAVING clause could also used a grouping columns that is not presented in the SELECT.
Removes natural or using joins by calculating output columns based on output from two sides, Then apply a Project on a normal Join to eliminate natural or using join.
Resolves NewInstance by finding and adding the outer scope to it if the object being constructed is an inner class.
In many dialects of SQL it is valid to use ordinal positions in order/sort by and group by clauses.
In many dialects of SQL it is valid to use ordinal positions in order/sort by and group by clauses. This rule is to convert ordinal positions to the corresponding expressions in the select list. This support is introduced in Spark 2.0.
- When the sort references or group by expressions are not integer but foldable expressions, just ignore them. - When spark.sql.orderByOrdinal/spark.sql.groupByOrdinal is set to false, ignore the position numbers too.
Before the release of Spark 2.0, the literals in order/sort by and group by clauses have no effect on the results.
Replaces UnresolvedAttributes with concrete AttributeReferences from a logical plan node's children.
Replaces UnresolvedRelations with concrete relations from the catalog.
This rule resolves and rewrites subqueries inside expressions.
This rule resolves and rewrites subqueries inside expressions.
Note: CTEs are handled in CTESubstitution.
Replace the UpCast expression by Cast, and throw exceptions if the cast may truncate.
Check and add proper window frames for all window functions.
Check and add order to AggregateWindowFunctions.
Substitute child plan with WindowSpecDefinitions.
Defines a sequence of rule batches, to be overridden by the implementation.
Defines a sequence of rule batches, to be overridden by the implementation.
Returns true if expr
can be evaluated using only the output of plan
.
Returns true if expr
can be evaluated using only the output of plan
. This method
can be used to determine when it is acceptable to move expression evaluation within a query
plan.
For example consider a join between two relations R(a, b) and S(c, d).
- canEvaluate(EqualTo(a,b), R)
returns true
- canEvaluate(EqualTo(a,c), R)
returns false
- canEvaluate(Literal(1), R)
returns true
as literals CAN be evaluated on any plan
Returns true iff expr
could be evaluated as a condition within join.
Returns true iff expr
could be evaluated as a condition within join.
Executes the batches of rules defined by the subclass.
Executes the batches of rules defined by the subclass. The batches are executed serially using the defined execution strategy. Within each batch, rules are also executed serially.
Override to provide additional checks for correct analysis.
Override to provide additional checks for correct analysis. These rules will be evaluated after our built-in check rules.
Override to provide additional rules for the "Resolution" batch.
Provides a logical query plan analyzer, which translates UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a SessionCatalog and a FunctionRegistry.