Provides a logical query plan analyzer, which translates UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a SessionCatalog and a FunctionRegistry.
Throws user facing errors when passed invalid queries that fail to analyze.
Thrown by a catalog when an item already exists.
Thrown by a catalog when an item already exists. The analyzer will rethrow the exception as an org.apache.spark.sql.AnalysisException with the correct position information.
A catalog for looking up user defined functions, used by an Analyzer.
A catalog for looking up user defined functions, used by an Analyzer.
Note: The implementation should be thread-safe to allow concurrent access.
Used to assign new names to Generator's output, such as hive udtf.
Used to assign new names to Generator's output, such as hive udtf. For example the SQL expression "stack(2, key, value, key, value) as (a, b)" could be represented as follows: MultiAlias(stack_function, Seq(a, b))
the computation being performed
the names to be associated with each output of computing child.
A trait that should be mixed into query operators where a single instance might appear multiple times in a logical query plan.
A trait that should be mixed into query operators where a single instance might appear multiple times in a logical query plan. It is invalid to have multiple copies of the same attribute produced by distinct operators in a query tree as this breaks the guarantee that expression ids, which are used to differentiate attributes, are unique.
During analysis, operators that include this trait may be asked to produce a new version of itself with globally unique expression ids.
Thrown by a catalog when an item cannot be found.
Thrown by a catalog when an item cannot be found. The analyzer will rethrow the exception as an org.apache.spark.sql.AnalysisException with the correct position information.
Represents all the resolved input attributes to a given relational operator.
Represents all the resolved input attributes to a given relational operator. This is used in the data frame DSL.
Expressions to expand.
Resolver should return true if the first string refers to the same entity as the second string.
Resolver should return true if the first string refers to the same entity as the second string. For example, by using case insensitive equality.
Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".
Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...". A Star gets automatically expanded during analysis.
Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression.
Represents the result of Expression.checkInputDataTypes
.
Represents the result of Expression.checkInputDataTypes
.
We will throw AnalysisException
in CheckAnalysis
if isFailure
is true.
Holds the expression that has yet to be aliased.
Holds the expression that has yet to be aliased.
The computation that is needs to be resolved during analysis.
The function if specified to be called to generate an alias to associate with the result of computing child
Holds the name of an attribute that has yet to be resolved.
Holds the deserializer expression and the attributes that are available during the resolution for it.
Holds the deserializer expression and the attributes that are available during the resolution
for it. Deserializer expression is a special kind of expression that is not always resolved by
children output, but by given attributes, e.g. the keyDeserializer
in MapGroups
should be
resolved by groupingAttributes
instead of children output.
The unresolved deserializer expression
The input attributes used to resolve deserializer expression, can be empty if we want to resolve deserializer by children output.
Thrown when an invalid attempt is made to access a property of a tree that has yet to be fully resolved.
Extracts a value or values from an Expression
Extracts a value or values from an Expression
The expression to extract value from, can be Map, Array, Struct or array of Structs.
The expression to describe the extraction, can be key of Map, index of Array, field name of Struct.
Represents an unresolved generator, which will be created by the parser for the org.apache.spark.sql.catalyst.plans.logical.Generate operator.
Represents an unresolved generator, which will be created by the parser for the org.apache.spark.sql.catalyst.plans.logical.Generate operator. The analyzer will resolve this generator.
An inline table that has not been resolved yet.
An inline table that has not been resolved yet. Once resolved, it is turned by the analyzer into a org.apache.spark.sql.catalyst.plans.logical.LocalRelation.
list of column names
expressions for the data
Represents unresolved ordinal used in order by or group by.
Represents unresolved ordinal used in order by or group by.
For example:
select a from table order by 1 select a from table group by 1
ordinal starts from 1, instead of 0
Holds the name of a relation that has yet to be looked up in a catalog.
Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".
Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".
This is also used to expand structs. For example: "SELECT record.* from (SELECT struct(a,b,c) as record ...)
an optional name that should be the target of the expansion. If omitted all targets' columns are produced. This can either be a table name or struct name. This is a list of identifiers that is the path of the expansion.
A table-valued function, e.g.
A table-valued function, e.g.
select * from range(10);
Cleans up unnecessary Aliases inside the plan.
Cleans up unnecessary Aliases inside the plan. Basically we only need Alias as a top level
expression in Project(project list) or Aggregate(aggregate expressions) or
Window(window expressions). Notice that if an expression has other expression parameters which
are not in its children
, e.g. RuntimeReplaceable
, the transformation for Aliases in this
rule can't work for those parameters.
Calculates and propagates precision for fixed-precision decimals.
Calculates and propagates precision for fixed-precision decimals. Hive has a number of rules for this based on the SQL standard and MS SQL: https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf https://msdn.microsoft.com/en-us/library/ms190476.aspx
In particular, if we have expressions e1 and e2 with precision/scale p1/s2 and p2/s2 respectively, then the following operations have the following precision / scale:
Operation Result Precision Result Scale ------------------------------------------------------------------------ e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 * e2 p1 + p2 + 1 s1 + s2 e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1) e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2) e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2) sum(e1) p1 + 10 s1 avg(e1) p1 + 4 s1 + 4
To implement the rules for fixed-precision types, we introduce casts to turn them to unlimited precision, do the math on unlimited-precision numbers, then introduce casts back to the required fixed precision. This allows us to do all rounding and overflow handling in the cast-to-fixed-precision operator.
In addition, when mixing non-decimal types with decimals, we use the following rules: - BYTE gets turned into DECIMAL(3, 0) - SHORT gets turned into DECIMAL(5, 0) - INT gets turned into DECIMAL(10, 0) - LONG gets turned into DECIMAL(20, 0) - FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE
Removes SubqueryAlias operators from the plan.
Removes SubqueryAlias operators from the plan. Subqueries are only required to provide scoping information for attributes and can be removed once analysis is complete.
Removes Union operators from the plan if it just has one child.
A trivial catalog that returns an error when a function is requested.
A trivial catalog that returns an error when a function is requested. Used for testing when all functions are already filled in and the analyzer needs only to resolve attribute references.
Resolve a CreateNamedStruct if it contains NamePlaceholders.
An analyzer rule that replaces UnresolvedInlineTable with LocalRelation.
Rule that resolves table-valued function references.
A trivial Analyzer with a dummy SessionCatalog and EmptyFunctionRegistry.
A trivial Analyzer with a dummy SessionCatalog and EmptyFunctionRegistry. Used for testing when all relations are already filled in and the analyzer needs only to resolve attribute references.
Maps a time column to multiple time windows using the Expand operator.
Maps a time column to multiple time windows using the Expand operator. Since it's non-trivial to figure out how many windows a time column can map to, we over-estimate the number of windows and filter out the rows where the time column is not inside the time window.
A collection of Rule that can be used to coerce differing types that participate in operations into compatible ones.
A collection of Rule that can be used to coerce differing types that participate in operations into compatible ones.
Notes about type widening / tightest common types: Broadly, there are two cases when we need to widen data types (e.g. union, binary comparison). In case 1, we are looking for a common data type for two or more data types, and in this case no loss of precision is allowed. Examples include type inference in JSON (e.g. what's the column's data type if one row is an integer while the other row is a long?). In case 2, we are looking for a widened data type with some acceptable loss of precision (e.g. there is no common type for double and decimal because double's range is larger than decimal, and yet decimal is more precise than double, but in union we would cast the decimal into double).
Analyzes the presence of unsupported operations in a logical plan.
Catches any AnalysisExceptions thrown by f
and attaches t
's position if any.
Provides a logical query plan Analyzer and supporting classes for performing analysis. Analysis consists of translating UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a schema Catalog.