All the attributes that are used for this plan.
All the attributes that are used for this plan.
Returns the tree node at the specified number, used primarily for interactive debugging.
Returns the tree node at the specified number, used primarily for interactive debugging. Numbers for each node can be found in the numberedTreeString.
Note that this cannot return BaseType because logical plan's plan node might return physical plan for innerChildren, e.g. in-memory relation logical plan node has a reference to the physical plan node it is referencing.
Returns a string representing the arguments to this node, minus any children
Returns a string representing the arguments to this node, minus any children
Returns a 'scala code' representation of this TreeNode
and its children.
Returns a 'scala code' representation of this TreeNode
and its children. Intended for use
when debugging where the prettier toString function is obfuscating the actual structure. In the
case of 'pure' TreeNodes
that only contain primitives and other TreeNodes, the result can be
pasted in the REPL to build an equivalent Tree.
Canonicalized copy of this query plan.
Canonicalized copy of this query plan.
Returns a Seq of the children of this node.
Returns a Seq of the children of this node. Children should not change. Immutability required for containsChild optimization
Args that have cleaned such that differences in expression id should not affect equality
Args that have cleaned such that differences in expression id should not affect equality
Eagerly clear any broadcasts created by this plan execution.
Eagerly clear any broadcasts created by this plan execution.
Returns a Seq containing the result of applying a partial function to all elements in this tree on which the function is defined.
Returns a Seq containing the result of applying a partial function to all elements in this tree on which the function is defined.
Finds and returns the first TreeNode of the tree for which the given partial function is defined (pre-order), and applies the partial function to it.
Returns a Seq containing the leaves in this tree.
Returns a Seq containing the leaves in this tree.
An ExpressionSet that contains invariants about the rows output by this operator.
An ExpressionSet that contains invariants about the rows output by this operator. For
example, if this set contains the expression a = 2
then that expression is guaranteed to
evaluate to true
for all rows produced.
Consume the generated columns or row from current SparkPlan, call its parent's doConsume()
.
Consume the generated columns or row from current SparkPlan, call its parent's doConsume()
.
Generate the Java source code to process the rows from child SparkPlan.
Generate the Java source code to process the rows from child SparkPlan.
This should be override by subclass to support codegen.
For example, Filter will generate the code like this:
# code to evaluate the predicate expression, result is isNull1 and value2 if (isNull1 || !value2) continue; # call consume(), which will call parent.doConsume()
Note: A plan can either consume the rows as UnsafeRow (row), or a list of variables (input).
Overridden by concrete implementations of SparkPlan.
Overridden by concrete implementations of SparkPlan. Produces the result of the query as an RDD[InternalRow]
Overridden by concrete implementations of SparkPlan.
Overridden by concrete implementations of SparkPlan. Produces the result of the query as a broadcast variable.
Overridden by concrete implementations of SparkPlan.
Overridden by concrete implementations of SparkPlan. It is guaranteed to run before any
execute
of SparkPlan. This is helpful if we want to set up some state before executing the
query, e.g., BroadcastHashJoin
uses it to broadcast asynchronously.
Note: the prepare method has already walked down the tree, so the implementation doesn't need to call children's prepare methods.
This will only be called once, protected by this
.
Generate the Java source code to process, should be overridden by subclass to support codegen.
Generate the Java source code to process, should be overridden by subclass to support codegen.
doProduce() usually generate the framework, for example, aggregation could generate this:
if (!initialized) { # create a hash map, then build the aggregation hash map # call child.produce() initialized = true; } while (hashmap.hasNext()) { row = hashmap.next(); # build the aggregation results # create variables for results # call consume(), which will call parent.doConsume() if (shouldStop()) return; }
Returns source code to evaluate the variables for required attributes, and clear the code of evaluated variables, to prevent them to be evaluated twice.
Returns source code to evaluate the variables for required attributes, and clear the code of evaluated variables, to prevent them to be evaluated twice.
Returns source code to evaluate all the variables, and clear the code of them, to prevent them to be evaluated twice.
Returns source code to evaluate all the variables, and clear the code of them, to prevent them to be evaluated twice.
Returns the result of this query as an RDD[InternalRow] by delegating to doExecute
after
preparations.
Returns the result of this query as an RDD[InternalRow] by delegating to doExecute
after
preparations.
Concrete implementations of SparkPlan should override doExecute
.
Returns the result of this query as a broadcast variable by delegating to doExecuteBroadcast
after preparations.
Returns the result of this query as a broadcast variable by delegating to doExecuteBroadcast
after preparations.
Concrete implementations of SparkPlan should override doExecuteBroadcast
.
Runs this query returning the result as an array.
Runs this query returning the result as an array.
Runs this query returning the result as an array, using external Row format.
Runs this query returning the result as an array, using external Row format.
Execute a query after preparing the query and adding query plan information to created RDDs for visualization.
Execute a query after preparing the query and adding query plan information to created RDDs for visualization.
Runs this query returning the first n
rows as an array.
Runs this query returning the first n
rows as an array.
This is modeled after RDD.take but never runs any job locally on the driver.
Runs this query returning the result as an iterator of InternalRow.
Runs this query returning the result as an iterator of InternalRow.
Note: this will trigger multiple jobs (one for each partition).
Returns all of the expressions present in this query plan operator.
Returns all of the expressions present in this query plan operator.
Faster version of equality which short-circuits when two treeNodes are the same instance.
Faster version of equality which short-circuits when two treeNodes are the same instance.
We don't just override Object.equals, as doing so prevents the scala compiler from
generating case class equals
methods
Find the first TreeNode that satisfies the condition specified by f
.
Returns a Seq by applying a function to all nodes in this tree and using the elements of the resulting collections.
Returns a Seq by applying a function to all nodes in this tree and using the elements of the resulting collections.
Runs the given function on this node and then recursively on children.
Runs the given function recursively on children then on this node.
Appends the string represent of this node and its children to the given StringBuilder.
Appends the string represent of this node and its children to the given StringBuilder.
The i
-th element in lastChildren
indicates whether the ancestor of the current node at
depth i + 1
is the last child of its own parent node. The depth of the root node is 0, and
lastChildren
for the root node should be empty.
Note that this traversal (numbering) order must be the same as getNodeNumbered.
Extracts the relevant constraints from a given set of constraints based on the attributes that appear in the outputSet.
All the nodes that should be shown as a inner nested tree of this node.
Returns all the RDDs of InternalRow which generates the input rows.
Returns all the RDDs of InternalRow which generates the input rows.
Note: right now we support up to two RDDs.
The set of all attributes that are input to this operator by its children.
The set of all attributes that are input to this operator by its children.
Return a LongSQLMetric according to the name.
Return a LongSQLMetric according to the name.
Overridden make copy also propagates sqlContext to copied plan.
Returns a Seq containing the result of applying the given function to each node in this tree in a preorder traversal.
Returns a Seq containing the result of applying the given function to each node in this tree in a preorder traversal.
the function to be applied.
Returns a copy of this node where f
has been applied to all the nodes children.
Returns a copy of this node where f
has been applied to all the nodes children.
Apply a map function to each expression present in this query operator, and return a new query operator based on the mapped expressions.
Apply a map function to each expression present in this query operator, and return a new query operator based on the mapped expressions.
Efficient alternative to productIterator.map(f).toArray
.
Efficient alternative to productIterator.map(f).toArray
.
Return all metadata that describes more details of this SparkPlan.
Return all metadata that describes more details of this SparkPlan.
Creates a metric using the specified name.
Creates a metric using the specified name.
name of the variable representing the metric
Return all metrics containing metrics of this SparkPlan.
Return all metrics containing metrics of this SparkPlan.
Attributes that are referenced by expressions but not provided by this nodes children.
Attributes that are referenced by expressions but not provided by this nodes children. Subclasses should override this method if they produce attributes internally as it is used by assertions designed to prevent the construction of invalid plans.
Creates a row ordering for the given schema, in natural ascending order.
Creates a row ordering for the given schema, in natural ascending order.
Returns the name of this type of TreeNode.
Returns the name of this type of TreeNode. Defaults to the class name. Note that we remove the "Exec" suffix for physical operators here.
Returns a string representation of the nodes in this tree, where each operator is numbered.
Returns a string representation of the nodes in this tree, where each operator is numbered. The numbers can be used with TreeNode.apply to easily access specific subtrees.
The numbers are based on depth-first traversal of the tree (with innerChildren traversed first before children).
Args to the constructor that should be copied, but not transformed.
Args to the constructor that should be copied, but not transformed. These are appended to the transformed args automatically by makeCopy
Specifies how data is ordered in each partition.
Specifies how data is ordered in each partition.
Specifies how data is partitioned for the table.
Returns the set of attributes that are output by this node.
Returns the set of attributes that are output by this node.
Returns the tree node at the specified number, used primarily for interactive debugging.
Returns the tree node at the specified number, used primarily for interactive debugging. Numbers for each node can be found in the numberedTreeString.
This is a variant of apply that returns the node as BaseType (if the type matches).
Which SparkPlan is calling produce() of this one.
Which SparkPlan is calling produce() of this one. It's itself for the first SparkPlan.
Prepare a SparkPlan for execution.
Prepare a SparkPlan for execution. It's idempotent.
Finds scalar subquery expressions in this plan node and starts evaluating them.
Finds scalar subquery expressions in this plan node and starts evaluating them.
Prints out the schema in the tree format
Prints out the schema in the tree format
Returns Java source code to process the rows from input RDD.
Returns Java source code to process the rows from input RDD.
Returns Java source code to process the rows from input RDD that will work on executors too (assuming no sub-query processing required).
Returns Java source code to process the rows from input RDD that will work on executors too (assuming no sub-query processing required).
The set of all attributes that are produced by this node.
The set of all attributes that are produced by this node.
All Attributes that appear in expressions from this operator.
All Attributes that appear in expressions from this operator. Note that this set does not include attributes that are implicitly referenced by being passed through to the output tuple.
Specifies the partition requirements on the child.
Specifies the partition requirements on the child.
Specifies sort order for each partition requirements on the input data for this operator.
Specifies sort order for each partition requirements on the input data for this operator.
Reset all the metrics.
Reset all the metrics.
Returns true when the given query plan will return the same results as this query plan.
Returns true when the given query plan will return the same results as this query plan.
Since its likely undecidable to generally determine if two given plans will produce the same results, it is okay for this function to return false, even if the results are actually the same. Such behavior will not affect correctness, only the application of performance enhancements like caching. However, it is not acceptable to return true if the results could possibly be different.
By default this function performs a modified version of equality that is tolerant of cosmetic differences like attribute naming and or expression id differences. Operators that can do better should override this function.
Returns the output schema in the tree format.
Returns the output schema in the tree format.
ONE line description of this node.
ONE line description of this node.
A handle to the SQL Context that was used to create this plan.
A handle to the SQL Context that was used to create this plan. Since many operators need access to the sqlContext for RDD operations or configuration this field is automatically populated by the query planning infrastructure.
A prefix string used when printing the plan.
A prefix string used when printing the plan.
We use "!" to indicate an invalid plan, and "'" to indicate an unresolved plan.
The arguments that should be included in the arg string.
The arguments that should be included in the arg string. Defaults to the productIterator
.
All the subqueries of current plan.
All the subqueries of current plan.
Whether this SparkPlan support whole stage codegen or not.
Whether this SparkPlan support whole stage codegen or not.
Returns a copy of this node where rule
has been recursively applied to the tree.
Returns a copy of this node where rule
has been recursively applied to the tree.
When rule
does not apply to a given node it is left unchanged.
Users should not expect a specific directionality. If a specific directionality is needed,
transformDown or transformUp should be used.
the function use to transform this nodes children
Returns the result of running transformExpressions on this node and all its children.
Returns the result of running transformExpressions on this node and all its children.
Returns a copy of this node where rule
has been recursively applied to it and all of its
children (pre-order).
Returns a copy of this node where rule
has been recursively applied to it and all of its
children (pre-order). When rule
does not apply to a given node it is left unchanged.
the function used to transform this nodes children
Runs transform with rule
on all expressions present in this query operator.
Runs transform with rule
on all expressions present in this query operator.
Users should not expect a specific directionality. If a specific directionality is needed,
transformExpressionsDown or transformExpressionsUp should be used.
the rule to be applied to every expression in this operator.
Runs transformDown with rule
on all expressions present in this query operator.
Runs transformDown with rule
on all expressions present in this query operator.
the rule to be applied to every expression in this operator.
Runs transformUp with rule
on all expressions present in this query operator.
Runs transformUp with rule
on all expressions present in this query operator.
the rule to be applied to every expression in this operator.
Returns a copy of this node where rule
has been recursively applied first to all of its
children and then itself (post-order).
Returns a copy of this node where rule
has been recursively applied first to all of its
children and then itself (post-order). When rule
does not apply to a given node, it is left
unchanged.
the function use to transform this nodes children
Returns a string representation of the nodes in this tree
Returns a string representation of the nodes in this tree
The subset of inputSet those should be evaluated before this plan.
The subset of inputSet those should be evaluated before this plan.
We will use this to insert some code to access those columns that are actually used by current plan before calling doConsume().
This method can be overridden by any child class of QueryPlan to specify a set of constraints based on the given operator's constraint propagation logic.
This method can be overridden by any child class of QueryPlan to specify a set of constraints based on the given operator's constraint propagation logic. These constraints are then canonicalized and filtered automatically to contain only those attributes that appear in the outputSet.
See Canonicalize for more details.
ONE line description of this node with more information
Blocks the thread until all subqueries finish evaluation and update the results.
Blocks the thread until all subqueries finish evaluation and update the results.
Returns a copy of this node with the children replaced.
Returns a copy of this node with the children replaced. TODO: Validate somewhere (in debug mode?) that children are ordered correctly.
Generated code plan for updates into a column table. This extends RowExec to generate the combined code for row buffer updates.