A BroadcastExchangeExec collects, transforms and finally broadcasts the result of a transformed SparkPlan.
Ensures that the Partitioning of input data meets the Distribution requirements for each operator by inserting ShuffleExchange Operators where required.
Base class for operators that exchange data among multiple threads or processes.
Base class for operators that exchange data among multiple threads or processes.
Exchanges are the key class of operators that enable parallelism. Although the implementation differs significantly, the concept is similar to the exchange operator described in "Volcano -- An Extensible and Parallel Query Evaluation System" by Goetz Graefe.
A coordinator used to determines how we shuffle data between stages generated by Spark SQL.
A coordinator used to determines how we shuffle data between stages generated by Spark SQL. Right now, the work of this coordinator is to determine the number of post-shuffle partitions for a stage that needs to fetch shuffle data from one or multiple stages.
A coordinator is constructed with three parameters, numExchanges
,
targetPostShuffleInputSize
, and minNumPostShufflePartitions
.
numExchanges
is used to indicated that how many ShuffleExchanges that will be registered
to this coordinator. So, when we start to do any actual work, we have a way to make sure that
we have got expected number of ShuffleExchanges.targetPostShuffleInputSize
is the targeted size of a post-shuffle partition's
input data size. With this parameter, we can estimate the number of post-shuffle partitions.
This parameter is configured through
spark.sql.adaptive.shuffle.targetPostShuffleInputSize
.minNumPostShufflePartitions
is an optional parameter. If it is defined, this coordinator
will try to make sure that there are at least minNumPostShufflePartitions
post-shuffle
partitions.The workflow of this coordinator is described as follows:
doPrepare
method.postShuffleRDD
to get its corresponding post-shuffle
ShuffledRowRDD.
If this coordinator has made the decision on how to shuffle data, this ShuffleExchange
will immediately get its corresponding post-shuffle ShuffledRowRDD.postShuffleRDD
, this coordinator
can lookup the corresponding RDD.The strategy used to determine the number of post-shuffle partitions is described as follows. To determine the number of post-shuffle partitions, we have a target input size for a post-shuffle partition. Once we have size statistics of pre-shuffle partitions from stages corresponding to the registered ShuffleExchanges, we will do a pass of those statistics and pack pre-shuffle partitions with continuous indices to a single post-shuffle partition until the size of a post-shuffle partition is equal or greater than the target size. For example, we have two stages with the following pre-shuffle partition size statistics: stage 1: [100 MB, 20 MB, 100 MB, 10MB, 30 MB] stage 2: [10 MB, 10 MB, 70 MB, 5 MB, 5 MB] assuming the target input size is 128 MB, we will have three post-shuffle partitions, which are:
Find out duplicated exchanges in the spark plan, then use the same exchange for all the references.
A wrapper for reused exchange to have different output, because two exchanges which produce logically identical output will have distinct sets of output attribute ids, so we need to preserve the original ids because they're what downstream operators are expecting.
Performs a shuffle that will result in the desired newPartitioning
.
Ensures that the Partitioning of input data meets the Distribution requirements for each operator by inserting ShuffleExchange Operators where required. Also ensure that the input partition ordering requirements are met.