ExchangeCoordinator

A coordinator used to determines how we shuffle data between stages generated by Spark SQL. Right now, the work of this coordinator is to determine the number of post-shuffle partitions for a stage that needs to fetch shuffle data from one or multiple stages.

A coordinator is constructed with three parameters, numExchanges, targetPostShuffleInputSize, and minNumPostShufflePartitions.

numExchanges is used to indicated that how many ShuffleExchanges that will be registered to this coordinator. So, when we start to do any actual work, we have a way to make sure that we have got expected number of ShuffleExchanges.
targetPostShuffleInputSize is the targeted size of a post-shuffle partition's input data size. With this parameter, we can estimate the number of post-shuffle partitions. This parameter is configured through spark.sql.adaptive.shuffle.targetPostShuffleInputSize.
minNumPostShufflePartitions is an optional parameter. If it is defined, this coordinator will try to make sure that there are at least minNumPostShufflePartitions post-shuffle partitions.

The workflow of this coordinator is described as follows:

Before the execution of a SparkPlan, for a ShuffleExchange operator, if an ExchangeCoordinator is assigned to it, it registers itself to this coordinator. This happens in the doPrepare method.
Once we start to execute a physical plan, a ShuffleExchange registered to this coordinator will call postShuffleRDD to get its corresponding post-shuffle ShuffledRowRDD. If this coordinator has made the decision on how to shuffle data, this ShuffleExchange will immediately get its corresponding post-shuffle ShuffledRowRDD.
If this coordinator has not made the decision on how to shuffle data, it will ask those registered ShuffleExchanges to submit their pre-shuffle stages. Then, based on the size statistics of pre-shuffle partitions, this coordinator will determine the number of post-shuffle partitions and pack multiple pre-shuffle partitions with continuous indices to a single post-shuffle partition whenever necessary.
Finally, this coordinator will create post-shuffle ShuffledRowRDDs for all registered ShuffleExchanges. So, when a ShuffleExchange calls postShuffleRDD, this coordinator can lookup the corresponding RDD.

The strategy used to determine the number of post-shuffle partitions is described as follows. To determine the number of post-shuffle partitions, we have a target input size for a post-shuffle partition. Once we have size statistics of pre-shuffle partitions from stages corresponding to the registered ShuffleExchanges, we will do a pass of those statistics and pack pre-shuffle partitions with continuous indices to a single post-shuffle partition until the size of a post-shuffle partition is equal or greater than the target size. For example, we have two stages with the following pre-shuffle partition size statistics: stage 1: [100 MB, 20 MB, 100 MB, 10MB, 30 MB] stage 2: [10 MB, 10 MB, 70 MB, 5 MB, 5 MB] assuming the target input size is 128 MB, we will have three post-shuffle partitions, which are:

post-shuffle partition 0: pre-shuffle partition 0 and 1
post-shuffle partition 1: pre-shuffle partition 2
post-shuffle partition 2: pre-shuffle partition 3 and 4

Linear Supertypes

internal.Logging, AnyRef, Any

Instance Constructors

new ExchangeCoordinator(numExchanges: Int, advisoryTargetPostShuffleInputSize: Long, minNumPostShufflePartitions: Option[Int] = None)

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def estimatePartitionStartIndices(mapOutputStatistics: Array[MapOutputStatistics]): Array[Int]

Estimates partition start indices for post-shuffle partitions based on mapOutputStatistics provided by all pre-shuffle stages.
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean = false): Boolean

Attributes
protected
Definition Classes
Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes
protected
Definition Classes
Logging
def isEstimated: Boolean
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isTraceEnabled(): Boolean

Attributes
protected
Definition Classes
Logging
def log: Logger

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logDebug(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logError(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logInfo(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logName: String

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logTrace(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes
protected
Definition Classes
Logging
def logWarning(msg: ⇒ String): Unit

Attributes
protected
Definition Classes
Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def postShuffleRDD(exchange: ShuffleExchange): ShuffledRowRDD
def registerExchange(exchange: ShuffleExchange): Unit

Registers a ShuffleExchange operator to this coordinator.
Registers a ShuffleExchange operator to this coordinator. This method is only allowed to be called in the doPrepare method of a ShuffleExchange operator.

Annotations
@GuardedBy()
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
ExchangeCoordinator → AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package exchange

class ExchangeCoordinator extends internal.Logging

Instance Constructors

new ExchangeCoordinator(numExchanges: Int, advisoryTargetPostShuffleInputSize: Long, minNumPostShufflePartitions: Option[Int] = None)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def estimatePartitionStartIndices(mapOutputStatistics: Array[MapOutputStatistics]): Array[Int]

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean = false): Boolean

def initializeLogIfNecessary(isInterpreter: Boolean): Unit

def isEstimated: Boolean

final def isInstanceOf[T0]: Boolean

def isTraceEnabled(): Boolean

def log: Logger

def logDebug(msg: ⇒ String, throwable: Throwable): Unit

def logDebug(msg: ⇒ String): Unit

def logError(msg: ⇒ String, throwable: Throwable): Unit

def logError(msg: ⇒ String): Unit

def logInfo(msg: ⇒ String, throwable: Throwable): Unit

def logInfo(msg: ⇒ String): Unit

def logName: String

def logTrace(msg: ⇒ String, throwable: Throwable): Unit

def logTrace(msg: ⇒ String): Unit

def logWarning(msg: ⇒ String, throwable: Throwable): Unit

def logWarning(msg: ⇒ String): Unit

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def postShuffleRDD(exchange: ShuffleExchange): ShuffledRowRDD

def registerExchange(exchange: ShuffleExchange): Unit

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from internal.Logging

Inherited from AnyRef

Inherited from Any

Ungrouped