Object

org.apache.spark.sql.execution.SparkStrategies

JoinSelection

Related Doc: package SparkStrategies

Permalink

object JoinSelection extends Strategy with PredicateHelper

Select the proper physical plan for join based on joining keys and size of logical plan.

At first, uses the ExtractEquiJoinKeys pattern to find joins where at least some of the predicates can be evaluated by matching join keys. If found, Join implementations are chosen with the following precedence:

- Broadcast: if one side of the join has an estimated physical size that is smaller than the user-configurable SQLConf.AUTO_BROADCASTJOIN_THRESHOLD threshold or if that side has an explicit broadcast hint (e.g. the user applied the org.apache.spark.sql.functions.broadcast() function to a DataFrame), then that side of the join will be broadcasted and the other side will be streamed, with no shuffling performed. If both sides of the join are eligible to be broadcasted then the - Shuffle hash join: if the average size of a single partition is small enough to build a hash table. - Sort merge: if the matching join keys are sortable.

If there is no joining keys, Join implementations are chosen with the following precedence: - BroadcastNestedLoopJoin: if one side of the join could be broadcasted - CartesianProduct: for Inner join - BroadcastNestedLoopJoin

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. JoinSelection
  2. PredicateHelper
  3. SparkStrategy
  4. GenericStrategy
  5. Logging
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def apply(plan: LogicalPlan): Seq[SparkPlan]

    Permalink
    Definition Classes
    JoinSelectionGenericStrategy
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def canEvaluate(expr: Expression, plan: LogicalPlan): Boolean

    Permalink

    Returns true if expr can be evaluated using only the output of plan.

    Returns true if expr can be evaluated using only the output of plan. This method can be used to determine when it is acceptable to move expression evaluation within a query plan.

    For example consider a join between two relations R(a, b) and S(c, d).

    - canEvaluate(EqualTo(a,b), R) returns true - canEvaluate(EqualTo(a,c), R) returns false - canEvaluate(Literal(1), R) returns true as literals CAN be evaluated on any plan

    Attributes
    protected
    Definition Classes
    PredicateHelper
  7. def canEvaluateWithinJoin(expr: Expression): Boolean

    Permalink

    Returns true iff expr could be evaluated as a condition within join.

    Returns true iff expr could be evaluated as a condition within join.

    Attributes
    protected
    Definition Classes
    PredicateHelper
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  14. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean = false): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  15. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  16. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  17. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  18. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  19. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  20. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  21. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  22. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  31. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  32. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  33. def planLater(plan: LogicalPlan): SparkPlan

    Permalink

    Returns a placeholder for a physical plan that executes plan.

    Returns a placeholder for a physical plan that executes plan. This placeholder will be filled in automatically by the QueryPlanner using the other execution strategies that are available.

    Attributes
    protected
    Definition Classes
    SparkStrategyGenericStrategy
  34. def replaceAlias(condition: Expression, aliases: AttributeMap[Expression]): Expression

    Permalink
    Attributes
    protected
    Definition Classes
    PredicateHelper
  35. def splitConjunctivePredicates(condition: Expression): Seq[Expression]

    Permalink
    Attributes
    protected
    Definition Classes
    PredicateHelper
  36. def splitDisjunctivePredicates(condition: Expression): Seq[Expression]

    Permalink
    Attributes
    protected
    Definition Classes
    PredicateHelper
  37. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  38. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  39. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  41. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from PredicateHelper

Inherited from SparkStrategy

Inherited from GenericStrategy[SparkPlan]

Inherited from internal.Logging

Inherited from AnyRef

Inherited from Any

Ungrouped