org.apache.spark.sql.catalyst.plans.physical
Returns true iff we can say that the partitioning scheme of this Partitioning
guarantees the same partitioning scheme described by other
.
Returns true iff we can say that the partitioning scheme of this Partitioning
guarantees the same partitioning scheme described by other
.
Compatibility of partitionings is only checked for operators that have multiple children and that require a specific child output Distribution, such as joins.
Intuitively, partitionings are compatible if they route the same partitioning key to the same partition. For instance, two hash partitionings are only compatible if they produce the same number of output partitionings and hash records according to the same hash function and same partitioning key schema.
Put another way, two partitionings are compatible with each other if they satisfy all of the same distribution guarantees.
Returns the number of partitions that the data is split across
Returns true iff the guarantees made by this Partitioning are sufficient
to satisfy the partitioning scheme mandated by the required
Distribution,
i.e.
Returns true iff the guarantees made by this Partitioning are sufficient
to satisfy the partitioning scheme mandated by the required
Distribution,
i.e. the current dataset does not need to be re-partitioned for the required
Distribution (it is possible that tuples within a partition need to be reorganized).
Returns true iff we can say that the partitioning scheme of this Partitioning guarantees
the same partitioning scheme described by other
.
Returns true iff we can say that the partitioning scheme of this Partitioning guarantees
the same partitioning scheme described by other
. If a A.guarantees(B)
, then repartitioning
the child's output according to B
will be unnecessary. guarantees
is used as a performance
optimization to allow the exchange planner to avoid redundant repartitionings. By default,
a partitioning only guarantees partitionings that are equal to itself (i.e. the same number
of partitions, same strategy (range or hash), etc).
In order to enable more aggressive optimization, this strict equality check can be relaxed.
For example, say that the planner needs to repartition all of an operator's children so that
they satisfy the AllTuples distribution. One way to do this is to repartition all children
to have the SinglePartition partitioning. If one of the operator's children already happens
to be hash-partitioned with a single partition then we do not need to re-shuffle this child;
this repartitioning can be avoided if a single-partition HashPartitioning guarantees
SinglePartition.
The SinglePartition example given above is not particularly interesting; guarantees' real
value occurs for more advanced partitioning strategies. SPARK-7871 will introduce a notion
of null-safe partitionings, under which partitionings can specify whether rows whose
partitioning keys contain null values will be grouped into the same partition or whether they
will have an unknown / random distribution. If a partitioning does not require nulls to be
clustered then a partitioning which _does_ cluster nulls will guarantee the null clustered
partitioning. The converse is not true, however: a partitioning which clusters nulls cannot
be guaranteed by one which does not cluster them. Thus, in general guarantees
is not a
symmetric relation.
Another way to think about guarantees
: if A.guarantees(B)
, then any partitioning of rows
produced by A
could have also been produced by B
.
Describes how an operator's output is split across partitions. The
compatibleWith
,guarantees
, andsatisfies
methods describe relationships between child partitionings, target partitionings, and Distributions. These relations are described more precisely in their individual method docs, but at a high level:satisfies
is a relationship between partitionings and distributions.compatibleWith
is relationships between an operator's child output partitionings.guarantees
is a relationship between a child's existing output partitioning and a target output partitioning.Diagrammatically:
+--------------+ | Distribution | +--------------+ | satisfies | +--------------+ +--------------+ | Child | | Target | +----| Partitioning |----guarantees--->| Partitioning | | +--------------+ +--------------+ | | | | compatibleWith | | +------------+