org.apache.spark.sql.catalyst.plans.logical
number of distinct values
minimum value
maximum value
number of nulls
average length of the values. For fixed-length types, this should be a constant.
maximum length of the values. For fixed-length types, this should be a constant.
average length of the values.
average length of the values. For fixed-length types, this should be a constant.
number of distinct values
maximum value
maximum length of the values.
maximum length of the values. For fixed-length types, this should be a constant.
minimum value
number of nulls
Returns a map from string to string that can be used to serialize the column stats.
Returns a map from string to string that can be used to serialize the column stats. The key is the name of the field (e.g. "distinctCount" or "min"), and the value is the string representation for the value. The deserialization side is defined in ColumnStat.fromMap.
As part of the protocol, the returned map always contains a key called "version". In the case min/max values are null (None), they won't appear in the map.
Statistics collected for a column.
1. Supported data types are defined in
ColumnStat.supportsType
. 2. The JVM data type stored in min/max is the external data type (used in Row) for the corresponding Catalyst data type. For example, for DateType we store java.sql.Date, and for TimestampType we store java.sql.Timestamp. 3. For integral types, they are all upcasted to longs, i.e. shorts are stored as longs. 4. There is no guarantee that the statistics collected are accurate. Approximation algorithms (sketches) might have been used, and the data collected can also be stale.number of distinct values
minimum value
maximum value
number of nulls
average length of the values. For fixed-length types, this should be a constant.
maximum length of the values. For fixed-length types, this should be a constant.