encoding

Type Members

final class BigDictionaryDecoder extends BigDictionaryDecoderBase with NotNullDecoder
abstract class BigDictionaryDecoderBase extends DictionaryDecoderBase
final class BigDictionaryDecoderNullable extends BigDictionaryDecoderBase with NullableDecoder
final class BooleanBitSetDecoder extends BooleanBitSetDecoderBase with NotNullDecoder
abstract class BooleanBitSetDecoderBase extends ColumnDecoder with BooleanBitSetEncoding
final class BooleanBitSetDecoderNullable extends BooleanBitSetDecoderBase with NullableDecoder
final class BooleanBitSetEncoder extends NotNullEncoder with BooleanBitSetEncoderBase
trait BooleanBitSetEncoderBase extends ColumnEncoder with BooleanBitSetEncoding
final class BooleanBitSetEncoderNullable extends NullableEncoder with BooleanBitSetEncoderBase
trait BooleanBitSetEncoding extends ColumnEncoding
trait BooleanRunLengthDecoder extends RunLengthDecoding

Run length encoding optimized for booleans.
Run length encoding optimized for booleans. Each short run-length value represents the run-length with the value. Even numbered run-lengths indicate a run of false values having half the length, while odd numbered run-lengths are for true values (having length = run / 2 + 1).
trait BooleanRunLengthEncoder extends AnyRef
abstract class ColumnDecoder extends ColumnEncoding
final class ColumnDeleteDecoder extends AnyRef

Decodes the deleted positions of a batch that has seen some deletes.
final class ColumnDeleteDelta extends ColumnFormatValue with Delta

Simple delta that merges the deleted positions
final class ColumnDeleteEncoder extends ColumnEncoder

Currently just stores the deleted positions assuming sorted by position at plan level.
Currently just stores the deleted positions assuming sorted by position at plan level. This can be optimized to use a more efficient storage when number of positions is large large like a boolean bitset, or use a more comprehensive compression scheme like PFOR (https://github.com/lemire/JavaFastPFOR).
final class ColumnDeltaDecoder extends AnyRef

Internal class to decode values from a single delta as obtained from ColumnDeltaEncoder.
Internal class to decode values from a single delta as obtained from ColumnDeltaEncoder. Should not be used directly rather the combined decoder UpdatedColumnDecoder should be the one used.
final class ColumnDeltaEncoder extends ColumnEncoder

Encodes a delta value for a ColumnFormatValue obtained after an update operation that can change one or more values.
Encodes a delta value for a ColumnFormatValue obtained after an update operation that can change one or more values. This applies the update in an optimized batch manner as far as possible.
The format of delta encoding is straightforward and adds the positions in the full column in addition to the normal column encoding. So the layout looks like below:
```
 .----------------------- Base encoding scheme (4 bytes)
|    .------------------- Null bitset size as number of longs N (4 bytes)
|   |
|   |   .---------------- Null bitset longs (8 x N bytes,
|   |   |                                    empty if null count is zero)
|   |   |  .------------- Number of base column rows
|   |   |  |  .---------- Number of positions of delta updates
|   |   |  |  |  .------- Positions in full column value
|   |   |  |  |  |   .--- Encoded non-null elements
V   V   V  V  V  V   V
+---+---+--+--+--+---+--------------+
|   |   |  |  |  |   |   ...  ...   |
+---+---+--+----+-------------------+
 \-----/ \--------------------------/
  header           body
```
Whether the value type is a delta or not is determined by the "deltaHierarchy" field in ColumnFormatValue and the negative columnIndex in ColumnFormatKey. Encoding typeId itself does not store anything for it separately.
An alternative could be storing the position before each encoded element but it will not work properly for schemes like run-length encoding that will not write anything if elements are in that current run-length.
A set of new updated column values results in the merge of those values with the existing encoded values held in the current delta with smallest hierarchy depth (i.e. one that has a maximum size of 100). Each delta can grow to a limit after which it is subsumed in a larger delta of bigger size thus creating a hierarchy of deltas. So base delta will go till 100 entries or so, then the next higher level one will go till say 1000 entries and so on till the full ColumnFormatValue size is attained. This design attempts to minimize disk writes at the cost of some scan overhead for columns that see a large number of updates. The hierarchy is expected to be small not more than 3-4 levels to get a good balance between write overhead and scan overhead.
trait ColumnEncoder extends ColumnEncoding

trait ColumnEncoding extends AnyRef

Base class for encoding and decoding in columnar form.

Base class for encoding and decoding in columnar form. Memory layout of the bytes for a set of column values is:

 .----------------------- Encoding scheme (4 bytes)
|    .------------------- Null bitset size as number of longs N (4 bytes)
|   |
|   |   .---------------- Null bitset longs (8 x N bytes,
|   |   |                                    empty if null count is zero)
|   |   |     .---------- Encoded non-null elements
V   V   V     V
+---+---+-----+---------+
|   |   | ... | ... ... |
+---+---+-----+---------+
 \-----/ \-------------/
  header      body

case class ColumnStatsSchema(fieldName: String, dataType: DataType) extends Product with Serializable

Full stats row has "nullCount" as non-nullable while delta stats row has it as nullable.
abstract class DeltaWriter extends AnyRef

Trait to read column values from delta encoded column and write to target delta column.
Trait to read column values from delta encoded column and write to target delta column. The reads may not be sequential and could be random-access reads while writes will be sequential, sorted by position in the full column value. Implementations should not do any null value handling.
This uses a separate base class rather than a closure to avoid the overhead of boxing/unboxing with multi-argument closures (>2 arguments).
trait DeltaWriterFactory extends AnyRef

Factory for DeltaWriter used by code generation to enable using the simpler Janino "createFastEvaluator" API.
final class DictionaryDecoder extends DictionaryDecoderBase with NotNullDecoder
abstract class DictionaryDecoderBase extends ColumnDecoder with DictionaryEncoding
final class DictionaryDecoderNullable extends DictionaryDecoderBase with NullableDecoder
final class DictionaryEncoder extends NotNullEncoder with DictionaryEncoderBase
trait DictionaryEncoderBase extends ColumnEncoder with DictionaryEncoding
final class DictionaryEncoderNullable extends NullableEncoder with DictionaryEncoderBase
trait DictionaryEncoding extends ColumnEncoding
trait NotNullDecoder extends ColumnDecoder
trait NotNullEncoder extends ColumnEncoder
trait NullableDecoder extends ColumnDecoder

Nulls are stored either as a bitset or as a sequence of positions (or inverse sequence of missing positions) if the number of nulls is small (or non-nulls is small).
Nulls are stored either as a bitset or as a sequence of positions (or inverse sequence of missing positions) if the number of nulls is small (or non-nulls is small). This is indicated by the count field which is negative for the latter case. The decoder for latter keeps track of next null position and compares against that.
trait NullableEncoder extends NotNullEncoder
final class RunLengthDecoder extends RunLengthDecoderBase with NotNullDecoder
abstract class RunLengthDecoderBase extends ColumnDecoder with RunLengthEncoding
final class RunLengthDecoderNullable extends RunLengthDecoderBase with NullableDecoder
trait RunLengthDecoding extends AnyRef
trait RunLengthEncoding extends ColumnEncoding
final class StringDictionary extends AnyRef
trait Uncompressed extends ColumnEncoding
final class UncompressedDecoder extends UncompressedDecoderBase with NotNullDecoder
abstract class UncompressedDecoderBase extends ColumnDecoder with Uncompressed
final class UncompressedDecoderNullable extends UncompressedDecoderBase with NullableDecoder
final class UncompressedEncoder extends NotNullEncoder with UncompressedEncoderBase
trait UncompressedEncoderBase extends ColumnEncoder with Uncompressed
final class UncompressedEncoderNullable extends NullableEncoder with UncompressedEncoderBase
final class UpdatedColumnDecoder extends UpdatedColumnDecoderBase

Decodes a column of a batch that has seen some updates by combining all delta values, and full column value obtained from ColumnDeltaEncoders and column encoders.
Decodes a column of a batch that has seen some updates by combining all delta values, and full column value obtained from ColumnDeltaEncoders and column encoders. Callers should provide this with the set of all deltas for the column apart from the full column value.
To create an instance, use the companion class apply method which will create a nullable or non-nullable version as appropriate.
abstract class UpdatedColumnDecoderBase extends AnyRef
final class UpdatedColumnDecoderNullable extends UpdatedColumnDecoderBase

Nullable version of UpdatedColumnDecoder.

Value Members

object BitSet

Static methods for working with fixed-size bitsets stored elsewhere in a byte array or long array.
Static methods for working with fixed-size bitsets stored elsewhere in a byte array or long array. Similar to Spark's BitSetMethods but respects platform endian-ness so is suitable for storage.
object ColumnEncoding
object ColumnStatsSchema extends Serializable
object DeltaWriter extends Logging
object UpdatedColumnDecoder

package encoding

Type Members

final class BigDictionaryDecoder extends BigDictionaryDecoderBase with NotNullDecoder

abstract class BigDictionaryDecoderBase extends DictionaryDecoderBase

final class BigDictionaryDecoderNullable extends BigDictionaryDecoderBase with NullableDecoder

final class BooleanBitSetDecoder extends BooleanBitSetDecoderBase with NotNullDecoder

abstract class BooleanBitSetDecoderBase extends ColumnDecoder with BooleanBitSetEncoding

final class BooleanBitSetDecoderNullable extends BooleanBitSetDecoderBase with NullableDecoder

final class BooleanBitSetEncoder extends NotNullEncoder with BooleanBitSetEncoderBase

trait BooleanBitSetEncoderBase extends ColumnEncoder with BooleanBitSetEncoding

final class BooleanBitSetEncoderNullable extends NullableEncoder with BooleanBitSetEncoderBase

trait BooleanBitSetEncoding extends ColumnEncoding

trait BooleanRunLengthDecoder extends RunLengthDecoding

trait BooleanRunLengthEncoder extends AnyRef

abstract class ColumnDecoder extends ColumnEncoding

final class ColumnDeleteDecoder extends AnyRef

final class ColumnDeleteDelta extends ColumnFormatValue with Delta

final class ColumnDeleteEncoder extends ColumnEncoder

final class ColumnDeltaDecoder extends AnyRef

final class ColumnDeltaEncoder extends ColumnEncoder

trait ColumnEncoder extends ColumnEncoding

trait ColumnEncoding extends AnyRef

case class ColumnStatsSchema(fieldName: String, dataType: DataType) extends Product with Serializable

abstract class DeltaWriter extends AnyRef

trait DeltaWriterFactory extends AnyRef

final class DictionaryDecoder extends DictionaryDecoderBase with NotNullDecoder

abstract class DictionaryDecoderBase extends ColumnDecoder with DictionaryEncoding

final class DictionaryDecoderNullable extends DictionaryDecoderBase with NullableDecoder

final class DictionaryEncoder extends NotNullEncoder with DictionaryEncoderBase

trait DictionaryEncoderBase extends ColumnEncoder with DictionaryEncoding

final class DictionaryEncoderNullable extends NullableEncoder with DictionaryEncoderBase

trait DictionaryEncoding extends ColumnEncoding

trait NotNullDecoder extends ColumnDecoder

trait NotNullEncoder extends ColumnEncoder

trait NullableDecoder extends ColumnDecoder

trait NullableEncoder extends NotNullEncoder

final class RunLengthDecoder extends RunLengthDecoderBase with NotNullDecoder

abstract class RunLengthDecoderBase extends ColumnDecoder with RunLengthEncoding

final class RunLengthDecoderNullable extends RunLengthDecoderBase with NullableDecoder

trait RunLengthDecoding extends AnyRef

trait RunLengthEncoding extends ColumnEncoding

final class StringDictionary extends AnyRef

trait Uncompressed extends ColumnEncoding

final class UncompressedDecoder extends UncompressedDecoderBase with NotNullDecoder

abstract class UncompressedDecoderBase extends ColumnDecoder with Uncompressed

final class UncompressedDecoderNullable extends UncompressedDecoderBase with NullableDecoder

final class UncompressedEncoder extends NotNullEncoder with UncompressedEncoderBase

trait UncompressedEncoderBase extends ColumnEncoder with Uncompressed

final class UncompressedEncoderNullable extends NullableEncoder with UncompressedEncoderBase

final class UpdatedColumnDecoder extends UpdatedColumnDecoderBase

abstract class UpdatedColumnDecoderBase extends AnyRef

final class UpdatedColumnDecoderNullable extends UpdatedColumnDecoderBase

Value Members

object BitSet

object ColumnEncoding

object ColumnStatsSchema extends Serializable

object DeltaWriter extends Logging

object UpdatedColumnDecoder

Ungrouped