Class

org.apache.spark.sql.execution

StratifiedSamplerErrorLimit

Related Doc: package execution

Permalink

final class StratifiedSamplerErrorLimit extends StratifiedSampler with CastLongTime

A stratified sampling implementation that uses an error limit with confidence on a numerical column to sample as much as required to maintaining the expected error within the limit. An optional initial cache size can be specified that is used as the initial reservoir size per stratum for reservoir sampling. The error limit is attempted to be honoured for each of the stratum independently and the sampling rate increased or decreased accordingly. It uses standard closed form estimation of the sampling error increasing or decreasing the sampling as required (and expanding the cache size for bigger reservoir if required in next rounds).

Linear Supertypes
CastLongTime, StratifiedSampler, Logging, Cloneable, Cloneable, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. StratifiedSamplerErrorLimit
  2. CastLongTime
  3. StratifiedSampler
  4. Logging
  5. Cloneable
  6. Cloneable
  7. Serializable
  8. Serializable
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new StratifiedSamplerErrorLimit(_options: SampleOptions)

    Permalink

Type Members

  1. type ReservoirSegment = SegmentMap[Row, StratumReservoir]

    Permalink
    Definition Classes
    StratifiedSampler
  2. final class RowWithWeight extends Row

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def append[U](rows: Iterator[Row], init: U, processFlush: (U, InternalRow) ⇒ U, startBatch: (U, Int) ⇒ U, endBatch: (U) ⇒ U, rowEncoder: ExpressionEncoder[Row], partIndex: Int): Long

    Permalink
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. final val castType: Int

    Permalink

    Store type of column once to avoid checking for every row at runtime

    Store type of column once to avoid checking for every row at runtime

    Attributes
    protected
    Definition Classes
    CastLongTime
  7. def checkCacheFlush(force: Boolean): Boolean

    Permalink

    Check whether the cache needs to be flushed.

    Check whether the cache needs to be flushed. This should be invoked whenever there is a potential significant increase in memory consumption

    returns

    java.lang.Boolean.TRUE if cache needs to be flushed and fully reset, java.lang.Boolean.FALSE if cache needs to be flushed but no full reset, and null if cache does not need to be flushed

  8. def clone(): StratifiedSamplerErrorLimit

    Permalink
    Definition Classes
    StratifiedSamplerErrorLimit → AnyRef
  9. def columnBatchSize: Int

    Permalink
  10. final def concurrency: Int

    Permalink
    Definition Classes
    StratifiedSampler
  11. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  13. def errorLimitColumn: Int

    Permalink
  14. def errorLimitPercent: Double

    Permalink
  15. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. def flushReservoir[U](init: U, process: (U, InternalRow) ⇒ U, startBatch: (U, Int) ⇒ U, endBatch: (U) ⇒ U): U

    Permalink
  17. final def foldDrainSegment[U](prevReservoirSize: Int, fullReset: Boolean, process: (U, InternalRow) ⇒ U)(init: U, seg: ReservoirSegment): U

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  18. final def foldReservoir[U](prevReservoirSize: Int, doReset: Boolean, fullReset: Boolean, process: (U, InternalRow) ⇒ U)(bid: Int, sr: StratumReservoir, init: U): U

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  19. def getBucketId(partIndex: Int, primaryBucketIds: IntArrayList = null)(hashValue: Int): Int

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  20. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  21. def getNullMillis(getDefaultForNull: Boolean): Long

    Permalink
    Attributes
    protected
    Definition Classes
    CastLongTime
  22. def getReservoirSegment(newQcs: Array[Int], types: Array[DataType], numColumns: Int, initialCapacity: Int, loadFactor: Double, qcsColHandler: Option[ColumnHandler], segi: Int, nsegs: Int): ReservoirSegment

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  23. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  24. def initCacheSize: Int

    Permalink
  25. def initializeLogIfNecessary(): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. def isBucketLocal(partIndex: Int): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  27. final def isDebugEnabled: Boolean

    Permalink
    Definition Classes
    Logging
  28. final def isInfoEnabled: Boolean

    Permalink
    Definition Classes
    Logging
  29. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  30. final def isTraceEnabled: Boolean

    Permalink
    Definition Classes
    Logging
  31. def iterator(segmentStart: Int, segmentEnd: Int): Iterator[InternalRow]

    Permalink
    Definition Classes
    StratifiedSampler
  32. def iteratorOnRegion(buckets: Set[Integer]): Iterator[InternalRow]

    Permalink
    Definition Classes
    StratifiedSampler
  33. final var levelFlags: Int

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  34. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  35. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Definition Classes
    Logging
  36. def logDebug(msg: ⇒ String): Unit

    Permalink
    Definition Classes
    Logging
  37. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Definition Classes
    Logging
  38. def logError(msg: ⇒ String): Unit

    Permalink
    Definition Classes
    Logging
  39. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Definition Classes
    Logging
  40. def logInfo(msg: ⇒ String): Unit

    Permalink
    Definition Classes
    Logging
  41. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  42. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Definition Classes
    Logging
  43. def logTrace(msg: ⇒ String): Unit

    Permalink
    Definition Classes
    Logging
  44. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Definition Classes
    Logging
  45. def logWarning(msg: ⇒ String): Unit

    Permalink
    Definition Classes
    Logging
  46. final var log_: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  47. def module: String

    Permalink
    Definition Classes
    StratifiedSampler
  48. final def name: String

    Permalink
    Definition Classes
    StratifiedSampler
  49. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  50. final def newMutableRow(row: Row, rowEncoder: ExpressionEncoder[Row]): UnsafeRow

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  51. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  52. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  53. def onTruncate(): Unit

    Permalink
  54. final val options: SampleOptions

    Permalink
    Definition Classes
    StratifiedSampler
  55. final def parseMillis(row: Row, timeCol: Int, getDefaultForNull: Boolean = false): Long

    Permalink
    Definition Classes
    CastLongTime
  56. final def parseMillisFromAny(ts: Any): Long

    Permalink
    Attributes
    protected
    Definition Classes
    CastLongTime
  57. final val pendingBatch: AtomicReference[ArrayBuffer[InternalRow]]

    Permalink

    Store pending values to be flushed in a separate buffer so that we do not end up creating too small ColumnBatches.

    Store pending values to be flushed in a separate buffer so that we do not end up creating too small ColumnBatches.

    Note that this mini-cache is copy-on-write (to avoid copy-on-read for readers) so the buffer inside should never be changed rather the whole buffer replaced if required. This should happen only inside flushCache.

    Attributes
    protected
    Definition Classes
    StratifiedSampler
  58. final def qcs: Array[Int]

    Permalink
    Definition Classes
    StratifiedSampler
  59. final def qcsSparkPlan: Option[(CodeAndComment, ArrayBuffer[Any], Int, Array[DataType])]

    Permalink
    Definition Classes
    StratifiedSampler
  60. def reservoirInRegion: Boolean

    Permalink
    Definition Classes
    StratifiedSampler
  61. def resetLogger(): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  62. final val rng: Random

    Permalink

    Random number generator for sampling.

    Random number generator for sampling.

    Attributes
    protected
    Definition Classes
    StratifiedSampler
  63. def sample(items: Iterator[InternalRow], rowEncoder: ExpressionEncoder[Row], flush: Boolean): Iterator[InternalRow]

    Permalink
  64. final def schema: StructType

    Permalink
    Definition Classes
    StratifiedSampler
  65. def setFlushStatus(doFlush: Boolean): Unit

    Permalink
    Definition Classes
    StratifiedSampler
  66. final val strata: ConcurrentSegmentedHashMap[Row, StratumReservoir, ReservoirSegment]

    Permalink

    Map of each stratum key (i.e.

    Map of each stratum key (i.e. a unique combination of values of columns in qcs) to related metadata and reservoir

    Attributes
    protected
    Definition Classes
    StratifiedSampler
  67. def strataReservoirSize: Int

    Permalink

    not used for this implementation so return init size

    not used for this implementation so return init size

    Attributes
    protected
    Definition Classes
    StratifiedSamplerErrorLimitStratifiedSampler
  68. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  69. def timeColumnType: Option[DataType]

    Permalink
  70. def timeInterval: Long

    Permalink
  71. def timeSeriesColumn: Int

    Permalink
  72. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  73. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  74. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  75. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  76. def waitForSamplers(waitUntil: Int, maxMillis: Long): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler

Inherited from CastLongTime

Inherited from StratifiedSampler

Inherited from Logging

Inherited from Cloneable

Inherited from Cloneable

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped