Class

org.apache.spark.sql.execution

StratifiedSamplerCached

Related Doc: package execution

Permalink

class StratifiedSamplerCached extends StratifiedSampler with CastLongTime

A stratified sampling implementation that uses a fraction and initial cache size. Latter is used as the initial reservoir size per stratum for reservoir sampling. It primarily tries to satisfy the fraction of the total data repeatedly filling up the cache as required (and expanding the cache size for bigger reservoir if required in next rounds). The fraction is attempted to be satisfied while ensuring that the selected rows are equally divided among the current stratum (for those that received any rows, that is).

Linear Supertypes
CastLongTime, StratifiedSampler, Logging, Cloneable, Cloneable, Serializable, Serializable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. StratifiedSamplerCached
  2. CastLongTime
  3. StratifiedSampler
  4. Logging
  5. Cloneable
  6. Cloneable
  7. Serializable
  8. Serializable
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new StratifiedSamplerCached(_options: SampleOptions)

    Permalink

Type Members

  1. final class ProcessRows[U] extends ChangeValue[Row, StratumReservoir]

    Permalink
    Attributes
    protected
  2. type ReservoirSegment = SegmentMap[Row, StratumReservoir]

    Permalink
    Definition Classes
    StratifiedSampler
  3. final class RowWithWeight extends Row

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def append[U](rows: Iterator[Row], init: U, processFlush: (U, InternalRow) ⇒ U, startBatch: (U, Int) ⇒ U, endBatch: (U) ⇒ U, rowEncoder: ExpressionEncoder[Row], partIndex: Int): Long

    Permalink
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. final val castType: Int

    Permalink

    Store type of column once to avoid checking for every row at runtime

    Store type of column once to avoid checking for every row at runtime

    Attributes
    protected
    Definition Classes
    CastLongTime
  7. def clone(): StratifiedSamplerCached

    Permalink
    Definition Classes
    StratifiedSamplerCached → AnyRef
  8. def columnBatchSize: Int

    Permalink
  9. final def concurrency: Int

    Permalink
    Definition Classes
    StratifiedSampler
  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. val flushInProgress: AtomicBoolean

    Permalink
    Attributes
    protected
  14. def flushReservoir[U](init: U, process: (U, InternalRow) ⇒ U, startBatch: (U, Int) ⇒ U, endBatch: (U) ⇒ U): U

    Permalink
  15. final def foldDrainSegment[U](prevReservoirSize: Int, fullReset: Boolean, process: (U, InternalRow) ⇒ U)(init: U, seg: ReservoirSegment): U

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  16. final def foldReservoir[U](prevReservoirSize: Int, doReset: Boolean, fullReset: Boolean, process: (U, InternalRow) ⇒ U)(bid: Int, sr: StratumReservoir, init: U): U

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  17. val fraction: Double

    Permalink
  18. def getBucketId(partIndex: Int, primaryBucketIds: IntArrayList = null)(hashValue: Int): Int

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  19. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  20. def getNullMillis(getDefaultForNull: Boolean): Long

    Permalink
    Attributes
    protected
    Definition Classes
    CastLongTime
  21. def getPrimaryBucketArray(partIndex: Int): IntArrayList

    Permalink
    Attributes
    protected
  22. def getReservoirSegment(newQcs: Array[Int], types: Array[DataType], numColumns: Int, initialCapacity: Int, loadFactor: Double, qcsColHandler: Option[ColumnHandler], segi: Int, nsegs: Int): ReservoirSegment

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  23. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  24. def initializeLogIfNecessary(): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def isBucketLocal(partIndex: Int): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  26. final def isDebugEnabled: Boolean

    Permalink
    Definition Classes
    Logging
  27. final def isInfoEnabled: Boolean

    Permalink
    Definition Classes
    Logging
  28. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  29. final def isTraceEnabled: Boolean

    Permalink
    Definition Classes
    Logging
  30. def iterator(segmentStart: Int, segmentEnd: Int): Iterator[InternalRow]

    Permalink
    Definition Classes
    StratifiedSampler
  31. def iteratorOnRegion(buckets: Set[Integer]): Iterator[InternalRow]

    Permalink
    Definition Classes
    StratifiedSampler
  32. final var levelFlags: Int

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  33. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  34. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Definition Classes
    Logging
  35. def logDebug(msg: ⇒ String): Unit

    Permalink
    Definition Classes
    Logging
  36. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Definition Classes
    Logging
  37. def logError(msg: ⇒ String): Unit

    Permalink
    Definition Classes
    Logging
  38. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Definition Classes
    Logging
  39. def logInfo(msg: ⇒ String): Unit

    Permalink
    Definition Classes
    Logging
  40. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  41. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Definition Classes
    Logging
  42. def logTrace(msg: ⇒ String): Unit

    Permalink
    Definition Classes
    Logging
  43. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Definition Classes
    Logging
  44. def logWarning(msg: ⇒ String): Unit

    Permalink
    Definition Classes
    Logging
  45. final var log_: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  46. def module: String

    Permalink
    Definition Classes
    StratifiedSampler
  47. final def name: String

    Permalink
    Definition Classes
    StratifiedSampler
  48. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  49. final def newMutableRow(row: Row, rowEncoder: ExpressionEncoder[Row]): UnsafeRow

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler
  50. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  51. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  52. def onTruncate(): Unit

    Permalink
  53. final val options: SampleOptions

    Permalink
    Definition Classes
    StratifiedSampler
  54. final def parseMillis(row: Row, timeCol: Int, getDefaultForNull: Boolean = false): Long

    Permalink
    Definition Classes
    CastLongTime
  55. final def parseMillisFromAny(ts: Any): Long

    Permalink
    Attributes
    protected
    Definition Classes
    CastLongTime
  56. final val pendingBatch: AtomicReference[ArrayBuffer[InternalRow]]

    Permalink

    Store pending values to be flushed in a separate buffer so that we do not end up creating too small ColumnBatches.

    Store pending values to be flushed in a separate buffer so that we do not end up creating too small ColumnBatches.

    Note that this mini-cache is copy-on-write (to avoid copy-on-read for readers) so the buffer inside should never be changed rather the whole buffer replaced if required. This should happen only inside flushCache.

    Attributes
    protected
    Definition Classes
    StratifiedSampler
  57. final def qcs: Array[Int]

    Permalink
    Definition Classes
    StratifiedSampler
  58. final def qcsSparkPlan: Option[(CodeAndComment, ArrayBuffer[Any], Int, Array[DataType])]

    Permalink
    Definition Classes
    StratifiedSampler
  59. def reservoirInRegion: Boolean

    Permalink
    Definition Classes
    StratifiedSampler
  60. def resetLogger(): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  61. final val rng: Random

    Permalink

    Random number generator for sampling.

    Random number generator for sampling.

    Attributes
    protected
    Definition Classes
    StratifiedSampler
  62. def sample(items: Iterator[InternalRow], rowEncoder: ExpressionEncoder[Row], flush: Boolean): Iterator[InternalRow]

    Permalink
  63. final def schema: StructType

    Permalink
    Definition Classes
    StratifiedSampler
  64. def setFlushStatus(doFlush: Boolean): Unit

    Permalink
    Definition Classes
    StratifiedSampler
  65. final val strata: ConcurrentSegmentedHashMap[Row, StratumReservoir, ReservoirSegment]

    Permalink

    Map of each stratum key (i.e.

    Map of each stratum key (i.e. a unique combination of values of columns in qcs) to related metadata and reservoir

    Attributes
    protected
    Definition Classes
    StratifiedSampler
  66. def strataReservoirSize: Int

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSamplerCachedStratifiedSampler
  67. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  68. def timeColumnType: Option[DataType]

    Permalink
    Definition Classes
    StratifiedSamplerCachedCastLongTime
  69. def timeInterval: Long

    Permalink
  70. def timeSeriesColumn: Int

    Permalink
  71. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  72. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  73. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  74. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  75. def waitForSamplers(waitUntil: Int, maxMillis: Long): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    StratifiedSampler

Inherited from CastLongTime

Inherited from StratifiedSampler

Inherited from Logging

Inherited from Cloneable

Inherited from Cloneable

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped