Class/Object

org.apache.spark.sql.execution.streaming

CompactibleFileStreamLog

Related Docs: object CompactibleFileStreamLog | package streaming

Permalink

abstract class CompactibleFileStreamLog[T <: AnyRef] extends HDFSMetadataLog[Array[T]]

An abstract class for compactible metadata logs. It will write one log file for each batch. The first line of the log file is the version number, and there are multiple serialized metadata lines following.

As reading from many small files is usually pretty slow, also too many small files in one folder will mess the FS, CompactibleFileStreamLog will compact log files every 10 batches by default into a big file. When doing a compaction, it will read all old log files and merge them with the new batch.

Linear Supertypes
HDFSMetadataLog[Array[T]], internal.Logging, MetadataLog[Array[T]], AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CompactibleFileStreamLog
  2. HDFSMetadataLog
  3. Logging
  4. MetadataLog
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CompactibleFileStreamLog(metadataLogVersion: Int, sparkSession: SparkSession, path: String)(implicit arg0: ClassTag[T])

    Permalink

Abstract Value Members

  1. abstract def compactLogs(logs: Seq[T]): Seq[T]

    Permalink

    Filter out the obsolete logs.

  2. abstract def defaultCompactInterval: Int

    Permalink
    Attributes
    protected
  3. abstract def fileCleanupDelayMs: Long

    Permalink

    If we delete the old files after compaction at once, there is a race condition in S3: other processes may see the old files are deleted but still cannot see the compaction file using "list".

    If we delete the old files after compaction at once, there is a race condition in S3: other processes may see the old files are deleted but still cannot see the compaction file using "list". The allFiles handles this by looking for the next compaction file directly, however, a live lock may happen if the compaction happens too frequently: one processing keeps deleting old files while another one keeps retrying. Setting a reasonable cleanup delay could avoid it.

    Attributes
    protected
  4. abstract def isDeletingExpiredLog: Boolean

    Permalink
    Attributes
    protected

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def add(batchId: Long, logs: Array[T]): Boolean

    Permalink

    Store the metadata for the specified batchId and return true if successful.

    Store the metadata for the specified batchId and return true if successful. If the batchId's metadata has already been stored, this method will return false.

    Definition Classes
    CompactibleFileStreamLogHDFSMetadataLogMetadataLog
  5. def allFiles(): Array[T]

    Permalink

    Returns all files except the deleted ones.

  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. val batchFilesFilter: PathFilter

    Permalink

    A PathFilter to filter only batch files

    A PathFilter to filter only batch files

    Attributes
    protected
    Definition Classes
    HDFSMetadataLog
  8. def batchIdToPath(batchId: Long): Path

    Permalink
  9. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. final lazy val compactInterval: Int

    Permalink
    Attributes
    protected
  11. def deserialize(in: InputStream): Array[T]

    Permalink
  12. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  14. val fileManager: FileManager

    Permalink
    Attributes
    protected
    Definition Classes
    HDFSMetadataLog
  15. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  16. def get(startId: Option[Long], endId: Option[Long]): Array[(Long, Array[T])]

    Permalink

    Return metadata for batches between startId (inclusive) and endId (inclusive).

    Return metadata for batches between startId (inclusive) and endId (inclusive). If startId is None, just return all batches before endId (inclusive).

    Definition Classes
    HDFSMetadataLogMetadataLog
  17. def get(batchId: Long): Option[Array[T]]

    Permalink

    Return the metadata for the specified batchId if it's stored.

    Return the metadata for the specified batchId if it's stored. Otherwise, return None.

    Definition Classes
    HDFSMetadataLogMetadataLog
  18. def get(batchFile: Path): Option[Array[T]]

    Permalink

    returns

    the deserialized metadata in a batch file, or None if file not exist.

    Definition Classes
    HDFSMetadataLog
    Exceptions thrown

    IllegalArgumentException when path does not point to a batch file.

  19. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  20. def getLatest(): Option[(Long, Array[T])]

    Permalink

    Return the latest batch Id and its metadata if exist.

    Return the latest batch Id and its metadata if exist.

    Definition Classes
    HDFSMetadataLogMetadataLog
  21. def getOrderedBatchFiles(): Array[FileStatus]

    Permalink

    Get an array of [FileStatus] referencing batch files.

    Get an array of [FileStatus] referencing batch files. The array is sorted by most recent batch file first to oldest batch file.

    Definition Classes
    HDFSMetadataLog
  22. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  23. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean = false): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def isBatchFile(path: Path): Boolean

    Permalink
  26. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  27. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  31. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  32. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  33. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  34. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  35. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  36. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  37. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  38. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  39. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  40. val metadataPath: Path

    Permalink
    Definition Classes
    HDFSMetadataLog
  41. val minBatchesToRetain: Int

    Permalink
    Attributes
    protected
  42. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  43. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  44. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  45. def pathToBatchId(path: Path): Long

    Permalink
  46. def purge(thresholdBatchId: Long): Unit

    Permalink

    Removes all the log entry earlier than thresholdBatchId (exclusive).

    Removes all the log entry earlier than thresholdBatchId (exclusive).

    Definition Classes
    HDFSMetadataLogMetadataLog
  47. def serialize(logData: Array[T], out: OutputStream): Unit

    Permalink
  48. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  49. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  50. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  51. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  52. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from HDFSMetadataLog[Array[T]]

Inherited from internal.Logging

Inherited from MetadataLog[Array[T]]

Inherited from AnyRef

Inherited from Any

Ungrouped