org.apache.spark.sql.execution.streaming
Store the metadata for the specified batchId and return true if successful.
Store the metadata for the specified batchId and return true if successful. If the batchId's
metadata has already been stored, this method will return false.
A PathFilter to filter only batch files
A PathFilter to filter only batch files
Return metadata for batches between startId (inclusive) and endId (inclusive).
Return metadata for batches between startId (inclusive) and endId (inclusive). If startId is
None, just return all batches before endId (inclusive).
Return the metadata for the specified batchId if it's stored.
Return the metadata for the specified batchId if it's stored. Otherwise, return None.
the deserialized metadata in a batch file, or None if file not exist.
IllegalArgumentException when path does not point to a batch file.
Return the latest batch Id and its metadata if exist.
Return the latest batch Id and its metadata if exist.
Get an array of [FileStatus] referencing batch files.
Get an array of [FileStatus] referencing batch files. The array is sorted by most recent batch file first to oldest batch file.
Removes all the log entry earlier than thresholdBatchId (exclusive).
Removes all the log entry earlier than thresholdBatchId (exclusive).
A MetadataLog implementation based on HDFS. HDFSMetadataLog uses the specified
pathas the metadata storage.When writing a new batch, HDFSMetadataLog will firstly write to a temp file and then rename it to the final batch file. If the rename step fails, there must be multiple writers and only one of them will succeed and the others will fail.
Note: HDFSMetadataLog doesn't support S3-like file systems as they don't guarantee listing files in a directory always shows the latest files.