org.apache.spark.sql.execution.streaming
Store the metadata for the specified batchId and return true
if successful.
Store the metadata for the specified batchId and return true
if successful. If the batchId's
metadata has already been stored, this method will return false
.
A PathFilter
to filter only batch files
A PathFilter
to filter only batch files
Return metadata for batches between startId (inclusive) and endId (inclusive).
Return metadata for batches between startId (inclusive) and endId (inclusive). If startId
is
None
, just return all batches before endId (inclusive).
Return the metadata for the specified batchId if it's stored.
Return the metadata for the specified batchId if it's stored. Otherwise, return None.
the deserialized metadata in a batch file, or None if file not exist.
IllegalArgumentException
when path does not point to a batch file.
Return the latest batch Id and its metadata if exist.
Return the latest batch Id and its metadata if exist.
Get an array of [FileStatus] referencing batch files.
Get an array of [FileStatus] referencing batch files. The array is sorted by most recent batch file first to oldest batch file.
Removes all the log entry earlier than thresholdBatchId (exclusive).
Removes all the log entry earlier than thresholdBatchId (exclusive).
A MetadataLog implementation based on HDFS. HDFSMetadataLog uses the specified
path
as the metadata storage.When writing a new batch, HDFSMetadataLog will firstly write to a temp file and then rename it to the final batch file. If the rename step fails, there must be multiple writers and only one of them will succeed and the others will fail.
Note: HDFSMetadataLog doesn't support S3-like file systems as they don't guarantee listing files in a directory always shows the latest files.