whether to delete the checkpoint if the query is stopped without errors
Tracks the offsets that are available to be processed, but have not yet be committed to the sink.
Tracks the offsets that are available to be processed, but have not yet be committed to the sink. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.
Await until all fields of the query have been initialized.
Waits for the termination of this
query, either by query.stop()
or by an exception.
Waits for the termination of this
query, either by query.stop()
or by an exception.
If the query has terminated with an exception, then the exception will be thrown.
Otherwise, it returns whether the query has terminated or not within the timeoutMs
milliseconds.
If the query has terminated, then all subsequent calls to this method will either return
true
immediately (if the query was terminated by stop()
), or throw the exception
immediately (if the query has terminated with exception).
2.0.0
StreamingQueryException
if the query has terminated with an exception
Waits for the termination of this
query, either by query.stop()
or by an exception.
Waits for the termination of this
query, either by query.stop()
or by an exception.
If the query has terminated with an exception, then the exception will be thrown.
If the query has terminated, then all subsequent calls to this method will either return
immediately (if the query was terminated by stop()
), or throw the exception
immediately (if the query has terminated with exception).
2.0.0
StreamingQueryException
if the query has terminated with an exception.
Tracks how much data we have processed and committed to the sink or state store from each input source.
Tracks how much data we have processed and committed to the sink or state store from each input source. Only the scheduler thread should modify this field, and only in atomic steps. Other threads should make a shallow copy if they are going to access this field more than once, since the field's value may change at any time.
The current batchId or -1 if execution has not yet been initialized.
The current batchId or -1 if execution has not yet been initialized.
Returns the StreamingQueryException if the query was terminated by an exception.
Returns the StreamingQueryException if the query was terminated by an exception.
Prints the physical plan to the console for debugging purposes.
Prints the physical plan to the console for debugging purposes.
2.0.0
Prints the physical plan to the console for debugging purposes.
Prints the physical plan to the console for debugging purposes.
whether to do extended explain or not
2.0.0
Expose for tests
Finalizes the query progress and adds it to list of recent status updates.
Finalizes the query progress and adds it to list of recent status updates.
Returns the unique id of this query that persists across restarts from checkpoint data.
Returns the unique id of this query that persists across restarts from checkpoint data. That is, this id is generated when a query is started for the first time, and will be the same every time it is restarted from checkpoint data. Also see runId.
2.1.0
Whether the query is currently active or not
Whether the query is currently active or not
Returns the most recent query progress update or null if there were no progress updates.
Returns the most recent query progress update or null if there were no progress updates.
The thread that runs the micro-batches of this stream.
The thread that runs the micro-batches of this stream. Note that this thread must be
org.apache.spark.util.UninterruptibleThread to workaround KAFKA-1894: interrupting a
running KafkaConsumer
may cause endless loop, and HADOOP-10622: interrupting
Shell.runCommand
causes deadlock. (SPARK-14131)
Returns the user-specified name of the query, or null if not specified.
Returns the user-specified name of the query, or null if not specified.
This name can be specified in the org.apache.spark.sql.streaming.DataStreamWriter
as dataframe.writeStream.queryName("query").start()
.
This name, if set, must be unique across all active queries.
2.0.0
Holds the most recent input data for each source.
Holds the most recent input data for each source.
A write-ahead-log that records the offsets that are present in each batch.
A write-ahead-log that records the offsets that are present in each batch. In order to ensure that a given batch will always consist of the same data, we write to this log *before* any processing is done. Thus, the Nth record in this log indicated data that is currently being processed and the N-1th entry indicates which offsets have been durably committed to the sink.
Metadata associated with the offset seq of a batch in the query.
Metadata associated with the offset seq of a batch in the query.
Blocks until all available data in the source has been processed and committed to the sink.
Blocks until all available data in the source has been processed and committed to the sink.
This method is intended for testing. Note that in the case of continually arriving data, this
method may block forever. Additionally, this method is only guaranteed to block until data that
has been synchronously appended data to a org.apache.spark.sql.execution.streaming.Source
prior to invocation. (i.e. getOffset
must immediately reflect the addition).
2.0.0
Returns an array containing the most recent query progress updates.
Returns an array containing the most recent query progress updates.
Records the duration of running body
for the next query progress update.
Records the duration of running body
for the next query progress update.
Returns the unique id of this run of the query.
Returns the unique id of this run of the query. That is, every start/restart of a query will generated a unique runId. Therefore, every time a query is restarted from checkpoint, it will have the same id but different runIds.
All stream sources present in the query plan.
All stream sources present in the query plan. This will be set when generating logical plan.
Returns the SparkSession
associated with this
.
Returns the SparkSession
associated with this
.
2.0.0
Starts the execution.
Starts the execution. This returns only after the thread has started and QueryStartedEvent has been posted to all the listeners.
Begins recording statistics about query progress for a given trigger.
Begins recording statistics about query progress for a given trigger.
Returns the current status of the query.
Returns the current status of the query.
Signals to the thread executing micro-batches that it should stop running after the next batch.
Signals to the thread executing micro-batches that it should stop running after the next batch. This method blocks until the thread stops running.
Metadata associated with the whole query
Metadata associated with the whole query
Used to report metrics to coda-hale.
Used to report metrics to coda-hale. This uses id for easier tracking across restarts.
Updates the message returned in status
.
Updates the message returned in status
.
Manages the execution of a streaming Spark SQL query that is occurring in a separate thread. Unlike a standard query, a streaming query executes repeatedly each time new data arrives at any Source present in the query plan. Whenever new data arrives, a QueryExecution is created and the results are committed transactionally to the given Sink.