Case class for converting RDD to DataFrame
Custom Receiver that receives data over a socket.
Consumes messages from one or more topics in Kafka and does wordcount.
Consumes messages from one or more topics in Kafka and does wordcount. Usage: DirectKafkaWordCount <brokers> <topics> <brokers> is a list of one or more Kafka brokers <topics> is a list of one or more kafka topics to consume from
Example: $ bin/run-example streaming.DirectKafkaWordCount broker1-host:port,broker2-host:port \ topic1,topic2
Use this singleton to get or register an Accumulator.
Counts words in new text files created in the given directory Usage: HdfsWordCount <directory> <directory> is the directory that Spark Streaming will use to find and read new text files.
Counts words in new text files created in the given directory Usage: HdfsWordCount <directory> <directory> is the directory that Spark Streaming will use to find and read new text files.
To run this on your local machine on directory localdir
, run this example
$ bin/run-example \
org.apache.spark.examples.streaming.HdfsWordCount localdir
Then create a text file in localdir
and the words in the file will get counted.
Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
Counts words in UTF8 encoded, '\n' delimited text received from the network every second.
Usage: NetworkWordCount <hostname> <port> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
To run this on your local machine, you need to first run a Netcat server
$ nc -lk 9999
and then run the example
$ bin/run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999
Receives text from multiple rawNetworkStreams and counts how many '\n' delimited lines have the word 'the' in them.
Receives text from multiple rawNetworkStreams and counts how many '\n' delimited lines have the word 'the' in them. This is useful for benchmarking purposes. This will only work with spark.streaming.util.RawTextSender running on all worker nodes and with Spark using Kryo serialization (set Java property "spark.serializer" to "org.apache.spark.serializer.KryoSerializer"). Usage: RawNetworkGrep <numStreams> <host> <port> <batchMillis> <numStream> is the number rawNetworkStreams, which should be same as number of work nodes in the cluster <host> is "localhost". <port> is the port on which RawTextSender is running in the worker nodes. <batchMillise> is the Spark Streaming batch duration in milliseconds.
Counts words in text encoded with UTF8 received from the network every second.
Counts words in text encoded with UTF8 received from the network every second. This example also shows how to use lazily instantiated singleton instances for Accumulator and Broadcast so that they can be registered on driver failures.
Usage: RecoverableNetworkWordCount <hostname> <port> <checkpoint-directory> <output-file> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data. <checkpoint-directory> directory to HDFS-compatible file system which checkpoint data <output-file> file to which the word counts will be appended
<checkpoint-directory> and <output-file> must be absolute paths
To run this on your local machine, you need to first run a Netcat server
$ nc -lk 9999
and run the example as
$ ./bin/run-example org.apache.spark.examples.streaming.RecoverableNetworkWordCount \
localhost 9999 ~/checkpoint/ ~/out
If the directory ~/checkpoint/ does not exist (e.g. running for the first time), it will create a new StreamingContext (will print "Creating new context" to the console). Otherwise, if checkpoint data exists in ~/checkpoint/, then it will create StreamingContext from the checkpoint data.
Refer to the online documentation for more details.
Lazily instantiated singleton instance of SparkSession
Use DataFrames and SQL to count words in UTF8 encoded, '\n' delimited text received from the network every second.
Use DataFrames and SQL to count words in UTF8 encoded, '\n' delimited text received from the network every second.
Usage: SqlNetworkWordCount <hostname> <port> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
To run this on your local machine, you need to first run a Netcat server
$ nc -lk 9999
and then run the example
$ bin/run-example org.apache.spark.examples.streaming.SqlNetworkWordCount localhost 9999
Counts words cumulatively in UTF8 encoded, '\n' delimited text received from the network every second starting with initial value of word count.
Counts words cumulatively in UTF8 encoded, '\n' delimited text received from the network every second starting with initial value of word count. Usage: StatefulNetworkWordCount <hostname> <port> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.
To run this on your local machine, you need to first run a Netcat server
$ nc -lk 9999
and then run the example
$ bin/run-example
org.apache.spark.examples.streaming.StatefulNetworkWordCount localhost 9999
Utility functions for Spark Streaming examples.
Use this singleton to get or register a Broadcast variable.
Custom Receiver that receives data over a socket. Received bytes are interpreted as text and \n delimited lines are considered as records. They are then counted and printed.
To run this on your local machine, you need to first run a Netcat server
$ nc -lk 9999
and then run the example$ bin/run-example org.apache.spark.examples.streaming.CustomReceiver localhost 9999