Package

org.apache.spark.examples

streaming

Permalink

package streaming

Visibility
  1. Public
  2. All

Type Members

  1. class CustomReceiver extends Receiver[String] with internal.Logging

    Permalink
  2. class JavaCustomReceiver extends Receiver[String]

    Permalink
  3. final class JavaDirectKafkaWordCount extends AnyRef

    Permalink
  4. final class JavaNetworkWordCount extends AnyRef

    Permalink
  5. final class JavaQueueStream extends AnyRef

    Permalink
  6. class JavaRecord extends Serializable

    Permalink
  7. final class JavaRecoverableNetworkWordCount extends AnyRef

    Permalink
  8. final class JavaSqlNetworkWordCount extends AnyRef

    Permalink
  9. class JavaStatefulNetworkWordCount extends AnyRef

    Permalink
  10. case class Record(word: String) extends Product with Serializable

    Permalink

    Case class for converting RDD to DataFrame

Value Members

  1. object CustomReceiver extends Serializable

    Permalink

    Custom Receiver that receives data over a socket.

    Custom Receiver that receives data over a socket. Received bytes are interpreted as text and \n delimited lines are considered as records. They are then counted and printed.

    To run this on your local machine, you need to first run a Netcat server $ nc -lk 9999 and then run the example $ bin/run-example org.apache.spark.examples.streaming.CustomReceiver localhost 9999

  2. object DirectKafkaWordCount

    Permalink

    Consumes messages from one or more topics in Kafka and does wordcount.

    Consumes messages from one or more topics in Kafka and does wordcount. Usage: DirectKafkaWordCount <brokers> <topics> <brokers> is a list of one or more Kafka brokers <topics> is a list of one or more kafka topics to consume from

    Example: $ bin/run-example streaming.DirectKafkaWordCount broker1-host:port,broker2-host:port \ topic1,topic2

  3. object DroppedWordsCounter

    Permalink

    Use this singleton to get or register an Accumulator.

  4. object HdfsWordCount

    Permalink

    Counts words in new text files created in the given directory Usage: HdfsWordCount <directory> <directory> is the directory that Spark Streaming will use to find and read new text files.

    Counts words in new text files created in the given directory Usage: HdfsWordCount <directory> <directory> is the directory that Spark Streaming will use to find and read new text files.

    To run this on your local machine on directory localdir, run this example $ bin/run-example \ org.apache.spark.examples.streaming.HdfsWordCount localdir

    Then create a text file in localdir and the words in the file will get counted.

  5. object NetworkWordCount

    Permalink

    Counts words in UTF8 encoded, '\n' delimited text received from the network every second.

    Counts words in UTF8 encoded, '\n' delimited text received from the network every second.

    Usage: NetworkWordCount <hostname> <port> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.

    To run this on your local machine, you need to first run a Netcat server $ nc -lk 9999 and then run the example $ bin/run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999

  6. object QueueStream

    Permalink
  7. object RawNetworkGrep

    Permalink

    Receives text from multiple rawNetworkStreams and counts how many '\n' delimited lines have the word 'the' in them.

    Receives text from multiple rawNetworkStreams and counts how many '\n' delimited lines have the word 'the' in them. This is useful for benchmarking purposes. This will only work with spark.streaming.util.RawTextSender running on all worker nodes and with Spark using Kryo serialization (set Java property "spark.serializer" to "org.apache.spark.serializer.KryoSerializer"). Usage: RawNetworkGrep <numStreams> <host> <port> <batchMillis> <numStream> is the number rawNetworkStreams, which should be same as number of work nodes in the cluster <host> is "localhost". <port> is the port on which RawTextSender is running in the worker nodes. <batchMillise> is the Spark Streaming batch duration in milliseconds.

  8. object RecoverableNetworkWordCount

    Permalink

    Counts words in text encoded with UTF8 received from the network every second.

    Counts words in text encoded with UTF8 received from the network every second. This example also shows how to use lazily instantiated singleton instances for Accumulator and Broadcast so that they can be registered on driver failures.

    Usage: RecoverableNetworkWordCount <hostname> <port> <checkpoint-directory> <output-file> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data. <checkpoint-directory> directory to HDFS-compatible file system which checkpoint data <output-file> file to which the word counts will be appended

    <checkpoint-directory> and <output-file> must be absolute paths

    To run this on your local machine, you need to first run a Netcat server

    $ nc -lk 9999

    and run the example as

    $ ./bin/run-example org.apache.spark.examples.streaming.RecoverableNetworkWordCount \ localhost 9999 ~/checkpoint/ ~/out

    If the directory ~/checkpoint/ does not exist (e.g. running for the first time), it will create a new StreamingContext (will print "Creating new context" to the console). Otherwise, if checkpoint data exists in ~/checkpoint/, then it will create StreamingContext from the checkpoint data.

    Refer to the online documentation for more details.

  9. object SparkSessionSingleton

    Permalink

    Lazily instantiated singleton instance of SparkSession

  10. object SqlNetworkWordCount

    Permalink

    Use DataFrames and SQL to count words in UTF8 encoded, '\n' delimited text received from the network every second.

    Use DataFrames and SQL to count words in UTF8 encoded, '\n' delimited text received from the network every second.

    Usage: SqlNetworkWordCount <hostname> <port> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.

    To run this on your local machine, you need to first run a Netcat server $ nc -lk 9999 and then run the example $ bin/run-example org.apache.spark.examples.streaming.SqlNetworkWordCount localhost 9999

  11. object StatefulNetworkWordCount

    Permalink

    Counts words cumulatively in UTF8 encoded, '\n' delimited text received from the network every second starting with initial value of word count.

    Counts words cumulatively in UTF8 encoded, '\n' delimited text received from the network every second starting with initial value of word count. Usage: StatefulNetworkWordCount <hostname> <port> <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.

    To run this on your local machine, you need to first run a Netcat server $ nc -lk 9999 and then run the example $ bin/run-example org.apache.spark.examples.streaming.StatefulNetworkWordCount localhost 9999

  12. object StreamingExamples extends internal.Logging

    Permalink

    Utility functions for Spark Streaming examples.

  13. object WordBlacklist

    Permalink

    Use this singleton to get or register a Broadcast variable.

  14. package clickstream

    Permalink

Ungrouped