



package streaming

  1. object StructuredKafkaWordCount


    Consumes messages from one or more topics in Kafka and does wordcount.

    Consumes messages from one or more topics in Kafka and does wordcount. Usage: StructuredKafkaWordCount <bootstrap-servers> <subscribe-type> <topics> <bootstrap-servers> The Kafka "bootstrap.servers" configuration. A comma-separated list of host:port. <subscribe-type> There are three kinds of type, i.e. 'assign', 'subscribe', 'subscribePattern'. |- <assign> Specific TopicPartitions to consume. Json string | {"topicA":[0,1],"topicB":[2,4]}. |- <subscribe> The topic list to subscribe. A comma-separated list of | topics. |- <subscribePattern> The pattern used to subscribe to topic(s). | Java regex string. |- Only one of "assign, "subscribe" or "subscribePattern" options can be | specified for Kafka source. <topics> Different value format depends on the value of 'subscribe-type'.

    Example: $ bin/run-example \ sql.streaming.StructuredKafkaWordCount host1:port1,host2:port2 \ subscribe topic1,topic2

  2. object StructuredNetworkWordCount


    Counts words in UTF8 encoded, '\n' delimited text received from the network.

    Usage: StructuredNetworkWordCount <hostname> <port> <hostname> and <port> describe the TCP server that Structured Streaming would connect to receive data.

    To run this on your local machine, you need to first run a Netcat server $ nc -lk 9999 and then run the example $ bin/run-example sql.streaming.StructuredNetworkWordCount localhost 9999

  3. object StructuredNetworkWordCountWindowed


    Counts words in UTF8 encoded, '\n' delimited text received from the network over a sliding window of configurable duration.

    Counts words in UTF8 encoded, '\n' delimited text received from the network over a sliding window of configurable duration. Each line from the network is tagged with a timestamp that is used to determine the windows into which it falls.

    Usage: StructuredNetworkWordCountWindowed <hostname> <port> <window duration> [<slide duration>] <hostname> and <port> describe the TCP server that Structured Streaming would connect to receive data. <window duration> gives the size of window, specified as integer number of seconds <slide duration> gives the amount of time successive windows are offset from one another, given in the same units as above. <slide duration> should be less than or equal to <window duration>. If the two are equal, successive windows have no overlap. If <slide duration> is not provided, it defaults to <window duration>.

    To run this on your local machine, you need to first run a Netcat server $ nc -lk 9999 and then run the example $ bin/run-example sql.streaming.StructuredNetworkWordCountWindowed localhost 9999 <window duration in seconds> [<slide duration in seconds>]

    One recommended <window duration>, <slide duration> pair is 10, 5
