Estimate clusters on one stream of data and make predictions
on another stream, where the data streams arrive as text files
into two different directories.
The rows of the training text files must be vector data in the form
[x1,x2,x3,...,xn]
Where n is the number of dimensions.
The rows of the test text files must be labeled data in the form
(y,[x1,x2,x3,...,xn])
Where y is some identifier. n must be the same for train and test.
To run on your local machine using the two directories trainingDir and testDir,
with updates every 5 seconds, 2 dimensions per data point, and 3 clusters, call:
$ bin/run-example mllib.StreamingKMeansExample trainingDir testDir 5 3 2
As you add text files to trainingDir the clusters will continuously update.
Anytime you add text files to testDir, you'll see predicted labels using the current model.
Estimate clusters on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.
The rows of the training text files must be vector data in the form
[x1,x2,x3,...,xn]
Where n is the number of dimensions.The rows of the test text files must be labeled data in the form
(y,[x1,x2,x3,...,xn])
Where y is some identifier. n must be the same for train and test.Usage: StreamingKMeansExample <trainingDir> <testDir> <batchDuration> <numClusters> <numDimensions>
To run on your local machine using the two directories
trainingDir
andtestDir
, with updates every 5 seconds, 2 dimensions per data point, and 3 clusters, call: $ bin/run-example mllib.StreamingKMeansExample trainingDir testDir 5 3 2As you add text files to
trainingDir
the clusters will continuously update. Anytime you add text files totestDir
, you'll see predicted labels using the current model.