Perform streaming testing using Welch's 2-sample t-test on a stream of data, where the data
stream arrives as text files in a directory. Stops when the two groups are statistically
significant (p-value < 0.05) or after a user-specified timeout in number of batches is exceeded.
The rows of the text files must be in the form Boolean, Double. For example:
false, -3.92
true, 99.32
To run on your local machine using the directory dataDir with 5 seconds between each batch and
a timeout after 100 insignificant batches, call:
$ bin/run-example mllib.StreamingTestExample dataDir 5 100
As you add text files to dataDir the significance test wil continually update every
batchDuration seconds until the test becomes significant (p-value < 0.05) or the number of
batches processed exceeds numBatchesTimeout.
Perform streaming testing using Welch's 2-sample t-test on a stream of data, where the data stream arrives as text files in a directory. Stops when the two groups are statistically significant (p-value < 0.05) or after a user-specified timeout in number of batches is exceeded.
The rows of the text files must be in the form
Boolean, Double
. For example: false, -3.92 true, 99.32Usage: StreamingTestExample <dataDir> <batchDuration> <numBatchesTimeout>
To run on your local machine using the directory
dataDir
with 5 seconds between each batch and a timeout after 100 insignificant batches, call: $ bin/run-example mllib.StreamingTestExample dataDir 5 100As you add text files to
dataDir
the significance test wil continually update everybatchDuration
seconds until the test becomes significant (p-value < 0.05) or the number of batches processed exceedsnumBatchesTimeout
.