mllib

Type Members

abstract class AbstractParams[T] extends AnyRef

Abstract class for parameter case classes.
Abstract class for parameter case classes. This overrides the toString method to print all case class fields by name and value.
T
Concrete parameter class.
final class JavaALS extends AnyRef
class JavaAssociationRulesExample extends AnyRef
class JavaBinaryClassificationMetricsExample extends AnyRef
class JavaBisectingKMeansExample extends AnyRef
class JavaChiSqSelectorExample extends AnyRef
class JavaCorrelationsExample extends AnyRef
class JavaElementwiseProductExample extends AnyRef
class JavaGaussianMixtureExample extends AnyRef
class JavaGradientBoostingClassificationExample extends AnyRef
class JavaGradientBoostingRegressionExample extends AnyRef
class JavaHypothesisTestingExample extends AnyRef
class JavaHypothesisTestingKolmogorovSmirnovTestExample extends AnyRef
class JavaIsotonicRegressionExample extends AnyRef
class JavaKMeansExample extends AnyRef
class JavaKernelDensityEstimationExample extends AnyRef
class JavaLBFGSExample extends AnyRef
class JavaLatentDirichletAllocationExample extends AnyRef
class JavaLinearRegressionWithSGDExample extends AnyRef
class JavaLogisticRegressionWithLBFGSExample extends AnyRef
class JavaMultiLabelClassificationMetricsExample extends AnyRef
class JavaMulticlassClassificationMetricsExample extends AnyRef
class JavaNaiveBayesExample extends AnyRef
class JavaPCAExample extends AnyRef
class JavaPowerIterationClusteringExample extends AnyRef
class JavaPrefixSpanExample extends AnyRef
class JavaRandomForestClassificationExample extends AnyRef
class JavaRandomForestRegressionExample extends AnyRef
class JavaRankingMetricsExample extends AnyRef
class JavaRecommendationExample extends AnyRef
class JavaRegressionMetricsExample extends AnyRef
class JavaSVDExample extends AnyRef
class JavaSVMWithSGDExample extends AnyRef
class JavaSimpleFPGrowth extends AnyRef
class JavaStratifiedSamplingExample extends AnyRef
class JavaStreamingTestExample extends AnyRef
class JavaSummaryStatisticsExample extends AnyRef

Value Members

object AssociationRulesExample
object BinaryClassification

An example app for binary classification.
An example app for binary classification. Run with
```
bin/run-example org.apache.spark.examples.mllib.BinaryClassification
```
A synthetic dataset is located at data/mllib/sample_binary_classification_data.txt. If you use it as a template to create your own app, please use spark-submit to submit your app.
object BinaryClassificationMetricsExample
object BisectingKMeansExample

An example demonstrating a bisecting k-means clustering in spark.mllib.
An example demonstrating a bisecting k-means clustering in spark.mllib.
Run with
```
bin/run-example mllib.BisectingKMeansExample
```
object ChiSqSelectorExample
object Correlations

An example app for summarizing multivariate data from a file.
An example app for summarizing multivariate data from a file. Run with
```
bin/run-example org.apache.spark.examples.mllib.Correlations
```
By default, this loads a synthetic dataset from data/mllib/sample_linear_regression_data.txt. If you use it as a template to create your own app, please use spark-submit to submit your app.
object CorrelationsExample
object CosineSimilarity

Compute the similar columns of a matrix, using cosine similarity.
Compute the similar columns of a matrix, using cosine similarity.
The input matrix must be stored in row-oriented dense format, one line per row with its entries separated by space. For example,
```
0.5 1.0
2.0 3.0
4.0 5.0
```
represents a 3-by-2 matrix, whose first row is (0.5, 1.0).
Example invocation:
bin/run-example mllib.CosineSimilarity \ --threshold 0.1 data/mllib/sample_svm_data.txt
object DecisionTreeClassificationExample
object DecisionTreeRegressionExample
object DecisionTreeRunner

An example runner for decision trees and random forests.
An example runner for decision trees and random forests. Run with
```
./bin/run-example org.apache.spark.examples.mllib.DecisionTreeRunner [options]
```
If you use it as a template to create your own app, please use spark-submit to submit your app.
Note: This script treats all features as real-valued (not categorical). To include categorical features, modify categoricalFeaturesInfo.
object DenseKMeans

An example k-means app.
An example k-means app. Run with
```
./bin/run-example org.apache.spark.examples.mllib.DenseKMeans [options] <input>
```
If you use it as a template to create your own app, please use spark-submit to submit your app.
object ElementwiseProductExample
object FPGrowthExample

Example for mining frequent itemsets using FP-growth.
Example for mining frequent itemsets using FP-growth. Example usage: ./bin/run-example mllib.FPGrowthExample \ --minSupport 0.8 --numPartition 2 ./data/mllib/sample_fpgrowth.txt
object GaussianMixtureExample
object GradientBoostedTreesRunner

An example runner for Gradient Boosting using decision trees as weak learners.
An example runner for Gradient Boosting using decision trees as weak learners. Run with
```
./bin/run-example mllib.GradientBoostedTreesRunner [options]
```
If you use it as a template to create your own app, please use spark-submit to submit your app.
Note: This script treats all features as real-valued (not categorical). To include categorical features, modify categoricalFeaturesInfo.
object GradientBoostingClassificationExample
object GradientBoostingRegressionExample
object HypothesisTestingExample
object HypothesisTestingKolmogorovSmirnovTestExample
object IsotonicRegressionExample
object KMeansExample
object KernelDensityEstimationExample
object LBFGSExample
object LDAExample

An example Latent Dirichlet Allocation (LDA) app.
An example Latent Dirichlet Allocation (LDA) app. Run with
```
./bin/run-example mllib.LDAExample [options] <input>
```
If you use it as a template to create your own app, please use spark-submit to submit your app.
object LatentDirichletAllocationExample
object LogisticRegressionWithLBFGSExample
object MovieLensALS

An example app for ALS on MovieLens data (http://grouplens.org/datasets/movielens/).
An example app for ALS on MovieLens data (http://grouplens.org/datasets/movielens/). Run with
```
bin/run-example org.apache.spark.examples.mllib.MovieLensALS
```
A synthetic dataset in MovieLens format can be found at data/mllib/sample_movielens_data.txt. If you use it as a template to create your own app, please use spark-submit to submit your app.
object MultiLabelMetricsExample
object MulticlassMetricsExample
object MultivariateSummarizer

An example app for summarizing multivariate data from a file.
An example app for summarizing multivariate data from a file. Run with
```
bin/run-example org.apache.spark.examples.mllib.MultivariateSummarizer
```
By default, this loads a synthetic dataset from data/mllib/sample_linear_regression_data.txt. If you use it as a template to create your own app, please use spark-submit to submit your app.
object NaiveBayesExample
object NormalizerExample
object PCAOnRowMatrixExample
object PCAOnSourceVectorExample
object PMMLModelExportExample
object PowerIterationClusteringExample

An example Power Iteration Clustering http://www.icml2010.org/papers/387.pdf app.
An example Power Iteration Clustering http://www.icml2010.org/papers/387.pdf app. Takes an input of K concentric circles and the number of points in the innermost circle. The output should be K clusters - each cluster containing precisely the points associated with each of the input circles.
Run with
```
./bin/run-example mllib.PowerIterationClusteringExample [options]

Where options include:
  k:  Number of circles/clusters
  n:  Number of sampled points on innermost circle.. There are proportionally more points
     within the outer/larger circles
  maxIterations:   Number of Power Iterations
```
Here is a sample run and output:
./bin/run-example mllib.PowerIterationClusteringExample -k 2 --n 10 --maxIterations 15
Cluster assignments: 1 -> [0,1,2,3,4,5,6,7,8,9], 0 -> [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29]
If you use it as a template to create your own app, please use spark-submit to submit your app.
object PrefixSpanExample
object RandomForestClassificationExample
object RandomForestRegressionExample
object RandomRDDGeneration

An example app for randomly generated RDDs.
An example app for randomly generated RDDs. Run with
```
bin/run-example org.apache.spark.examples.mllib.RandomRDDGeneration
```
If you use it as a template to create your own app, please use spark-submit to submit your app.
object RankingMetricsExample
object RecommendationExample
object SVDExample
object SVMWithSGDExample
object SampledRDDs

An example app for randomly generated and sampled RDDs.
An example app for randomly generated and sampled RDDs. Run with
```
bin/run-example org.apache.spark.examples.mllib.SampledRDDs
```
If you use it as a template to create your own app, please use spark-submit to submit your app.
object SimpleFPGrowth
object SparseNaiveBayes

An example naive Bayes app.
An example naive Bayes app. Run with
```
./bin/run-example org.apache.spark.examples.mllib.SparseNaiveBayes [options] <input>
```
If you use it as a template to create your own app, please use spark-submit to submit your app.
object StandardScalerExample
object StratifiedSamplingExample
object StreamingKMeansExample

Estimate clusters on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.
Estimate clusters on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.
The rows of the training text files must be vector data in the form [x1,x2,x3,...,xn] Where n is the number of dimensions.
The rows of the test text files must be labeled data in the form (y,[x1,x2,x3,...,xn]) Where y is some identifier. n must be the same for train and test.
Usage: StreamingKMeansExample <trainingDir> <testDir> <batchDuration> <numClusters> <numDimensions>
To run on your local machine using the two directories trainingDir and testDir, with updates every 5 seconds, 2 dimensions per data point, and 3 clusters, call: $ bin/run-example mllib.StreamingKMeansExample trainingDir testDir 5 3 2
As you add text files to trainingDir the clusters will continuously update. Anytime you add text files to testDir, you'll see predicted labels using the current model.
object StreamingLinearRegressionExample

Train a linear regression model on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.
Train a linear regression model on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.
The rows of the text files must be labeled data points in the form (y,[x1,x2,x3,...,xn]) Where n is the number of features. n must be the same for train and test.
Usage: StreamingLinearRegressionExample <trainingDir> <testDir>
To run on your local machine using the two directories trainingDir and testDir, with updates every 5 seconds, and 2 features per data point, call: $ bin/run-example mllib.StreamingLinearRegressionExample trainingDir testDir
As you add text files to trainingDir the model will continuously update. Anytime you add text files to testDir, you'll see predictions from the current model.
object StreamingLogisticRegression

Train a logistic regression model on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.
Train a logistic regression model on one stream of data and make predictions on another stream, where the data streams arrive as text files into two different directories.
The rows of the text files must be labeled data points in the form (y,[x1,x2,x3,...,xn]) Where n is the number of features, y is a binary label, and n must be the same for train and test.
Usage: StreamingLogisticRegression <trainingDir> <testDir> <batchDuration> <numFeatures>
To run on your local machine using the two directories trainingDir and testDir, with updates every 5 seconds, and 2 features per data point, call: $ bin/run-example mllib.StreamingLogisticRegression trainingDir testDir 5 2
As you add text files to trainingDir the model will continuously update. Anytime you add text files to testDir, you'll see predictions from the current model.
object StreamingTestExample

Perform streaming testing using Welch's 2-sample t-test on a stream of data, where the data stream arrives as text files in a directory.
Perform streaming testing using Welch's 2-sample t-test on a stream of data, where the data stream arrives as text files in a directory. Stops when the two groups are statistically significant (p-value < 0.05) or after a user-specified timeout in number of batches is exceeded.
The rows of the text files must be in the form Boolean, Double. For example: false, -3.92 true, 99.32
Usage: StreamingTestExample <dataDir> <batchDuration> <numBatchesTimeout>
To run on your local machine using the directory dataDir with 5 seconds between each batch and a timeout after 100 insignificant batches, call: $ bin/run-example mllib.StreamingTestExample dataDir 5 100
As you add text files to dataDir the significance test wil continually update every batchDuration seconds until the test becomes significant (p-value < 0.05) or the number of batches processed exceeds numBatchesTimeout.
object SummaryStatisticsExample
object TFIDFExample
object TallSkinnyPCA

Compute the principal components of a tall-and-skinny matrix, whose rows are observations.
Compute the principal components of a tall-and-skinny matrix, whose rows are observations.
The input matrix must be stored in row-oriented dense format, one line per row with its entries separated by space. For example,
```
0.5 1.0
2.0 3.0
4.0 5.0
```
represents a 3-by-2 matrix, whose first row is (0.5, 1.0).
object TallSkinnySVD

Compute the singular value decomposition (SVD) of a tall-and-skinny matrix.
Compute the singular value decomposition (SVD) of a tall-and-skinny matrix.
The input matrix must be stored in row-oriented dense format, one line per row with its entries separated by space. For example,
```
0.5 1.0
2.0 3.0
4.0 5.0
```
represents a 3-by-2 matrix, whose first row is (0.5, 1.0).
object Word2VecExample

Deprecated Value Members

object LinearRegression

An example app for linear regression.
An example app for linear regression. Run with
```
bin/run-example org.apache.spark.examples.mllib.LinearRegression
```
A synthetic dataset can be found at data/mllib/sample_linear_regression_data.txt. If you use it as a template to create your own app, please use spark-submit to submit your app.
Annotations
@deprecated
Deprecated
(Since version 2.0.0) Use ml.regression.LinearRegression or LBFGS
object LinearRegressionWithSGDExample

Annotations
@deprecated
Deprecated
(Since version 2.0.0) Use ml.regression.LinearRegression or LBFGS
object PCAExample

Annotations
@deprecated
Deprecated
(Since version 2.0.0) Deprecated since LinearRegressionWithSGD is deprecated. Use ml.feature.PCA
object RegressionMetricsExample

Annotations
@deprecated
Deprecated
(Since version 2.0.0) Use ml.regression.LinearRegression and the resulting model summary for metrics

package mllib

Type Members

abstract class AbstractParams[T] extends AnyRef

final class JavaALS extends AnyRef

class JavaAssociationRulesExample extends AnyRef

class JavaBinaryClassificationMetricsExample extends AnyRef

class JavaBisectingKMeansExample extends AnyRef

class JavaChiSqSelectorExample extends AnyRef

class JavaCorrelationsExample extends AnyRef

class JavaElementwiseProductExample extends AnyRef

class JavaGaussianMixtureExample extends AnyRef

class JavaGradientBoostingClassificationExample extends AnyRef

class JavaGradientBoostingRegressionExample extends AnyRef

class JavaHypothesisTestingExample extends AnyRef

class JavaHypothesisTestingKolmogorovSmirnovTestExample extends AnyRef

class JavaIsotonicRegressionExample extends AnyRef

class JavaKMeansExample extends AnyRef

class JavaKernelDensityEstimationExample extends AnyRef

class JavaLBFGSExample extends AnyRef

class JavaLatentDirichletAllocationExample extends AnyRef

class JavaLinearRegressionWithSGDExample extends AnyRef

class JavaLogisticRegressionWithLBFGSExample extends AnyRef

class JavaMultiLabelClassificationMetricsExample extends AnyRef

class JavaMulticlassClassificationMetricsExample extends AnyRef

class JavaNaiveBayesExample extends AnyRef

class JavaPCAExample extends AnyRef

class JavaPowerIterationClusteringExample extends AnyRef

class JavaPrefixSpanExample extends AnyRef

class JavaRandomForestClassificationExample extends AnyRef

class JavaRandomForestRegressionExample extends AnyRef

class JavaRankingMetricsExample extends AnyRef

class JavaRecommendationExample extends AnyRef

class JavaRegressionMetricsExample extends AnyRef

class JavaSVDExample extends AnyRef

class JavaSVMWithSGDExample extends AnyRef

class JavaSimpleFPGrowth extends AnyRef

class JavaStratifiedSamplingExample extends AnyRef

class JavaStreamingTestExample extends AnyRef

class JavaSummaryStatisticsExample extends AnyRef

Value Members

object AssociationRulesExample

object BinaryClassification

object BinaryClassificationMetricsExample

object BisectingKMeansExample

object ChiSqSelectorExample

object Correlations

object CorrelationsExample

object CosineSimilarity

object DecisionTreeClassificationExample

object DecisionTreeRegressionExample

object DecisionTreeRunner

object DenseKMeans

object ElementwiseProductExample

object FPGrowthExample

object GaussianMixtureExample

object GradientBoostedTreesRunner

object GradientBoostingClassificationExample

object GradientBoostingRegressionExample

object HypothesisTestingExample

object HypothesisTestingKolmogorovSmirnovTestExample

object IsotonicRegressionExample

object KMeansExample

object KernelDensityEstimationExample

object LBFGSExample

object LDAExample

object LatentDirichletAllocationExample

object LogisticRegressionWithLBFGSExample

object MovieLensALS

object MultiLabelMetricsExample

object MulticlassMetricsExample

object MultivariateSummarizer

object NaiveBayesExample

object NormalizerExample

object PCAOnRowMatrixExample

object PCAOnSourceVectorExample

object PMMLModelExportExample

object PowerIterationClusteringExample

object PrefixSpanExample

object RandomForestClassificationExample