Spark.ML Module
Jump to navigation
Jump to search
A Spark.ML Module is an ML training framework that is an Apache Spark module.
- Example(s):
- …
- Counter-Example(s):
- See: PySpark.ML, org.apache.spark.ml.classification.LogisticRegression.
References
2017
- https://www.quora.com/Why-are-there-two-ML-implementations-in-Spark-ML-and-MLlib-and-what-are-their-different-features
- QUOTE: Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features and expect more features coming. Developers should contribute new algorithms to spark.ml if they fit the ML pipeline concept well, e.g., feature extractors and transformers.
- spark.mllib contains the original API built on top of RDDs.
- spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.
- QUOTE: Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features and expect more features coming. Developers should contribute new algorithms to spark.ml if they fit the ML pipeline concept well, e.g., feature extractors and transformers.
2017
- http://spark.apache.org/docs/latest/api/python/pyspark.ml.html
- QUOTE: DataFrame-based machine learning APIs to let users quickly assemble and configure practical machine learning pipelines.
ML Pipeline APIs Transformer Estimator Model Pipeline PipelineModel pyspark.ml.param module Param Params TypeConverters pyspark.ml.feature module Binarizer BucketedRandomProjectionLSHE BucketedRandomProjectionLSHModelE Bucketizer ChiSqSelectorE ChiSqSelectorModelE CountVectorizer CountVectorizerModel DCT ElementwiseProduct HashingTF IDF IDFModel ImputerE ImputerModelE IndexToString MaxAbsScaler MaxAbsScalerModel MinHashLSHE MinHashLSHModelE MinMaxScaler MinMaxScalerModel NGram Normalizer OneHotEncoder PCA PCAModel PolynomialExpansion QuantileDiscretizerE RegexTokenizer RFormulaE RFormulaModelE SQLTransformer StandardScaler StandardScalerModel StopWordsRemover StringIndexer StringIndexerModel Tokenizer VectorAssembler VectorIndexer VectorIndexerModel VectorSlicer Word2Vec Word2VecModel pyspark.ml.classification module LinearSVCE LinearSVCModelE LogisticRegression LogisticRegressionModel LogisticRegressionSummaryE LogisticRegressionTrainingSummaryE BinaryLogisticRegressionSummary BinaryLogisticRegressionTrainingSummaryE DecisionTreeClassifier DecisionTreeClassificationModel GBTClassifier GBTClassificationModel RandomForestClassifier RandomForestClassificationModel NaiveBayes NaiveBayesModel MultilayerPerceptronClassifier MultilayerPerceptronClassificationModel OneVsRestE OneVsRestModelE pyspark.ml.clustering module BisectingKMeans BisectingKMeansModel BisectingKMeansSummaryE KMeans KMeansModel GaussianMixture GaussianMixtureModel GaussianMixtureSummaryE LDA LDAModel LocalLDAModel DistributedLDAModel pyspark.ml.linalg module Vector DenseVector SparseVector Vectors Matrix DenseMatrix SparseMatrix Matrices pyspark.ml.recommendation module ALS ALSModel pyspark.ml.regression module AFTSurvivalRegressionE AFTSurvivalRegressionModelE DecisionTreeRegressor DecisionTreeRegressionModel GBTRegressor GBTRegressionModel GeneralizedLinearRegressionE GeneralizedLinearRegressionModelE GeneralizedLinearRegressionSummaryE GeneralizedLinearRegressionTrainingSummaryE IsotonicRegression IsotonicRegressionModel LinearRegression LinearRegressionModel LinearRegressionSummaryE LinearRegressionTrainingSummaryE RandomForestRegressor RandomForestRegressionModel pyspark.ml.stat module ChiSquareTestE CorrelationE pyspark.ml.tuning module ParamGridBuilder CrossValidator CrossValidatorModel TrainValidationSplitE TrainValidationSplitModelE pyspark.ml.evaluation module Evaluator BinaryClassificationEvaluatorE RegressionEvaluatorE MulticlassClassificationEvaluatorE pyspark.ml.fpm module FPGrowthE FPGrowthModelE pyspark.ml.util module Identifiable JavaMLReadable JavaMLReader JavaMLWritable JavaMLWriter JavaPredictionModel MLReadable MLReader MLWritable MLWriter