SparkR Package
Jump to navigation
Jump to search
A SparkR Package is an R package that ...
- See: Spark MLlib.
References
2017
- http://spark.apache.org/docs/latest/sparkr.html
- SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. In Spark 2.1.0, SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets. SparkR also supports distributed machine learning using MLlib.
2016
- https://spark-summit.org/east-2016/events/generalized-linear-models-in-spark-mllib-and-sparkr/
- QUOTE: Generalized linear models (GLMs) unify various statistical models such as linear regression and logistic regression through the specification of a model family and link function. They are widely used in modeling, inference, and prediction with applications in numerous fields. In this talk, we will summarize recent community efforts in supporting GLMs in Spark MLlib and SparkR. We will review supported model families, link functions, and regularization types, as well as their use cases, e.g., logistic regression for classification and log-linear model for survival analysis. Then we discuss the choices of solvers and their pros and cons given training datasets of different sizes, and implementation details in order to match R’s model output and summary statistics. We will also demonstrate the APIs in MLlib and SparkR, including R model formula support, which make building linear models a simple task in Spark.