Distributed Machine Learning System
Jump to navigation
Jump to search
A Distributed Machine Learning System is Distributed Computing System for implementing and developing machine learning algorithms.
- Context:
- It can range from being a standard Distributed ML System, to being a Distributed Reinforcement Learning System, to being a Distributed Deep Learning System.
- Example(s):
- Counter-Example(s):
- See: Distributed Algorithms, Resilient Distributed Datasets (RDDS), Distributed Graph System, Iterative Machine Learning System.
References
2018
- (Nishihara & Moritz, 2018) ⇒ Robert Nishihara, and Philipp Moritz (Jan 9, 2018). "Ray: A Distributed System for AI" Retrieved on 2019-04-14
- QUOTE: One of Ray’s goals is to enable practitioners to turn a prototype algorithm that runs on a laptop into a high-performance distributed application that runs efficiently on a cluster (or on a single multi-core machine) with relatively few additional lines of code. Such a framework should include the performance benefits of a hand-optimized system without requiring the user to reason about scheduling, data transfers, and machine failures.
(...) There are two main ways of using Ray: through its lower-level APIs and higher-level libraries. The higher-level libraries are built on top of the lower-level APIs. Currently these include Ray RLlib, a scalable reinforcement learning library and Ray.tune, an efficient distributed hyperparameter search library.
- QUOTE: One of Ray’s goals is to enable practitioners to turn a prototype algorithm that runs on a laptop into a high-performance distributed application that runs efficiently on a cluster (or on a single multi-core machine) with relatively few additional lines of code. Such a framework should include the performance benefits of a hand-optimized system without requiring the user to reason about scheduling, data transfers, and machine failures.
2016
- (Meng et al., 2016) ⇒ Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. (2016). “MLlib: Machine Learning in Apache Spark.” In: The Journal of Machine Learning Research, 17. ISBN:1938-7228 arXiv:1505.06807
- QUOTE: In this work we present MLlib, Spark’s distributed machine learning library, and the largest such library. The library targets large-scale learning settings that benefit from data-parallelism or model-parallelism to store and operate on data or models. MLlib consists of fast and scalable implementations of standard learning algorithms for common learning settings including classification, regression, collaborative filtering, clustering, and dimensionality reduction. It also provides a variety of underlying statistics, linear algebra, and optimization primitives. Written in Scala and using native (C++ based) linear algebra libraries on each node, MLlib includes Java, Scala, and Python APIs, and is released as part of the Spark project under the Apache 2.0 license.
2014
- (Agarwal et al., 2014) ⇒ Alekh Agarwal, Olivier Chapelle, Miroslav Dudík, and John Langford. (2014). “A Reliable Effective Terascale Linear Learning System.” In: The Journal of Machine Learning Research, 15(1).
- QUOTE: Perhaps the simplest strategy when the number of examples n is too large for a given learning algorithm is to reduce the data set size by subsampling. However, this strategy only works if the problem is simple enough or the number of parameters is very small. The setting of interest here is when a large number of examples is really needed to learn a good model. Distributed algorithms are a natural choice for such scenarios.