Machine Learning (ML) Pipeline
(Redirected from ML pipeline)
Jump to navigation
Jump to search
A Machine Learning (ML) Pipeline is a data processing pipeline that supports an ML-based application.
- Context:
- It can (typically) include ML data preparation jobs, ML training jobs, ML inference jobs, and/or online ML model evaluation jobs.
- It can (typically) be the result of Machine Learning Development Task.
- It can (often) follow a Machine Learning-based System Development Process Model.
- It can (often) include ML Pipeline Monitoring and ML Pipeline Alerting.
- It can range from being a Batch-based ML Pipeline to being a Real-Time ML Pipeline.
- It can range from being a Non-Scalable ML Pipeline to being a Scalable ML Pipeline.
- It can be produced within an ML Platform.
- …
- Example(s):
- the one used at T-Mobile to Predict Customer Churn.
- the one used at PlayStation to Predict Personalized Game-Play Relevance.
- …
- Counter-Example(s):
- See: ETL Pipeline, ML Development Process Model, ML Feature Creation System, Training Data Creation, ML Model Training, ML Model Deployment.
References
- https://google.com/search?q="a machine+learning+pipeline
2019c
- Semi Koen. (2019c). “Architecting a Machine Learning Pipeline." In: Medium
- QUOTE: Architecting a ML Pipeline: Traditionally, pipelines involve overnight batch processing, i.e. collecting data, sending it through an enterprise message bus and processing it to provide pre-calculated results and guidance for next day’s operations. Whilst this works in some industries, it is really insufficient in others, and especially when it comes to ML applications.
The following diagram shows a ML pipeline applied to a real-time business problem where features and predictions are time sensitive (e.g. Netflix’s recommendation engines, Uber’s arrival time estimation, LinkedIn’s connections suggestions, Airbnb’s search engines etc).
- QUOTE: Architecting a ML Pipeline: Traditionally, pipelines involve overnight batch processing, i.e. collecting data, sending it through an enterprise message bus and processing it to provide pre-calculated results and guidance for next day’s operations. Whilst this works in some industries, it is really insufficient in others, and especially when it comes to ML applications.
2019b
- https://towardsdatascience.com/being-a-data-scientist-does-not-make-you-a-software-engineer-c64081526372?sk=fd1e5ace8c5bfdaa6e1b1ace201dbff1
- QUOTE: The main objectives are to build a system that:
- Reduces latency;
- Is integrated but loosely coupled with the other parts of the system, e.g. data stores, reporting, graphical user interface;
- Can scale both horizontally and vertically;
- Is message driven i.e. the system communicates via asynchronous, non-blocking message passing;
- Provides efficient computation with regards to workload management;
- Is fault-tolerant and self healing i.e. breakdown management;
- Supports batch and real-time processing.
- QUOTE: The main objectives are to build a system that:
2018
- "Building a Reproducible Machine Learning Pipeline." In: arXiv
- QUOTE: ... Many open-source tools exist for the individual tasks in a machine learning pipeline. For example, Git or Subversion for version control of software code, Scikit-Learn or MLlib for building models, and Docker for containerization. ...
2017
- https://conferences.oreilly.com/strata/strata-eu-2017/public/schedule/detail/60680
- QUOTE: ... explore use cases from the BMW Group where novel machine-learning pipelines (such as those based on XGBoost and convolutional neural nets, for example) support a broad variety of business stakeholders. ...
2016
- https://www.researchgate.net/publication/314940802_Automating_Biomedical_Data_Science_Through_Tree-Based_Pipeline_Optimization
- QUOTE: ... AutoML methods typically automate one or more steps in the creation of useful machine learning pipelines, such as the selection of preprocessing or learning ...