TensorFlow Serving Framework
A TensorFlow Serving Framework is an ML model serving framework associated with TensorFlow.
- AKA: tensorflow_serving.
- Context:
- It can make use of a TensorFlow SavedModelBuilder module to export the model (by first saving a “snapshot” of the trained model to reliable storage so that it can be loaded later for inference).
- Example(s):
- TensorFlow Serving v0.6 (2017-06-20) for TensorFlow v1.2 [1]
- It can be integrated with TensorFlow Serving via Docker [2]
- Counter-Example(s):
- See: Docker.
References
2017
- https://www.tensorflow.org/deploy/tfserve
- QUOTE: TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs.
2017b
- https://tensorflow.github.io/serving/serving_basic
- QUOTE: ... your [mnist_model] TensorFlow model is exported and ready to be loaded!
Load Exported Model With Standard TensorFlow Model Server
- QUOTE: ... your [mnist_model] TensorFlow model is exported and ready to be loaded!
$>bazel build //tensorflow_serving/model_servers:tensorflow_model_server $>bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9000 --model_name=mnist --model_base_path=/tmp/mnist_model/
2017c
- https://github.com/tensorflow/serving
- QUOTE: TensorFlow Serving is an open-source software library for serving machine learning models. It deals with the inference aspect of machine learning, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table.
Multiple models, or indeed multiple versions of the same model, can be served simultaneously. This flexibility facilitates canarying new versions, non-atomically migrating clients to new models or versions, and A/B testing experimental models.
The primary use-case is high-performance production serving, but the same serving infrastructure can also be used in bulk-processing (e.g. map-reduce) jobs to pre-compute inference results or analyze model performance. In both scenarios, GPUs can substantially increase inference throughput. TensorFlow Serving comes with a scheduler that groups individual inference requests into batches for joint execution on a GPU, with configurable latency controls.
- QUOTE: TensorFlow Serving is an open-source software library for serving machine learning models. It deals with the inference aspect of machine learning, taking models after training and managing their lifetimes, providing clients with versioned access via a high-performance, reference-counted lookup table.
2017d
- (Crankshaw et al., 2017) ⇒ Daniel Crankshaw, Xin Wang, Giulio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. (2017). “Clipper: A Low-latency Online Prediction Serving System.” In: Proceedings of the 14th USENIX Conference on Networked Systems Design and Implementation. ISBN:978-1-931971-37-9
- QUOTE: ... we compare Clipper to the Tensorflow Serving system and demonstrate that we are able to achieve comparable throughput and latency while enabling model composition and online learning to improve accuracy and render more robust predictions. ...