Data Version Control (DVC) Framework
Jump to navigation
Jump to search
A Data Version Control (DVC) Framework is a ML model management framework.
References
2020b
- https://dvc.org/doc
- QUOTE: Data Version Control, or DVC, is a data and ML experiments management tool that takes advantage of the existing engineering toolset that you're already familiar with (Git, CI/CD, etc.)
2020b
- https://github.com/iterative/dvc
- QUOTE: Data Version Control or DVC is an open-source tool for data science and machine learning projects. Key features:
- Simple command line Git-like experience. Does not require installing and maintaining any databases. Does not depend on any proprietary online services.
- Management and versioning of datasets and machine learning models. Data is saved in S3, Google cloud, Azure, Alibaba cloud, SSH server, HDFS, or even local HDD RAID.
- Makes projects reproducible and shareable; helping to answer questions about how a model was built.
- Helps manage experiments with Git tags/branches and metrics tracking.
- DVC aims to replace spreadsheet and document sharing tools (such as Excel or Google Docs) which are being used frequently as both knowledge repositories and team ledgers. DVC also replaces both ad-hoc scripts to track, move, and deploy different model versions; as well as ad-hoc data file suffixes and prefixes.
- QUOTE: Data Version Control or DVC is an open-source tool for data science and machine learning projects. Key features: