Data Version Control (DVC) Framework

From GM-RKB
(Redirected from Data Version Control)
Jump to navigation Jump to search

A Data Version Control (DVC) Framework is a ML model management framework.



References

2020b

2020b

  • https://github.com/iterative/dvc
    • QUOTE: Data Version Control or DVC is an open-source tool for data science and machine learning projects. Key features:
      • Simple command line Git-like experience. Does not require installing and maintaining any databases. Does not depend on any proprietary online services.
      • Management and versioning of datasets and machine learning models. Data is saved in S3, Google cloud, Azure, Alibaba cloud, SSH server, HDFS, or even local HDD RAID.
      • Makes projects reproducible and shareable; helping to answer questions about how a model was built.
      • Helps manage experiments with Git tags/branches and metrics tracking.
    • DVC aims to replace spreadsheet and document sharing tools (such as Excel or Google Docs) which are being used frequently as both knowledge repositories and team ledgers. DVC also replaces both ad-hoc scripts to track, move, and deploy different model versions; as well as ad-hoc data file suffixes and prefixes.