MLOps 04 - Model Versioning and Registry
Git Alone Isn't Enough
In software development, version control is a solved problem. Git tracks every change to code, allows parallel development through branches, and lets you revert to any point in history. So why not just use Git for ML models?
The problem is that models are fundamentally different from code. Code is text โ you can inspect changes through diffs, and file sizes are small. Trained model files, on the other hand, are binary files ranging from hundreds of megabytes to several gigabytes, and diffs are meaningless. More importantly, code alone does not determine a model's behavior. The same code trained on different data, with different hyperparameters, or in a different environment produces an entirely different model.
Versioning a model therefore means tracking not just code, but also data, parameters, environment, and performance metrics together. If Git manages the lineage of code, model version control manages the lineage of models.
The Concept of a Model Registry
A model registry is a centralized system for storing and managing trained models. It differs from a simple file store in that it manages each model's metadata and lifecycle alongside the model itself.
The information a model registry manages includes the model itself (trained weight files), the model's metadata (training dataset, hyperparameters, performance metrics), the model's lineage (which experiment it was derived from, its relationship to previous versions), and the model's current state (whether it's in development, deployed to staging, or serving in production).
When this information is managed centrally, the entire team gains transparent visibility into which model is currently serving in production, what the next deployment candidate is, and which models were used in the past.
The Model Lifecycle
After being created, a model passes through defined stages before reaching production. Without systematic management of this process, unvalidated models can end up in production, or conversely, fully validated models can be left undeployed.
A typical model lifecycle consists of three stages.
None โ Staging โ Production โ Archived
โ โ
โโโโโโโ Rollback โโโโโโโโโ
The first stage is staging. When a model that showed promising results in experimentation is registered in the registry, it enters the staging state. Additional validation is performed at this stage โ performance evaluation against diverse test data, A/B testing, bias checks, and similar assessments.
The second stage is production. A model that has passed validation is deployed to the live service and handles inference requests. It is standard practice to maintain only one production version for a given model at any time.
The third stage is archival. When a new model is promoted to production, the previous model transitions to the archived state. It is no longer used for serving, but is preserved rather than deleted in case a rollback becomes necessary.
Approval Workflows
Automatic state transitions can be risky. Deploying a model to production simply because its performance metrics look good is not always appropriate. Factors like fairness, regulatory compliance, and business impact cannot be assessed through quantitative metrics alone.
This is where approval workflows come in. The transition from staging to production requires explicit approval from a designated reviewer. By combining automated validation (checking whether performance thresholds are met, data quality verification) with manual review (business logic confirmation, ethical review), the process ensures that production deployment follows a controlled path.
Data Versioning with DVC
Managing model versions requires managing training data versions as well. DVC (Data Version Control) is a tool that provides a Git-like interface for versioning large files and datasets.
The core idea behind DVC is to store large files themselves in remote storage (S3, GCS, etc.) while tracking only their hash values and metadata in Git. This allows code and data versions to stay synchronized. When you check out a specific Git commit, the corresponding data is restored as well, enabling complete reproduction of past experiments.
# Track a data file with DVC
dvc add data/training_dataset.csv
# Configure remote storage
dvc remote add -d storage s3://my-bucket/dvc-store
# Push data
dvc push
# Restore data at a specific version
git checkout v1.0
dvc checkout
MLflow Model Registry
In addition to experiment tracking, MLflow includes a built-in model registry. Models trained in experiments can be registered, assigned version numbers, and have their lifecycle states managed.
| Feature | Description |
|---|---|
| Model registration | Register models from experiment artifacts |
| Version management | Manage multiple versions under the same model name |
| State transitions | None โ Staging โ Production โ Archived |
| Descriptions and tags | Add descriptions, tags, and annotations to each version |
| API access | Query and load models via REST API or Python API |
The strength of MLflow Model Registry lies in its integration with experiment tracking. You can trace which experiment, with which parameters, produced the model currently serving in production โ all within a single system. When loading a model, you can specify a version number or a state (e.g., "Production"), making integration with deployment pipelines straightforward.
Without Versioning, Models Are Unmanageable
In the early stages, when the number of models is small and deployment cycles are long, the need for versioning is hard to appreciate. But as models multiply and deployment frequency increases, keeping track of which model is in which state becomes increasingly difficult. A model registry is the mechanism that keeps this complexity at a manageable level. Just as Git is essential for code, a model registry is essential for ML systems.
In the next post, we'll look at model serving and deployment strategies โ how to bring trained models into production services.