Same Feature, Different Results?

Can you be confident that the features used during model training are identical to those used during serving? Most teams cannot answer this question with certainty. It's common for training code to compute features in pandas while serving code reimplements the same features in Java. When the two implementations diverge in subtle ways, a model that showed 95% accuracy during training exhibits unexpected performance degradation in production.

This problem is called training-serving skew. The root cause is implementing features twice, and the feature store is the infrastructure component that emerged to solve it.

The Core Idea Behind Feature Stores

The central concept of a feature store is managing feature definitions in a single place and ensuring that both training and serving retrieve features from the same source. A feature is defined once and delivered in batch for training or in real time for serving, but the computation logic itself remains identical.

To achieve this, a feature store operates two storage layers. The offline store holds large volumes of historical data for training, typically backed by a data warehouse or file system. The online store provides features with low latency for serving, using key-value stores like Redis or DynamoDB. The feature definition is singular, but the access paths are separated to suit training and serving needs.

Feature Definition (single source)
    โ”‚
    โ”œโ”€โ”€ Offline Store โ”€โ”€โ†’ Training Pipeline (batch)
    โ”‚   (Parquet, BigQuery)
    โ”‚
    โ””โ”€โ”€ Online Store โ”€โ”€โ†’ Serving API (real-time)
        (Redis, DynamoDB)

Point-in-Time Correctness

What distinguishes a feature store from a simple cache is its guarantee of point-in-time correctness. Why does this matter so much?

Consider training a loan approval model. When a loan application came in during March 2024, the model should use the customer's credit score as it was at that point in time. Using a credit score updated in June 2024 would introduce future information into training โ€” a phenomenon known as data leakage. A model trained this way looks great in experiments but fails in production.

A feature store manages timestamps for each feature, ensuring that when generating training data, only features that were actually available at that point in time are joined. Without this capability, constructing a correct training dataset becomes a highly complex engineering task on its own.

Feature Reuse Across Teams

Another significant value of feature stores is enabling feature reuse within an organization. If the recommendation team's user activity features can be directly consumed by the fraud detection team, duplicate work is eliminated and feature quality improves. When each team independently implements the same feature, subtle differences inevitably creep in, and verifying which implementation is correct becomes difficult.

A feature store serves as a central registry, making it possible to track which features exist, who created them, and which models consume them. This goes beyond simply reducing code duplication โ€” it enables organizational governance over features.

Key Tools

The widely used feature store tools each bring different strengths to the table.

ToolStrengthsBest For
FeastOpen-source, lightweight, multiple backend supportTeams running their own infrastructure
TectonManaged service, strong real-time feature transformsOrganizations where real-time ML is critical
HopsworksOpen-source, integrated ML platformTeams needing an end-to-end platform
Vertex AI Feature StoreGCP-native, BigQuery integrationGCP-based infrastructure
SageMaker Feature StoreAWS-native, S3/Glue integrationAWS-based infrastructure

Feast is the most widely adopted tool, having started as an open-source project. It allows feature definitions to be written in Python code and abstracts offline and online stores behind a unified interface. For teams that need a managed service, Tecton or cloud-native solutions reduce operational overhead.

Where Feature Stores Fit in the MLOps Pipeline

A feature store sits between the data pipeline and model training/serving. Raw data is processed through the data pipeline, and the results are registered in the feature store. The training pipeline queries historical features from the offline store, while the serving infrastructure queries real-time features from the online store.

Raw Data โ†’ Data Pipeline โ†’ Feature Store โ†’ Training/Serving
                                โ”‚
                          Feature Registry
                       (metadata, versions, owners)

In the early stages without a feature store, feature computation logic is scattered across training and serving code. This is manageable when the number of models is small, but once dozens of models share hundreds of features, maintaining consistency without centralized feature management becomes impractical. The right time to introduce a feature store is when production issues caused by training-serving skew begin to surface, or when feature duplication across teams becomes apparent.

In the next post, we'll look at GPU infrastructure and scaling.