Data StrategyAdvanced8 min read

Real-Time Feature Engineering

Real-time feature engineering is the practice of computing ML model inputs (features) on fresh data within milliseconds of an event happening — so a fraud model can use 'transactions in last 60 seconds' rather than 'transactions yesterday.' It requires a feature store that serves features both for training (offline, historical) and inference (online, low-latency) from the same definitions. Tecton, Feast, and Hopsworks are the dominant feature platforms. The hard problem is training-serving skew: if your batch pipeline computes a feature one way and your streaming pipeline computes it differently, your model's online predictions degrade silently. Real-time features only matter for use cases where freshness drives a measurable business outcome — fraud, fraud, ad targeting, dynamic pricing, recommendations on session data.

Also known asOnline FeaturesStreaming FeaturesReal-Time ML FeaturesLow-Latency Features

Challenge a friend Browse library

The Trap

The trap is real-time-everything syndrome: teams build streaming feature pipelines for use cases where a 24-hour-old feature would be 99% as accurate. A churn model retrained nightly does not need real-time features. A demand forecast updated hourly does not need Kafka. Real-time infrastructure costs 5-10x more to build and operate than batch — separate online and offline stores, point-in-time correctness, lower latency SLOs, more on-call burden. Most teams building 'real-time ML' platforms could ship the same business value with batch features computed every hour. The other trap: using real-time features for training but stale features in production, causing silent model degradation.

What to Do

Before building real-time features, do the freshness audit: for each candidate feature, measure the lift from 1-minute freshness vs 1-hour vs 1-day. If lift below 1 hour is small, use batch. For genuine real-time use cases, adopt a feature store (Tecton/Feast) so the same feature definition produces identical values online and offline. Enforce point-in-time correctness in training data — never use a feature value that wouldn't have existed at prediction time. Monitor training-serving skew on every feature in production with daily distribution comparison.

Formula

Feature Freshness SLO = Time(event occurred) → Time(feature available for prediction); Online-Offline Skew = abs(online_feature_value − offline_feature_value) / offline_feature_value

In Practice

Tecton was founded in 2019 by Uber's Michelangelo team to commercialize the feature store pattern Uber pioneered. Uber's original problem: their fraud and ETA models needed features like 'driver acceptance rate in last 5 minutes' computed from streaming events, but data scientists were rebuilding the same features in three different ways for batch training, online serving, and backfills. Tecton's pitch: define a feature once, get consistent online + offline values automatically. By 2023, customers like Cash App, HelloFresh, and Atlassian were running 1000+ production features with millisecond serving latency.

Pro Tips

01
Default to batch features. Only escalate to real-time when you can measure the AUC or revenue lift from freshness. Feast's creator Willem Pienaar publicly notes most users never need streaming features.
02
Use the same DSL for batch and stream. Tecton's transformation language compiles to Spark for batch and Flink/Spark Streaming for online — one definition, two execution engines, zero skew.
03
Set hard freshness SLOs per feature, not per system. 'Driver location' might need 30-second freshness; 'driver lifetime trips' is fine at 24 hours. Bundle high-freshness features into the streaming tier and leave the rest in batch.

Myth vs Reality

Myth

“Real-time features always improve model accuracy”

Reality

For most models, accuracy plateaus once features are within a few hours fresh. A Lyft analysis of pricing models found that moving from 1-hour to 1-minute features improved AUC by less than 0.5% on most use cases — not worth the 8x infrastructure cost. Test the lift before assuming it's there.

Myth

“A feature store is just a low-latency database”

Reality

The hard part isn't serving features fast — Redis can do that. The hard part is point-in-time correctness for training data, deduplication of streaming events, and matching online/offline transformations exactly. A naive 'feature cache' will produce models that look great in training and degrade silently in production.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.

🧪

Knowledge Check

Your team wants to build a real-time feature pipeline for a customer churn model that retrains weekly. The product team is excited about 'real-time AI.' What should you do?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets — not absolutes.

Online Feature Serving Latency (p99)

Real-time inference (fraud, ads, search ranking)

Elite

< 10ms

Good

10-50ms

Acceptable

50-200ms

Slow

> 200ms

Source: Tecton 2023 Feature Store Benchmarks

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

💳

Tecton (with Cash App)

2021-2023

success

Cash App moved fraud detection from batch features to a Tecton-powered real-time feature pipeline. Previously, features like 'unique merchants in last 24 hours' were computed nightly and served stale by morning. With Tecton, the same feature was computed in streaming windows and served at p99 sub-50ms. Fraud catch rates improved meaningfully on velocity-based attacks where the attacker rapidly issues transactions before nightly detection runs.

Feature Serving Latency p99

< 50ms

Production Features

1000+

Training-Serving Skew

< 0.1%

For fraud and other adversarial use cases, attackers exploit the gap between detection cycles. Real-time features close that gap and produce real ROI. The discipline of one-definition-many-runtimes prevented the silent skew that historically killed Cash App's previous attempts.

Source ↗

📈

Hypothetical: Mid-Market SaaS Co

2024

failure

Hypothetical: A 200-person SaaS company invested 9 months building a Kafka + Flink + online store stack to power 'real-time' features for their churn model. After deployment, they measured the lift: AUC improved from 0.81 to 0.815. The model retrained weekly and predictions fed a marketing email queue that ran daily. The streaming infrastructure cost $14K/month vs $1.5K for the batch alternative. Three quarters in, leadership shut down the streaming stack and reverted to nightly batch features.

Engineering Time Spent

9 months × 4 engineers

AUC Lift Measured

0.005

Infra Cost Burned

~$140K

Hypothetical, but archetypal. Always measure the lift before committing to streaming infrastructure. A model that consumes its predictions in batch cannot benefit from real-time features.

Decision scenario

The Real-Time Feature Investment

You lead the ML platform team. Three product teams are asking for real-time features. Each claims they need sub-second freshness. You have budget for one streaming pipeline this quarter ($60K + 2 engineers).

Quarterly Budget

$60K

Engineers Available

Requesting Teams

3 (Fraud, Recs, Churn)

Decision 1

Fraud team: catches $3M/month of attempted fraud at 88% rate, wants 'transactions in last 60 seconds' to push to 92%. Recs team: same-session recommender, currently uses 1-hour stale features, conversion is 4.2% and they hope real-time would push to 4.5%. Churn team: weekly batch model that fires retention emails on Mondays, wants 'login activity in last hour' as a feature.

Greenlight all three — democratize real-time features and let the teams figure out adoptionReveal

You spread two engineers across three pipelines. None ship cleanly. Six months in, fraud is 70% built, recs is paused, churn delivered but is providing zero lift because the weekly retraining cycle can't use it. Leadership questions the entire ML platform investment.

Pipelines Shipped: 0 of 3Platform Credibility: Damaged

Fund only fraud — measurable $120K/month upside (4 pts × $3M); push back on churn (weekly model can't use freshness); pilot recs with batch features at varying staleness to measure lift before committingReveal

Fraud team ships in 4 months and immediately captures $120K/month. The recs lift study reveals that 1-hour vs 1-minute features lift conversion by only 0.05% — not worth the build. Churn team adds an hourly batch feature instead and gets the same outcome at 5% of the cost. The platform's reputation for rigor grows.

Monthly Value Captured: $0 → $120KWasted Streaming Builds: 0

Related concepts