Data StrategyAdvanced7 min read

Real-Time Analytics

Real-Time Analytics is the discipline of computing and serving analytical answers within seconds (or sub-second) of the event happening — as opposed to batch analytics where data lands in a warehouse hourly or daily. The defining stack is event streams (Kafka, Kinesis, Pulsar) feeding low-latency OLAP engines (Apache Pinot, Druid, ClickHouse, StarRocks) that answer queries in milliseconds across billions of events. The use cases that justify the cost are narrow: in-app personalization, real-time fraud detection, operational dashboards for live ops (rideshare, delivery), trading, and user-facing analytics (LinkedIn 'who viewed your profile', Uber rider ETAs). Real-time costs 5-50x more than batch and adds significant complexity. The honest question is: does the business decision actually change based on a 5-second-old number vs a 5-hour-old number?

Also known asStreaming AnalyticsReal-Time DataSub-Second AnalyticsLive DashboardsOperational Analytics

Challenge a friend Browse library

The Trap

The trap is 'real-time' becoming the default architectural choice because it sounds modern. Most 'real-time dashboards' are watched by humans who refresh them once an hour at most — meaning a 1-hour batch refresh would deliver the same business value at 5% of the cost. The other trap is mistaking 'fresh data' for 'real-time analytics': a dashboard that shows yesterday's numbers updated every minute is fast-but-stale, not real-time. Conversely, the most expensive failure is committing to real-time architecture for a use case that genuinely doesn't need it, then spending three years and $5M maintaining a Kafka + Pinot + Flink stack to power dashboards that get viewed twice a week.

What to Do

Apply the 'decision latency test' before building real-time: what is the maximum staleness at which the consuming decision changes? If a fraud system needs to block a transaction in 200ms, real-time is mandatory. If an executive looks at the dashboard once a morning, batch is fine. Categorize use cases into three buckets: (1) Operational/sub-second — real-time required (fraud, personalization, live ops). (2) Near-real-time / minutes — micro-batch or CDC pipelines work. (3) Strategic / hours-to-days — batch is correct. Stop the conversation before architecture starts. Then sequence: never build a general-purpose real-time platform first. Solve one operational use case end-to-end on the simplest stack possible (often just CDC + ClickHouse), and only generalize when you have 3+ proven use cases.

Formula

Real-Time Justification ≈ (Decision Frequency per Day × Value per Decision × Latency Sensitivity) ÷ Annual Platform Cost. Latency Sensitivity > 5 (sub-minute decisions) is required; otherwise batch wins on ROI.

In Practice

LinkedIn built and open-sourced Apache Pinot specifically because their batch-warehouse architecture could not power user-facing analytics like 'Who viewed your profile' for hundreds of millions of users with sub-second latency. Pinot ingests from Kafka in real time and serves OLAP queries on billions of events in <100ms p95. Today, LinkedIn runs ~80+ user-facing analytics products on Pinot (notifications, dashboards, feed analytics). Uber adopted Pinot for similar reasons — surge pricing, ETA dashboards, and rider/driver analytics all need real-time. The decisive insight: both companies built real-time only when the analytics were embedded in the product UX (millions of consumers), not for internal dashboards. Internal use cases stayed on batch.

Pro Tips

01
User-facing real-time (analytics in the product UX, consumed by millions) almost always justifies the cost. Internal-facing real-time (dashboards consumed by 50 employees) almost never does. Categorize ruthlessly.
02
CDC (Change Data Capture) + a fast OLAP engine like ClickHouse is the 80% solution at 10% of the cost of a full Kafka + Flink + Pinot stack. Don't build streaming infrastructure unless you genuinely have streaming data sources and millisecond requirements.
03
Real-time exposes data quality problems instantly. In batch, a bad pipeline runs overnight and the data team fixes it before anyone sees the dashboard. In real-time, every consumer sees every glitch in real-time. Invest in data observability and SLAs BEFORE you ship real-time, or the first week of go-live will destroy trust.

Myth vs Reality

Myth

“Real-time analytics replaces batch analytics”

Reality

Real-time and batch are complementary, not substitutes. Real-time handles the ~5-15% of use cases with genuine latency requirements; batch handles the other 85% at a fraction of the cost. The 'lambda architecture' of running both is the norm in mature data orgs, not the exception. Companies that try to do everything in real-time spend 3-5x more for value batch could deliver.

Myth

“Streaming infrastructure is required for real-time analytics”

Reality

Many 'real-time' needs are satisfied by CDC + a sub-second OLAP database (ClickHouse, StarRocks). True streaming (Flink, Kafka Streams, materialized views over event streams) is required only when you need stateful computations on event streams — fraud detection, anomaly detection, sessionization. For 'I just need fresh data in a dashboard', CDC is dramatically simpler.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.

🧪

Knowledge Check

An e-commerce CMO wants a 'real-time revenue dashboard' so executives can 'feel the pulse of the business'. The exec team logs in once a day. Engineering quotes $2M to build a Kafka + Pinot stack. What's the right answer?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets — not absolutes.

Real-Time Analytics Latency by Use Case

Industry latency targets across LinkedIn, Uber, and DoorDash published architectures

Fraud / Transaction Blocking

<200ms p99

User-Facing Analytics in Product

<1s p95

Operational Dashboards (Live Ops)

<10s p95

Executive Dashboards (Misuse of Real-Time)

Daily refresh sufficient

Source: https://engineering.linkedin.com/blog/2019/auto-tuning-pinot

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

💼

2014-present

success

LinkedIn built and open-sourced Apache Pinot to power user-facing analytics products at sub-second latency for hundreds of millions of users. Use cases include 'Who viewed your profile', notifications, feed analytics, and member dashboards. Pinot ingests from Kafka in real time and serves OLAP queries on billions of events in <100ms p95. Today LinkedIn runs ~80+ user-facing analytics products on Pinot, with the platform handling >100K queries/second at peak. The decisive design choice was building real-time only for product UX — internal analytics stayed on batch.

User-Facing Products on Pinot

~80+

Query Latency p95

<100ms

Peak QPS

>100,000

Events Ingested per Day

Trillions

Real-time pays off when the analytics ARE the product, served to millions. Building real-time for internal dashboards is almost always the wrong call.

Source ↗

🚕

Uber

2018-present

success

Uber adopted Apache Pinot for real-time analytics powering surge pricing, restaurant manager dashboards, rider ETAs, and fraud detection. With hundreds of millions of trips and billions of events monthly, Uber needs sub-second analytics on real-time event streams. Pinot serves both customer-facing products (rider apps showing live driver positions) and internal operational dashboards for live ops teams. Uber also uses Apache Hudi for incremental data lake updates that bridge real-time and batch. The architecture demonstrates the lambda pattern done correctly — real-time only where decisions need it.

Trips per Day

~25M

Real-Time Use Cases

Surge, ETA, fraud, ops dashboards

Architecture

Pinot (real-time) + Hudi (incremental)

Decision Latency

Sub-second to seconds

Real-time analytics is justified by the decision-latency requirement of the consuming use case, not by the volume of data. Uber runs real-time where customer experience or live ops depends on it; everything else is batch.

Source ↗

📉

Hypothetical: Mid-Market B2B SaaS

2021-2023

failure

A 600-person B2B SaaS company committed $2.5M to a Kafka + Flink + ClickHouse real-time analytics platform after the CTO returned from a streaming conference. The intended use cases were 'real-time exec dashboards' and 'operational alerts'. After 18 months, the platform was live but the exec dashboards were viewed an average of 4 times/week. Operational alerts could have been satisfied by hourly batch jobs. Annual operating cost ($800K including engineering) exceeded any measurable business value. The platform was decommissioned and replaced with a daily warehouse refresh + a few PagerDuty rules. Total loss: $2.5M build + $800K operating + 18 months of opportunity cost.

Build Cost

$2.5M

Annual Operating Cost

$800K

Real Decision-Latency Use Cases

Outcome

Decommissioned after 18 months

Real-time is the most over-bought capability in modern data architecture. If you can't name the sub-minute decision that depends on it, you don't need it.

Related concepts