Data Product Discovery
Data Product Discovery is the structured process of finding, validating, and prioritizing the data assets your organization (or the market) will pay for or rely on as products. It treats datasets, dashboards, models, and APIs the way PMs treat software: who is the user, what job are they hiring it for, what willingness-to-pay (or willingness-to-rely) exists, and what's the smallest version that proves it. Discovery starts before pipelines are built — interviews, log mining of existing reports, and shadowing analysts uncover the 5-10 'evergreen questions' that get re-asked weekly. Those questions become candidate data products. Without discovery, data teams build 200 dashboards that nobody opens; with it, they build 12 that drive decisions.
The Trap
The trap is letting engineers pick what to build because 'the data is already there.' Convenience-driven roadmaps produce technically clean datasets that solve nobody's problem. The other trap is over-indexing on executive requests — execs ask for what they think they want (a unified KPI dashboard) when the actual blocker is a clean customer ID. Discovery requires saying no to 80% of requests, which feels political. Teams without an explicit prioritization rubric default to loudest-voice-wins, which is how you end up rebuilding the same revenue dashboard four times for four VPs.
What to Do
Run a 4-week discovery sprint before any new data platform investment: (1) Interview 15-20 downstream users — analysts, ops managers, sales — about decisions they make weekly and where data fails them. (2) Log-mine your BI tool: which dashboards have >50 weekly views and which have zero? (3) Score candidates on a 2x2: business value (revenue impact, decision frequency) vs feasibility (data exists, quality acceptable). (4) Pick 3-5 to build as v1 data products with named owners, SLAs, and explicit consumers. Kill everything else from the backlog publicly.
Formula
In Practice
Airbnb's data team in 2017 ran a discovery exercise that revealed 80% of their 500+ internal dashboards had under 5 weekly users, while 10 dashboards drove 70% of decisions. They retired 400+ dashboards, formalized the top 10 as 'certified data products' with on-call ownership, and built the Dataportal tool to make discovery searchable. Decision velocity measurably improved and on-call data incidents dropped 50%, because the team stopped supporting noise.
Pro Tips
- 01
The 'evergreen question' test: if the same question gets asked in Slack 3+ times by different people in 30 days, it's a candidate data product. Search your Slack for 'can someone pull' to find them.
- 02
Always interview the analysts who manually answer recurring questions — they know exactly which question patterns recur and which datasets are unreliable. They are your richest discovery source, richer than executives.
- 03
Discovery is never 'done.' Run a lightweight discovery cycle every quarter. Business priorities change faster than data platforms, and last quarter's evergreen question may now be stale.
Myth vs Reality
Myth
“Discovery is just gathering requirements from stakeholders”
Reality
Requirements gathering captures stated needs. Discovery captures revealed needs — what people actually do, what they Slack each other for at 11pm, what manual workarounds they've built. Stated and revealed needs diverge wildly. Discovery without observing actual workflows produces feature lists, not products.
Myth
“If we build it, they will come”
Reality
False, expensively. Studies of internal data platforms show 60-70% of built datasets have under 10 monthly active users. Adoption requires distribution: embedding in workflows, training, evangelism, deprecation of competing sources. Build-only strategies waste 40%+ of data engineering capacity.
Try it
Run the numbers.
Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.
Knowledge Check
Your data team has capacity to build 5 new data products this quarter. You have 47 stakeholder requests. The CFO has personally requested 8 of them. What's the best prioritization approach?
Industry benchmarks
Is your number good?
Calibrate against real-world tiers. Use these ranges as targets — not absolutes.
Data Product Adoption (60 days post-launch)
Internal data products at mid-to-large enterprisesElite
> 70% of target users active weekly
Good
40-70%
Average
20-40%
Poor
5-20%
Failed Discovery
< 5%
Source: ThoughtSpot State of Analytics 2024 / DataKitchen Data Ops Benchmarks
Real-world cases
Companies that lived this.
Verified narratives with the numbers that prove (or break) the concept.
Airbnb (Dataportal)
2017-2019
Airbnb's data platform team discovered that out of 500+ active dashboards, only 10 drove 70% of decisions. They built Dataportal — an internal data discovery tool — to surface high-trust assets and deprecate low-value ones. The exercise was as much about killing data products as launching them. Top dashboards were certified, given on-call owners, and indexed for search. Adoption of the certified set jumped, while ad-hoc data requests fell.
Dashboards Audited
500+
Dashboards Driving 70% of Decisions
10
Dashboards Retired
400+
On-call Data Incidents
Down ~50%
Discovery is as much about deprecation as creation. The most underrated data product roadmap action is killing things nobody uses but everyone is afraid to delete.
Hypothetical: Mid-Market Insurance Carrier
2024
A regional insurance carrier hired a 12-person data team and built 80 dashboards in year one. A discovery audit revealed 62 had fewer than 5 weekly users. The root cause: requirements were gathered from VP-level executives in roadmap meetings, but the actual users — claims adjusters and underwriters — were never interviewed. After a discovery reset focused on adjuster workflows, the team built 6 high-impact tools (adjuster workload balancer, fraud signal alerts) that hit 80%+ adoption. The other 62 were retired.
Dashboards Built (Year 1)
80
With < 5 Weekly Users
62 (78%)
Post-Discovery Tools Built
6
Adoption Rate (New Tools)
80%+
Executives request ideas; users live the workflows. Discovery that skips frontline users produces dashboards optimized for steering committees, not for the people whose decisions actually create value.
Decision scenario
The Quarterly Roadmap Pitch
You're the Head of Data at a 600-person fintech. Backlog has 53 dashboard/dataset requests. The CRO wants 'a unified pipeline view' (3 weeks of work). The CFO wants 'real-time cash position' (6 weeks). Frontline ops are quietly asking for a 'why did this transaction fail' tool. Your team can ship 4 things this quarter.
Backlog Size
53 requests
Quarterly Capacity
4 data products
Active Stakeholders
12 VPs + 200 ops users
Last Quarter's Adoption
31% avg (poor)
Decision 1
You can either build what the C-suite asked for (politically safe) or run a 2-week discovery sprint before committing. Discovery costs 25% of the quarter's capacity but reframes the roadmap.
Skip discovery, ship the CRO + CFO requests + 2 more from the backlog. Move fast.Reveal
Run a 2-week discovery sprint: shadow the ops team, log-mine the BI tool, interview the people who'd actually use the CRO/CFO requests. Re-prioritize publicly.✓ OptimalReveal
Related concepts
Keep connecting.
The concepts that orbit this one — each one sharpens the others.
Beyond the concept
Turn Data Product Discovery into a live operating decision.
Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.
Typical response time: 24h · No retainer required
Turn Data Product Discovery into a live operating decision.
Use Data Product Discovery as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.