K
KnowMBAAdvisory
Data StrategyAdvanced8 min read

Data Marketplace Strategy

A Data Marketplace is a platform — internal or external — where data products are listed, discovered, evaluated, and provisioned with minimal friction. External examples: Snowflake Data Cloud Marketplace (3,200+ live datasets), AWS Data Exchange, Databricks Marketplace with Delta Sharing. Internal examples: Uber's Databook, Lyft's Amundsen, Netflix's Metacat. Marketplace strategy answers four questions: (1) Are we a buyer, seller, or platform operator? (2) What's our curation model — open, curated, or certified-only? (3) How do we handle data contracts and SLAs across the catalog? (4) What's the discovery + provisioning UX? Marketplaces succeed only when curation discipline exceeds product breadth — 50 high-trust datasets with named owners always outperforms 5,000 unmaintained ones.

Also known asData Exchange StrategyData Catalog MarketplaceExternal Data MarketplaceInternal Data Marketplace

The Trap

The trap most enterprises fall into is treating the marketplace as a catalog problem instead of a curation problem. They buy a tool (Collibra, Alation, Atlan), index every dataset in the warehouse, and ship a 12,000-asset catalog that nobody trusts. Without aggressive curation — certified tier, deprecation policy, named stewards — the marketplace becomes a graveyard of stale views. The other trap is open-publishing: letting any team list a dataset without quality gates produces a long tail of broken pipelines that erode marketplace trust faster than any single bad dataset would. Snowflake Marketplace's success comes from listing review, not from being permissive.

What to Do

Stand up a marketplace in three waves: (1) Curated Core (months 1-3): hand-pick 20-50 highest-value datasets, establish certified tier with on-call owners, document SLAs (freshness, quality, schema). (2) Federated Listings (months 4-9): allow domain teams to list under 'community' tier with lower trust signals; require basic metadata + ownership. (3) Self-Service Provisioning (months 10+): integrate with access controls so users can request and consume data without ticketing. Measure: % of marketplace traffic going to certified vs community, weekly active users, time-to-first-query.

Formula

Marketplace Health = (Certified Asset Trust Score × Asset Adoption Rate) ÷ (Total Listed Assets × Mean Time to Trust Decision)

In Practice

Snowflake Data Cloud Marketplace launched in 2019 with strict provider vetting and grew to 3,200+ live datasets and 2,000+ providers by 2024 (Weather Source, FactSet, Foursquare, Knoema). Their core innovation: 'data sharing' eliminates copy-and-paste data exchange — consumers query provider data in place via Snowflake's compute, no ETL. This made the marketplace the default integration point for many enterprises. Snowflake refuses to list providers who can't meet freshness/quality bars — the curation discipline is what makes the catalog usable instead of a junk drawer.

Pro Tips

  • 01

    The single most predictive marketplace metric is 'time to trust decision' — how long does a new user spend evaluating whether a dataset is reliable before using it? Under 60 seconds = great UX; over 5 minutes = catalog is failing its purpose.

  • 02

    Always have a deprecation pipeline. Datasets without recent queries (90+ days zero usage) should be flagged, owners notified, and assets archived. Marketplaces without deprecation become cluttered within 18 months.

  • 03

    Tier your trust signals visibly: 'Certified' (SLA + on-call), 'Community' (owned but uncertified), 'Experimental' (no guarantees). Color-code aggressively. Users should never have to read documentation to know which tier they're using.

Myth vs Reality

Myth

More datasets in the marketplace is better

Reality

False at scale. Empirical data from large internal marketplaces (Uber, Airbnb, LinkedIn) shows that as catalog size grows past ~1,000 assets, time-to-discovery degrades exponentially and trust collapses. The optimal mid-large enterprise catalog is 200-500 curated assets, not 10,000 indexed ones. Breadth without curation destroys utility.

Myth

External and internal marketplaces are different beasts

Reality

Largely false. The economic and trust mechanics are identical: producers list, consumers discover, platform curates, trust signals drive adoption. The internal version skips money but still has reputation and political 'currency.' Treating internal data marketplaces with marketplace economics rigor — not as IT catalogs — improves outcomes substantially.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.

🧪

Knowledge Check

Your enterprise marketplace has 4,800 listed datasets after 18 months. Adoption is flat at 12% weekly active users. What's the highest-leverage intervention?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets — not absolutes.

Internal Marketplace Adoption (% of data users active monthly)

Internal data marketplaces (Atlan, Alation, Collibra, in-house)

Elite (Airbnb-level)

> 70%

Strong

40-70%

Average

20-40%

Underutilized

10-20%

Failed Catalog

< 10%

Source: Atlan State of Data Discovery 2024 / DataKitchen Benchmarks

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

❄️

Snowflake Data Cloud Marketplace

2019-Present

success

Snowflake launched Marketplace in 2019 with a deliberate curation-first approach: providers must meet listing standards (freshness, schema documentation, contact SLA). Combined with Snowflake's data sharing technology — consumers query provider data in place without copy/paste ETL — the marketplace became the default external data integration for many Snowflake enterprises. Grew to 3,200+ datasets, 2,000+ providers by 2024. Crucially, Snowflake declined to compete on listing breadth with AWS Data Exchange; they competed on consumption UX (zero-ETL) and curation, which won the high-value enterprise segment.

Live Datasets (2024)

3,200+

Active Providers

2,000+

Year Launched

2019

Differentiator

Zero-ETL data sharing

Marketplace differentiation comes from consumption UX and curation, not from listing volume. Snowflake bet that quality and frictionless access would beat AWS Data Exchange on breadth — and won the enterprise segment.

Source ↗
🧱

Databricks Marketplace + Delta Sharing

2022-Present

success

Databricks launched its Marketplace in 2022 built on Delta Sharing — an open protocol for sharing data across platforms (not just Databricks-to-Databricks). This addressed the lock-in concern that limited Snowflake Marketplace adoption among multi-cloud enterprises. Combined with marketplace listings for AI models and notebooks (not just datasets), Databricks differentiated by widening the SKU. By 2024, Marketplace listings included models from Hugging Face partners, dbt models, and analytics templates, making it a general-purpose AI/data exchange instead of a pure data catalog.

Year Launched

2022

Underlying Protocol

Delta Sharing (open)

Listing Types

Datasets, AI models, notebooks

Cross-Platform

Yes (multi-cloud)

Open protocols (Delta Sharing) addressed the lock-in objection that constrained Snowflake's reach. Widening listing types beyond raw data positioned Databricks as the AI exchange, not just a data exchange.

Source ↗

Decision scenario

The Internal Marketplace Re-Launch

You inherited an internal data catalog with 6,200 indexed datasets, 14% monthly active users, and a Slack channel full of 'is this dataset reliable?' questions. Your CTO wants you to 're-launch' the marketplace in 6 months with a clear adoption goal.

Indexed Datasets

6,200

Monthly Active Users

14%

Trust Complaints (Slack/quarter)

~120

Certified Datasets

0 (no tier exists)

01

Decision 1

You can either expand catalog (add 2,000 more datasets from new sources) or contract aggressively (deprecate 4,000+ stale assets, certify 100, rebuild trust UX).

Expand: more datasets means more users will find what they need. Add the 2,000 new datasets and improve search.Reveal
After 6 months: 8,200 datasets indexed, MAU drops to 11% as discovery degrades further. Slack complaints rise to ~180/quarter. Users go back to asking analysts directly. The CTO concludes 'the marketplace tool doesn't work' and you're asked to evaluate replacement vendors. The real problem (trust, not breadth) was never addressed.
Indexed Datasets: 6,200 → 8,200MAU: 14% → 11%Trust Complaints: 120 → 180
Contract: deprecate 4,000 stale assets, certify the top 100 most-used with named owners and SLAs, rebuild UX so default search shows certified-only.Reveal
After 6 months: 2,200 visible datasets (100 certified + 2,100 community), MAU jumps to 38%. Slack complaints drop to ~30/quarter — and most are about wanting MORE certifications, not about reliability. Time-to-trust-decision drops from ~6 minutes to ~45 seconds. The CTO uses the marketplace as a board-presentation example. You earn budget for a Phase 2 to certify another 200 assets.
Visible Datasets: 6,200 → 2,200Certified Datasets: 0 → 100MAU: 14% → 38%Trust Complaints: 120 → 30

Related concepts

Keep connecting.

The concepts that orbit this one — each one sharpens the others.

Beyond the concept

Turn Data Marketplace Strategy into a live operating decision.

Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.

Typical response time: 24h · No retainer required

Turn Data Marketplace Strategy into a live operating decision.

Use Data Marketplace Strategy as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.