K
KnowMBAAdvisory
Data StrategyIntermediate7 min read

Data Team Staffing Model

Data team staffing is the discipline of deciding how many of which roles you need (data engineers, analytics engineers, data scientists, ML engineers, analysts, platform engineers) and how to distribute them across a centralized, decentralized (embedded), or hub-and-spoke model. The wrong staffing mix is the most common reason data teams fail to deliver โ€” 4 data scientists with no analytics engineer produces models nobody can put into production; 6 data engineers with no analyst produces pristine warehouses nobody queries. Modern healthy ratios (per 1000 employees): 2-4 data engineers, 3-6 analytics engineers, 4-8 analysts, 1-3 ML engineers, 0-2 data scientists, 1 data platform engineer.

Also known asData Org StaffingData Team SizingData Headcount PlanningData Roles Mix

The Trap

The trap is hiring data scientists first because they sound impressive. A 200-person company hires three PhD data scientists, then discovers the data is too messy to model on, there's no production ML platform, and no one to translate model outputs into product changes. Two years and $1M later, the data scientists have produced two notebooks and have left. The other staffing trap is centralizing everything โ€” a central data team of 20 cannot serve 30 product teams responsively, but a fully decentralized model creates 30 inconsistent stacks. Hub-and-spoke with strong central platform + embedded analytics engineers is the most reliable pattern.

What to Do

Audit your current team: list every role, what they actually spend their week on, and the queue of unmet demand. Compare against the maturity-appropriate ratio: at <500 employees, prioritize analytics engineers and analysts (move data); at 500-2000, add data engineers and platform engineers (industrialize the stack); at >2000, add ML engineers and specialists (productize and scale). Don't hire data scientists until you have a production ML platform; don't hire ML engineers until you have a model worth productionizing. Sequence matters more than total headcount.

Formula

Healthy Mix per 1000 employees (mid-stage SaaS): ~2-4 DE + 3-6 AE + 4-8 Analyst + 1-3 ML Eng + 0-2 DS + 1 Platform. Total: ~12-25 data professionals per 1000 employees.

In Practice

Hypothetical: Across hundreds of data org reviews, the dominant failure mode is the same โ€” companies hire 'data scientists' before they have data engineering or analytics engineering capacity to support them. Tristan Handy (dbt Labs founder) and Erik Bernhardsson (former Spotify data) have both publicly written about this pattern. The healthy modern stack ratios โ€” popularized by analytics-engineering thought leaders โ€” emphasize analytics engineers as the largest single role at most companies under 1000 employees, not data scientists.

Pro Tips

  • 01

    Hire analytics engineers before data scientists. AEs (dbt + SQL + business sense) move the needle on every reporting and ML readiness initiative; DS without AE produces stranded models.

  • 02

    Erik Bernhardsson's post 'Building a data team at a mid-stage startup' is the single best public reference for staging hires. Read it before any data org expansion plan.

  • 03

    If you can't articulate what an additional data scientist would ship in their first 6 months, don't hire one. Vague 'we need ML capacity' is not a hiring justification โ€” it's a strategy gap.

Myth vs Reality

Myth

โ€œData scientists are the most valuable data hireโ€

Reality

At most companies under 1000 employees, the highest-leverage data hire is an analytics engineer who can build trusted models in dbt and bridge to business consumers. Data scientists are valuable when you have ML use cases, an ML platform, and AE/DE support โ€” without those, they don't ship.

Myth

โ€œCentralized data teams are more efficientโ€

Reality

Centralized teams are more efficient for platform investments and standards. They are LESS efficient at serving fast-moving product teams. The hub-and-spoke pattern (central platform + embedded analytics engineers in product squads) consistently outperforms either pure model in companies above 200 employees.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge โ€” answer the challenge or try the live scenario.

๐Ÿงช

Knowledge Check

You're advising a 250-person Series B SaaS company. They have $400K to invest in their first three data hires. What's the right sequence?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets โ€” not absolutes.

Data Team Size (per 1000 employees, mid-stage SaaS)

Excludes embedded BI analysts in business units. Mid-stage SaaS, B2B-focused.

Lean

8-12

Healthy

12-25

Heavy

25-40

Bloated

> 40

Source: Hypothetical synthesis from public data org posts (Erik Bernhardsson, Tristan Handy, Locally Optimistic community)

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

๐Ÿข

Hypothetical: Mid-Stage B2B SaaS

2024

pivot

Hypothetical: A 400-person B2B SaaS company hired 3 data scientists in 2022 before establishing data engineering or analytics engineering capacity. By 2024, all three had left for FAANG roles, having produced 5 notebooks and zero production models. The new VP Data restructured: hired 1 senior data engineer, 3 analytics engineers, 1 ML platform engineer, and 1 data scientist (focused). Within 12 months: shipped first production model, dashboard incidents dropped 60%, business-side trust in data restored.

Original Mix

3 DS, 0 AE, 0 ML Platform

Production Models Year 1

0

Restructured Mix

1 DE, 3 AE, 1 MLE, 1 DS

Production Models Year 2

3

Hiring sequence determines outcomes. Data scientists without AE/DE/MLE support produce stranded artifacts. Build the foundation first, then hire specialists.

Related concepts

Keep connecting.

The concepts that orbit this one โ€” each one sharpens the others.

Beyond the concept

Turn Data Team Staffing Model into a live operating decision.

Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.

Typical response time: 24h ยท No retainer required

Turn Data Team Staffing Model into a live operating decision.

Use Data Team Staffing Model as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.