K
KnowMBAAdvisory
AutomationAdvanced9 min read

Workflow Design Patterns

Workflow Design Patterns are the reusable architectural blueprints for how automated work flows through systems and humans. The canonical patterns include: (1) Sequential โ€” step A then B then C; (2) Parallel/Fan-Out โ€” split into N parallel branches and aggregate (Fan-In); (3) Saga โ€” long-running transaction with compensating undo steps; (4) State Machine โ€” explicit states with allowed transitions; (5) Event-Driven โ€” react to events rather than poll; (6) Human-in-the-Loop โ€” pause for human decision and resume; (7) Retry-with-Backoff โ€” handle transient failure deterministically; (8) Circuit Breaker โ€” stop calling a failing dependency. Senior automation engineers think in patterns the way senior software engineers think in design patterns โ€” naming the pattern is half the design conversation.

Also known asAutomation PatternsProcess PatternsOrchestration PatternsWorkflow Archetypes

The Trap

The trap is treating every workflow as a sequential script. Junior automation builders default to 'do step 1, then step 2, then step 3' and end up with brittle linear flows that break the moment any step takes longer than expected, fails transiently, or needs human input. Real enterprise processes have parallelism, compensation, exceptions, and waiting โ€” and a sequential script can't model those without becoming an unreadable mess of nested if-blocks. The other trap: over-engineering simple flows into Sagas and State Machines when a 5-step sequence would do. Pattern selection is itself a discipline.

What to Do

Make pattern selection an explicit step in design. Before building, answer: (1) Are any steps independent and safely parallel? (2) Can any step fail in a way that requires undoing prior steps? (3) Does the flow have a small set of named states or is it free-form? (4) Are we polling or reacting to events? (5) Where are the human decision points? Write the chosen pattern at the top of the workflow definition. In code review, reject 'sequential script' as the default answer โ€” it should be a deliberate choice. Maintain a small internal pattern catalog with reference implementations.

Formula

Pattern Fit Score = (% of workflows mapped to a named pattern) ร— (1 โˆ’ % of workflows with > 3 special-case branches)

In Practice

Temporal (workflow orchestration platform used by Uber, Snap, Stripe, and Netflix) built its product around making patterns first-class: their SDK has explicit primitives for Sagas, parallel execution, signals (for human-in-the-loop), and deterministic retries. The bet โ€” validated by adoption โ€” is that production workflows fail because engineers reimplement patterns badly using ad-hoc cron jobs and queue plumbing. Customers report 60-80% reduction in workflow incident volume after migrating from hand-rolled orchestration to pattern-based platforms.

Pro Tips

  • 01

    The Saga pattern is the most underused in enterprise automation. Any process that touches 3+ systems with side effects (e.g., reserve inventory, charge card, ship order) needs explicit compensation logic. Without it, partial failures leave the business in a corrupted state.

  • 02

    Every human-in-the-loop step needs an SLA and a timeout escalation. The most common production failure is a workflow paused on a human approver who left the company. Without escalation, work disappears into a void.

  • 03

    When you find yourself nesting 4+ if-statements inside a sequential workflow, stop. You've outgrown the sequential pattern โ€” convert to a state machine or split into sub-workflows.

Myth vs Reality

Myth

โ€œWorkflow patterns are an academic concernโ€

Reality

Patterns are operational concerns. The most expensive production incidents in automation programs come from missing patterns: no compensation (data corruption), no circuit breaker (cascading failure), no idempotency (duplicate side effects). Naming the pattern is the cheapest insurance.

Myth

โ€œModern platforms make patterns unnecessaryโ€

Reality

Modern platforms make patterns easier to implement, not less necessary. Make.com and Zapier let you build parallel branches and retry logic โ€” but you still have to know to use them. The platform doesn't choose the pattern for you.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge โ€” answer the challenge or try the live scenario.

๐Ÿงช

Knowledge Check

Your order-processing workflow does: (1) reserve inventory, (2) charge customer card, (3) generate shipping label. Step 3 fails 5% of the time due to carrier API outages. Currently, when step 3 fails, the customer is charged and inventory is reserved but no shipment exists. What pattern do you need?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets โ€” not absolutes.

Workflow Pattern Adoption (% of flows mapped to a named pattern)

Enterprise automation teams operating 50+ workflows in production

Mature Engineering

> 80%

Maturing

50-80%

Ad-Hoc

20-50%

Pattern-Less Sprawl

< 20%

Source: KnowMBA aggregate from Temporal/Camunda customer maturity reports

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

โฑ๏ธ

Temporal (Uber, Snap, Stripe)

2019-present

success

Temporal originated from Uber's Cadence project, designed because Uber's engineering teams kept reimplementing the same patterns badly: retries, sagas, signals for human input. Temporal made patterns first-class primitives. Stripe, Snap, Datadog, and Netflix have all publicly described migrating from hand-rolled cron+queue orchestration to Temporal. Reported outcomes include 60-80% reduction in workflow incident volume and substantial reduction in 'workflow plumbing' code per engineer.

Origin

Uber Cadence (2017)

Pattern Primitives

Workflows, Activities, Signals, Sagas

Reported Incident Reduction

60-80% post-migration

Strategic Shift

Patterns as platform features, not dev discipline

When patterns are platform primitives, engineers use them by default. When patterns are dev discipline, they get skipped under deadline pressure. Choose platforms that make the right thing the easy thing.

Source โ†—
๐Ÿ“‹

Hypothetical: Insurance Claims Workflow Refactor

2023-2024

success

A regional insurer's claims-processing automation had grown to 40 sequential workflows over 3 years. Production incidents averaged 8/month with mean time to resolve of 4 hours. A 2-quarter refactor mapped each workflow to a named pattern (mostly state machines and sagas), added idempotency keys to all external calls, and introduced circuit breakers on three flaky third-party APIs. Post-refactor: incidents dropped to 2/month, MTTR fell to 45 minutes, and engineering velocity increased because new claims types could reuse the pattern templates.

Workflows Refactored

40

Incident Volume

8/mo โ†’ 2/mo

MTTR

4hr โ†’ 45min

Refactor Effort

2 quarters, 4 engineers

Pattern adoption isn't just about new development โ€” refactoring legacy workflows to named patterns delivers compounding reliability and velocity gains.

Decision scenario

Choosing a Workflow Pattern Under Pressure

Your team is building an order-fulfillment workflow that touches 5 systems: inventory, payments, shipping, ERP, and email notifications. The deadline is in 2 weeks. The PM says 'just build it sequentially, we'll add fancy patterns later.'

Systems Touched

5

Daily Volume

12,000 orders

Deadline

2 weeks

Side Effects per Step

Inventory, $, Shipping label, ERP record

01

Decision 1

You estimate that 1-2% of orders will hit a partial failure (one system succeeds, the next fails). At 12K orders/day, that is 120-240 corrupted orders per day if compensation is missing. Three options.

Build sequentially as PM requested โ€” add patterns in v2Reveal
Ship in 2 weeks. Within 3 days, support tickets explode: customers charged but no shipment, customers shipped but not charged, inventory phantoms. Engineering pulled into 24/7 firefighting. Manual reconciliation team spun up costing $40K/month. v2 'add patterns later' is delayed indefinitely because the team is too busy with incidents to do design work. Estimated cost of skipping the pattern: 8x the cost of doing it right initially.
Time to Ship: 2 weeks (on time)Operating Cost: +$40K/mo manual reconciliation, ongoing firefighting
Push back on deadline. Build a Saga with explicit compensation for each side-effect-producing step. Add idempotency keys. Take 3 weeks instead of 2.Reveal
Ship 1 week late. The PM is unhappy initially but the workflow runs cleanly from day one. Partial failures self-heal: when shipping fails, payment auto-refunds and inventory auto-releases. Customer-facing incidents in the first 90 days: 4 (all genuinely novel failure modes). The 1-week delay is forgotten within a month. The Saga template gets reused for 3 subsequent workflows in the next quarter.
Time to Ship: 3 weeksProduction Incidents (90d): 4 (vs estimated 200+ if shipped sequentially)
Build sequentially but add a nightly reconciliation script to detect and fix partial failuresReveal
Ships on time. Reconciliation script catches ~80% of partial failures within 24 hours. The 20% it misses (edge cases) become customer escalations. Customers wait up to 24 hours to see refunds for failed orders. Customer satisfaction takes a hit; competitors with cleaner UX win review comparisons. The reconciliation script becomes its own maintenance burden โ€” 0.5 FTE forever โ€” and is itself an unmaintained citizen-built script within 18 months.
Time to Ship: 2 weeks (on time)Tech Debt: +1 reconciliation script (forever)

Related concepts

Keep connecting.

The concepts that orbit this one โ€” each one sharpens the others.

Beyond the concept

Turn Workflow Design Patterns into a live operating decision.

Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.

Typical response time: 24h ยท No retainer required

Turn Workflow Design Patterns into a live operating decision.

Use Workflow Design Patterns as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.