Total Productive Maintenance
Total Productive Maintenance (TPM) is a Toyota-pioneered system where operators — not a separate maintenance department — own the day-to-day care of their equipment, with the goal of zero breakdowns, zero defects, and zero accidents. The headline metric is OEE (Overall Equipment Effectiveness) = Availability × Performance × Quality. World-class OEE is 85%+; most plants run 40-60% and don't realize it. TPM has eight pillars, but the operational core is two: Autonomous Maintenance (operators do cleaning, lubrication, tightening, inspection) and Planned Maintenance (scheduled interventions before failure). The KnowMBA take: TPM applies brutally well to knowledge work — your CI pipeline, your Kubernetes cluster, your data warehouse are 'machines' that need scheduled care. SaaS teams that treat infra like consumable hardware (only fix when broken) burn 40% of engineering hours on incidents that planned maintenance would have prevented.
The Trap
Companies install a 'TPM program' as a separate initiative run by the maintenance manager, then wonder why nothing changes. The whole point is that OPERATORS own equipment care — if you carve out a 'TPM team,' you've recreated the bureaucracy TPM was designed to dissolve. The other trap: chasing OEE as a vanity number. A station with 95% OEE that's not the bottleneck is irrelevant; a bottleneck running at 60% OEE is the only thing that matters. And measuring availability without measuring quality leads to producing fast garbage — high availability + low quality = lots of rework dressed as throughput.
What to Do
Pick one critical piece of equipment (your bottleneck, per Theory of Constraints). Measure baseline OEE for two weeks: clock every minute of downtime by category (breakdown, changeover, minor stop, idle). Then run a 5-day kaizen blitz: deep-clean the equipment with operators (cleaning surfaces problems — leaks, loose bolts, wear — that you'd never notice running), build a one-page Autonomous Maintenance checklist (daily 5-min checks, weekly 20-min checks), and move predictable failures from 'breakdown' to 'planned.' Re-measure OEE after 60 days. Expect 10-20 point improvement in the first cycle without buying anything.
Formula
In Practice
Toyota's Tahara plant — the most advanced auto plant in the world for decades — runs OEE above 85% on equipment that competitors average 50% on. The difference isn't better machines; it's that every operator at Tahara starts their shift with a 10-minute machine inspection (oil levels, belt tension, sensor cleaning) and stops the line at the first abnormal sound. Toyota proved in the 1970s under Seiichi Nakajima (who formalized TPM at Nippondenso, a Toyota supplier) that operator-owned care eliminates 70%+ of unplanned downtime — because operators feel the machine every day and notice changes a maintenance tech on a quarterly visit never would.
Pro Tips
- 01
Seiichi Nakajima's six big losses to attack in order: (1) breakdowns, (2) setup/adjustments, (3) minor stops, (4) reduced speed, (5) startup defects, (6) production defects. Most plants only see #1 because it's loud. The hidden killer is #3 — minor stops under 5 minutes that no one logs but consume 15-20% of capacity.
- 02
World-class OEE = 85% (Availability 90% × Performance 95% × Quality 99.9%). Below 65% means major losses you don't see. The first time you measure honestly, you'll be shocked how low your number is — that's normal, and it's where the gold is.
- 03
For SaaS: your TPM equivalent is incident postmortems + scheduled chaos engineering + planned dependency upgrades. Teams that 'don't have time' for these run at ~40% engineering OEE — most hours go to incidents and rework, not feature throughput.
Myth vs Reality
Myth
“TPM is just preventive maintenance with a fancy name”
Reality
Preventive maintenance is scheduled by a maintenance team. TPM transfers ownership to operators and embeds quality, safety, and continuous improvement into daily work. The cultural shift — operators as machine owners, not button-pushers — is the actual point. Calendar-based maintenance without the cultural shift fails.
Myth
“We can't do TPM until we have spare time / spare people”
Reality
TPM CREATES capacity by eliminating the unplanned downtime that's stealing it now. The 5-10 minutes per shift spent on autonomous checks pays back 5-10x in avoided breakdowns. Companies that wait for 'spare time' never start; companies that start always find the time was already there, hidden in firefighting.
Try it
Run the numbers.
Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.
Knowledge Check
Challenge coming soon for this concept.
Industry benchmarks
Is your number good?
Calibrate against real-world tiers. Use these ranges as targets — not absolutes.
Overall Equipment Effectiveness (OEE)
Discrete manufacturing across industriesWorld-Class
≥ 85%
Strong
75-85%
Typical
60-75%
Weak
40-60%
Crisis
< 40%
Source: Seiichi Nakajima / JIPM (Japan Institute of Plant Maintenance)
Real-world cases
Companies that lived this.
Verified narratives with the numbers that prove (or break) the concept.
Toyota (Tahara Plant)
1970s-Present
Toyota's Tahara plant institutionalized TPM under Seiichi Nakajima's framework: every operator does a 10-minute equipment check at shift start, owns cleaning and lubrication of their station, and is empowered to stop the line on any abnormal sound or vibration. Maintenance technicians shifted from break-fix to coaching operators and tackling complex root causes. The result: OEE consistently above 85% on equipment competitors average 50-55% on, with breakdown rates 1/10th of the US Big Three plants.
OEE
85%+ sustained
Unplanned Breakdowns
~10% of US peer plants
Operator Maintenance Time
~30 min/shift
Breakeven vs. Peers
Lower-cost-per-unit despite higher labor cost
OEE gains don't come from better machines or more techs — they come from giving daily ownership to the people who touch the equipment every shift.
Hypothetical: Mid-Market CPG Co-Packer
Recent
A 200-employee co-packer was missing 20% of customer ship dates, blamed 'old equipment' and proposed $4M in line replacements. Honest OEE measurement revealed the bottleneck filling line ran at 47% OEE — losing ~30% to minor stops nobody was logging (jams under 3 min). A 90-day TPM rollout (operator deep-clean kaizen, daily inspection card, planned changeover practice) lifted OEE to 71% with $35K in spend. Throughput rose 50%; the $4M capex was canceled.
OEE Before
47%
OEE After 90 Days
71%
Spend
$35K (vs. $4M proposed)
On-Time Ship Rate
80% → 96%
Before you justify capex on the grounds that 'the equipment is old,' measure honest OEE. Most plants discover 30+ points of throughput hiding in their existing assets.
Decision scenario
The Capex vs. TPM Investment Decision
You're VP of Operations at a packaging plant. The CEO has a $1.2M capex slot for next quarter. The plant manager wants a new high-speed filler ($1.2M, claims +25% throughput). The shop floor lead says current OEE on the existing filler is 52% and a TPM program could lift it to 75%+ for a fraction of the cost. The bottleneck IS the filler. You have to decide before Friday.
Current OEE (filler)
52%
Throughput
1,800 cases/shift
Capex Available
$1.2M
On-Time Ship Rate
78%
Bottleneck
Filler (confirmed)
Decision 1
You walk the floor. The filler stops 6-8 times per shift for 2-4 min each — operators clear jams, restart, no one logs it. Daily clean-down takes 90 min because grime has built up for years. The maintenance team only touches the filler when it fails outright. The plant manager's pitch is real — a new machine WOULD be faster. But the existing one has 35 points of OEE hiding in plain sight.
Buy the new $1.2M filler — more reliable, more capacity, plant manager has experience justifying capexReveal
Run a 90-day TPM kaizen on the existing filler first ($40K). Hold capex pending results. Reassess in 3 months.✓ OptimalReveal
Related concepts
Keep connecting.
The concepts that orbit this one — each one sharpens the others.
Beyond the concept
Turn Total Productive Maintenance into a live operating decision.
Use this concept as the framing layer, then move into a diagnostic if it maps directly to a current bottleneck.
Typical response time: 24h · No retainer required
Turn Total Productive Maintenance into a live operating decision.
Use Total Productive Maintenance as the framing layer, then move into diagnostics or advisory if this maps directly to a current business bottleneck.