AI StrategyIntermediate7 min read

AI Document Analysis

AI document analysis turns unstructured documents (contracts, invoices, claims, lab reports, applications) into structured data and answers. Modern systems chain three layers: (1) ingest and parse — convert PDF/scan/image into text + layout (Adobe Extract, Azure Document Intelligence, Unstructured.io, AWS Textract). (2) extract — identify entities, line items, and relationships using a schema (LLM, fine-tuned vision-language model, or rules). (3) reason and verify — answer questions, flag exceptions, route to humans for low-confidence cases. The market has consolidated: contract analysis (Ironclad, Evisort, Spellbook), invoice processing (Rossum, Hypatos), claims (Tractable, EvolutionIQ), legal discovery (Relativity aiR, Everlaw). The KnowMBA POV: 'AI document analysis' is rarely an AI problem — it's a document QA, schema design, and exception-routing problem with AI in the middle.

Also known asIntelligent Document ProcessingIDPAI Document ExtractionDocument IntelligenceContract AI

Challenge a friend Browse library

The Trap

The trap is benchmarking on accuracy without measuring confidence calibration and human-handoff cost. A model that's 95% accurate looks great until you discover you can't tell which 5% are wrong without re-reading every document — at which point the AI provided no leverage. The real metric is straight-through processing rate: what % of documents go from intake to a confident structured output without a human touching them. STP rate of 70% saves real money. STP rate of 30% with the rest needing manual review may save nothing because review time often exceeds original processing time when reviewers must re-read AI output skeptically.

What to Do

Design for straight-through processing from day one. (1) Define the schema you need to extract (don't extract everything — only fields with downstream consumers). (2) Build a confidence scorer per field, not per document. (3) Set thresholds: high confidence → auto-publish; medium → human approve; low → human enter from scratch. (4) Measure STP rate weekly and the cost of human handoff. (5) Improve by tightening the schema, adding examples to the prompt, or fine-tuning on the cases that fall into 'medium' — that's where the ROI lives. Always audit a random sample for silent errors that bypass the confidence filter.

Formula

Straight-Through Processing Rate (STP) = Documents Auto-Published Without Human Review / Total Documents Processed

In Practice

Rossum (invoice processing) reports 80%+ straight-through processing on enterprise invoice volumes after tuning, vs sub-50% on out-of-the-box deployment. Ironclad's contract AI extracts metadata (parties, dates, renewal terms, indemnification clauses) at scale for thousands of legal teams. Tractable does AI-based vehicle damage assessment for insurance claims, processing millions of claims with measurable cycle-time reduction. Adobe's Acrobat AI Assistant brought document QA to mass-market PDF users in 2024. Across all of these, the production winners shipped strong confidence calibration and exception workflows — not the highest accuracy in isolation.

Pro Tips

01
Per-field confidence scoring beats per-document confidence. A document might have 12 fields where 11 are high-confidence and 1 is medium. Routing the whole document to manual review wastes effort. Routing only that one field to a human keeps STP high.
02
Layout matters more than people think. PDFs that lose table structure on extraction lose 30-50% of downstream extraction accuracy. Invest in a good parser (Unstructured.io, Azure Document Intelligence, Adobe Extract) before tuning your LLM extraction prompts.
03
Multimodal models that read PDF pages directly (Claude with vision, GPT-4 with vision, Gemini) often outperform text-extraction-then-LLM pipelines on documents with complex layouts (forms, invoices, tables). The trade-off is cost — vision tokens are 10-20× the cost of text. Use vision for the hard cases, text for the easy ones.

Myth vs Reality

Myth

“Higher model accuracy is the only thing that matters”

Reality

STP rate is the metric that ties to ROI. A 92% accurate model with strong calibration that surfaces only the uncertain cases for review can deliver more business value than a 96% accurate model with no calibration that requires every output to be re-checked.

Myth

“You can extract everything from any document with one prompt”

Reality

Realistic IDP systems use multiple specialized prompts, sometimes multiple models, and almost always document-type-specific routing. Trying to handle invoices, contracts, and claims with one generic 'extract structured data' prompt produces mediocre results across all three.

Try it

Run the numbers.

Pressure-test the concept against your own knowledge — answer the challenge or try the live scenario.

🧪

Knowledge Check

Your AI invoice processing pipeline reports 94% extraction accuracy. STP rate is 35%. Operations team says cycle time hasn't improved meaningfully because reviewers still examine every invoice 'just to be sure.' What's the highest-leverage fix?

Industry benchmarks

Is your number good?

Calibrate against real-world tiers. Use these ranges as targets — not absolutes.

Straight-Through Processing Rate (Document Intelligence)

Production document processing systems, post-tuning

Best-in-Class

> 80%

Strong

60-80%

Acceptable

40-60%

Subscale

< 40%

Source: Hypothetical: synthesized from Rossum, Ironclad, and Hyperscience customer reports

Real-world cases

Companies that lived this.

Verified narratives with the numbers that prove (or break) the concept.

📨

Rossum

2017-2026

success

Rossum focuses specifically on invoice and document data extraction with a strong emphasis on confidence scoring and exception handling. Customer wins (Veolia, Bosch, Pepsico) consistently report 80%+ STP after tuning, vs sub-50% on out-of-the-box deployments. The company's product moat is the human-in-the-loop interface for the cases that don't auto-process — making review fast enough that the residual non-STP rate doesn't kill the ROI. The lesson: shipping a great review UX is as important as the extraction model.

Reported Steady-State STP

80%+ on tuned deployments

Out-of-the-Box STP

~40-50%

Notable Customers

Veolia, Bosch, Pepsico

STP rate after tuning is the metric that matters. Out-of-the-box performance always disappoints; the real ROI comes from the tuning + review workflow.

Source ↗

📜

Ironclad / Evisort

2018-2026

success

Ironclad and Evisort built large contract intelligence businesses on the same playbook: extract structured metadata from contracts (parties, dates, renewals, indemnification, governing law) and feed downstream workflows (renewal alerts, risk scoring, search). Both companies have hundreds of enterprise customers (Mastercard, Salesforce, ASOS, McKesson). The technology is a layered pipeline; the product moat is the integration with how legal teams actually work — repository, redlining, approvals — not the extraction accuracy in isolation.

Notable Ironclad Customers

Mastercard, ASOS, Asana

Reported Time Savings

30-60% on contract review

Architecture

Extraction + workflow + repository

Document extraction without workflow integration is a science project. Embed the AI in the actual end-to-end legal/finance workflow and the productivity gains compound.

Source ↗

Decision scenario

Build vs Buy Document Intelligence

You're the CTO of a mid-size insurance carrier processing 200,000 claims documents per month. Current manual processing costs $4.8M/year. A vendor offers IDP at $1.4M/year (estimated 60% STP). Your AI team can build internally for $3.2M upfront + $500K/year (estimated 80% STP after tuning).

Monthly Volume

200,000 docs

Manual Annual Cost

$4.8M

Vendor Cost (Annual)

$1.4M

Build Cost (Year 1)

$3.7M

Build Cost (Year 2+)

$500K

Decision 1

The board wants ROI within 18 months. The vendor is faster to deploy but caps your STP. The internal build is slower but compounds — STP can keep improving as you tune on your unique document mix.

Deploy the vendor — predictable cost, fast time-to-value, simpler vendor managementReveal

Vendor live in 14 weeks. STP plateaus at 58% within 6 months. Annual savings ~$3.0M against $1.4M cost = $1.6M net. ROI hits at month 12. The downside emerges at month 18 when document volume grows 30% YoY: vendor pricing scales linearly, your unique document types still get re-routed to manual, and the vendor has limited tuning options. You start a parallel internal effort anyway, having spent $2.8M on the vendor first.

Time to Value: 14 weeksSteady-State STP: ~58%Year 2 Net Savings: $1.6M (capped)

Build internally on top of a strong vendor parser. Tune extraction on your document mix. Aim for 80% STP within 9 months.Reveal

Build ships in 8 months. STP hits 76% in month 12, 82% by month 18 after iteration on edge cases. Year-1 net cost ~$3.7M; year-1 savings $3.6M (partial year). Year 2: $500K cost, ~$3.9M savings = $3.4M net. Year 3 onward: $3.4M+ net annually with the system improving. Total 3-year savings vs vendor: ~$5M, plus durable capability that extends to new document types.

Time to Value: 8 monthsYear-2 STP: ~82%3-Year Net Savings: $5M+ vs vendor

Related concepts