🧪

Knowledge Challenge

A friend thinks you can answer this question about AI Infrastructure Cost Control

Your monthly inference bill is $50,000. Audit reveals: average prompt size is 4,200 tokens (could compress to 1,800 with no quality loss), 80% of queries are simple FAQ-style (currently all routed to GPT-4o), and there is no caching despite 35% of queries being repeats. Rank the optimization levers by expected impact.