AI inference pricing is more volatile than most people realize — data from 1,752 SKUs across 44 vendors

We’ve been building a weekly inference pricing index for the past year. Tracking 1,752 SKUs across 44 vendors and 6 modalities using a chained matched-model methodology. A few findings from this week’s data that we think are worth discussing. Output tokens cost 3.74x more than input tokens on average. Most people know output is more expensive, but the gap is wider and more consistent than we expected. Prompt caching saves 66% on average where it exists, but only about 1 in 5 SKUs actually offers it. The savings are real but access is uneven. The open-source pricing advantage is 81% vs closed models on equivalent inference platforms. That gap has stayed relatively stable even as frontier closed model prices drop. Context length is a stealth cost driver. Long-context jobs cost 3.1x more than short ones on average, and most cost calculators don’t surface this well. The hardest methodological challenge was composition bias — vendors add and drop models constantly, price in incompatible units, and bundle differently. A naïve average gives you a completely different number than a chained matched-model index. We borrowed from how commodity price indexes handle this problem. We publish the AIPI weekly at a7om.com. Happy to discuss methodology or share more granular data.

1 Like