How Much Will Serving This LLM Actually Cost?

The question I get asked most often when someone wants to ship an LLM-powered feature is, basically, “okay but what’s this going to cost?” And the honest answer is it depends on a lot of things you haven’t decided yet: which model, what precision, how many tokens per request, how many requests per second at peak, whether you self-host or pay an API provider per token, and whether you can tolerate the cold-start of a serverless GPU. Most of those have order-of-magnitude effects, so a back-of-envelope number can be off by 10x in either direction. ...

April 19, 2026 · 2 min