<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Llm on Widgita</title><link>https://widgita.xyz/tags/llm/</link><description>Recent content in Llm on Widgita</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 19 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://widgita.xyz/tags/llm/index.xml" rel="self" type="application/rss+xml"/><item><title>How Much Will Serving This LLM Actually Cost?</title><link>https://widgita.xyz/posts/2026/04/how-much-will-serving-this-llm-actually-cost/</link><pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate><guid>https://widgita.xyz/posts/2026/04/how-much-will-serving-this-llm-actually-cost/</guid><description>&lt;p&gt;The question I get asked most often when someone wants to ship an LLM-powered feature is, basically, &amp;ldquo;okay but what&amp;rsquo;s this going to cost?&amp;rdquo; And the honest answer is &lt;em&gt;it depends on a lot of things you haven&amp;rsquo;t decided yet&lt;/em&gt;: which model, what precision, how many tokens per request, how many requests per second at peak, whether you self-host or pay an API provider per token, and whether you can tolerate the cold-start of a serverless GPU. Most of those have order-of-magnitude effects, so a back-of-envelope number can be off by 10x in either direction.&lt;/p&gt;</description></item></channel></rss>