<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Gpu on Widgita</title><link>https://widgita.xyz/tags/gpu/</link><description>Recent content in Gpu on Widgita</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 19 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://widgita.xyz/tags/gpu/index.xml" rel="self" type="application/rss+xml"/><item><title>A CUDA Occupancy Calculator You Can Just Pull Up</title><link>https://widgita.xyz/posts/2026/04/a-cuda-occupancy-calculator-you-can-just-pull-up/</link><pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate><guid>https://widgita.xyz/posts/2026/04/a-cuda-occupancy-calculator-you-can-just-pull-up/</guid><description>&lt;p&gt;If you&amp;rsquo;ve ever tuned a CUDA kernel, you know the dance: pick a block size, count registers per thread (or let &lt;code&gt;nvcc&lt;/code&gt; tell you with &lt;code&gt;--ptxas-options=-v&lt;/code&gt;), figure out how much shared memory you&amp;rsquo;re using, and then work out how many of those blocks can actually live on an SM at once. NVIDIA used to ship a spreadsheet for this - it was great, but a spreadsheet is exactly the friction I don&amp;rsquo;t want when I&amp;rsquo;m halfway through optimising a kernel and just want a quick &amp;ldquo;is the limiting factor &lt;em&gt;registers&lt;/em&gt; or &lt;em&gt;shared memory&lt;/em&gt; here?&amp;rdquo; answer.&lt;/p&gt;</description></item></channel></rss>