A CUDA Occupancy Calculator You Can Just Pull Up
If you’ve ever tuned a CUDA kernel, you know the dance: pick a block size, count registers per thread (or let nvcc tell you with --ptxas-options=-v), figure out how much shared memory you’re using, and then work out how many of those blocks can actually live on an SM at once. NVIDIA used to ship a spreadsheet for this - it was great, but a spreadsheet is exactly the friction I don’t want when I’m halfway through optimising a kernel and just want a quick “is the limiting factor registers or shared memory here?” answer. ...