Toolbelt

A CUDA Occupancy Calculator You Can Just Pull Up

If you’ve ever tuned a CUDA kernel, you know the dance: pick a block size, count registers per thread (or let nvcc tell you with --ptxas-options=-v), figure out how much shared memory you’re using, and then work out how many of those blocks can actually live on an SM at once. NVIDIA used to ship a spreadsheet for this - it was great, but a spreadsheet is exactly the friction I don’t want when I’m halfway through optimising a kernel and just want a quick “is the limiting factor registers or shared memory here?” answer. ...

A JWT Decoder That Doesn't Phone Home

Every now and then I need to peek inside a JWT - debugging an auth flow, sanity-checking what scopes a CI service account actually has, or figuring out why a token is being rejected at 23:00 the night before a release. And every time, I’d catch myself reaching for whatever JWT decoder Google surfaced first, pasting in a token, and then immediately feeling slightly icky about it. That token might be a service credential. It might still be valid for another six hours. And I just handed it to some random subdomain. ...

Converting Between JSON, YAML, and TOML Without the Awkwardness

Half my life as an engineer is moving things between configuration formats. A Helm chart wants YAML, the Rust crate wants TOML, the GitHub Action wants YAML again (but a slightly different dialect, naturally), and the thing I’m shipping it all into wants JSON. I always end up doing one of two things: opening some random web converter and pasting in a config that probably contains internal hostnames, or writing a five-line Python snippet that I’ll write again next week because I never bother to save it. ...

How Much Will Serving This LLM Actually Cost?

The question I get asked most often when someone wants to ship an LLM-powered feature is, basically, “okay but what’s this going to cost?” And the honest answer is it depends on a lot of things you haven’t decided yet: which model, what precision, how many tokens per request, how many requests per second at peak, whether you self-host or pay an API provider per token, and whether you can tolerate the cold-start of a serverless GPU. Most of those have order-of-magnitude effects, so a back-of-envelope number can be off by 10x in either direction. ...

Stop Second-Guessing Cron Expressions

Cron is one of those things I’ve been using for twenty-odd years and still occasionally stare at for thirty seconds before committing. Is */15 9-17 * * 1-5 what I think it is? Does 0 0 * * 0 fire at midnight Sunday or Monday? And does that depend on the box’s timezone or UTC? Most of the time you can squint and reason your way through it, but “most of the time” is exactly the kind of confidence level you don’t want when the cron in question is the nightly DB backup. ...