Blog

April 27, 2026

What broke when I tried LLMs on LunarLander

A CartPole follow-up where API latency, raw local models, and LoRA all broke before full SFT plus DAgger finally worked.

April 23, 2026

The Don't Stop Benchmark

A narrow test for long-running coding agents: can they keep doing one boring thing without getting cute or giving up?

April 21, 2026

Turning Hermes Agent into a Jarvis home assistant

How I replaced my Google Home with a desktop Hermes Agent voice assistant.

April 9, 2026

CartPole experiments with LLMs

Even Groq isn't fast enough for realtime, an off-the-shelf local LLM got close, and SFT finished the job.

March 22, 2026

People argue about policy facts when they really disagree about values. Empirical uncertainty is what makes this possible. AI is changing the equation: right now it makes it easier to find evidence for whatever you already believe, but better policy simulations could eventually make some factual disputes harder to hide behind.

What broke when I tried LLMs on LunarLander

The Don't Stop Benchmark

Turning Hermes Agent into a Jarvis home assistant

CartPole experiments with LLMs

The empirical veil