On the perils of AI-first debugging -- or, why Stack Overflow still matters in 2025

Posted on 19 February 2025 in AI, Musings

"My AI hype/terror level is directly proportional to my ratio of reading news about it to actually trying to get things done with it."

-- Ryan Moulton on X

This post may not age well, as AI-assisted coding is progressing at an absurd rate. But I think that this is an important thing to remember right now: current LLMs can not only hallucinate, but they can misweight the evidence available to them, and make mistakes when debugging that human developers would not. If you don't allow for this you can waste quite a lot of time!

Recently, I was debugging an issue with the rendering of MathML on this blog. I'm a little under the weather, so I decided to see what I could do from my sofa, by asking Claude and ChatGPT about it on my phone. I'd discovered that the issue showed up on Chromium-based browsers but not on Firefox or Safari, so I used this prompt:

"Is there a way to make Chrome-like browsers render MathML more like Firefox or Safari? My page looks great in Firefox and on an iPhone, but there are a bunch of problems in Chromium and Brave, eg font sizes too small, multi-line brackets not stretched. I can't use a JS-based system like MathJax."

Claude told me that yes, Chromium-based browsers had crappy MathML handling, but then it generated a bunch of CSS rules that it said would fix the worst of them. They looked pretty plausible! So that looked like one option.

However, I knew that MathML support had only been added to Chromium at the start of 2023. Claude has, as of this writing, a knowledge cutoff of April 2024. Perhaps "MathML is broken in Chromium" was over-represented in its training set: it's a feature that has been in the works for decades. According to Wikipedia, it was in Mozilla 1.0, which means it's been in Firefox from the start. But Chromium apparently supported it at one point, then it was pulled in 2013, only to be re-introduced a decade later.

If you need up-to-date information about something, then it makes sense to use an AI with web browsing enabled. Claude doesn't have that right now, so I switched to ChatGPT 4o with search enabled. I gave it the same prompt, and it said essentially the same as Claude, but the CSS it gave me was much less in-depth. I asked follow-up questions, and it gave me some hints and tips, including a link to a Chromium issue where it suggested I post a repro and agitate for a fix. Unfortunately the link in the chat session didn't work, and when I looked up the issue by ID on the Chromium bug-tracker, it turned out to not exist.

So I tried o3-mini-high with search. It was even more gloomy:

The best approaches are either to wait for improved native MathML support in Chromium (ongoing work by groups like Igalia is aimed at exactly this) or to have your visitors switch to browsers that already offer better MathML support. Unfortunately, without resorting to a JS polyfill (which you’ve ruled out), there isn’t currently a practical workaround.

Well, that sounds sad. Except -- I recognised the name "Igalia", because I remembered that when I was originally considering using MathML, I came across a celebratory 2023 blog post from them about the integration of their work into Chromium. Had ChatGPT missed that?

I'd spent an hour or two on this, and decided that it was time to go old-school. I googled for "mathml chrome rendering", and here's what I got:

Search results for "mathml chrome rendering"

The first result was from 2015, but looking down through the others, there was one that caught my eye: the one from 2 March 2023, just underneath, which mentioned "matrix like formulas". That sounded very similar to what I was seeing, and was from after the release of MathML on Chromium, unlike all of the others.

When I clicked through to it, I saw that it was exactly the problem I was having, and the answers there helped me work out the solution. Elapsed time: half an hour, most of which was spent trying to decide on a math font.

So, Google/Stack Overflow 1, AI chatbots nil. What was going on?

My guess is that 10 years of poor MathML support in Chrome and only two years of a working system means that most of their training data says that the problem I had was unsolveable. Perhaps with a less obscure topic, they might have had a world model saying that this kind of situation can change, and understood that more recent information should be weighted higher; this would be hard for Claude, with its fixed knowledge from its training set (which would typically not have dates associated), but should be possible for web browsing models like ChatGPT. But perhaps their search tools don't provide publication dates like Google does?

I asked ChatGPT if it could work out what the problem was, but its response (which unfortunately I can't find now) was almost Sydney Bing-like in its defensiveness, so that wasn't much help.

I'm not exactly sure how to update my practices based on this experience. AIs have been fantastically useful with many tricky problems I've been working on; at PythonAnywhere, thanks to Claude, I recently spiked out a solution to an issue that has been bugging me for more than five years in an afternoon -- and my colleague Glenn likewise managed to solve a nasty kernel resource leak that we'd had for at least a year; with Claude's help it took him an hour or so. A lot of the CSS, HTML, HTMX, and Python code that make up this blog were generated by one AI or another originally, or have been fixed by one.

The closest thing I have to a heuristic right now is that if something doesn't change much, so the training data will be on average correct, then you can trust the AIs to do your work for you. With recent technologies that aren't in the training set, you can rely on web search (or dump docs into the context). But if you're working on something that is fast-moving -- or something that has drastically changed recently -- then Google and the Stack Overflow questions, blog posts, and other results it pulls up are your friends. And this applies even if the AI is using search itself, because -- at least right now -- they're not all that good at it.