Adding mathematical typesetting to the blog

Posted on 9 February 2025 in Blogkeeping, Website design, MathML, LaTeX

I've spent a little time over the weekend adding the ability to post stuff in mathematical notation on this blog. For example:

x=b±b24ac2a

It should render OK in any browser released after early 2023; I suspect that many RSS readers won't be able to handle it right now, but that will hopefully change over time. [Update: my own favourite, NewsBlur, handles it perfectly!]

Here's why I wanted to do that, and how I did it.

In my last post on the LLM from scratch book I wanted to post some simple matrix maths. I wound up using preformatted text for that -- here's an example (copied because I'll fix the old post in due course [Update: it is now fixed]):

a b        a m x
m n  times b n y
x y

This was ugly, and I think that as I continue posting about AI and related stuff, I'm going to need to post similar things more frequently -- so it needs to be better.

It seems like there is one and only one popular standard for expressing mathematical equations in textual form: LaTeX. So I figured I'd need to use that. I did come across MathML, but that is much lower-level; for example, the quadratic formula above looks like this in LaTeX:

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

...but like this in MathML:

<math xmlns="http://www.w3.org/1998/Math/MathML" display="block">
    <mrow>
        <mi>x</mi>
        <mo>=</mo>
        <mfrac>
            <mrow>
                <mo></mo>
                <mi>b</mi>
                <mi>±</mi>
                <msqrt>
                    <mrow>
                        <msup>
                            <mi>b</mi>
                            <mn>2</mn>
                        </msup>
                        <mo></mo>
                        <mn>4</mn>
                        <mi>a</mi>
                        <mi>c</mi>
                    </mrow>
                </msqrt>
            </mrow>
            <mrow>
                <mn>2</mn>
                <mi>a</mi>
            </mrow>
        </mfrac>
    </mrow>
</math>

To me, MathML looks like a useful intermediate format to be output by other tools rather than something you'd generally craft yourself -- kind of like SVG.

So, how to add LaTeX to this blog? The posts here are generated from Markdown sources using the markdown2 library to generate HTML, which is then injected into some Jinja2 templates and rendered as static files. So the first thing was to find out how to render LaTeX into HTML pages.

Working with Claude, I found two popular options, MathJax and KaTeX. Both seem to work in a similar way: they're JavaScript libraries that process HTML, either at display-time in the browser or as a server-side preprocessing step if you are using JS there (eg. with Node). They look for LaTeX blocks inside specific delimiters ($$ is a popular option) and then extract that LaTeX and replace it with appropriate HTML, CSS and SVG so that it renders correctly. MathJax seems more complete and better-established, but KaTex is much smaller (100KiB rather than 200KiB) and claims to be 10x faster.

Now, I didn't want to add a server-side JavaScript processing layer to my static site generation, so if I were to use either of them, it would have to be done in the browser. I wasn't entirely happy with that idea, as it would add another asset to load when loading the blog -- and I've been working pretty hard to keep it lightweight. However, Claude and I came up with a reasonably elegant solution that would mark certain posts as needing mathematical markup and would only include the JS for that. So I put something together on the test version of this blog and got it working.

However, there was a problem. Consider this rotation matrix:

(cosθsinθsinθcosθ)

The LaTeX for it looks like this:

\begin{pmatrix}
\cos \theta & -\sin\theta \\
\sin\theta & \cos\theta
\end{pmatrix}

That double-blackslash at the end of the second line is how you specify that you're moving on to a new row in the matrix. However, Markdown treats a backslash at the end of a line as meaningful -- it's basically a way of telling it to insert a "soft newline" -- eg. a <br> tag in HTML. So when it came across the LaTeX above, it treated the backslash that way and broke it.

Claude and I spent a while trying to work out how we might work around that, and had got as far as starting to write a custom processor for markdown2 (an "extra" in markdown2 terminology), when I spotted something I'd completely missed earlier: there is already an extra called latex that comes as part of markdown2, which "converts inline and block equations wrapped using $...$ or $$...$$ to MathML". That sounded like it was exactly what I needed -- a way of getting the LaTeX compiled to something the browser could render, all on the server side at the time I generated the static site, with no need for any front-end JavaScript at all.

I added it to my list of extras in my Markdown-rendering code, and installed the latex2mathml package that it needs. And it worked! What you're looking at right now -- at least as of this writing -- is the result, and I think it looks OK.

But there are a couple of problems.

Firstly, MathML is only handled by relatively recent browsers -- but Caniuse tells me that it's common as of 2023, so I think that's OK.

Secondly, I'm not 100% happy with the styling of the rendered notation -- for example, the fonts seem a bit small, and things look a bit jammed together. That, however, sounds like something I should be able to fix with CSS acting on the MathML elements that are generated from the LaTeX -- you can specify rules for things like <math> and <mfrac> tags in your stylesheets, just like you can for <h3> and so on. So this, I think, is actually an advantage of doing it in a browser-native way -- rather than with a JavaScript renderer that produced a blend of HTML/SVG/whatever, which would need styling to be configured in some other way.

Thirdly, I now need to think about the use of dollar signs in my Markdown, because $$ now introduces a multi-line LaTeX expression like the ones above, while $ wraps inline LaTeX like this: cosθ. The former isn't a huge deal, because I rarely write two dollar signs together, but I do use $ quite a lot, both in code and for prices. However, that's not too bad -- the special treatment of the dollar signs doesn't apply inside single backticks or triple-backtick code fences (which covers all of the code cases), and inline LaTeX must be on a single source line in the Markdown. so I can write:

It cost $12, which was cheap

...without having to worry, it's just things like

It would cost between $12 and $15, which is cheap

...that are the problem, because the text beween the two dollars gets interpreted as LaTeX, and is rendered like this:

Misrendering due to two dollar signs on one line

That can be fixed by introducing a line break in between the dollars:

It would cost between $12 and
$15, which is cheap

...which is doesn't trigger LaTeX parsing, so it's just rendered as normal text. So while it's not a problem for future posts, I did have to spend half an hour going through every dollar sign in every post I've made to check it and fix it if necessary, and that was very dull.

Fourthly and finally, I think there might be a bug somewhere in the toolchain; I may have fixed it by the time you read this, and if I have you would not have seen it in the rotation matrix above, but here's a screenshot showing how it renders as I'm writing this:

Badly-rendered parentheses on the blog

By comparison, here's the same LaTeX rendered by KaTex's demo site:

Nicely-rendered parentheses from KaTex

You can see that the parentheses are much better in the second case, being the full height of the matrix. I'll need to dig in to work out what the problem is there. It could be an error on my side -- perhaps some kind of config thing or some missing CSS that I didn't realise I needed -- or a bug in latex2mathml or even in the browser. Either way, I'm sure it's fixable in some way.

[Update: looks like this is specifically a Chromium thing -- it's messed up on Chrome/Chromium/Brave, but looks great in Firefox and Safari mobile.]

[Further update: now fixed!]

Anyway, hopefully this means that in the future, any maths I want to put into my posts will be properly formatted rather than ugly ASCII-art type things.