It's still worth blogging in the age of AI
My post about blogging as writing the tutorial that you wished you'd found really took off on Hacker News. There were a lot of excellent comments, but one thing kept coming up: what's the point in blogging if people are using ChatGPT, Claude and DeepSeek to spoon-feed them answers? Who, apart from the AIs, will read what you write?
I was asking myself the same question when I started blogging semi-regularly again last year, and this post is an attempt to summarise why I decided that it was worthwhile. The TL;DR: blogging isn't just about being read -- it's about learning and thinking, and having a durable proof that you can do both.
Going through the archives
I've been looking through my archives over the last few days while laid up with a minor medical problem. Along the way, I stumbled across a draft from 2021 -- written just after I moved my blog from Wordpress to a static site generator -- that never saw the light of day.
I really don't know why I didn't finish and publish it at the time -- it's got some good stuff. And in the light of my decision to write more TIL deep dive posts, it's particularly relevant now. So here it is, dusted off and lightly updated.
One of the main changes in moving this blog over to my new static site generator has been the change in the format for the posts. I wrote it so that it could still render (reasonably well) the semi-HTML that Wordpress uses to save posts, but its main source language is Markdown, with which I can use code fences for syntax-highlighted code blocks, and a number of other nice typographical tricks -- and posts look much better with a few changes to take advantage of that.
So over the last few days, I’ve spent more hours than I probably should have going through all of my old posts and hand-converting them to Markdown. There was absolutely no need to do this, but it felt like the right thing to do. A touch of OCD? Well, possibly, but there have been other benefits.
On the benefits of learning in public
While laid up with a minor but annoying medical issue over the last week, I've blogged more than usual. I've also spent some time reading through the archives here, and come to the conclusion that the best posts I've made -- at least from my perspective -- follow a similar pattern. They're posts where I've been learning how to do something, or how something worked, and presented what I've found as a summary, often as a tutorial.
I think of these as writing the post that I wished I'd found when I started learning whatever it was.
Basic matrix maths for neural networks: in practice
This is the second post in my short series of tutorials on matrix operations for neural networks, targeted at beginners, and at people who have some practical experience, but who haven't yet dug into the underlying theory. Again, if you're an experienced ML practitioner, you should skip this post -- though if you want to read it anyway, any comments or suggestions for improvements would be much appreciated!
In my last post in the series, I showed how to derive the formulae to run a neural network from the basic principles of matrix maths. I gave two formulae that are generally used in mathematical treatments of NNs -- one with a separate bias matrix:
...and one with the bias terms baked into the weights matrix, and the inputs matrix extended with a row of s at the bottom:
However, I finished off by saying that in real production implementations, people normally use this instead:
...which you might have seen in production PyTorch code looking like this:
Z = X @ W.T + B
This post explores why that form of the equation works better in practice.
Basic matrix maths for neural networks: the theory
I thought it would be worth writing a post on how matrix multiplication is used to calculate the output of neural networks. We use matrices because they make the maths easier, and because GPUs can work with them efficiently, allowing us to do a whole bunch of calculations with a single step -- so it's really worth having a solid grounding in what the underlying operations are.
If you're an experienced ML practitioner, you should skip this post. But you might find it useful if you're a beginner -- or if, like me until I started working through this, you've coded neural networks and used matrix operations for them, but apart from working through an example or two by hand, you've never thought through the details.
In terms of maths, I'll assume that you know what a vector is, what a matrix is, and have some vague memories of matrix multiplication from your schooldays, but that's it -- everything else I will define.
In terms of neural networks, I'll assume that you are aware of their basic layout and how they work in a general sense -- but there will be diagrams for clarity and I'll define specific terms.
So, with expectations set, let's go!
On the perils of AI-first debugging -- or, why Stack Overflow still matters in 2025
"My AI hype/terror level is directly proportional to my ratio of reading news about it to actually trying to get things done with it."
This post may not age well, as AI-assisted coding is progressing at an absurd rate. But I think that this is an important thing to remember right now: current LLMs can not only hallucinate, but they can misweight the evidence available to them, and make mistakes when debugging that human developers would not. If you don't allow for this you can waste quite a lot of time!
Getting MathML to render properly in Chrome, Chromium and Brave
The other day I posted about adding mathematical typesetting to this blog using markdown2, LaTeX and MathML. One problem that remained at the end of that was that it looked a bit rubbish; in particular, the brackets surrounding matrices were just one line high, albeit centred, like this:
...rather than stretched to the height of the matrix, like this example from KaTex:
After posting that, I discovered that the problem only existed in Chromium-based browsers. I saw it in Chromium, Chrome and Brave on Android and Linux, but in Firefox on Linux, and on Safari on an iPhone, it rendered perfectly well.
Guided by the answers to this inexplicably-quiet Stack Overflow question,
I discovered that the prolem is the math fonts available on Chromium-based browsers.
Mathematical notation, understandably, needs specialised fonts. Firefox and Safari
either have these pre-installed, or do something clever to adapt the fonts you
are using (I suspect the former, but Firefox developer tools told me that it was
using my default body text font for <math>
elements). Chromium-based browsers
do not, so you need to provide one in your CSS.
Using Frédéric Wang's MathML font test page,
I decided I wanted to use the STIX font. It was a bit tricky to find a downloadable
OTF file (you specifically need the "math" variant of the font -- in the same way
as you might find -italic
and -bold
files to download, you can find -math
ones) but I eventually found a link on this MDN page.
I put the .otf
file in my font assets directory, then added the appropriate stuff
to my CSS -- a font face definition:
@font-face {
font-family: 'STIX-Two-Math';
src: url('/fonts/STIXTwoMath-Regular.otf') format('opentype');
}
...and a clause saying it should be used for <math>
tags:
math {
font-family: STIX-Two-Math;
font-size: larger;
}
The larger
font size is because by default it was rendering about one third of
the height of my body text -- not completely happy about that, as it feels like an
ad-hoc hack, but it will do for now.
Anyway, mathemetical stuff now renders pretty well! Here's the matrix from above, using my new styling:
I hope that's useful for anyone else hitting the same problem.
[Update: because RSS readers don't load the CSS, the bad rendering still shows up in NewsBlur's Android app, which I imagine must be using Chrome under the hood for its rendering. Other RSS readers are probably the same :-(]
Adding mathematical typesetting to the blog
I've spent a little time over the weekend adding the ability to post stuff in mathematical notation on this blog. For example:
It should render OK in any browser released after early 2023; I suspect that many RSS readers won't be able to handle it right now, but that will hopefully change over time. [Update: my own favourite, NewsBlur, handles it perfectly!]
Here's why I wanted to do that, and how I did it.
Blog design update
I was recently reading some discussions on Twitter (I've managed to lose the links, sadly) where people were debating why sites have dark mode. One story that I liked went like this:
Back in the late 80s and 90s, computer monitors were CRTs. These were pretty bright, so people would avoid white backgrounds. For example, consider the light-blue-on-dark-blue colour scheme of the Commodore 64. The only exception I can remember is the classic Mac, which was black on a white background -- and I think I remember having to turn the brightness of our family SE-30 down to make it less glaring.
When the Web came along in the early 90s, non-white backgrounds were still the norm -- check out the screenshot of the original Mosaic browser on this page.
But then, starting around 2000 or so, we all started switching to flat-panel displays. These had huge advantages -- no longer did your monitor have to be deeper and use up more desk space just to have a larger viewable size. And they used less power and were more portable. They had one problem, though -- they were a bit dim compared to CRTs. But that was fine; designers adapted, and black-on-white became common, because it worked, wasn't too bright, and mirrored the ink-on-paper aesthetic that made sense as more and more people came online.
Since then, it's all changed. Modern LCDs and OLEDs are super-bright again. But, or so the story goes, design hasn't updated yet. Instead, people are used to black on white -- and those that find it rather like having a light being shone straight in their face ask for dark mode to make it all better again.
As I said, this is just a story that someone told on Twitter -- but the sequence of events matches what I remember in terms of tech and design. And it certainly made me think that my own site's black-on-white colour scheme was indeed pretty glaring.
So all of this is a rather meandering introduction to the fact that I've changed the design here. The black-on-parchment colour scheme for the content is actually a bit of a throwback to the first website I wrote back in 1994 (running on httpd on my PC in my college bedroom). In fact, probably the rest of the design echoes that too, but it's all in modern HTML with responsive CSS, with the few JavaScript bits ported from raw JS to htmx.
Feedback welcome! In particular, I'd love to hear about accessibility issues or stuff that's just plain broken on particular systems -- I've checked on my phone, in various widths on Chrome (with and without the developer console "mobile emulation" mode enabled) and on Sara's iPhone, but I would not be surprised if there are some configurations where it just doesn't work.
Writing an LLM from scratch, part 7 -- wrapping up non-trainable self-attention
This is the seventh post in my series of notes on Sebastian Raschka's book "Build a Large Language Model (from Scratch)". Each time I read part of it, I'm posting about what I found interesting or needed to think hard about, as a way to help get things straight in my own head -- and perhaps to help anyone else that is working through it too.
This post is a quick one, covering just section 3.3.2, "Computing attention weights for all input tokens". I'm covering it in a post on its own because it gets things in place for what feels like the hardest part to grasp at an intuitive level -- how we actually design a system that can learn how to generate attention weights, which is the subject of the next section, 3.4. My linear algebra is super-rusty, and while going through this one, I needed to relearn some stuff that I think I must have forgotten sometime late last century...