An AI chatroom (a few steps further)

Posted on 30 December 2024 in Programming, Python, AI

Still playing hooky from "Build a Large Language Model (from Scratch)" -- I was on our support rota today and felt a little drained afterwards, so decided to finish off my AI chatroom. The the codebase is now in a state where I'm reasonably happy with it -- it's not production-grade code by any stretch of the imagination, but the structure is acceptable, and it has the basic functionality I wanted:

Here's a short chat with them:

A conversation in the AI chatroom

The important thing I found today was that, as I suspected, the AIs find it very confusing if all messages from bots have the assistant role. They're trained in a way that seems to map to "assistant means you", so if other messages come in with that role, they get confused about what they have said and what was said by others. So changing things so that each AI receives only its messages with that role, while the others were all tagged with a role of user, seemed to improve matters a lot.

It was also important to make sure that the assistant messages matched what they had actually said. You can see from the image above that messages from the AIs have bot emojis then their names with square brackets in front of them. That's important for the UI -- so that the humans can tell which bot is which -- and also useful when sending the non-assistant messages to the AIs so that they can do likewise. However, when that kind of "decorator" was in front of the assistant messages -- so they did not match what the AI had said in the past -- it seemed to cause confusion.

Once I'd worked that out, I had to do some prompt engineering work to stop them from putting their own "signatures" in front of their responses. Claude and DeepSeek seemed particularly keen on doing that. I was eventually able to stop them from doing that with

These identifiers are provided by the chat system, you should NOT under any circumstances start your own messages with {ai_identifier},

...but then DeepSeek decided to interpret that in the silliest possible way, and managed to make Claude have what appears to be an existential crisis:

DeepSeek impersonates Claude

So I had to extend that to:

These identifiers are provided by the chat system, you should NOT under any circumstances start your own messages with {ai_identifier}, or anything that makes it look like you are a different AI.

...and add this to the top:

You are {ai_identifier}, a helpful AI assistant.

This seems to work surprisingly well! I'll spend some time chatting with it over the coming days. Maybe, working together, Claude, ChatGPT, Grok and DeepSeek can help me get over this hump with understanding self-attention. Or perhaps the conversations will degenerate in to AI surrealism. Should be fun either way!