Giving up on the AI chatbot tutorial (for now)
I'm a big fan of learning in public, and early last year I started trying to do that by writing an AI chatbot tutorial as I learned the technology myself. But somehow it just wasn't working -- perhaps because my understanding was evolving so quickly that each time I sat down to write, I spotted dozens of errors in the previous posts, and felt I should fix those first. So I've decided to give up on that one, at least for now.
So, back to something a bit more achievable! Some lab notes will be coming on things I've been working on, including -- later on this evening -- a post about an oddity I found the other day.
In the meantime, here's a blog post I did for PythonAnywhere late last year: Five steps to create your own PythonAnywhere AI guru, on PythonAnywhere.
Building an AI chatbot for beginners: part 2
[Note that this series kind of dried up; when I started the series, I knew that I knew very little about the subject, but I was hoping to learn better by learning in public. However, as time went by it turned out that this wasn't working. There are a lot of better tutorials out there!]
Welcome to the second part of my tutorial on how to build a chatbot using OpenAI's interface to their Large Language Models (LLMs)! You can read the introduction here, and the first part here. As a reminder, I'm writing this not because I'm an expert, but because I'm learning how to do it myself, and writing about it helps me learn faster. Caveat lector :-)
In this post, we'll give the bot some memory of the conversation so far.
At the end of the first part, we had a program that would accept input from a user, combine it with some static text to make a prompt that an LLM would complete in the character of a chatbot (stopping at the point that the chatbot should stop, and not trying to carry on the conversation), then send it to OpenAI's API specifying an LLM model, and print out the result.
Building an AI chatbot for beginners: part 1
[Note that this series kind of dried up; when I started the series, I knew that I knew very little about the subject, but I was hoping to learn better by learning in public. However, as time went by it turned out that this wasn't working. There are a lot of better tutorials out there!]
Welcome to the first part of my tutorial on how to build a chatbot using OpenAI's interface to their Large Language Models (LLMs)! You can read the introduction here.
If you're reading this and want to get the best out of it, I strongly recommend that you run the code on your own machine as you go along: trust me, it will stick in your mind much better if you do that.
The goal in this post is to write a basic bot script that accepts user input, and just bounces it off an OpenAI LLM to generate a response.
Building an AI chatbot for beginners: part 0
[Note that this series kind of dried up; when I started the series, I knew that I knew very little about the subject, but I was hoping to learn better by learning in public. However, as time went by it turned out that this wasn't working. There are a lot of better tutorials out there!]
Like a lot of people, I've been blown away by the capabilities of Large Language Model (LLM) based systems over the last few months. I'm using ChatGPT regularly for all kinds of things, from generating basic code to debugging errors to writing emails.
I wanted to understand more about how these tools worked, and feel strongly that there's no better way to learn something than by doing it. Building an LLM is, at least right now, super-expensive -- in the millions of dollars (although maybe that will be coming down fast?). It also requires a lot of deep knowledge to get to something interesting. Perhaps something to try in the future, but not right now.
However, using LLMs to create something interesting -- that's much easier, especially because OpenAI have a powerful API, which provides ways to do all kinds of stuff. Most relevantly, they provide access to a Completion API. That, as I understand it, is the lowest-level way of interacting with an LLM, so building something out of it is probably the best bang for the buck for learning.
Over the last few weeks I've put together a bunch of things I found interesting, and have learned a lot. But it occurred to me that an even better way to learn stuff than by building it is to build it, and then explain it to someone else, even if that person is an abstract persona for "someone out there on the Internet". So: time for a LLM chatbot tutorial!
Python code to generate Let's Encrypt certificates
I spent today writing some Python code to request certificates from Let's Encrypt. I couldn't find much in the way of simple sample code out there, so I thought it would be worth sharing some. It uses the acme Python package, which is part of the certbot client script.
It's worth noting that none of this is useful stuff if you just want to get a Let's Encrypt certificate for your website; scripts like certbot and dehydrated are what you need for that. This code and the explanation below are for people who are building their own systems to manage Let's Encrypt certs (perhaps for a number of websites) or who want a reasonably simple example showing a little more of what happens under the hood.
Creating a time series from existing data in pandas
pandas is a high-performance library for data analysis in Python. It's generally excellent, but if you're a beginner or you use it rarely, it can be tricky to find out how to do quite simple things -- the code to do what you want is likely to be very clear once you work it out, but working it out can be relatively hard.
A case in point, which I'm posting here largely so that I can find it again next
time I need to do the same thing... I had a list start_times
of dictionaries,
each of which had (amongst other properties) a timestamp and a value. I wanted
to create a pandas time series object to represent those values.
The code to do that is this:
import pandas as pd
series = pd.Series(
[cs["value"] for cs in start_times],
index=pd.DatetimeIndex([cs["timestamp"] for cs in start_times])
)
Perfectly clear once you see it, but it did take upwards of 40 Google searches and help from two colleagues with a reasonable amount of pandas experience to work out what it should be.
Parsing website SSL certificates in Python
A kindly PythonAnywhere user dropped us a line today to point out that StartCom and WoSign's SSL certificates are no longer going to be supported in Chrome, Firefox and Safari. I wanted to email all of our customers who were using certificates provided by those organisations.
We have all of the domains we host stored in a database, and it was surprisingly hard to find out how I could take a PEM-formatted certificate (the normal base-64 encoded stuff surrounded by "BEGIN CERTIFICATE" and "END CERTIFICATE") in a string and find out who issued it.
After much googling, I finally found the right search terms to get to this Stack Overflow post by mhawke, so here's my adaptation of the code:
from OpenSSL import crypto
for domain in domains:
cert = crypto.load_certificate(crypto.FILETYPE_PEM, domain.cert)
issuer = cert.get_issuer().CN
if issuer is None:
# This happened with a Cloudflare-issued cert
continue
if "startcom" in issuer.lower() or "wosign" in issuer.lower():
# send the user an email
An HTTP request's journey through a platform-as-a-service
I'm definitely getting better as a public speaker :-) At EuroPython in Berlin last month, I gave a high-level introduction to PythonAnywhere's load-balancing system. There's a video up on PyVideo: An HTTP request's journey through a platform-as-a-service. And here are the slides [PDF].
How many Python programmers are there in the world?
We've been talking to some people recently who really wanted to know what the potential market size was for PythonAnywhere, our Python Platform-as-a-Service and cloud-based IDE.
There are a bunch of different ways to look at that, but the most obvious starting point is, "how many people are coding Python?" This blog post is an attempt to get some kind of order-of-magnitude number for that.
First things first: Wikipedia has an estimate of 10 million Java developers (though I couldn't find the numbers to back that up on the cited pages) but nothing for Python -- or, indeed, any of the other languages I checked. So nothing there.
A bit of Googling around gets one interesting hit; in this Stack Overflow answer, "Tall Jeff" says that the 2007 version of Learning Python estimated that there were 1 million Python programmers in the world. Using Amazon's "Look inside" feature on the current edition, they still have the same number but for the present day, but let's assume that they were right originally and the number has grown since then. Now, according to the Python wiki, there were 586 people at the 2007 PyCon. According to the front page at PyCon.org, there were 2,500 people at PyCon 2013. So if we take that as a proxy for the growth of the language, we get one guess of the number of Python developers: 4.3 million.
Let's try another metric. Python.org's web statistics are public. Looking at the first five months of this year, and adding up the total downloads, we get:
Jan: | 2,584,754 |
Feb: | 2,539,177 |
Mar: | 3,182,946 |
Apr: | 3,199,012 |
May: | 2,855,033 |
Averaging that over a year gives us 34,466,213 downloads per year. It's worth noting that these are overwhelmingly Windows downloads -- most Linux users are going to be using the versions packaged as part of their distro, and (I think, but correct me if I'm wrong) the same is largely going to be the case on the Mac.
So, 34.5 million downloads. There were ten versions of Python released over the last year, so for let's assume that each developer downloaded each version once and once only; that gives us 3.5 million Python programmers on Windows.
What other data points are there? This job site aggregator's blog post suggests using searches for resumes/CVs as a way of getting numbers. Their suggested search for Python would be
(intitle:resume OR inurl:resume) Python -intitle:jobs -resumes -apply
Being in the UK, where we use "CV" more than we use "resume", I tried this:
(intitle:resume OR inurl:resume OR intitle:cv OR inurl:cv) Python -intitle:jobs -resumes -apply
The results were unfortunately completely useless. 338,000 hits but the only actual CV/resume on the first page was Guido van Rossum's -- everything else was about the OpenCV computer vision library, or about resuming things.
So let's scrap that. What else can we do? Well, taking inspiration (and some raw data) from this excellent blog post about estimating the number of Java programmers in the world, we can do this calculation:
- Programmers in the world: 43,000,000 (see the link above for the calculation)
- Python developers as per the latest TIOBE ranking: 4.183%, which gives 1,798,690
- Python developers as per the latest LangPop.com ranking: 7% (taken by an approximate ratio of the Python score to the sum of the scores of all languages), which gives 2,841,410
OK, so there I'm multiplying one very approximate number of programmers by a "percentage" rating that doesn't claim to be a percentage of programmers using a given language. But this ain't rocket science, I can mix and match units if I want.
The good news is, we're in the same order of magnitude; we've got numbers of 1.8 million, 2.8 million, 3.5 million, and 4.3 million. So, based on some super-unscientific guesswork, I think I can happily say that the number of Python programmers in the world is in the low millions.
What do you think? Are there other ways of working this out that I've missed? Does anyone have (gasp!) hard numbers?
PythonAnywhereAnywhere
We recently added something cool to PythonAnywhere, our Python online IDE and web hosting environment -- if you're writing a tutorial, or anything else where you'd find a Python console useful in a web page, you can use one of ours! Check it out:
What's particularly cool about these consoles (apart from the fact that they advertise the world's best Python IDE-in-a-browser) is that they keep the session data on a per-client basis -- so, if you put one on multiple pages of your tutorial, the user's previous state is kept as they navigate from page to page! The downside (or is it an upside?) is that this state is also kept from site to site, so if they go from your page to someone else's, they'll have the state they had when they were trying out yours.
Bug or feature? Let me know what you think in the comments...