Writing a reverse proxy/loadbalancer from the ground up in C, part 1: a trivial single-threaded proxy
This is the first step along my road to building a simple C-based reverse proxy/loadbalancer so that I can understand how nginx/OpenResty works -- more explanation here. It's called rsp
, for Really Simple Proxy. This version listens for connections on a particular port, specified on the command line; when one is made it sends the request down to a backend -- another server with an associated port, also specified on the command line -- and sends whatever comes back from the backend back to the person who made the original connection. It can only handle one connection at a time -- while it's handling one, it just queues up others, and it handles them in turn. This will, of course, change later.
I'm posting this in the hope that it might help people who know Python, and some basic C, but want to learn more about how the OS-level networking stuff works. I'm also vaguely hoping that any readers who code in C day to day might take a look and tell me what I'm doing wrong :-)
Writing a reverse proxy/loadbalancer from the ground up in C, part 0: introduction
We're spending a lot of time on nginx configuration at PythonAnywhere. We're a platform-as-a-service, and a lot of people host their websites with us, so it's important that we have a reliable load-balancer to receive all of the incoming web traffic and appropriately distribute it around backend web-server nodes.
nginx is a fantastic, possibly unbeatable tool for this. It's fast, reliable, and lightweight in terms of CPU resources. We're using the OpenResty variant of it, which adds a number of useful modules -- most importantly for us, one for Lua scripting, which means that we can dynamically work out where to send traffic as the hits come in.
It's also quite simple to configure at a basic level. You want all incoming requests for site X to go to backend Y? Just write something like this:
server { server_name X listen 80;location / { proxy_set_header Host $host; proxy_pass Y; } }
Simple enough. Lua scripting is pretty easy to add -- you just put an extra directive before the proxy_pass
that provides some Lua code to run, and then variables you set in the code can be accessed from the proxy_pass
.
But there are many more complicated options. worker_connections
, tcp_nopush
, sendfile
, types_hash_max_size
... Some are reasonably easy to understand with a certain amount of reading, some are harder.
I'm a big believer that the best way to understand something complex is to try to build your own simple version of it. So, in my copious free time, I'm going to start putting together a simple loadbalancer in C. The aim isn't to rewrite nginx or OpenResty; it's to write enough equivalent functionality that I can better understand what they are really doing under the hood, in the same way as writing a compiler for a toy language gives you a better understanding of how proper compilers work. I'll get a good grasp on some underlying OS concepts that I have only a vague appreciation of now. It's also going to be quite fun coding in C again. I've not really written any since 1997.
Anyway, I'll document the steps I take here on this blog; partly because there's a faint chance that it might be interesting to other experienced Python programmers whose C is rusty or nonexistent and want to get a view under the hood, but mostly because the best way to be sure you really understand it is to try to explain it to other people.
I hope it'll be interesting!
Here's a link to the first post in the series: Writing a reverse proxy/loadbalancer from the ground up in C, part 1: a trivial one-shot proxy