Writing a reverse proxy/loadbalancer from the ground up in C, part 2: handling multiple connections with epoll
This is the second step along my road to building a simple C-based reverse proxy/loadbalancer so that I can understand how nginx/OpenResty works -- more background here. Here's a link to the first part, where I showed the basic networking code required to write a proxy that could handle one incoming connection at a time and connect it with a single backend.
This (rather long) post describes a version that uses Linux's
epoll API to handle multiple simultaneous
connections -- but it still just sends all of them down to the same backend
server. I've tested it using the Apache ab
server benchmarking tool,
and over a million requests, 100 running concurrently, it adds about 0.1ms to
the average request time as compared to a direct connection to the web server,
which is pretty good going at this early stage. It also doesn't appear to leak
memory, which is doubly good going for someone who's not coded in C since the
late 90s. I'm pretty sure it's not totally stupid code, though obviously
comments and corrections would be much appreciated!
[UPDATE: there's definitely one bug in this version -- it doesn't gracefully handle cases when the we can't send data to the client as fast as we're receiving it from the backend. More info here.]
Writing a reverse proxy/loadbalancer from the ground up in C, part 1: a trivial single-threaded proxy
This is the first step along my road to building a simple C-based reverse
proxy/loadbalancer so that I can understand how
nginx /
OpenResty works --
more explanation here.
It's called rsp
, for Really Simple Proxy. This version listens for
connections on a particular port, specified on the command line; when one is
made it sends the request down to a backend -- another server with an associated
port, also specified on the command line -- and sends whatever comes back from
the backend back to the person who made the original connection. It can only
handle one connection at a time -- while it's handling one, it just queues up
others, and it handles them in turn. This will, of course, change later.
I'm posting this in the hope that it might help people who know Python, and some basic C, but want to learn more about how the OS-level networking stuff works. I'm also vaguely hoping that any readers who code in C day to day might take a look and tell me what I'm doing wrong :-)
Writing a reverse proxy/loadbalancer from the ground up in C, part 0: introduction
We're spending a lot of time on nginx configuration at PythonAnywhere. We're a platform-as-a-service, and a lot of people host their websites with us, so it's important that we have a reliable load-balancer to receive all of the incoming web traffic and appropriately distribute it around backend web-server nodes.
nginx is a fantastic, possibly unbeatable tool for this. It's fast, reliable, and lightweight in terms of CPU resources. We're using the OpenResty variant of it, which adds a number of useful modules -- most importantly for us, one for Lua scripting, which means that we can dynamically work out where to send traffic as the hits come in.
It's also quite simple to configure at a basic level. You want all incoming requests for site X to go to backend Y? Just write something like this:
server {
server_name X
listen 80;
location / {
proxy_set_header Host $host;
proxy_pass Y;
}
}
Simple enough. Lua scripting is pretty easy to add -- you just put an extra
directive before the proxy_pass
that provides some Lua code to run, and then
variables you set in the code can be accessed from the proxy_pass
.
But there are many more complicated options. worker_connections
,
tcp_nopush
, sendfile
, types_hash_max_size
... Some are reasonably easy to
understand with a certain amount of reading, some are harder.
I'm a big believer that the best way to understand something complex is to try to build your own simple version of it. So, in my copious free time, I'm going to start putting together a simple loadbalancer in C. The aim isn't to rewrite nginx or OpenResty; it's to write enough equivalent functionality that I can better understand what they are really doing under the hood, in the same way as writing a compiler for a toy language gives you a better understanding of how proper compilers work. I'll get a good grasp on some underlying OS concepts that I have only a vague appreciation of now. It's also going to be quite fun coding in C again. I've not really written any since 1997.
Anyway, I'll document the steps I take here on this blog; partly because there's a faint chance that it might be interesting to other experienced Python programmers whose C is rusty or nonexistent and want to get a view under the hood, but mostly because the best way to be sure you really understand it is to try to explain it to other people.
I hope it'll be interesting!
Here's a link to the first post in the series: Writing a reverse proxy/loadbalancer from the ground up in C, part 1: a trivial one-shot proxy.
SNI-based reverse proxying with Go(lang)
Short version for readers who know all about this kind of stuff: we built a simple reverse-proxy server in Go that load-balances HTTP requests using the
Hosts
header and HTTPS using the SNIs from the client handshake. Backends are selected per-host from sets stored in a redis database. It works pretty well, but we won't be using it because it can't send the originating client IP to the backends when it's handling HTTPS. Code here.
We've been looking at options to load-balance our user's web applications at PythonAnywhere; this post is about something we considered but eventually abandoned; I'm posting it because the code might turn out to be useful to other people.
OpenCL: .NET, C# and Resolver One integration -- the very beginnings
Today I wrote the code required to call part of the OpenCL API from Resolver One; just one function so far, and all it does is get some information about your hardware setup, but it was great to get it working. There are already .NET bindings for OpenCL, but I felt that it was worthwhile reinventing the wheel -- largely as a way of making sure I understood every spoke, but also because I wanted the simplest possible API, with no extra code to make it more .NETty. It should also work as an example of how you can integrate a C library into a .NET/IronPython application like Resolver One.
I'll be documenting the whole thing when it's a bit more finished, but if you want to try out the work in progress, and are willing to build the interop code, here's how:
- Make sure you have OpenCL installed -- here's the NVIDA OpenCL download page, and here's the OpenCL page for ATI. I've only tested this with NVIDIA so far, so I'm keen to hear of any incompatibilities.
- Clone the dot-net-opencl project from Resolver Systems' GitHub account.
- Load up the
DotNetOpenCL.sln
project file in the root of the project using Visual C# 2008 (here's the free "Express" version if you don't have it already). - Build the project
- To try it out from IronPython, run
ipy test_clGetPlatformIDs.py
- To try it in Resolver One, load
test_clGetPlatformIDs.rsl
That should be it! If you want to look at the code, the only important bit is in
DotNetOpenCL.cs
-- and it's simply an external method definition... the tricky
bit was in working out which OpenCL function to write an external definition for,
and what that definition should look like.
I've put a slightly tidied version of the notes I kept as I implemented this below, for posterity's sake; if you're interested in finding out how the implementation went, read on...
OpenCL: first investigations with an NVIDIA card
I'm taking a look at OpenCL at the moment, with the vague intention of hooking it up to Resolver One. In case you've not heard about it, OpenCL is a language that allows you to do non-graphical computing on your graphics card (GPU). Because GPUs have more raw computational power than even modern CPUs, in the form of a large number of relatively slow stream processors, this can speed up certain kinds of calculations -- in particular, those that are amenable to massive parallelisation.
Until recently, the two main graphics card manufacturers had their own languages for this kind of general-purpose GPU computing; NVIDIA had CUDA, and ATI/AMD had their Stream technology. OpenCL was created as a way of having one language that would work on all graphics cards, so although the tools for developing using it are not currently as good as those for CUDA (which has been around for a long time and has great support), as a longer-term investment OpenCL looks to me like the best one to be looking at.
It took a little bit of work to get something up and running on my machine here at work, so it's probably worth documenting to help others who are trying the same.
Getting phpBB to accept Django sessions
phpBB is a fantastic bulletin board system. We use it at Resolver Systems for our forums, and it does a great job.
However, we're a Python shop, so we prefer to do our serious web development -- for example, the login system that allows our paying customers to download fully-featured unlocked versions of our software -- in Django.
We needed to have a single sign-on system for both parts of our website. Specifically, we wanted people to be able to log in using the Django authentication module, and then to be able to post on the forums without logging in again. This post is an overview of the code we used; I've had to extract it from various sources, so it might not be complete -- let me know in the comments if anything's missing.