Giles' blog

About

Contact

Writing a reverse proxy/loadbalancer from the ground up in C, pause to regroup: fixed it!

Posted on 29 September 2013 in C, Linux, TIL deep dives |

It took a bit of work, but the bug is fixed: rsp now handles correctly the case when it can't write as much as it wants to the client side. I think this is enough for it to properly work as a front-end for this website, so it's installed and running here. If you're reading this (and I've not had to switch it off in the meantime) then the pages you're reading were served over rsp. Which is very pleasing :-)

The code needs a bit of refactoring before I can present it, and the same bug still exists on the communicating-to-backends side (which is one of the reasons it needs refactoring -- this is something I should have been able to fix in one place only) so I'll do that over the coming days, and then do another post.

Writing a reverse proxy/loadbalancer from the ground up in C, pause to regroup: non-blocking output

Posted on 28 September 2013 in C, Linux, TIL deep dives |

Before moving on to the next step in my from-scratch reverse proxy, I thought it would be nice to install it on the machine where this blog runs, and proxy all access to the blog through it. It would be useful dogfooding and might show any non-obvious errors in the code. And it did.

I found that while short pages were served up perfectly well, longer pages were corrupted and interrupted halfway through. Using curl gave various weird errors, eg.

curl: (56) Problem (3) in the Chunked-Encoded data

...which is a general error saying that it's receiving chunked data and the chunking is invalid.

Doubly strangely, these problems didn't happen when I ran the proxy on the machine where I'm developing it and got it to proxy the blog; only when I ran it on the same machine as the blog. They're different versions of Ubuntu, the blog server being slightly older, but not drastically so -- and none of the stuff I'm using is that new, so it seemed unlikely to be a bug in the blog server's OS. And anyway, select isn't broken.

After a ton of debugging with printfs here there and everywhere, I tracked it down.

[ Read more ]

Writing a reverse proxy/loadbalancer from the ground up in C, part 3: Lua-based configuration

Posted on 11 September 2013 in C, Linux, TIL deep dives |

This is the third step along my road to building a simple C-based reverse proxy/loadbalancer so that I can understand how nginx/OpenResty works -- more background here. Here's a link to the first part, where I showed the basic networking code required to write a proxy that could handle one incoming connection at a time and connect it with a single backend, and the second part, where I added the code to handle multiple connections by using epoll.

This post is much shorter than the last one. I wanted to make the minimum changes to introduce some Lua-based scripting -- specifically, I wanted to keep the same proxy with the same behaviour, and just move the stuff that was being configured via command-line parameters into a Lua script, so that just the name of that script would be specified on the command line. It was really easy :-) -- but obviously I may have got it wrong, so as ever, any comments and corrections would be much appreciated.

[ Read more ]

Writing a reverse proxy/loadbalancer from the ground up in C, part 2: handling multiple connections with epoll

Posted on 7 September 2013 in C, Linux, TIL deep dives |

This is the second step along my road to building a simple C-based reverse proxy/loadbalancer so that I can understand how nginx/OpenResty works -- more background here. Here's a link to the first part, where I showed the basic networking code required to write a proxy that could handle one incoming connection at a time and connect it with a single backend.

This (rather long) post describes a version that uses Linux's epoll API to handle multiple simultaneous connections -- but it still just sends all of them down to the same backend server. I've tested it using the Apache ab server benchmarking tool, and over a million requests, 100 running concurrently, it adds about 0.1ms to the average request time as compared to a direct connection to the web server, which is pretty good going at this early stage. It also doesn't appear to leak memory, which is doubly good going for someone who's not coded in C since the late 90s. I'm pretty sure it's not totally stupid code, though obviously comments and corrections would be much appreciated!

[UPDATE: there's definitely one bug in this version -- it doesn't gracefully handle cases when the we can't send data to the client as fast as we're receiving it from the backend. More info here.]

[ Read more ]