Giles' blog

An HTTP request's journey through a platform-as-a-service

Posted on 20 August 2014 in Programming, Python, PythonAnywhere, Talks |

I'm definitely getting better as a public speaker :-) At EuroPython in Berlin last month, I gave a high-level introduction to PythonAnywhere's load-balancing system. There's a video up on PyVideo: An HTTP request's journey through a platform-as-a-service. And here are the slides [PDF].

A fun bug

Posted on 28 March 2014 in Programming, PythonAnywhere |

While I'm plugging the memory leaks in my epoll-based C reverse proxy, I thought I might share an interesting bug we found today on PythonAnywhere. The following is the bug report I posted to our forums.

So, here's what was happening.

Each web app someone has on PythonAnywhere runs on a backend server. We have a cluster of these backends, and the cluster is behind a loadbalancer. Every backend server in the cluster is capable of running any web app; the loadbalancer's job is to spread things out between them so that each one at any given time is only running an appropriately-sized subset of them. It has a list of backends, which we can update in realtime as we add or remove backends to scale up or down, and it looks at incoming requests and uses the domain name to work out which backend to route a request to.

That's all pretty simple. The twist comes when we add the code that reload web apps to the mix.

Reloading a PythonAnywhere web app is simply a case of making an authenticated request to a specific URL. For example, right now (and this might change, it's not an official API, so don't do anything that relies on it) to reload www.foo.com owned by user fred, you'd hit the URL http://www.pythonanywhere.com/user/fred/webapps/www.foo.com/reload

Now, the PythonAnywhere website itself is just another web app running on one of the backends (a bit recursive, I know). So most requests to it are routed based on the normal loadbalancing algorithm. But calls specifically to that "reload" URL need to be routed differently -- they need to go to the specific backend that is running the site that needs to be reloaded. So, for that URL, and that URL only, the loadbalancer uses the domain name that's specified second-to-the-end in the path bit of the URL to choose which backend to route the request to, instead of using the hostname at the start of the URL.

So, what happened here? Well, the clue was in the usernames of the people who were affected by the problem -- IronHand and JoeButy. Both of you have mixed-case usernames. And your web apps are ironhand.pythonanywhere.com and joebuty.pythonanywhere.com.

But the code on the "Web" tab that specifies the URL for reloading the selected domain specifies it using your mixed-case usernames -- that is, it specifies that the reload calls should go to the URL for IronHand.pythonanywhere.com or JoeButy.pythonanywhere.com.

And you can probably guess what the problem was -- the backend selection code was case-sensitive. So requests to your web apps were going to one backend, but reload messages were going to another different backend. The fix I just pushed made the backend selection code case-insensitive, as it should have been.

The remaining question -- why did this suddenly crop up today? My best guess is that it's been there for a while, but it was significantly less likely to happen, and so it was written off as a glitch when it happened in the past.

The reason it's become more common is that we actually more than doubled the number of backends yesterday. Because of the way the backend selection code works, when there's a relatively small number of backends it's actually quite likely that the lower-case version of your domain will, by chance, route to the same backend as the mixed-case one. But the doubling of the number of servers changed that, and suddenly the probability that they'd route differently went up drastically.

Why did we double the number of servers? Previously, backends were m1.xlarge AWS instances. We decided that it would be better to have a larger number of smaller backends, so that problems on one server impacted a smaller number of people. So we changed our system to use m1.large instances instead, span up slightly more than twice as many backend servers, and switched the loadbalancer across.

So, there you have it. I hope it was as interesting to read about as it was to figure out :-)

...just resting...

Posted on 12 December 2013 in Linux, Programming |

Just a quick note to say that I'm still here! Using rsp as a front-end for this site has usefully shown up some weird bugs, and I'm tracking them down. I'll do a new post about it when there's something useful to say...

Writing a reverse proxy/loadbalancer from the ground up in C, part 4: Dealing with slow writes to the network

Posted on 10 October 2013 in Linux, Programming |

This is the fourth step along my road to building a simple C-based reverse proxy/loadbalancer, rsp, so that I can understand how nginx/OpenResty works -- more background here. Here are links to the first part, where I showed the basic networking code required to write a proxy that could handle one incoming connection at a time and connect it with a single backend, to the second part, where I added the code to handle multiple connections by using epoll, and to the third part, where I started using Lua to configure the proxy.

This post was was unplanned; it shows how I fixed a bug that I discovered when I first tried to use rsp to act as a reverse proxy in front of this blog. The bug is fixed, ~~and you're now reading this via rsp~~ [update, later: sadly, no longer true]. The problem was that when the connection from a browser to the proxy was slower than the connection from the proxy to the backend (that is, most of the time), then when new data was received from the backend and we tried to send it to the client, we sometimes got an error to tell us that the client was not ready. This error was being ignored, so a block of data would be skipped, so the pages you got back would be missing chunks. There's more about the bug here.

This post describes the fix.

[ Read more ]

...and another sidetrack -- a new theme!

Posted on 3 October 2013 in Blogging, Meta, Website design |

While I was at it, I figured that this blog was looking ridiculously dated. So I've fixed that with the Iconic One Wordpress theme, with a few tweaks that I think make it look a bit cleaner.

A brief sidetrack: Varnish

Posted on 2 October 2013 in Linux, Programming |

In order to use this blog as a decent real-world test of rsp, I figured that I should make it as fast as possible. The quickest way to do that was to install Varnish, which is essentially a reverse proxy that caches stuff. You configure it to say what is cachable, and then it runs in place of the web server and proxies anything it can't cache back to it.

I basically used the instructions from Ewan Leith's excellent "10 Million hits a day with Wordpress using a $15 server" post.

So now, this server has:

rsp running on port 80, proxying everything to port 83.
varnish running on port 83, caching what it can and proxying the rest to port 81.
nginx running on port 81, serving static pages and sending PHP stuff to php5-fpm on port 9000.

I've also got haproxy running on port 82, doing the same as rsp -- proxying everything to varnish -- so that I can do some comparative speed tests once rsp does enough for such tests to give interesting results. Right now, all of the speed differences seem to be in the noise, with a run of ab pointed at varnish actually coming out slower than the two proxies.

Writing a reverse proxy/loadbalancer from the ground up in C, pause to regroup: fixed it!

Posted on 29 September 2013 in Linux, Programming |

It took a bit of work, but the bug is fixed: rsp now handles correctly the case when it can't write as much as it wants to the client side. I think this is enough for it to properly work as a front-end for this website, so it's installed and running here. If you're reading this (and I've not had to switch it off in the meantime) then the pages you're reading were served over rsp. Which is very pleasing :-)

The code needs a bit of refactoring before I can present it, and the same bug still exists on the communicating-to-backends side (which is one of the reasons it needs refactoring -- this is something I should have been able to fix in one place only) so I'll do that over the coming days, and then do another post.

Writing a reverse proxy/loadbalancer from the ground up in C, pause to regroup: non-blocking output

Posted on 28 September 2013 in Linux, Programming |

Before moving on to the next step in my from-scratch reverse proxy, I thought it would be nice to install it on the machine where this blog runs, and proxy all access to the blog through it. It would be useful dogfooding and might show any non-obvious errors in the code. And it did.

I found that while short pages were served up perfectly well, longer pages were corrupted and interrupted halfway through. Using curl gave various weird errors, eg.

curl: (56) Problem (3) in the Chunked-Encoded data

...which is a general error saying that it's receiving chunked data and the chunking is invalid.

Doubly strangely, these problems didn't happen when I ran the proxy on the machine where I'm developing it and got it to proxy the blog; only when I ran it on the same machine as the blog. They're different versions of Ubuntu, the blog server being slightly older, but not drastically so -- and none of the stuff I'm using is that new, so it seemed unlikely to be a bug in the blog server's OS. And anyway, select isn't broken.

After a ton of debugging with printfs here there and everywhere, I tracked it down.

[ Read more ]

Writing a reverse proxy/loadbalancer from the ground up in C, part 3: Lua-based configuration

Posted on 11 September 2013 in Linux, Programming |

This is the third step along my road to building a simple C-based reverse proxy/loadbalancer so that I can understand how nginx/OpenResty works -- more background here. Here's a link to the first part, where I showed the basic networking code required to write a proxy that could handle one incoming connection at a time and connect it with a single backend, and the second part, where I added the code to handle multiple connections by using epoll.

This post is much shorter than the last one. I wanted to make the minimum changes to introduce some Lua-based scripting -- specifically, I wanted to keep the same proxy with the same behaviour, and just move the stuff that was being configured via command-line parameters into a Lua script, so that just the name of that script would be specified on the command line. It was really easy :-) -- but obviously I may have got it wrong, so as ever, any comments and corrections would be much appreciated.

[ Read more ]

Writing a reverse proxy/loadbalancer from the ground up in C, part 2: handling multiple connections with epoll

Posted on 7 September 2013 in Linux, Programming |

This is the second step along my road to building a simple C-based reverse proxy/loadbalancer so that I can understand how nginx/OpenResty works -- more background here. Here's a link to the first part, where I showed the basic networking code required to write a proxy that could handle one incoming connection at a time and connect it with a single backend.

This (rather long) post describes a version that uses Linux's epoll API to handle multiple simultaneous connections -- but it still just sends all of them down to the same backend server. I've tested it using the Apache ab server benchmarking tool, and over a million requests, 100 running concurrently, it adds about 0.1ms to the average request time as compared to a direct connection to the web server, which is pretty good going at this early stage. It also doesn't appear to leak memory, which is doubly good going for someone who's not coded in C since the late 90s. I'm pretty sure it's not totally stupid code, though obviously comments and corrections would be much appreciated!

[UPDATE: there's definitely one bug in this version -- it doesn't gracefully handle cases when the we can't send data to the client as fast as we're receiving it from the backend. More info here.]

[ Read more ]