pam-unshare: a PAM module that switches into a PID namespace
Today in my 10% time at PythonAnywhere (we're
a bit less lax than Google) I wrote
a PAM module that lets you configure a
Linux system so that when someone su
s, sudo
s, or ssh
es in, they are put
into a private PID namespace. This means that they can't see anyone else's
processes, either via ps
or via /proc
. It's definitely not production-ready,
but any feedback on it would be very welcome.
In this blog post I explain why I wrote it, and how it all works, including some of the pitfalls of using PID namespaces like this and how I worked around them.
Writing a reverse proxy/loadbalancer from the ground up in C, part 4: Dealing with slow writes to the network
This is the fourth step along my road to building a simple C-based reverse
proxy/loadbalancer, rsp, so that I can understand how
nginx/OpenResty
works -- more background here.
Here are links to the first part,
where I showed the basic networking code required to write a proxy that could
handle one incoming connection at a time and connect it with a single backend, to
the second part,
where I added the code to handle multiple connections by using
epoll
, and to
the third part,
where I started using Lua to configure the proxy.
This post was was unplanned; it shows how I fixed a bug that I discovered when
I first tried to use rsp to act as a reverse proxy in front of this blog. The
bug is fixed, and you're now reading this via rsp [update, later: sadly, no longer true].
The problem was that when the connection from a browser to the proxy was slower
than the connection from the proxy to the backend (that is, most of the time),
then when new data was received from the backend and we tried to send it to the
client, we sometimes got an error to tell us that the client was not ready.
This error was being ignored, so a block of data would be skipped, so the pages
you got back would be missing chunks.
There's more about the bug here.
This post describes the fix.
Writing a reverse proxy/loadbalancer from the ground up in C, pause to regroup: fixed it!
It took a bit of work, but the bug is fixed: rsp now handles correctly the case when it can't write as much as it wants to the client side. I think this is enough for it to properly work as a front-end for this website, so it's installed and running here. If you're reading this (and I've not had to switch it off in the meantime) then the pages you're reading were served over rsp. Which is very pleasing :-)
The code needs a bit of refactoring before I can present it, and the same bug still exists on the communicating-to-backends side (which is one of the reasons it needs refactoring -- this is something I should have been able to fix in one place only) so I'll do that over the coming days, and then do another post.
Writing a reverse proxy/loadbalancer from the ground up in C, pause to regroup: non-blocking output
Before moving on to the next step in my from-scratch reverse proxy, I thought it would be nice to install it on the machine where this blog runs, and proxy all access to the blog through it. It would be useful dogfooding and might show any non-obvious errors in the code. And it did.
I found that while short pages were served up perfectly well, longer pages were corrupted and interrupted halfway through. Using curl gave various weird errors, eg.
curl: (56) Problem (3) in the Chunked-Encoded data
...which is a general error saying that it's receiving chunked data and the chunking is invalid.
Doubly strangely, these problems didn't happen when I ran the proxy on the machine where I'm developing it and got it to proxy the blog; only when I ran it on the same machine as the blog. They're different versions of Ubuntu, the blog server being slightly older, but not drastically so -- and none of the stuff I'm using is that new, so it seemed unlikely to be a bug in the blog server's OS. And anyway, select isn't broken.
After a ton of debugging with printf
s here there and everywhere, I tracked it
down.
Writing a reverse proxy/loadbalancer from the ground up in C, part 3: Lua-based configuration
This is the third step along my road to building a simple C-based reverse
proxy/loadbalancer so that I can understand how
nginx/OpenResty works --
more background here. Here's
a link to the first part,
where I showed the basic networking code required to write a proxy that could handle
one incoming connection at a time and connect it with a single backend, and
the second part,
where I added the code to handle multiple connections by using
epoll
.
This post is much shorter than the last one. I wanted to make the minimum changes to introduce some Lua-based scripting -- specifically, I wanted to keep the same proxy with the same behaviour, and just move the stuff that was being configured via command-line parameters into a Lua script, so that just the name of that script would be specified on the command line. It was really easy :-) -- but obviously I may have got it wrong, so as ever, any comments and corrections would be much appreciated.
Writing a reverse proxy/loadbalancer from the ground up in C, part 2: handling multiple connections with epoll
This is the second step along my road to building a simple C-based reverse proxy/loadbalancer so that I can understand how nginx/OpenResty works -- more background here. Here's a link to the first part, where I showed the basic networking code required to write a proxy that could handle one incoming connection at a time and connect it with a single backend.
This (rather long) post describes a version that uses Linux's
epoll API to handle multiple simultaneous
connections -- but it still just sends all of them down to the same backend
server. I've tested it using the Apache ab
server benchmarking tool,
and over a million requests, 100 running concurrently, it adds about 0.1ms to
the average request time as compared to a direct connection to the web server,
which is pretty good going at this early stage. It also doesn't appear to leak
memory, which is doubly good going for someone who's not coded in C since the
late 90s. I'm pretty sure it's not totally stupid code, though obviously
comments and corrections would be much appreciated!
[UPDATE: there's definitely one bug in this version -- it doesn't gracefully handle cases when the we can't send data to the client as fast as we're receiving it from the backend. More info here.]
Writing a reverse proxy/loadbalancer from the ground up in C, part 1: a trivial single-threaded proxy
This is the first step along my road to building a simple C-based reverse
proxy/loadbalancer so that I can understand how
nginx /
OpenResty works --
more explanation here.
It's called rsp
, for Really Simple Proxy. This version listens for
connections on a particular port, specified on the command line; when one is
made it sends the request down to a backend -- another server with an associated
port, also specified on the command line -- and sends whatever comes back from
the backend back to the person who made the original connection. It can only
handle one connection at a time -- while it's handling one, it just queues up
others, and it handles them in turn. This will, of course, change later.
I'm posting this in the hope that it might help people who know Python, and some basic C, but want to learn more about how the OS-level networking stuff works. I'm also vaguely hoping that any readers who code in C day to day might take a look and tell me what I'm doing wrong :-)
Writing a reverse proxy/loadbalancer from the ground up in C, part 0: introduction
We're spending a lot of time on nginx configuration at PythonAnywhere. We're a platform-as-a-service, and a lot of people host their websites with us, so it's important that we have a reliable load-balancer to receive all of the incoming web traffic and appropriately distribute it around backend web-server nodes.
nginx is a fantastic, possibly unbeatable tool for this. It's fast, reliable, and lightweight in terms of CPU resources. We're using the OpenResty variant of it, which adds a number of useful modules -- most importantly for us, one for Lua scripting, which means that we can dynamically work out where to send traffic as the hits come in.
It's also quite simple to configure at a basic level. You want all incoming requests for site X to go to backend Y? Just write something like this:
server {
server_name X
listen 80;
location / {
proxy_set_header Host $host;
proxy_pass Y;
}
}
Simple enough. Lua scripting is pretty easy to add -- you just put an extra
directive before the proxy_pass
that provides some Lua code to run, and then
variables you set in the code can be accessed from the proxy_pass
.
But there are many more complicated options. worker_connections
,
tcp_nopush
, sendfile
, types_hash_max_size
... Some are reasonably easy to
understand with a certain amount of reading, some are harder.
I'm a big believer that the best way to understand something complex is to try to build your own simple version of it. So, in my copious free time, I'm going to start putting together a simple loadbalancer in C. The aim isn't to rewrite nginx or OpenResty; it's to write enough equivalent functionality that I can better understand what they are really doing under the hood, in the same way as writing a compiler for a toy language gives you a better understanding of how proper compilers work. I'll get a good grasp on some underlying OS concepts that I have only a vague appreciation of now. It's also going to be quite fun coding in C again. I've not really written any since 1997.
Anyway, I'll document the steps I take here on this blog; partly because there's a faint chance that it might be interesting to other experienced Python programmers whose C is rusty or nonexistent and want to get a view under the hood, but mostly because the best way to be sure you really understand it is to try to explain it to other people.
I hope it'll be interesting!
Here's a link to the first post in the series: Writing a reverse proxy/loadbalancer from the ground up in C, part 1: a trivial one-shot proxy.
A bit of fun
This week's unofficial meme on the Unofficial Planet Python seems to be to name the programming languages you've learned. Here's Eric Florenzano's list (hat tip) -- it looks like the meme was started by Corey Goldberg -- and here's my list:
- BASIC (an odd ICL dialect, then Spectrum, Commodore, Amstrad, BBC, and QuickBasic)
- Z80 Assembler
- Pascal
- C
- Hypertalk (remember that?
Answer 'Are you sure?' with 'Yes' or 'No'. If it is 'Yes' then...
) - Logo
- Prolog
- LISP
- C++
- ML
- Modula-3
- Neil (proprietary, probably still in use at IST)
- Java
- JavaScript
- C#
- Python
Hmm. It looks like I've slowed down. Time to pick up that Erlang tutorial again...