Fun with network namespaces, part 1
Linux has some amazing kernel features to enable containerization. Tools like Docker are built on top of them, and at PythonAnywhere we have built our own virtualization system using them.
One part of these systems that I've not spent much time poking into is network
namespaces. Namespaces are a
general abstraction that allows you to separate out system resources; for example,
if a process is in a mount namespace, then it has its own set of mounted disks
that is separate from those seen by the other processes on a machine -- or if it's in a
process namespace,
then it has its own cordoned-off set of processes visible to it (so,
say, ps auxwf
will just show the ones in its namespace).
As you might expect from that, if you put a process into a network namespace, it will have its own restricted view of what the networking environment looks like -- it won't see the machine's main network interface,
This provides certain advantages when it comes to security, but one that I thought was interesting is that because two processes inside different namespaces would have different networking environments, they could both bind to the same port -- and then could be accessed from outside via port forwarding.
To put that in more concrete terms: my goal was to be able to start up two Flask servers on the same machine, both bound to port 8080 inside their own namespace. I wanted to be able to access one of them from outside by hitting port 6000 on the machine, and the other by hitting port 6001.
Here is a run through how I got that working; it's a lightly-edited set of my "lab notes".
Creating a network namespace and looking inside
The first thing to try is just creating a network namespace, called netns1
:
# ip netns add netns1
Now, you can "go into" the created namespace by using ip netns exec
namespace-name,
so we can run Bash there and then use ip a
to see what network interfaces we have
available:
# ip netns exec netns1 /bin/bash
# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# exit
(I'll put the ip netns exec
command at the start of all code blocks below if
the block in question needs to be run inside the namespace, even when it's not
necessary, so that it's reasonably clear which commands are to be run inside
and which are not.)
So, we have a new namespace, and when we're inside it, there's only one interface available, a basic loopback interface. We can compare that with what we see with the same command outside:
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 0a:d6:01:7e:06:5b brd ff:ff:ff:ff:ff:ff
inet 10.0.0.173/24 brd 10.0.0.255 scope global dynamic ens5
valid_lft 2802sec preferred_lft 2802sec
inet6 fe80::8d6:1ff:fe7e:65b/64 scope link
valid_lft forever preferred_lft forever
There we can see the actual network card attached to the machine, which has the name ens5
.
Getting the loopback interface working
You might have noticed that the details shown for the loopback interface inside the namespace were much shorter, too -- no IPv4 or IPv6 addresses, for example. That's because the interface is down by default. Let's see if we can fix that:
# ip netns exec netns1 /bin/bash
# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# ping 127.0.0.1
ping: connect: Network is unreachable
# ip link set dev lo up
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
# ping 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.019 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
^C
--- 127.0.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.019/0.023/0.027/0.004 ms
So, we could not ping the loopback when it was down (unsurprisingly) but once we
used the ip link set dev lo up
command, it showed up as configured and was pingable.
Now we have a working loopback interface, but the external network still is down:
# ip netns exec netns1 /bin/bash
# ping 8.8.8.8
ping: connect: Network is unreachable
Again, that makes sense. There's no non-loopback interface, so there's no way to send packets to anywhere but the loopback network.
Virtual network interfaces: connecting the namespace
What we need is some kind of non-loopback network interface inside the namespace.
However, we can't just put the external interface ens5
inside there; an interface
can only be in one namespace at a time, so if we put that one in there, the external
machine would lose networking.
What we need to do is create a virtual network interface. These are created in pairs, and are essentially connected to each other. This command:
# ip link add veth0 type veth peer name veth1
Creates interfaces called veth0
and veth1
. Anything sent to veth0
will
appear on veth1
, and vice versa. It's as if they were two separate ethernet
cards, connected to the same hub (but not to anything else). Having run that
command (outside the network namespace) we can list all of our available interfaces:
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 0a:d6:01:7e:06:5b brd ff:ff:ff:ff:ff:ff
inet 10.0.0.173/24 brd 10.0.0.255 scope global dynamic ens5
valid_lft 2375sec preferred_lft 2375sec
inet6 fe80::8d6:1ff:fe7e:65b/64 scope link
valid_lft forever preferred_lft forever
5: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether ce:d5:74:80:65:08 brd ff:ff:ff:ff:ff:ff
6: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 22:55:4e:34:ce:ba brd ff:ff:ff:ff:ff:ff
You can see that they're both there, and are currently down. I read the veth1@veth0
notation as meaning "virtual interface veth1
, which is connected to the virtual
interface veth0
".
We can now move one of them -- veth1
-- into the network namespace netns1
, which means that we have
the interface outside connected to the one inside:
# ip link set veth1 netns netns1
Now, from outside, we see this:
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 0a:d6:01:7e:06:5b brd ff:ff:ff:ff:ff:ff
inet 10.0.0.173/24 brd 10.0.0.255 scope global dynamic ens5
valid_lft 2368sec preferred_lft 2368sec
inet6 fe80::8d6:1ff:fe7e:65b/64 scope link
valid_lft forever preferred_lft forever
6: veth0@if5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 22:55:4e:34:ce:ba brd ff:ff:ff:ff:ff:ff link-netns netns1
veth1
has disappeared (and veth0
is now @if5
, which is interesting -- not sure why,
though it seems to make some kind of sense given that veth1
is now inside another
namespace). But anyway, inside, we can see our moved interface:
root@giles-devweb1:~# ip netns exec netns1 /bin/bash
root@giles-devweb1:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
5: veth1@if6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether ce:d5:74:80:65:08 brd ff:ff:ff:ff:ff:ff link-netnsid 0
At this point we have a network interface outside the namespace, which is connected to an interface
inside. However, in order to actually use them, we'll need to bring the interfaces
up and set up routing. The first step is to bring the outside one up; we'll give
it the IP address 192.168.0.1
on the 192.168.0.0/24
subnet (that is, the network
covering all addresses from 192.168.0.0
to 192.168.0.255
)
# ip addr add 192.168.0.1/24 dev veth0
# ip link set dev veth0 up
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 0a:d6:01:7e:06:5b brd ff:ff:ff:ff:ff:ff
inet 10.0.0.173/24 brd 10.0.0.255 scope global dynamic ens5
valid_lft 3567sec preferred_lft 3567sec
inet6 fe80::8d6:1ff:fe7e:65b/64 scope link
valid_lft forever preferred_lft forever
6: veth0@if5: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
link/ether 22:55:4e:34:ce:ba brd ff:ff:ff:ff:ff:ff link-netns netns1
inet 192.168.0.1/24 scope global veth0
valid_lft forever preferred_lft forever
So that's all looking good; it reports "no carrier" at the moment, of course, because there's
nothing at the other end yet. Let's go into the namespace and sort that out by
bringing it up on 192.168.0.2
on the same network:
# ip netns exec netns1 /bin/bash
# ip addr add 192.168.0.2/24 dev veth1
# ip link set dev veth1 up
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
5: veth1@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether ce:d5:74:80:65:08 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.0.2/24 scope global veth1
valid_lft forever preferred_lft forever
inet6 fe80::ccd5:74ff:fe80:6508/64 scope link tentative
valid_lft forever preferred_lft forever
Now, let's try pinging from inside the namespace to the outside interface:
# ip netns exec netns1 /bin/bash
# ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=0.069 ms
64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=0.042 ms
^C
--- 192.168.0.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1024ms
rtt min/avg/max/mdev = 0.042/0.055/0.069/0.013 ms
And from outside to the inside:
# ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=0.039 ms
64 bytes from 192.168.0.2: icmp_seq=2 ttl=64 time=0.043 ms
^C
--- 192.168.0.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1018ms
rtt min/avg/max/mdev = 0.039/0.041/0.043/0.002 ms
Great!
However, of course, it's still not routed -- from inside the interface, we still can't ping Google's DNS server:
# ip netns exec netns1 /bin/bash
# ping 8.8.8.8
ping: connect: Network is unreachable
Connecting the namespace to the outside world with NAT
We need to somehow connect the network defined by our pair of virtual interfaces to the one that is accessed via our real hardware network interface, either by setting up bridging or NAT. I'm running this experiment on a machine on AWS, and I'm not sure how well that would work with bridging (my guess is, really badly), so let's go with NAT.
First we tell the network stack inside the namespace to route everything via the machine
at the other end of the connection defined by its internal veth1
IP address:
# ip netns exec netns1 /bin/bash
# ip route add default via 192.168.0.1
# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2028ms
Note that the address we specify in the ip route add default
command is the one for the end of the virtual
interface pair that is outside the process namespace, which makes sense -- we're
saying "this other machine is our router". The first time I tried this I put the
address of the interface inside the namespace there, which obviously didn't work,
as it was trying to send packets to itself for routing.
So now the networking stack inside the namespace knows where to route stuff, which
is why it no longer says "Network is unreachable", but
of course there's nothing on the other side to send it onwards, so our ping packets
are getting dropped on the floor. We need to use
iptables
to set up that side of things outside the namespace.
The first step is to tell the host that it can route stuff:
# cat /proc/sys/net/ipv4/ip_forward
0
# echo 1 > /proc/sys/net/ipv4/ip_forward
# cat /proc/sys/net/ipv4/ip_forward
1
Now that we're forwarding packets, we want to make sure that we're not just forwarding them willy-nilly around the network. If we check the current rules in the FORWARD chain (in the default "filter" table):
# iptables -L FORWARD
Chain FORWARD (policy ACCEPT)
target prot opt source destination
#
We see that the default is ACCEPT, so we'll change that to DROP:
# iptables -P FORWARD DROP
# iptables -L FORWARD
Chain FORWARD (policy DROP)
target prot opt source destination
#
OK, now we want to make some changes to the nat
iptable so that we have routing.
Let's see what we have first:
# iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
DOCKER all -- anywhere !localhost/8 ADDRTYPE match dst-type LOCAL
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- ip-172-17-0-0.ec2.internal/16 anywhere
Chain DOCKER (2 references)
target prot opt source destination
RETURN all -- anywhere anywhere
#
I have Docker installed on the machine already, and it's got some of its own NAT-based routing configured there. I don't think there's any harm in leaving that there; it's on a different subnet to the one I chose for my own stuff.
So, firstly, we'll enable masquerading from the 192.168.0.*
network onto our
main ethernet interface ens5
:
# iptables -t nat -A POSTROUTING -s 192.168.0.0/255.255.255.0 -o ens5 -j MASQUERADE
Now we'll say that we'll forward stuff that comes in on ens5
can be forwarded
to our veth0
interface, which you'll remember is the end of the virtual network
pair that is outside the namespace:
# iptables -A FORWARD -i ens5 -o veth0 -j ACCEPT
...and then the routing in the other direction:
# iptables -A FORWARD -o ens5 -i veth0 -j ACCEPT
Now, let's see what happens if we try to ping from inside the namespace
# ip netns exec netns1 /bin/bash
# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=112 time=0.604 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=112 time=0.609 ms
^C
--- 8.8.8.8 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1003ms
rtt min/avg/max/mdev = 0.604/0.606/0.609/0.002 ms
w00t!
Running a server with port-forwarding
Right, now we have a network namespace where we can operate as a network client -- processes running inside it can access the external Internet.
However, we don't have things working the other way around; we cannot run a server inside the namespace and access it from outside. For that, we need to configure port-forwarding. I'm not perfectly clear in my own mind exactly how this all works; take my explanations below with a cellar of salt...
We use the "Destination NAT" chain in iptables:
# iptables -t nat -A PREROUTING -p tcp -i ens5 --dport 6000 -j DNAT --to-destination 192.168.0.2:8080
Or, in other words, if something comes in for port 6000
then we should sent it on
to port 8080 on the interface at 192.168.0.2
(which is the end of the virtual
interface pair that is inside the namespace).
Next, we say that we're happy to forward stuff back and forth over new, established and related (not sure what that last one is) connections to the IP of our namespaced interface:
# iptables -A FORWARD -p tcp -d 192.168.0.2 --dport 8080 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
So, with that set up, we should be able to run a server inside the namespace on
port 8080. Using this Python code in the file server.py
from flask import Flask
app = Flask(__name__)
@app.route('/')
def hello_world():
return 'Hello from Flask!\n'
if __name__ == "__main__":
app.run(host='0.0.0.0', port=8080)
...then we run it:
# ip netns exec netns1 /bin/bash
# python3.7 server.py
* Serving Flask app "server" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)
...and from a completely separate machine on the same network as the one where
we're running the server, we curl
it using the machine's external IP address,
on port 6000
:
$ curl http://10.0.0.233:6000/
Hello from Flask!
$
Yay! We've successfully got a Flask server running inside a network namespace, with routing from the external network.
Running a second server in a separate namespace
Now we can set up the second server in its own namespace. Leaving the existing Flask running in the session where we started it just now, we can run through all of the steps above at speed in another:
# ip netns add netns2
# ip link add veth2 type veth peer name veth3
# ip link set veth3 netns netns2
# ip addr add 192.168.1.1/24 dev veth2
# ip link set dev veth2 up
# iptables -t nat -A POSTROUTING -s 192.168.1.0/255.255.255.0 -o ens5 -j MASQUERADE
# iptables -A FORWARD -i ens5 -o veth2 -j ACCEPT
# iptables -A FORWARD -o ens5 -i veth2 -j ACCEPT
# iptables -t nat -A PREROUTING -p tcp -i ens5 --dport 6001 -j DNAT --to-destination 192.168.1.2:8080
# iptables -A FORWARD -p tcp -d 192.168.1.2 --dport 8080 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
# ip netns exec netns2 /bin/bash
# ip link set dev lo up
# ip addr add 192.168.1.2/24 dev veth3
# ip link set dev veth3 up
# ip route add default via 192.168.1.1
# python3.7 server2.py
* Serving Flask app "server2" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)
...where server2.py
is the same Python code, modified to return a slightly different
string. Now, from our other server:
$ curl http://10.0.0.233:6000/
Hello from Flask!
$ curl http://10.0.0.233:6001/
Hello from the other Flask!
$
And we're done :-) We have two separate servers on the same machine, both of
which are bound to port 8080
inside their own network namespace, with routing
set up so they can connect out to the external network, and port-forwarding
so that they can be accessed from outside.
Summing up
This was easier than I thought it was going to be; the only part of the networking that is still fuzzy in my own mind is the port-forwarding, and I think I just need to read up a bit on exactly how that works.
It was made a lot easier by various other tutorials on network namespaces that I found around the Internet. Some shout-outs:
- I felt this LWN post summarised the basics really well.
- Ivan Sim's article on itnext.io had some good hints and tips.
- As did Diego Pino García's at Unweaving the Web.
- ...and Scott Lowes's blog was also useful.
- My port-forwarding setup was a good example of full-stack-overflow programming.
- The
iptables
man page clarified a bunch of stuff for me.
Many thanks to all of them!