I started playing with docker a while ago, and like most people I was instantly impressed with its power and ease of use. Simplicity is one of docker’s core tenants, and much of docker’s power is abstracted behind simple cli commands. As I was learning to use docker, I wanted to know what it was doing in the background to make things happen, especially around networking (one of my primary areas of interest).
I found numerous documentation for how to create and manipulate container networks, but not as many when it came to how docker makes container networking work. Docker extensively uses linux iptables and bridge interfaces, and this post is my summary of how that is used to create container networks. Most of this information came from github discussion threads, presentations, as well as my own testing, and I link to a number of helpful resources at the end of this post.
I used docker 1.12.3 for the examples in this post. This is not meant as a comprehensive description of docker networking nor as an introduction to docker networking. I hope it might add some insights for users, and I would appreciate any feedback or comments on errors or anything missing.
Contents
- Docker Networks Overview
- Docker Bridge Networks
- Summary
- Links/Resources
- Part II: Container Connectivity in Docker Swarm and Overlay Networks
Docker Networks Overview
Docker’s networking is built on top of the Container Network Model (CNM) which allows any party to write their own network driver. This allows for different network types to be available to containers running on the docker engine, and containers can connect to more than one network type at the same time. In addition to the various third party network drivers available, docker comes with four built-in network drivers:
-
Bridge: This is the default network that containers are launched in. The connectivity is facilitated through a bridge interface on the docker host. Containers using the same bridge network have their own subnet, and can communicate with each other (by default).
-
Host: This driver allows a container to have access to the docker host’s own network space (The container will see and use the same interfaces as the docker host).
-
Macvlan: This driver allows containers to have direct access to an interface or subinterface (vlan) of the host. It also allows trunking.
-
Overlay: This driver allows for networks to be built across multiple hosts running docker (usually a docker swarm cluster). Containers also have their own subnet and network addresses, and can communicate with each other directly even if they are running on different physical hosts.
Bridge and overlay networks are probably the most commonly used network drivers, and I will be mostly concerned with these two drivers in this article and the next.
Docker Bridge Networks
The default network for containers running on a docker host is a bridge network. Docker creates a default bridge network named ‘bridge’ when first installed. We can see this network by listing all networks docker network ls:
$ docker network ls NETWORK ID NAME DRIVER SCOPE 3e8110efa04a bridge bridge local bb3cd79b9236 docker_gwbridge bridge local 22849c4d1c3a host host local 3kuba8yq3c27 ingress overlay swarm ecbd1c6c193a none null local
To inspect its properties run docker network inspect bridge:
$ docker network inspect bridge [ { "Name": "bridge", "Id": "3e8110efa04a1eb0923d863af719abf5eac871dbac4ae74f133894b8df4b9f5f", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": null, "Config": [ { "Subnet": "172.18.0.0/16", "Gateway": "172.18.0.1" } ] }, "Internal": false, "Containers": {}, "Options": { "com.docker.network.bridge.default_bridge": "true", "com.docker.network.bridge.enable_icc": "true", "com.docker.network.bridge.enable_ip_masquerade": "true", "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0", "com.docker.network.bridge.name": "docker0", "com.docker.network.driver.mtu": "1500" }, "Labels": {} } ]
You can also create your own bridge networks by using the docker network create command and specifying the option --driver bridge, for example docker network create --driver bridge --subnet 192.168.100.0/24 --ip-range 192.168.100.0/24 my-bridge-network creates another bridge network, with the name ‘my-bridge-network’ and subnet 192.168.100.0/24.
Linux bridge interfaces
Each bridge network that docker creates is represented by a bridge interface on the docker host. The default bridge network ‘bridge’ usually has the interface docker0 associated with it, and each subsequent bridge network that is created with the docker network create command will have a new interface associated with it.
$ ifconfig docker0 docker0 Link encap:Ethernet HWaddr 02:42:44:88:bd:75 inet addr:172.18.0.1 Bcast:0.0.0.0 Mask:255.255.0.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
To find out the linux interface associated with the docker network you created, you can use ifconfig to list all interfaces and then find the interface that has the subnet you specified, for example if we wanted to look up the bridge interface for my-bridge-network which we created above, we can run the following:
$ ifconfig | grep 192.168.100. -B 1 br-e6bc7d6b75f3 Link encap:Ethernet HWaddr 02:42:bc:f1:91:09 inet addr:192.168.100.1 Bcast:0.0.0.0 Mask:255.255.255.0
The linux bridge interfaces are similar to switches in their function in that they connect different interfaces to the same subnet, and forward traffic based on MAC addresses. As we shall see below, each container connected to a bridge network will have its own virtual interface created on the docker host, and the docker engine will connect all containers in the same network to the same bridge interface, which will allow them to communicate with each other. You can get more details about the status of the bridge by using the brctl utility:
$ brctl show docker0 bridge name bridge id STP enabled interfaces docker0 8000.02424488bd75 no
Once we have containers running and connected to this network, we will see each container’s interface listed under the interfaces column. And running a traffic capture on the bridge interface will allow us to see intercommunication between containers on the same subnet.
Linux virtual interfaces (veth)
The Container Networking Model (CNM) allows each container to have its own network space. Running ifconfig from inside the container will show its interfaces as the container sees them:
$ docker run -ti ubuntu:14.04 /bin/bash root@6622112b507c:/# root@6622112b507c:/# ifconfig eth0 Link encap:Ethernet HWaddr 02:42:ac:12:00:02 inet addr:172.18.0.2 Bcast:0.0.0.0 Mask:255.255.0.0 inet6 addr: fe80::42:acff:fe12:2/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:9 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:766 (766.0 B) TX bytes:508 (508.0 B) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
However, the eth0 seen above is only available from within that container, and outside on the docker host, docker creates a twin virtual interface that corresponds to it and acts as a link to the outside world. These virtual interfaces are then connected to the bridge interfaces discussed above to facilitate connectivity between different containers on the same subnet.
We can review this process by starting two containers connected to the default bridge network, and then view the interface configuration on the docker host.
Before running starting any containers, the docker0 bridge interface has no interfaces attached:
$ sudo brctl show docker0 bridge name bridge id STP enabled interfaces docker0 8000.02424488bd75 no
I then started two containers from the ubuntu:14.04 image:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a754719db594 ubuntu:14.04 "/bin/bash" 5 seconds ago Up 4 seconds zen_kalam 976041ec420f ubuntu:14.04 "/bin/bash" 7 seconds ago Up 5 seconds stupefied_easley
You can immediately see that there are now two interfaces attached to the docker0 bridge interface (one for each container):
$ sudo brctl show docker0 bridge name bridge id STP enabled interfaces docker0 8000.02424488bd75 no veth2177159 vethd8e05dd
Starting a ping to google from one of the containers, then doing a traffic capture on the container’s virtual interface from the docker host will show us the containers traffic:
$ docker exec a754719db594 ping google.com PING google.com (216.58.217.110) 56(84) bytes of data. 64 bytes from iad23s42-in-f110.1e100.net (216.58.217.110): icmp_seq=1 ttl=48 time=0.849 ms 64 bytes from iad23s42-in-f110.1e100.net (216.58.217.110): icmp_seq=2 ttl=48 time=0.965 ms ubuntu@swarm02:~$ sudo tcpdump -i veth2177159 icmp tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on veth2177159, link-type EN10MB (Ethernet), capture size 262144 bytes 20:47:12.170815 IP 172.18.0.3 > iad23s42-in-f14.1e100.net: ICMP echo request, id 14, seq 55, length 64 20:47:12.171654 IP iad23s42-in-f14.1e100.net > 172.18.0.3: ICMP echo reply, id 14, seq 55, length 64 20:47:13.170821 IP 172.18.0.3 > iad23s42-in-f14.1e100.net: ICMP echo request, id 14, seq 56, length 64 20:47:13.171694 IP iad23s42-in-f14.1e100.net > 172.18.0.3: ICMP echo reply, id 14, seq 56, length 64
Similarly we can do a ping from one container to another.
First, we need to get the IP address of the container which we can do by either running ifconfig in the container or inspecting the container using the docker inspect command:
$ docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' a754719db594 172.18.0.3
Then start a ping from one container to another:
$ docker exec 976041ec420f ping 172.18.0.3 PING 172.18.0.3 (172.18.0.3) 56(84) bytes of data. 64 bytes from 172.18.0.3: icmp_seq=1 ttl=64 time=0.070 ms 64 bytes from 172.18.0.3: icmp_seq=2 ttl=64 time=0.053 ms
To see this traffic from the docker host, we can do a capture on either of the virtual interfaces corresponding to the containers, or we can do a capture on the bridge interface (docker0 in this instance) which shows all inter-container communication on that subnet:
$ sudo tcpdump -ni docker0 host 172.18.0.2 and host 172.18.0.3 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes 20:55:37.990831 IP 172.18.0.2 > 172.18.0.3: ICMP echo request, id 14, seq 200, length 64 20:55:37.990865 IP 172.18.0.3 > 172.18.0.2: ICMP echo reply, id 14, seq 200, length 64 20:55:38.990828 IP 172.18.0.2 > 172.18.0.3: ICMP echo request, id 14, seq 201, length 64 20:55:38.990866 IP 172.18.0.3 > 172.18.0.2: ICMP echo reply, id 14, seq 201, length 64
Locate a container’s veth interface
There is no straightforward way for finding which veth interface on the docker host is linked to the interface within a container, but there are several methods discussed in various docker forum and github threads. The easiest in my opinion is the following (Based on the solution in this thread with a slight modification) which depends on having ethtool accessible in the container:
For example, I have three containers running on my system:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ccbf97c72bf5 ubuntu:14.04 "/bin/bash" 3 seconds ago Up 3 seconds admiring_torvalds 77d9f02d61f2 ubuntu:14.04 "/bin/bash" 4 seconds ago Up 4 seconds goofy_borg 19743c0ddf24 ubuntu:14.04 "/bin/sh" 8 minutes ago Up 8 minutes high_engelbart
First I execute the following in the container, and get the peer_ifindex number:
$ docker exec 77d9f02d61f2 sudo ethtool -S eth0 NIC statistics: peer_ifindex: 16
Then on the docker host, I use the peer_ifindex to find the interface name:
$ sudo ip link | grep 16 16: veth7bd3604@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP mode DEFAULT group default
So the interface name in this case is veth7bd3604.
iptables
Docker uses linux iptables to control communication to and from the interfaces and networks it creates. Linux iptables consist of different tables, but we are primarily concerned with two: filter and nat. Filter is the security rules table used to allow or deny traffic to IP addresses, networks or interfaces, whereas nat contains the rules responsible for masking IP addresses or ports. Docker uses nat to allow containers on bridge networks to communicate with destinations outside the docker host (otherwise routes pointing to the container networks would have to be added in the docker host’s network).
iptables:filter
Tables in iptables consist of different chains that correspond to different conditions or stages in processing a packet on the docker host. The filter table has 3 chains by default: Input chain for processing packets arriving at the host and destined for the same host, output chain for packets originating on the host to an outside destination, and forward are for packets entering the host but with a destination outside the host. Each chain consists of some rules that dictate some action to be taken on the packet (for example reject or accept the packet) as well as conditions for matching the rule. Rules are processed in sequence until a match is found, otherwise the default policy of the chain is applied. It is also possible to define custom chains in a table.
To view the currently configured rules and default policies for chains in the filter table, run iptables -t filter -L (or iptables -L since the filter table is used by default if no table is specified):
$ sudo iptables -t filter -L Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT tcp -- anywhere anywhere tcp dpt:domain ACCEPT udp -- anywhere anywhere udp dpt:domain ACCEPT tcp -- anywhere anywhere tcp dpt:bootps ACCEPT udp -- anywhere anywhere udp dpt:bootps Chain FORWARD (policy ACCEPT) target prot opt source destination DOCKER-ISOLATION all -- anywhere anywhere DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere DOCKER all -- anywhere anywhere ACCEPT all -- anywhere anywhere ctstate RELATED,ESTABLISHED ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere ACCEPT all -- anywhere anywhere DROP all -- anywhere anywhere Chain OUTPUT (policy ACCEPT) target prot opt source destination Chain DOCKER (3 references) target prot opt source destination Chain DOCKER-ISOLATION (1 references) target prot opt source destination DROP all -- anywhere anywhere DROP all -- anywhere anywhere DROP all -- anywhere anywhere DROP all -- anywhere anywhere DROP all -- anywhere anywhere DROP all -- anywhere anywhere RETURN all -- anywhere anywhere
Highlighted are the different chains, and the default policy for each chain (There are no default policies for custom chains). We can also see that Docker has added two custom chains: Docker and Docker-Isolation, and has inserted rules in the Forward chain that have these two new chains as targets.
Docker-isolation chain
Docker-isolation contains rules that restrict access between the different container networks. To see more details, use the -v option when running iptables:
$ sudo iptables -t filter -L -v …. Chain DOCKER-ISOLATION (1 references) pkts bytes target prot opt in out source destination 0 0 DROP all -- br-e6bc7d6b75f3 docker0 anywhere anywhere 0 0 DROP all -- docker0 br-e6bc7d6b75f3 anywhere anywhere 0 0 DROP all -- docker_gwbridge docker0 anywhere anywhere 0 0 DROP all -- docker0 docker_gwbridge anywhere anywhere 0 0 DROP all -- docker_gwbridge br-e6bc7d6b75f3 anywhere anywhere 0 0 DROP all -- br-e6bc7d6b75f3 docker_gwbridge anywhere anywhere 36991 3107K RETURN all -- any any anywhere anywhere
You can see above a number of drop rules that block traffic between any of the bridge interfaces created by docker, thus making sure that container networks cannot communicate.
icc=false
One of the options that can be passed to the docker network create command is com.docker.network.bridge.enable_icc, which stands for inter-container communication. Setting this option to false blocks containers on the same network from communicating with each other. This is carried out by adding a drop rule in the forward chain that matches on packets coming from the bridge interface associated with the network destined for the same interface.
For example, if we create a new network with the command docker network create --driver bridge --subnet 192.168.200.0/24 --ip-range 192.168.200.0/24 -o "com.docker.network.bridge.enable_icc"="false" no-icc-network
$ ifconfig | grep 192.168.200 -B 1 br-8e3f0d353353 Link encap:Ethernet HWaddr 02:42:c4:6b:f1:40 inet addr:192.168.200.1 Bcast:0.0.0.0 Mask:255.255.255.0 $ sudo iptables -t filter -S FORWARD -P FORWARD ACCEPT -A FORWARD -j DOCKER-ISOLATION -A FORWARD -o br-8e3f0d353353 -j DOCKER -A FORWARD -o br-8e3f0d353353 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -i br-8e3f0d353353 ! -o br-8e3f0d353353 -j ACCEPT -A FORWARD -o docker0 -j DOCKER -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -i docker0 ! -o docker0 -j ACCEPT -A FORWARD -i docker0 -o docker0 -j ACCEPT -A FORWARD -o br-e6bc7d6b75f3 -j DOCKER -A FORWARD -o br-e6bc7d6b75f3 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -i br-e6bc7d6b75f3 ! -o br-e6bc7d6b75f3 -j ACCEPT -A FORWARD -i br-e6bc7d6b75f3 -o br-e6bc7d6b75f3 -j ACCEPT -A FORWARD -o docker_gwbridge -j DOCKER -A FORWARD -o docker_gwbridge -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A FORWARD -i docker_gwbridge ! -o docker_gwbridge -j ACCEPT -A FORWARD -o lxcbr0 -j ACCEPT -A FORWARD -i lxcbr0 -j ACCEPT -A FORWARD -i docker_gwbridge -o docker_gwbridge -j DROP -A FORWARD -i br-8e3f0d353353 -o br-8e3f0d353353 -j DROP
iptables:nat
NAT allows the host to change the IP address or port of a packet. In this instance, it is used to mask the source IP address of packets coming from docker bridge networks (for example hosts in the 172.18.0.0/24 subnet) destined to the outside world, behind the IP address of the docker host. This feature is controlled by the com.docker.network.bridge.enable_ip_masquerade option that can be passed to docker network create (If not specified, then it defaults to true).
You can see the effect of this command in the nat table of iptables:
$ sudo iptables -t nat -L Chain PREROUTING (policy ACCEPT) target prot opt source destination DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL Chain INPUT (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination DOCKER all -- anywhere !127.0.0.0/8 ADDRTYPE match dst-type LOCAL Chain POSTROUTING (policy ACCEPT) target prot opt source destination MASQUERADE all -- 172.18.0.0/16 anywhere MASQUERADE all -- 192.168.100.0/24 anywhere MASQUERADE all -- 172.19.0.0/16 anywhere MASQUERADE all -- 10.0.3.0/24 !10.0.3.0/24 Chain DOCKER (2 references) target prot opt source destination RETURN all -- anywhere anywhere RETURN all -- anywhere anywhere RETURN all -- anywhere anywhere
In the postrouting chain, you can see all the docker networks created with the action of masquerade applied to them when communicating with any host outside their own network.
Summary
- A bridge network has a corresponding linux bridge interface on the docker host that acts as a layer2 switch, and which connects different containers on the same subnet.
- Each network interface in a container has a corresponding virtual interface on the docker host that is created while the container is running.
- A traffic capture from the docker host on the bridge interface is equivalent to configuring a SPAN port on a switch in that you can see all inter-container communication on that network.
- A traffic capture from the docker host on the virtual interface (veth-*) will show all traffic the container is sending on a particular subnet.
- Linux iptables rules are used to block different networks (and sometimes hosts within the network) from communicating using the filter table. These rules are usually added in the DOCKER-ISOLATION chain.
- Containers communicating with the outside world through a bridge interface have their IP hidden behind the docker host’s IP address. This is done by adding rules to the nat table in iptables.
This is fantastic !!! Helped me a lot ! Thank you !
ReplyDeleteThank you for the feedback! I'm glad you found it helpful.
DeleteAwesome post!
ReplyDeleteDo you know how to block all outgoing traffic? The internal network type blocks incoming traffic as well.
ReplyDeleteThere are multiple ways to do this. One way would be to remove the iptables rules that allow the bridge networks' outgoing traffic to any destination, which would be the following rules in the FORWARD chain (from the example in the post there were two bridge networks and each has its own rule):
Delete-A FORWARD -i br-8e3f0d353353 ! -o br-8e3f0d353353 -j ACCEPT
-A FORWARD -i docker0 ! -o docker0 -j ACCEPT
Another option is to add a drop rule in the DOCKER-ISOLATION chain. If we wanted to do this for the docker0 bridge network, the rule can be added with the following command (make sure the rule is above the last rule in that chain):
sudo iptables -A DOCKER-ISOLATION -i docker0 ! -o docker0 -j DROP
You can of course be more granular to control what destinations are allowed/denied.
I hope this helps.
This comment has been removed by the author.
DeleteI think it's useless to add a drop rule in the DOCKER-ISOLATION chain. It's really strange that i can't block my telnet hostip:30004 request no matter where i add the drop rule to FORWARD OR DOCKER-ISOLATION chian.
Deletelike this:
-A FORWARD -i docker0 -o docker0 -p tcp -m tcp --dport 30004 -j DROP
-A DOCKER-ISOLATION -i docker0 -p tcp -m tcp --dport 30004 -j DROP
but it works when i added the drop rule to FORWARD if it's in docker 1.9.1 .
Very Nice demos man... great work.
ReplyDeleteAny idea on how to pass mcast requests on standard ports, say for example i am running ws-discovery as docker and in this case discovery requests will be on port 3702. For the requests to reach ws-discovery docker from host - any changes to be made or what changes to be made?
ReplyDelete