A bit of background and the “old/normal way”

If you use Docker, you very quickly run into a common question: how do you make Docker work across multiple hosts, datacenters, and different clouds. One of the simplest solutions is Docker Swarm. Docker summarizes it best as “a native clustering for Docker…[which] allows you create and access to a pool of Docker hosts using the full suite of Docker tools.”

One of the biggest benefits to using Docker Swarm is that it provides the standard Docker API, which means that all of the existing Docker management tools (and 3rd party products) just work out of the box as they do with a single host. The only difference is that they now scale transparently over multiple hosts.

After reading up on it HERE and HERE, it was evident that this is a pretty simple service, but it wasn’t 100% clear what went where. After searching around the web, I realized that almost all of the tutorials and examples on Docker Swarm involved either docker-machine or very convoluted examples which did not explain what was happening on which component. With that said, here is a very simple Docker Swarm Tutorial with some practical examples.

Assuming you have a bunch of servers running docker:
vm01 (10.0.0.101), vm02 (10.0.0.102), vm03 (10.0.0.103), vm04 (10.0.0.104)

Normally, you can do “docker ps” on each host for example:
ssh vm01 ‘docker ps’
ssh vm04 ‘docker ps’

If you enable the API for remote bind on each host you can manage them from a central place:
docker -H tcp://vm01:2375 ps
docker -H tcp://vm04:2375 ps
(note: port is optional for default)

But if you want to use all of these docker engines as a cluster, you need Swarm.

Docker Swarm Tutorial and How-To/Examples

A swarm contains only two components: agents (the workers in the cluster) and manager(s).

First, grab the swarm image on each docker host:

Then, make sure the API is enabled for remote bind on each host (NOTE: see bellow if using Systemd OS):

To get everything started, go to whatever docker host you pick as the manager (in our case vm01), and create the swarm:

This will generate an unique token like:
c05c3ef4c4b15821a8e8e2ef6bdf192d

Now, on *each* AGENT (including the manager if you want to use it as a worker) run:
docker run -d swarm join –advertise agentIP:2375 token://c05c3ef4c4b15821a8e8e2ef6bdf192d

You would do this for *each* agent and in our case vm01 is also an agent)

At last, you need to run a manager service on your chosen manager host (in our case, vm01) to manage the swarm:

The idea is that the manager wants to provide an API on port 2375. We are binding that to the local host on 2376. If your manager is NOT an agent, you can simply bind it on 2375 by doing a “run -d -P swarm manage token://…”. In that case, you would NOT run the “swarm join” command on your manager. However, in our case we want all of the hosts to be agents, including the manager.

The last step is to query the cluster:
docker -H tcp://managerIP:2376 info

In our case, we use vm01:

Again, if your manager is NOT an agent, you would simply run:
“docker -H tcp://managerIP:2375 info” or even “docker -H tcp://managerip”

Don’t forget to start the manager on reboot, and each join on the agents on reboot.

35 Thoughts on “Docker Swarm Tutorial and Examples

  1. Arslan Qadeer on August 3, 2016 at 3:27 pm said:

    Hi,
    The link you mentioned above and this one https://hub.docker.com/r/progrium/consul/ was quite helpful for me. Swarm Mode probably does not allow you to use your own plugins.

    Thanks 🙂

  2. Arslan Qadeer on August 3, 2016 at 2:39 pm said:

    Hi Ventz,

    Thanks for your immediate help. Actually what i am trying to do is:
    Node1 = Swarm Manager+Consul
    Node2 = Swarm Manager+Consul
    Node3= Swarm Manager+Consul
    Node4=Swarm Agent
    Node5=Swarm Agent

    Apparently i can install Consul and Manager on same nodes as mentioned above, but the question is while installing Agent, to which consul node i would have to join it? Either i’ll have to join it with one of any consul nodes or i would have to join it with all three separately? also how this will work in case of failure if one consul/manager node goes down?

    Thanks

  3. Arslan Qadeer on August 3, 2016 at 12:50 pm said:

    Hi,
    I am new on docker swarm, i want to deploy it in HA environment using consul. What i have in my mind is to install consul on one node and then register one swarm manager node with consul, then creating two more replicas. In this case if one manager goes down one of the other two will become leader.
    I got this idea from https://docs.docker.com/swarm/multi-manager-setup/
    But what if the consul server goes down?

    • Ventz on August 3, 2016 at 1:36 pm said:

      Hi — The official (https://docs.docker.com/swarm/install-manual/) answer suggests: “To increase its reliability, you can create a high-availability cluster using a trio of consul nodes using the link mentioned at the end of this page.”.

      I think that still limits you to the same server/blade.

      With the new Docker 1.12+, you can actually create services on top of swarm and distribute them, have them HA (+VIP).
      The trick here is that Docker WILL start even if it can’t reach the consul service, and then you can start the consul service via docker — and docker will “re-connect” to the consul service.

      See this: https://medium.com/on-docker/toward-a-production-ready-docker-swarm-cluster-with-consul-9ecd36533bb8#.2iew6cdse

      Specifically, part that mentions “But how can this work? I see an apparent reference to Consul but it hasn’t yet been installed!” … “Fortunately, the Docker daemon will happily retry to connect to the cluster-store aka the KVS every so often”

    • Ventz on August 3, 2016 at 1:58 pm said:

      I wanted to add one more comment — with the new Docker 1.12 and up, you don’t even need consul. Service Discovery is actually built in.

      I’ll try to post an updated article, but many things have been improved and fixed. For example, SSL/Certs is now fully backed in and automated. Also, it’s much easier to start and join a swarm. And then create services on top of the swarm.

      In Summary: upgrade to Docker 1.12, and you will get a distributed service discovery backend automatically. Nodes can auto elect to be “managers”, so you really can scale out to a production.

  4. Hi Ventz,

    Actually, I just use “swarm …” directly, not “docker swarm …”. Because in my company, it relateds to complicated network rules, such as firewall, proxy, etc. So I think now use “swarm” commands directly is enough for me to learm Swarm.

    Thanks again for your help and time!

    • Ventz on June 7, 2016 at 10:32 pm said:

      Great to hear – glad you found the main problem, and even better that you have a working solution!

      It’s absolutely great technology.

  5. Hi Ventz,

    Inspired by your post and help, I successfully deployed Swarm cluster on one host: http://nanxiao.me/en/deploy-docker-swarm-cluster-on-one-host/, and hope the experience can help others.

  6. Pingback: Deploy Docker Swarm cluster on one host | Nan Xiao's Blog

  7. ibolcina on June 7, 2016 at 8:04 am said:

    Hi.

    After having set up cluster on vm01 (10.0.0.101), vm02 (10.0.0.102), vm03 (10.0.0.103), vm04 (10.0.0.104) as you described in your tutorial, what would be best way to set up etcd or consul as service discovery and overlay network…

    I would need to install etcd0 on vm01, then do “docker network create swarm_network”, but what about -cluster-store and –cluster-advertise?
    Documentation is somewhat confusing…

    br,ivan

  8. Ventz on June 6, 2016 at 11:05 am said:

    This eliminates local connections.

    So what’s left is proxy and the IP docker binds on which could be the issue. There might be issues with the version of docker and such (try the latest/NOT the one that comes pre-packaged with RHEL).

    Sorry that I can’t be of more help remotely. Try spinning up a Digital Ocean VM (just for pennies per hour) and try everything there just to eliminate your environment.

  9. Ahmed Shendy on June 6, 2016 at 6:51 am said:

    Very nice, thank you 🙂

  10. Hi Ventz,

    Thanks very much for your time! Change IP to 127.0.0.1 doesn’t take effect:

    # docker run -d swarm join –advertise 127.0.0.1:2375 token://bdfcbdf8e82b8ad484f027362ec010e5
    93f1adee51fda62eaff3648fb05e04263731a5a2e535f24bf677cb72f5fa2062
    Usage of loopback devices is strongly discouraged for production use. Either use --storage-opt dm.thinpooldev or use --storage-opt dm.no_warn_on_loop_devices=true to suppress this warning.

    # docker run -d -p 2376:2375 swarm manage token://bdfcbdf8e82b8ad484f027
    09d0c4c882012e0e50b1ef48f8c9ba7e54476d614cb2cb2d56105e5b7d48621b
    Usage of loopback devices is strongly discouraged for production use. Either use --storage-opt dm.thinpooldev or use --storage-opt dm.no_warn_on_loop_devices=true to suppress this warning.

    # docker -H tcp://127.0.0.1:2376 info
    Containers: 0
    Images: 0
    Server Version: swarm/1.2.3
    Role: primary
    Strategy: spread
    Filters: health, port, containerslots, dependency, affinity, constraint
    Nodes: 0
    Kernel Version: 4.5.0
    Operating System: linux
    CPUs: 0
    Total Memory: 0 B
    Name: 09d0c4c88201

    Anyway, thanks very much again! I think maybe I should drill down network related stuff.

  11. “ifconfig” output (Not omit the public IP):

    # ifconfig
    docker0: flags=4163 mtu 1500
    inet 172.17.0.1 netmask 255.255.0.0 broadcast 0.0.0.0
    inet6 fe80::42:c9ff:fe90:e973 prefixlen 64 scopeid 0x20
    ether 02:42:c9:90:e9:73 txqueuelen 0 (Ethernet)
    RX packets 4031 bytes 297710 (290.7 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 3377 bytes 669830 (654.1 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    eno1: flags=4163 mtu 1500
    inet a.b.c.d netmask 255.255.252.0 broadcast a.b.c.255
    inet6 fe80::26be:5ff:fe18:a746 prefixlen 64 scopeid 0x20
    ether 24:be:05:18:a7:46 txqueuelen 1000 (Ethernet)
    RX packets 62032 bytes 13066440 (12.4 MiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 5465 bytes 1557769 (1.4 MiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
    device interrupt 20 memory 0xf7f00000-f7f20000

    lo: flags=73 mtu 65536
    inet 127.0.0.1 netmask 255.0.0.0
    inet6 ::1 prefixlen 128 scopeid 0x10
    loop txqueuelen 1 (Local Loopback)
    RX packets 4 bytes 340 (340.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 4 bytes 340 (340.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    veth3745ef9: flags=4163 mtu 1500
    inet6 fe80::e40d:29ff:fefb:9c38 prefixlen 64 scopeid 0x20
    ether e6:0d:29:fb:9c:38 txqueuelen 0 (Ethernet)
    RX packets 2071 bytes 170837 (166.8 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 1947 bytes 218288 (213.1 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    veth8aeee6b: flags=4163 mtu 1500
    inet6 fe80::fcdb:feff:fe4f:e245 prefixlen 64 scopeid 0x20
    ether fe:db:fe:4f:e2:45 txqueuelen 0 (Ethernet)
    RX packets 1666 bytes 155935 (152.2 KiB)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 1328 bytes 394112 (384.8 KiB)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    virbr0: flags=4099 mtu 1500
    inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255
    ether 52:54:00:49:c4:e3 txqueuelen 1000 (Ethernet)
    RX packets 0 bytes 0 (0.0 B)
    RX errors 0 dropped 0 overruns 0 frame 0
    TX packets 0 bytes 0 (0.0 B)
    TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

    • Ventz on June 6, 2016 at 5:12 am said:

      My guess is that due to the firewall/NAT out IP, your public IP is not actually reachable on that port.

      You could re-bind the docker daemon (-H tcp://a.b.c.d:2375) to listen on 127.0.0.1 and try it everywhere with 127.0.0.1.

      But it really comes down to:
      * The proxy
      * The NAT OUT IP vs what the machine thinks it is
      * Making sure that docker is “setup” against one IP, and that IP is being used everywhere. I noticed a mixture causes the problems.
      * Try the same steps on a RHEL7 VM in your own env (laptop/etc) to eliminate the proxy/double check if the IP is really needed in the /etc/hosts.

  12. Ventz on June 6, 2016 at 4:59 am said:

    I think the proxy is causing you problems, but also that error message about the “loopback device” means that docker is NOT picking up your public IP for some reason. (it is not listening or binding to the public IP). Double check the “docker0” bridge to make sure it’s there.

  13. Yes, the IP of “docker run –rm swarm list …” is the public IP, and matches the IP address of node.

  14. The output of “docker run –rm swarm list token://bdfcbdf8e82b8ad484f027362ec010e5”:

    # docker run –rm swarm list token://bdfcbdf8e82b8ad484f027362ec010e5
    Usage of loopback devices is strongly discouraged for production use. Either use --storage-opt dm.thinpooldev or use --storage-opt dm.no_warn_on_loop_devices=true to suppress this warning.
    a.b.c.d:2375

    My IP is a public IP, but since it is in a company, so I need to configure proxy. Do u mean I need to add this IP in /etc/hosts? Thanks!

  15. # cat /etc/hosts
    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    The detailed commands and outpus:

    # docker run –rm swarm create
    Usage of loopback devices is strongly discouraged for production use. Either use --storage-opt dm.thinpooldev or use --storage-opt dm.no_warn_on_loop_devices=true to suppress this warning.
    bdfcbdf8e82b8ad484f027362ec010e5
    # docker run -d swarm join –advertise a.b.c.d:2375 token://bdfcbdf8e82b8ad484f027362ec010e5
    fa65ef051d09551b178bda91d3523e322ed678e7edf6b525662f2b44a610ff55
    Usage of loopback devices is strongly discouraged for production use. Either use --storage-opt dm.thinpooldev or use --storage-opt dm.no_warn_on_loop_devices=true to suppress this warning.
    # docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    fa65ef051d09 swarm “/swarm join –advert” 10 seconds ago Up 8 seconds 2375/tcp backstabbing_joliot
    # docker run -d -p 2376:2375 swarm manage token://bdfcbdf8e82b8ad484f027362ec010e5
    9fb4efb2bbef9662ffc50550ae5631201cc97561fe13993e4bfa70e074351e3c
    Usage of loopback devices is strongly discouraged for production use. Either use --storage-opt dm.thinpooldev or use --storage-opt dm.no_warn_on_loop_devices=true to suppress this warning.
    # docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    9fb4efb2bbef swarm “/swarm manage token:” 5 seconds ago Up 4 seconds 0.0.0.0:2376->2375/tcp determined_mcclintock
    fa65ef051d09 swarm “/swarm join –advert” About a minute ago Up About a minute 2375/tcp backstabbing_joliot
    # docker -H tcp://a.b.c.d:2376 info
    Containers: 0
    Images: 0
    Server Version: swarm/1.2.3
    Role: primary
    Strategy: spread
    Filters: health, port, containerslots, dependency, affinity, constraint
    Nodes: 0
    Kernel Version: 4.5.0
    Operating System: linux
    CPUs: 0
    Total Memory: 0 B

    Also the docker info:

    # systemctl status docker
    â—Ź docker.service – Docker Application Container Engine
    Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
    Active: active (running) since Mon 2016-06-06 17:32:32 CST; 1h 28min ago
    Docs: http://docs.docker.com
    Main PID: 5819 (sh)
    Memory: 46.2M
    CGroup: /system.slice/docker.service
    ├─5819 /bin/sh -c /usr/bin/docker-current daemon $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIO…
    ├─5821 /usr/bin/docker-current daemon –selinux-enabled -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock –add-registry r…
    ├─5822 /usr/bin/forward-journald -tag docker
    └─6295 docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 2376 -container-ip 172.17.0.3 -container-port 2375
    ……

    • Ventz on June 6, 2016 at 4:16 am said:

      What do you see with:

      “docker run –rm swarm list token://bdfcbdf8e82b8ad484f027362ec010e5” (note: change token if you are reset it)

      This should show you the IP of the node in the cluster. Does that IP match the one of the node?

      About /etc/hosts — I would enter your IP in there. Is the node directly on a public IP or is it on a private (rfc1918) IP?

  16. Hi Ventz,

    Firstly, thanks very much for your comments!

    My OS is RHEL 7, but running “netstat -an | grep 2375” can output the following:

    # netstat -an | grep 2375
    tcp6 0 0 :::2375 :::* LISTEN

    So I can’t know why, thanks!

    • Ventz on June 6, 2016 at 4:04 am said:

      Can you post here: “cat /etc/hosts”
      and also, the full commands you are entering on the command line to launch both containers?

      Try the local IP listed in /etc/hosts both for the “–advertise” ip, and later for the “docker -H tcp://…” ip for the info.

  17. Firstly, thanks very much for your simple but very clear tutorial!

    I only have one machine now, so try to deploy agent and manager in the same machine and using the same IP (a.b.c.d). And output of “docker ps” command likes this:

    # docker ps
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    9fb4efb2bbef swarm “/swarm manage token:” 18 minutes ago Up 18 minutes 0.0.0.0:2376->2375/tcp determined_mcclintock
    fa65ef051d09 swarm “/swarm join –advert” 19 minutes ago Up 19 minutes 2375/tcp backstabbing_joliot

    Abd the output of “docker -H tcp://a.b.c.d:2376 info” likes this:
    # docker -H tcp://a.b.c.d:2376 info
    Containers: 0
    Images: 0
    Server Version: swarm/1.2.3
    Role: primary
    Strategy: spread
    Filters: health, port, containerslots, dependency, affinity, constraint
    Nodes: 0
    Kernel Version: 4.5.0
    Operating System: linux
    CPUs: 0
    Total Memory: 0 B
    Name: 9fb4efb2bbef

    I am a little confused about the output of “docker -H tcp://16.187.249.34:2376 info”, because there is one agent (also the manager), why the resources are all 0?

    Thanks very much!

    • Ventz on June 6, 2016 at 3:47 am said:

      If I had to guess, it would be one of two things:

      1.) Your agent is simply not communicating with the manager.
      The way to debug this would be to make sure you are using the same subnets (ex: if all local/same system, join with –advertise nonRoutedIP:2375 … or opt in for the 127.0.0.1:2375). Given it’s on the same system though, this is pretty unlikely.

      or

      2.) It’s possible that your docker options are not open – DOCKER_OPTS=”-H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock …”
      One thing I noticed while testing this on a new Ubuntu 16.04, is that docker is now fully controlled by systemd.

      Aha! So that would do it — and this is my guess at what’s happening.

      First check this:
      netstat -an | grep 2375

      I bet you won’t see anything. If so, try this:

      Make sure you have:

      Then:

      Now check again: netstat -an | grep 2375

      Here is an example of launching all of this on one system:

      Got a token of: 3793a96dacb8179ab7dc1f258b7bba90

      Now here’s the key part — when I used 127.0.0.1 for “localDockerIp” — got the same thing as you.

      Use the same IP as /etc/hosts — the private, local, non-routable IP (unless your node has a direct public IP assigned – in which case use the public).
      Anyway, this worked! Good luck.

      ps: removed your public IP so it’s not in the comments.

  18. ibolcina on June 2, 2016 at 1:01 pm said:

    Would you recommend weave for an complex app, with 3 layers(web,app, services). There would be around 10 different services. I would like them to be on same network, and visible to each other by name (front, app, db,…). Front would be exposed to web. There would be 3 physical boxes and I would like to have control where to put each service.
    And what do you think about vswitch for this?

    • Ventz on June 2, 2016 at 9:16 pm said:

      I would *absolutely* recommend weave for this. I think it was designed for exactly these types of scenarios, and much much more complicated ones.
      One suggestion would be to separate out your layers (front, app, db, etc.) via subnets. It’s extremely easy to propagate different subnets over your mesh, and you can easily separate them out — so that you can apply the correct ACLs/firewall rules. Weave actually creates vSwitches underneath all this. If you care about location, you could potentially even skip swarm and just manually specify where you want to launch them. Alternatively, you could set policies in swarm and launch based on those (ex: front-ends only run on server01, and DBs only run on server02, etc… no matter what the available resources/load are). You have many different options on how to do this.

  19. ibolcina on June 2, 2016 at 5:18 am said:

    Thanks for tutorial.
    One question. I created “runon1” container on vm01, and “runon2” container on vm02.
    “runon1” gets ip 172.17.0.4, and “runon2” gets ip 172.17.0.3.

    I enter into “runon1” and can ping “runon2” by IP, but not by name. Any ideas?

    • Ventz on June 2, 2016 at 10:01 am said:

      Just note that two different servers (vm##) have their own bridge (docker0) which is not linked. So for example, 172.17.0.4 (on vm01) and 172.17.0.3 (on vm02) are technically not connected. However, if you can ping them — I am assuming you are using a common backend bridge. With that assumption, you have a few options:

      * The easiest is to simply populate the /etc/hosts file. This is a lame “hacky” way to do this, but it works and it was “the official solution” for a long time.

      or

      * You can use full DNS or even something like mDNS (avahi or zeroconf on linux)

      or

      * You can use a service discovery system like consul (https://www.consul.io/) or etcd (https://coreos.com/etcd/) to do a distribution of the IPs < - > hostnames.

      or

      * The last suggestion – there are solutions that already utilize a service discovery + DNS combination. My favorite is “weave” from weaveworks. It does a whole lot more — for example, you can create your common overlay/mesh network, but it also allows you to ping all containers by the name you have given them. If your goal is multi-host docker containers, this is probably the most ideal/easiest system right now. Eventually, docker is going to publish better network drivers (https://gist.github.com/nerdalert/c0363c15d20986633fda)

      It really depends on what you are doing manually and what you already have, but generally, I think using weave is simply a fantastic solution. It takes care of a lot of things (common bridges, AND multi provider/datacenter overlays, DNS, etc.)

  20. Ivan on May 13, 2016 at 3:26 am said:

    I want to move to docker, but having a mix of vagrant + saltstack currently on production I need to be sure what every components before to move to swarm. I need high availability also on data volumes (which means flocker), load balancing, etc but I have not spare time to dig into bad/incomplete explained tutorials. And if you read the documentation I have the feeling you need to understand every bit and bytes before start working on, but maybe that’s because I am the kind of person that I do not put anything on production that I don’t fully understand.

  21. Iván on May 12, 2016 at 12:02 pm said:

    Congrats!!!
    It is the simpler, easiest, clear, and straightforward tutorial ever seen about swarm.

    Typically you can find tutorials that try to explain things unneeded in the very beginning that makes the newbie to be confused after start following some pages.
    It is very very frustrating to have the impression you’re losing something on the explanation because they obviate part of the things they try to explain.

    imho, your approach is way better, making it work simple and then, with the basic idea running, deep dive on it at your pace.

    Thank you!

    • Ventz on May 12, 2016 at 1:08 pm said:

      Thanks. I am trying to be better about the way I write some of these “tutorial” like blog posts. I realized that from a personal perspective — I would want quick blocks that I can skim, and “configs/code”. If something interest you, you can always read up on it later/dig deeper. Anyway, glad it’s helpful. I’ve been meaning to add the using swarm with consul, and secure between docker and swarm, and among swarm clients.

Leave a Reply to Ventz Cancel reply

Your email address will not be published. Required fields are marked *

>> NOTE: Please use <code>...</code> to post code/configs in your comment.

Post Navigation