If you have not used Swarm, skim the non-service-discovery tutorial to get a feel for how it works:
https://blog.vpetkov.net/2015/12/07/docker-swarm-tutorial-and-examples. It’s very easy, and it should give you an idea of how it works within a couple of minutes.
Using Swarm with pre-generated static tokens is useful, but there are many benefits to using a service discovery backend. For example, you can utilize network overlays and have common “bridges” that span multiple hosts (https://docs.docker.com/engine/userguide/networking/get-started-overlay/). It also provides service registration and discovery for the Docker containers launched into the Swarm. Now lets get into how to use it with service discovery – which is what you would use in a scaled out environment/production.
Again, assuming you have a bunch of servers running docker:
vm01 (10.0.0.101), vm02 (10.0.0.102), vm03 (10.0.0.103), vm04 (10.0.0.104)
Normally, you can do “docker ps” on each host for example:
ssh vm01 ‘docker ps’
ssh vm04 ‘docker ps’
If you enable the API for remote bind on each host you can manage them from a central place:
docker -H tcp://vm01:2375 ps
docker -H tcp://vm04:2375 ps
(note: port is optional for default)
But if you want to use all of these docker engines as a cluster, you need Swarm.
Here we will go one step further and use a common service discovery backend (Consul).
Docker Swarm Tutorial with Consul and How-To/Examples
A swarm contains only two components: agents (the workers in the cluster) and manager(s).
We are also going to add consul (the service discovery backend).
First, grab the swarm and the consul images on each docker host:
docker pull swarm
docker pull progrium/consul
Then, make sure the API is enabled for remote bind on each host (NOTE: see bellow if using Systemd OS):
# (on Ubuntu) cat /etc/default/docker:
DOCKER_OPTS="-H tcp://0.0.0.0:2375 \
-H unix:///var/run/docker.sock --dns 126.96.36.199 --dns 188.8.131.52 ..."
# And then restart:
# -- OR --
# IF USING a Systemd OS -- Ubuntu 16.04/CentOS7 and up, use this instead:
# (on Ubuntu) cat /etc/systemd/system/docker.service
ExecStart=/usr/bin/docker daemon -H tcp://0.0.0.0:2375 \
-H unix:///var/run/docker.sock --dns 184.108.40.206 --dns 220.127.116.11
# And then reload/restart
systemctl restart docker
Don’t panic here! It looks complicated, but it’s actually incredibly easy.
The consulIp in “–cluster-store=consul://consulIp:8500” is the docker host that will run the consul service (much like the swarm manager). Since you will map the port to the docker host itself, that’s simply the IP of the docker host (in our case – vm01)
The managerIp in “–cluster-advertise=managerIp:2376” is the docker host that will run the swarm manager service. Since you will map the port to the docker host itself, that’s simply the IP of the docker host (in our case – vm01).
To get everything started, go to whatever docker host you pick as the manager (in our case vm01), and create the consul server:
docker run -d -p "8500:8500" -h "consul" progrium/consul -server -bootstrap
Now, on *each* AGENT (including the manager if you want to use it as a worker) run:
docker run -d swarm join –addr 18.104.22.168:2375 consul://consulIp/swarm:8500/swarm
docker run -d swarm join --advertise 10.0.0.101:2375 consul://10.0.0.101:8500/swarm
docker run -d swarm join --advertise 10.0.0.102:2375 consul://10.0.0.101:8500/swarm
docker run -d swarm join --advertise 10.0.0.103:2375 consul://10.0.0.101:8500/swarm
docker run -d swarm join --advertise 10.0.0.104:2375 consul://10.0.0.101:8500/swarm
You would do this for *each* agent and in our case vm01 is also an agent.
At last, you need to run a manager service on your chosen manager host (in our case, vm01) to manage the swarm:
docker run -d -p 2376:2375 swarm manage consul://consulIp:8500/swarm
docker run -d -p 2376:2375 swarm manage consul://10.0.0.101:8500/swarm
The idea is that the manager wants to provide an API on port 2375. We are binding that to the local host on 2376. If your manager is NOT an agent, you can simply bind it on 2375 by doing a “run -d -P swarm manage consul://…”. In that case, you would NOT run the “swarm join” command on your manager. However, in our case we want all of the hosts to be agents, including the manager.
The last step is to query the cluster:
docker -H tcp://managerIP:2376 info
In our case, we use vm01:
docker -H tcp://vm01:2376 info
docker -H tcp://vm01:2376 ps
Again, if your manager is NOT an agent, you would simply run:
“docker -H tcp://managerIP:2375 info” or even “docker -H tcp://managerip”
Don’t forget to start the manager on reboot, and each join on the agents on reboot.
hi sir ,
just ignore my previuos comment
already succesfull ran this command
ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2375 \
-H unix:///var/run/docker.sock –dns 22.214.171.124 –dns 126.96.36.199
after that i got this error on the status docker
level=warning msg=”Registering as \”188.8.131.52:2376\” in discovery failed
[root@sw ~]# systemctl status docker
● docker.service – Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Sen 2017-07-24 07:47:23 EDT; 1h 12min ago
Main PID: 31360 (dockerd)
├─31360 /usr/bin/dockerd -H tcp://0.0.0.0:2375 –cluster-store=consul://184.108.40.206:8500 –cluster-advertise=220.127.116.11:2376 -H unix:///var/run/docker.sock –dns 8.8….
├─31373 docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock –metrics-interval=0 –start-timeout 2m –state-dir /var/run/docker/libcontainerd/cont…
├─31593 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 8500 -container-ip 172.17.0.2 -container-port 8500
├─31599 docker-containerd-shim 410b9ccd8458c2782215f8f0ad08294b8cbf3c3b8184cc57c1abde6a19300752 /var/run/docker/libcontainerd/410b9ccd8458c2782215f8f0ad08294b8cbf3c3b8184cc57c…
├─31733 docker-containerd-shim 692b70da1a03dd453350a2ec9b8eda68ab88fffec5174f3574c774cfe8ba7a0c /var/run/docker/libcontainerd/692b70da1a03dd453350a2ec9b8eda68ab88fffec5174f357…
├─31890 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 2376 -container-ip 172.17.0.4 -container-port 2375
└─31896 docker-containerd-shim f25ba779f0aaad2d01bd96a1c3f3da2f16160ce5b9af6d5956416a1b8ea63741 /var/run/docker/libcontainerd/f25ba779f0aaad2d01bd96a1c3f3da2f16160ce5b9af6d595…
Jul 24 07:47:23 : Started Docker Application Container Engine.
Jul 24 07:47:42 : time=”2017-07-24T07:47:42.917802299-04:00″ level=error msg=”discovery error: Get http://18.104.22.168:8500/v1/kv/docker/…on refused”
Jul 24 07:47:42 dockerd: time=”2017-07-24T07:47:42.917898573-04:00″ level=error msg=”discovery error: Put http://22.214.171.124:8500/v1/kv/docker/…on refused”
Jul 24 07:47:42 dockerd: time=”2017-07-24T07:47:42.918050458-04:00″ level=error msg=”discovery error: Unexpected watch error”
Jul 24 07:48:05 : time=”2017-07-24T07:48:05.924421036-04:00″ level=error msg=”discovery error: Get http://126.96.36.199:8500/v1/kv/docker/…te to host”
Jul 24 07:48:08 : time=”2017-07-24T07:48:08.930381847-04:00″ level=error msg=”discovery error: Put http://188.8.131.52:8500/v1/kv/docker/…te to host”
Jul 24 07:48:11 : time=”2017-07-24T07:48:11.936205421-04:00″ level=error msg=”discovery error: Unexpected watch error”
Jul 24 07:48:27 : time=”2017-07-24T07:48:27.975333177-04:00″ level=warning msg=”Registering as \”184.108.40.206:2376\” in discovery failed:…n sessions”
Jul 24 07:48:42 : time=”2017-07-24T07:48:42.796836356-04:00″ level=warning msg=”2017/07/24 07:48:42 [WARN] memberlist: Binding to public add…ryption!\n”
Jul 24 07:48:42 : time=”2017-07-24T07:48:42.796966912-04:00″ level=info msg=”2017/07/24 07:48:42 [INFO] serf: EventMemberJoin: ….172.104\n”
Hint: Some lines were ellipsized, use -l to show in full.
No problem. Glad you figured it out.
By the way – you should look at the new Docker “Swarm Mode” — it fixes a lot of the design issues that the “older style” cluster introduces (with consul and such)
i just following the step but i got the error when i have tried run the docker afer i insert this
ExecStart=/usr/bin/docker daemon -H tcp://0.0.0.0:2375 \
-H unix:///var/run/docker.sock –dns 220.127.116.11 –dns 18.104.22.168
# here’s the error
Active: failed (Result: start-limit) since Sen 2017-07-24 05:57:50 EDT; 11s ago
Process: 29096 ExecStart=/usr/bin/dockerd -H tcp://0.0.0.0:2375 –cluster-store=consul://22.214.171.124:8500 –cluster-advertise=126.96.36.199:2376 -H unix:///var/run/docker.sock –dns 188.8.131.52 –dns 184.108.40.206 (code=exited, status=1/FAILURE)
Main PID: 29096 (code=exited, status=1/FAILURE)
Jul 24 05:57:49 sw.docker.mbiz.co.id systemd: Failed to start Docker Application Container Engine.
Jul 24 05:57:49 sw.docker.mbiz.co.id systemd: Unit docker.service entered failed state.
Jul 24 05:57:49 sw.docker.mbiz.co.id systemd: docker.service failed.
Jul 24 05:57:50 sw.docker.mbiz.co.id systemd: docker.service holdoff time over, scheduling restart.
Jul 24 05:57:50 sw.docker.mbiz.co.id systemd: start request repeated too quickly for docker.service
Jul 24 05:57:50 sw.docker.mbiz.co.id systemd: Failed to start Docker Application Container Engine.
Jul 24 05:57:50 sw.docker.mbiz.co.id systemd: Unit docker.service entered failed state.
Jul 24 05:57:50 sw.docker.mbiz.co.id systemd: docker.service failed.
here’s the location of the config :
Yes, sure, I got that running in no time, it is really easy. But I thought it wasn
t ready for production use yet. I got a lot of problems in the old swarm and the overlay network. Lot of containers went stale and there were no way to disconnect them from the network,compose sometimes just wouldnt find the network, it was a real mess.
I’ll give it a shot with swarm mode and leaveConsul just for general service discovery and health checks.
Hi, thanks for the reply,
I could not really see any difference from your post, except when using docker hub`s token mode. The docs are not very clear on how to create the swarm using an external discovery backend like consul.
The new one is completely different and incredibly simple.
It has discovery (ex: consul) built in, and you no longer have to worry about scaling discovery services or having redundancy — it “just works”.
Each “engine” can be a manager/master or a regular node. It can be promoted and demoted. It also comes with a bunch of other nice features like auto load balancing, services, etc.
The only downside I’ve noticed so far is that it really doesn’t make sense to have 2 “nodes” as a manager and non-manager. You really want an odd # of nodes (to deal with contention), and the minimum number is 3.
Here’s an article that compares the two types of docker swarm.
Hi, thanks for the post.
Maybe it is too late for asking but running docker -H :2376 info shows detailed info about the swarm manager. I found the output strange cause I’ve noticed that it has the following info:
Is Manager: false
If I run -H :2376 node ls, i get : Error response from daemon: 404 page not found.
Also, automatic load balancing does not work neither regular swarm commands like node ls or service create/ls/tasks.
What am I missing?
This is for the old “swarm engine”. It was extremely confusing and I kept seeing people wanting “simple tutorials”.
Now that Docker’s 1.12 “swarm mode” is out, you should use that. Here is their official tutorial: https://docs.docker.com/engine/swarm/swarm-tutorial/
I agree. Weave is great. I just hope it will be accepted into community.
Found something interesting which is relevant to what we were talking about:
Look at this section: “Network-scoped alias”
This looks like the exact solution you want.
The limitation (it doesn’t list, but I am assuming) is multiple hosts and container across those. However, between the overlay networking, and the new macvlans which came out in Docker 1.12, you can fix that now easily too — either an overlay on top of swarm over multiple hosts, or having multiple hosts just plug into the same vlan with macvlan and 802.1q.
You would have to first expose the DNS ports in Consul (-p 8600:53/udp) and then point your docker container’s DNS server to the consul server’s IP. (better yet, add “–dns-search service.consul” to your DOCKER_OPS” — since it seems it has recursion enabled and it’s using google’s servers)
At this point each docker container you start will register it’s “–name” into consul. This will be accessible via:
$name.service.consul. If you add that to the DNS search domain, you can simply use the $name at that point.
I haven’t tried this myself yet because for me Weave (mentioned this earlier) is simply the perfect solution.
You might need a service like “registrator” (another docker container), which monitors the unix socket and then registers container names/ips/data into consul. Ex:
Give it a try and let me know.
I still think using something like Weave is the way to go for the time being. It solves the cross-datacenter/cross-clouds problem, and it gives you a bunch of freebies on the process.
Hi.Thanks for tutorial.
I’am trying to ping machines by name
I create network
docker -H tcp://10.0.1.101:2376 network create my_swarm_net
then I start “u1”, “u2″” containers
docker -H tcp://10.0.1.101:2376 run -ti –name u1 –net my_swarm_net ubuntu bash
docker -H tcp://10.0.1.101:2376 run -ti –name u2 –net my_swarm_net ubuntu bash
they get addresses like “10.0.0.2” ,”10.0.0.3″
Host names are correctly resolved,
but ping (from u2->u1) returns
root@2ef627851fd7:/# ping u1
PING u1 (10.0.0.2) 56(84) bytes of data.
From 2ef627851fd7 (10.0.0.3) icmp_seq=1 Destination Host Unreachable
From 2ef627851fd7 (10.0.0.3) icmp_seq=2 Destination Host Unreachable