Thursday, 26 June 2014

CoreOS Review

I have spent a few days now playing with CoreOS and helping other members of HP's Advanced Technology Group get it running on their setups.

Today I thought I would write about the good and the bad about CoreOS so far.  Many components are in an alpha or beta state so things may change over the coming months.  Also as a disclaimer, views in this post are my own and not necessarily those of HP.

Installation


As stated in my blog post yesterday, I have been using CoreOS on my Macbook Pro using Vagrant and VirtualBox.  This made it extremely easy to setup the CoreOS cluster on my Mac.  I made a minor mistake to start with, and that is to not configure the unique URL required for Etcd correctly.  A mistake a colleague of mine also made on his first try so it likely to be a common one to make.

I initially had VirtualBox configured to use a Mac formatted USB drive I have hooked-up.  Vagrant tried to create my CoreOS cluster there and during the setup the Ruby in Vagrant kept spinning on some disk reading routine and not completing the setup.  Debug output didn't help find the cause so I switched to using the internal SSD instead.

CoreOS


CoreOS itself appears to be derived from Chrome OS which itself is a fork of Gentoo.  It is incredibly minimal, there isn't even a package manager that comes with it.  But that is the whole point.  It is designed so that Docker Containers are run on top of it providing the application support.  Almost completely isolating the underlying OS from the applications running on top of it.  This also provides excellent isolation between say MySQL and Apache for a LAMP stack.

It is a clean, fast OS using many modern concepts such as systemd and journald.  Some of these are only in the bleeding-edge distributions at the moment so many people may not be familiar with using them.  Luckily one of my machines is running Fedora 20 so I've had a play with these technologies before.

Etcd


CoreOS provides a clustered key/value store system called 'etcd'.  The name of this confused many people I spoke to before we tried it.  We all assumed it was a clustered file store for the /etc/ path on CoreOS.  We were wrong, although that is maybe the direction it will eventually take.  It actually uses a REST based interface to communicate with it.

Etcd has been pretty much created as a new project from the ground-up by the CoreOS team.  The project is written in Go and can be found on Github.  Creating a reliable clustered key/value store is hard, really hard.  There are so many edge cases that can cause horrific problems.  I cannot understand why the CoreOS team decided to roll their own instead of using one of the many that have been well-tested.

Under the hood the nodes communicate to each other using what appears to be JSON (REST) for internal admin commands and Google Protobufs over HTTP for the Raft Consensus Algorithm library used.  Whilst I commend them for using Protobufs in a few places, HTTP and JSON are both bad ideas for what they are trying to achieve.  JSON will cause massive headaches for protocol upgrades/downgrades and HTTP really wasn't designed for this purpose.

At the moment this appears to be designed more for very small scale installations instead of hundreds to thousands of instances.  Hopefully at some point it will gain its own protocol based on Protobufs or similar and have code to work with the many edge cases of weird and wonderful machine and network configurations.

Fleet


Fleet is another service written in Go and created by the CoreOS team.  It is still a very new project aimed at being a scheduler for a CoreOS cluster.

To use fleet you basically create a systemd configuration file with an optional extra section to tell Fleet what CoreOS instance types it can run on and what it would conflict with.  Fleet communicates with Etcd and via. some handshaking figures out a CoreOS instance to run the service on.  A daemon on that instance handles the rest.  The general idea is you use this to have a systemd file to manage docker instances, there is also a kind-of hack so that it will notify/re-schedule when something has failed using a separate systemd file per service.

Whilst it is quite simple in design it has many flaws and for me was the most disappointing part of CoreOS so far.  Fleet breaks, a lot.  I must have found half a dozen bugs in it in the last few days, mainly around it getting completely confused as to which service is running in which instance.

Also the way that configurations are expressed to Fleet are totally wrong in my opinion.  Say, for example, you want ten MySQL docker containers across your CoreOS cluster.  To express this in Fleet you need to create ten separate systemd files and send them up.  Even though those files are likely identical.

This is how it should work in my opinion:  You create a YAML file which specifies what a MySQL docker container is and what an Apache/PHP container is.  In this YAML you group these and call them a LAMP stack.  Then in the YAML file you specify that your CoreOS cluster needs five LAMP stacks, and maybe two load balancers.

Not only would my method scale a lot better but you can then start to have a front-end interface which would be able to accept customers.

Conclusion


CoreOS is very ambitions project that in some ways becomes the "hypervisor"/scheduler for a private Docker cloud.  It can easily sit on top of a cloud installation or on top of bare metal.  It requires a totally different way of thinking and I really think the ideas behind it are the way forward.  Unfortunately it is a little too early to be using it in anything more than a few machines in production, and even then it is more work to manage than it should be.

Wednesday, 25 June 2014

Test drive of CoreOS from a Mac

Yes, I am still the LinuxJedi, and whilst I have Linux machines in my office for day-to-day work the main desktop I use is Mac OS.

I have recently been giving CoreOS a spin for HP's Advanced Technology Group and so for this blog post I'm going to run through how to setup a basic CoreOS cluster on a Mac.  These instructions should actually be very similar to how you would do it in Linux too.

Prerequisites


There are a few things you need to install before you can get things going:
  1. VirtualBox for the CoreOS virtual machines to sit on
  2. Vagrant to spin-up and configure the virtual machines
  3. Git to grab the vagrant files for CoreOS
  4. Homebrew to install 'fleetctl' (also generally really useful on a developer machine)
You then need to install 'fleetctl' which will be used to spin up services on the cluster:

$ brew update
$ brew install fleetctl

Configuring


Once everything is installed the git repository for the vagrant bootstrap is needed:

$ git clone git://github.com/coreos/coreos-vagrant/
$ cd coreos-vagrant/
$ cp config.rb.sample config.rb
$ cp user-data.sample user-data

The last two commands here copy the sample configuration files to configuration files we will use for Vagrant.

Edit the config.rb file and look for the line:

#$num_instances=1

Uncomment this and set the number to '3'.  When I was testing I also enabled the 'beta' channel in this file instead of the default 'alpha' channel.  This is entirely up to personal preference and both will work with these instructions at time of writing.

With your web browser go to https://discovery.etcd.io/new.  This will give you a token URL.  This URL will be used to help synchronise Etcd which is a clustered key/value store in CoreOS.  In the user-data YAML file uncomment the 'discovery' line and set the URL to the one the link above gave you.  It is important that you generate a new URL for every cluster and every time you burn down / spin up  a cluster on your machine.  Otherwise Etcd will not behave as expected.

Spinning Up


At this point everything is configured to spin-up the CoreOS cluster.  From the 'coreos-vagrant' directory run the following:

$ vagrant up

This may take a few minutes as it downloads the CoreOS base image and creates the virtual machines required.  It will also configure Etcd and setup SSH keys so that you can communicate with the virtual machines.

You can SSH into any of these virtual machines using the following and substituting '01' with the CoreOS instance you wish to connect to:

$ vagrant ssh core-01 -- -A

Using Fleet


Once you have everything up and running you can use 'fleet' to send systemd configurations to the cluster.  Fleet will schedule which CoreOS machine the configuration file will run on.  Fleet internally uses Etcd to discover all the nodes of your CoreOS cluster.

First of all we need to tell 'fleetctl' on the Mac how to talk to the CoreOS cluster.  This requires two things, the IP/port and the SSH key:

$ export FLEETCTL_TUNNEL=127.0.0.1:2222
$ ssh-add ~/.vagrant.d/insecure_private_key

For this example I'm going to spin up three memcached docker instances inside CoreOS, I'll let Fleet figure out where to put them only telling it that two memcached instances cannot run on the same CoreOS machine.  To do this create three identical files, 'memcached.1.service', 'memcached.2.service' and 'memcached.3.service' as follows:

[Unit]
Description=Memcached server
After=docker.service

[Service]
ExecStart=/usr/bin/docker run -p 11211:11211 fedora/memcached

[X-Fleet]
X-Conflicts=memcached*

This is a very simple systemd based script which will tell Docker in CoreOS to get a container of a Fedora based memcached and run it.  The 'X-Conflicts' option tells Fleet that two memcached systemd configurations cannot run on the same CoreOS instance.

To run these we need to do:

$ fleetctl start memcached.1.service
$ fleetctl start memcached.2.service
$ fleetctl start memcached.3.service

This will take a while and show as started whilst it downloads the docker container and executes it.  Progress can be seen with the following commands:

$ fleetctl list-units
$ fleetctl journal memcached.1.service

Once complete your three CoreOS machines should all be listening on port 11211 to a Docker installed memcached.

Thursday, 5 June 2014

Live Kernel Patching Video Demo

Earlier today I posted my summary of testing the live kernel patching solutions so far as part of my work for HP's Advanced Technology Group.  I have created a video demo of one of these technologies, kpatch.


This demo goes over the basics of how easy kpatch is to use.  The toolset for now will only work with Fedora 20  for now but judging by the documentation Ubuntu support is coming very soon.

Live Kernel Patching Testing

At the end of March I gave a summary of some of the work I was doing for HP's Advanced Technology Group on evaluating kernel patching solutions.  Since then I have been trying out each one whilst also working on other projects and this blog post is to show my progress so far.

After my first blog post there were a few comments as to why such a thing would be required.  I'll be the first to admit that when Ksplice was still around I felt it better to have a redundant setup which you could perform rolling reboots on.  That is still true in many cases.  But with OpenStack based public clouds a reboot means that customers sitting on top of that hardware will have their virtual machines shut down or suspended.  This is not ideal when with big iron machine you can have 20 or more customers to one server.  A good live kernel patching solution means that in such a scenario the worst case would be a kernel pause for a few seconds.

In all cases my initial testing was to create a patch which would modify the output of /proc/meminfo. This is very simple and doesn't go as far as testing what would happen patching a busy function under load but at least gets the basics out of the way.

Ksplice


As previously discussed, Ksplice was really the only toolkit to do live kernel patching in Linux for a long time.  Unfortunately since the Oracle acquisition there have been no Open Source updates to the Ksplice codebase.  In my research I found a Git repository for kSplice which had recent updates to it. I tried using the with Fedora 20 but my guess is either Fedora does something unique with the kernels or it is just too outdated for modern 3.x Linux kernels.  My attempts to compile the patches hit several problems inside the toolset which could not easily be resolved.

kGraft


After Ksplice I moved on to kGraft.  This takes on a similar approach to Ksplice but uses much newer techniques.  Of all the solutions I believe the technology in kGraft is the most advanced.  It is the only solution that is capable of patching a kernel without pausing it first.  To use this solution there is a requirement for now to use SUSE's kGraft kernel tree.  I tested this on an OpenSUSE installation so I could be close to the systems it was designed on.  Unfortunately whilst the internals are quite well documented, the toolset is not.  After several attempts I could not find a way to get this solution to work end-to-end.  I do think this solution has massive potential, especially if it gets added to the mainline kernel.  I hope in the future the documentation is improved and I can revisit this.

kpatch


Finally I tried Red Hat's kpatch.  This solution was released around the same time as kGraft and likewise is in a development stage with warnings that it probably shouldn't be used in production yet. The technology again is similar to Ksplice with more modern approaches but what really impressed me is the ease-of-use.  The kpatch source does not require compiling as part of the kernel, it can be compiled separately and is just a case of 'make && make install'.  The toolset will take a standard unified diff file and using this will compile the required parts of the kernel to create the patch module.  Whilst this sounds complex it is a single command to do this.  Applying the patch to the kernel is equally as easy.  Running a single command will do all the required steps and apply the patch.

Conclusion


If public clouds with shared hardware and going to continue to grow this should be a technology that is invested in.  I was very quickly drawn to Red Hat's solution because they have put a high priority on how the user would interact with the toolset.  I'm sure many Devops engineers will appreciate that.  What I would love to see is kGraft's technology combined with kpatch's toolset.  This will certainly be an interesting field to watch as both projects mature.