Showing posts with label kernel. Show all posts
Showing posts with label kernel. Show all posts

Thursday, 5 June 2014

Live Kernel Patching Video Demo

Earlier today I posted my summary of testing the live kernel patching solutions so far as part of my work for HP's Advanced Technology Group.  I have created a video demo of one of these technologies, kpatch.


This demo goes over the basics of how easy kpatch is to use.  The toolset for now will only work with Fedora 20  for now but judging by the documentation Ubuntu support is coming very soon.

Live Kernel Patching Testing

At the end of March I gave a summary of some of the work I was doing for HP's Advanced Technology Group on evaluating kernel patching solutions.  Since then I have been trying out each one whilst also working on other projects and this blog post is to show my progress so far.

After my first blog post there were a few comments as to why such a thing would be required.  I'll be the first to admit that when Ksplice was still around I felt it better to have a redundant setup which you could perform rolling reboots on.  That is still true in many cases.  But with OpenStack based public clouds a reboot means that customers sitting on top of that hardware will have their virtual machines shut down or suspended.  This is not ideal when with big iron machine you can have 20 or more customers to one server.  A good live kernel patching solution means that in such a scenario the worst case would be a kernel pause for a few seconds.

In all cases my initial testing was to create a patch which would modify the output of /proc/meminfo. This is very simple and doesn't go as far as testing what would happen patching a busy function under load but at least gets the basics out of the way.

Ksplice


As previously discussed, Ksplice was really the only toolkit to do live kernel patching in Linux for a long time.  Unfortunately since the Oracle acquisition there have been no Open Source updates to the Ksplice codebase.  In my research I found a Git repository for kSplice which had recent updates to it. I tried using the with Fedora 20 but my guess is either Fedora does something unique with the kernels or it is just too outdated for modern 3.x Linux kernels.  My attempts to compile the patches hit several problems inside the toolset which could not easily be resolved.

kGraft


After Ksplice I moved on to kGraft.  This takes on a similar approach to Ksplice but uses much newer techniques.  Of all the solutions I believe the technology in kGraft is the most advanced.  It is the only solution that is capable of patching a kernel without pausing it first.  To use this solution there is a requirement for now to use SUSE's kGraft kernel tree.  I tested this on an OpenSUSE installation so I could be close to the systems it was designed on.  Unfortunately whilst the internals are quite well documented, the toolset is not.  After several attempts I could not find a way to get this solution to work end-to-end.  I do think this solution has massive potential, especially if it gets added to the mainline kernel.  I hope in the future the documentation is improved and I can revisit this.

kpatch


Finally I tried Red Hat's kpatch.  This solution was released around the same time as kGraft and likewise is in a development stage with warnings that it probably shouldn't be used in production yet. The technology again is similar to Ksplice with more modern approaches but what really impressed me is the ease-of-use.  The kpatch source does not require compiling as part of the kernel, it can be compiled separately and is just a case of 'make && make install'.  The toolset will take a standard unified diff file and using this will compile the required parts of the kernel to create the patch module.  Whilst this sounds complex it is a single command to do this.  Applying the patch to the kernel is equally as easy.  Running a single command will do all the required steps and apply the patch.

Conclusion


If public clouds with shared hardware and going to continue to grow this should be a technology that is invested in.  I was very quickly drawn to Red Hat's solution because they have put a high priority on how the user would interact with the toolset.  I'm sure many Devops engineers will appreciate that.  What I would love to see is kGraft's technology combined with kpatch's toolset.  This will certainly be an interesting field to watch as both projects mature.

Monday, 31 March 2014

Live Kernel Patching Solutions

A large public cloud is quite a complex thing to manage, even with toolsets such as Openstack to deploy and manage it.  You often have many customers with compute instances on a single server so security is a high priority.  This also introduces an interesting problem, when you need to deploy a Linux Kernel security patch to the host (sometimes thousands of hosts) how do you do it with the least disruption?

One of the currently most accepted methods is to suspend the instances on a box, deploy the patch, reboot the host and resume the instances.  Whilst this works in many cases you have customers who have had their instances down for X minutes (even if it is planned and the customer is notified it can be inconvenient) and instance kernels that suddenly think "my clock just jumped, WTF just happened?".  This in-turn can cause problems for long running clients talking to servers as well as any time-sensitive applications.  Long running clients will often think they are still connected to the server when they really are not (there are ways around this with TCP Keepalive).

There are a couple of old solutions to this and a couple of new ones and as part of my work for HP's Advanced Technology Group I will be taking a deep dive into them in the coming weeks.  For now here is a quick summary of what is around:

kexec


This is probably the oldest technology on the list.  It doesn't quite fall under the "Live Kernel Patching" umbrella but is close enough that it warrants a mention.  It works by basically ejecting the current kernel and userspace and starting a new kernel.  It effectively reboots the machine without a POST and BIOS/EFI initialisation.  Today this only really shaves a few seconds off the boot time and can leave hardware in an inconsistent state.  Booting a machine using the BIOS/EFI sets the hardware up in an initialised state, with kexec the hardware could be in the middle of reading/writing data at the point the new kernel is loaded causing all sorts of issues.

Whilst this solution is very interesting, I personally would not recommend using it as during a mass deployment you are likely to see failures.  More information on kexec can be found on the Wikipedia entry.

Ksplice


Ksplice was really the first toolset to implement Live Kernel Patching.  It was created by several MIT students who subsequently spun-off a company from it which supplied patches on a subscription model.  In 2011 this company was acquired by Oracle and since then there have been no more official Open Source releases of the technology.  Github trees which are updated to work with current kernels still exist.

The toolset works by taking a kernel patch and converting it into a module which will apply the changes to functions to the kernel without requiring a reboot.  It also supports changes to the data structures with some additional developer code.  It does temporarily pause the kernel whilst the patch is being applied (a very quick process), but this is far better than rebooting and should mean that the instances do not need suspending.

kpatch


Both Red Hat and SUSE realised that a more modern Open Source solution is needed for the problem, and whilst SUSE announce their solution (kGraft) first, Red Hat's kpatch solution was the first to show actual code to the solution.

Red Hat's kpatch solution gives you a toolset which creates a binary diff of a kernel object file before and after a patch has been applied.  It then turns this into a kernel module which can be applied to any machine with the same kernel (along with kpatch's loaded module).  Like Ksplice, it does need to pause the kernel whilst patching the functions.  It also as-yet does not support changes to data structures.

It is still very early days for this solution but development is has been progressing rapidly.  I believe the intention is to create a toolset that will take a unified diff file and turn that into a kpatch module automatically.

For more information on this solution take a look at their blog post and Github repository.

kGraft


SUSE Labs announced kGraft earlier this year but only very recently produced code to show their solution.

From the documentation I've seen so far their solution appears to work in a similar way to Red Hat's but they have the unique feature which gives the ability for the patch to be applied to the kernel without pausing it.  Both the old and the replacement functions can exist at the same time, old executions will finish using the old function and new executions will use the new function.

This solution seems to have gone down the route of bundling their code on top of a Linux kernel git tree which means it took an entire night for me to download the git history.  I'm looking forward to digging through the code of this solution to see how it works.

The git tree for this can be found in the kgraft kernel repository (make sure to checkout the origin/kgraft branch after you have cloned it) and SUSE's site on the technology can be found here.

Summary


All three of the above solutions are very interesting.  Combined with a deployment technology such as Salt or Ansible it could mean the end to maintenance downtime for cloud compute instances.  As soon as I have done more research on the technologies I will be writing more details and hopefully even contributing where possible.