LinuxJedi's /dev/null: mysql

Showing posts with label mysql. Show all posts

Monday, 22 September 2014

libAttachSQL now in Beta!

After three successful alpha releases I have today pushed up the last major feature prior to the 1.0 release of libAttachSQL. Therefore v0.4.0 has been released today and is the first beta release. I have been working on a few other things for HP's Advanced Technology Group hence why this is a little more delayed than I would like. But I have made sure there is a lot of good things in this release.

The big new feature in this release is Server-side Prepared Statement support. You now have the ability to prepare, execute and fetch the results for a prepared statement and the API for this is non-blocking as before.

Other things in this release are:

Compiler fixes for GCC 4.9, RedHat/CentOS 6.x and Python 3.x (Python is used for building the documentation)
Bug fixes for SSL and buffer issues
Much faster building on multi-core systems

In the next couple of weeks I'll write some blog posts on how to use this API as well as the rest of libAttachSQL.

Monday, 15 September 2014

Speaking about libAttachSQL at Percona Live London

As many of you know I'm actively developing libAttachSQL and am rapidly heading towards the first beta release. For those who don't, libAttachSQL is a lightweight C connector for MySQL servers with a non-blocking API. I am developing it as part of my day job for HP's Advanced Technology Group. It was in-part born out of my frustration when dealing with MySQL and eventlet in Python back when I was working on various Openstack projects. But there are many reasons why this is a good thing for C/C++ applications as well.

What you may not know is I will be giving a talk about libAttachSQL, the technology behind it and the decisions we made to get here at Percona Live London. The event is on the 3rd and 4th of November at the Millennium Gloucester Conference Centre. I highly recommend attending if you wish to find out more about libAttachSQL or any of the new things going on in the MySQL world.

As for the project itself, I'm currently working on the prepared statement code which I hope to have ready in the next few days. 0.4.0 will certainly be a big release in terms of changes. There has been feedback from some big companies which is awesome to hear and I have fixed a few problems they have found for 0.4.0. Hopefully you will be hearing more about that in the future.

For anyone there I'll be in London from the 2nd to the 5th of November and am happy to meet with anyone and chat about the work we are doing.

Monday, 1 September 2014

New version of libAttachSQL, C connector for MySQL released!

It has been just over 2 weeks since the last libAttachSQL version was released. I had a great vacation in the middle which for once meant that I didn't do any work for the week I was away :)

For those who don't know about it, libAttachSQL is a lightweight, non-blocking C connector for MySQL servers. It is Apache 2.0 licensed so plays well with both Open Source and Commercially licensed applications. I have been developing it for 2 months now as part of my work for HP's Advanced Technology Group. It is hosted on GitHub and uses many freely available tools (such as Travis CI) to host and test various parts of the project.

Once again I thank everyone for the feedback I have received. You all make it even more awesome to be working on this :)

So, on to the new version 0.3.0 alpha release. This time round we have been focusing on zlib compression and SSL support. Both of these features have been added and neither impacts the non-blocking aspect of the library. The SSL part in particular was quite new to me, I've coded SSL into applications many times in the past, but I've never done it in a non-blocking way before. It posed some interesting challenges but it was fun and appears to be working great now.

The biggest changes in this release are:

Fixes to the test cases and improvements to the CI used
Documentation improvements
Many minor bug fixes
Protocol compression (zlib) support
SSL encryption (OpenSSL) support
32bit compiling works

For more information see the Version History section of the docs.

On to the next release which should complete the biggest pre-release features. From there we can head towards our first GA release.

If you have any questions, feedback, etc... please feel free to leave comments, email me or open a GitHub issue.

Friday, 15 August 2014

libAttachSQL 0.2.0 alpha released!

Hot on the heals of last week's release we have released version 0.2.0 alpha of libAttachSQL. For those who have missed my previous blog posts, libAttachSQL is a lightweight C connector for MySQL servers I'm creating with HP's Advanced Technology Group. It has an Apache 2 license so is good for linking with most Open Source licenses as well as commercial software projects.

Changes in this release:

Added support for query result buffering
Passive connect on first query is now asynchronous
Improved memory handling
Many documentation changes, including API examples
Many other smaller fixes

For more information see the libAttachSQL documentation and the release itself can be found on the libAttachSQL website.

We have had some great feedback so far. Thanks to everyone who has contacted me since the first release. As always if you have any questions feel free to contact me or file an issue on GitHub.

Sunday, 10 August 2014

libAttachSQL 0.1.0 alpha released!

As I briefly mentioned in my previous post, I have been working on a new project for HP's Advanced Technology Group called libAttachSQL.

libAttachSQL is a lightweight C connector for MySQL servers. It is Apache 2 licensed (and therefore compatible with many open source licenses as well as commercial use) and has a new asynchronous API. With the new API you send a command which returns immediately and you poll until the library tells you there are results ready, this is very useful for applications that have many things going on that you do not want held up by waiting for the MySQL server to process a query. In later posts I will give usage examples of this.

I am a great believer in release early/often so on Friday, 5 weeks after I started writing code (and docs), I have released the first alpha version of this connector. The source of this release can be downloaded here. For now this is a source-only release just to give a taste of the project so far. At some point before GA binary packages will be released too. Documentation for the library can be found on Read The Docs.

What it can currently do:

Compile in CLang and GCC on Linux and Mac
Cross-compile for Windows using MinGW64 (in Fedora only)
Connect to MySQL servers using TCP or Unix socket file
Send basic MySQL queries and retrieve results
Using an API similar to prepared statements it can automatically escape and convert data for your queries
Not a lot else

As the project progresses we will be adding many more features such as prepared statement support.

This project is completely open, using many available free services as described in my previous blog post. We welcome people to come and kick the tyres and contribute in as small or large way as possible. This can be simply filing a bug or feature request, contributing docs or code, etc... One thing we could really use right now is someone with Debian/Ubuntu expertise to help us create the Debian package scripts (I'm not an expert at these and am struggling to make it work). There is a GitHub issue open for this.

If you have any questions about the library feel free to contact me, comment on this blog post, open a GitHub issue or come chat on the #libAttachSQL channel on Freenode.

Friday, 8 August 2014

How cloud hosted services are helping open source

One big project I'm working on for HP's Advanced Technology Group right now is an Apache 2.0 licensed C connector for MySQL servers called libAttachSQL. The whole process, not just the code itself, is helping us learn about new and current techniques in Open Source development. Whilst I will be writing many posts about libAttachSQL in the future, today's post is about the free hosted services we are using around it.

GitHub

Almost all previous Open Source projects I have worked on in the past have been hosted on Canonical's Launchpad platform. Over the last couple of years there has been a shift to using GitHub and almost everything I have worked on at HP has been hosted there. Now there are many services that hook into GitHub so this seemed like the perfect opportunity to try some of them out.

The libAttachSQL project has its own organisation in GitHub and a couple of trees under this. The service is fantastic and has grown a lot over the years in features and reliability. The only thing I don't quite agree with is that they prefer a custom type of Markdown documentation over other formats. Some reStructuredText support is there but it isn't as good as I would hope yet. This is a really minor issue though and not something they should be knocked down on.

GitHub Pages

GitHub pages is a relatively new service created by GitHub. Simply create a tree with a specific name, push some static content, and you are done! There is also an easy method to get domains pointed to it so we have a GitHub page as the site for libattachsql.org.

Read The Docs

Every Open Source project I have worked on from Drizzle onwards has had its documentation in reStructuredText format which compiles into HTML, PDF and many other formats using a Python based tool called Sphinx (not to be confused with the search server). In my opinion it is more flexible than Markdown format, especially when documenting APIs.

libAttachSQL's documentation was again written in reStructuredText format and is automatically compiled into HTML and PDF documentation using the free service Read The Docs. This is hooked into GitHub so on a new push/merge Read The Docs will automatically generate a new version of the documentation. We have pointed a subdomain to the Read The Docs output so that it can be easily accessed, docs.libattachsql.org.

I am extremely pleased with this service, not only is it free for Open Source projects but it makes documentation even more aesthetically pleasing than the basic Sphinx templates do.

Travis CI

Every source code project needs Continuous Integration. There are many solutions to this, one of the most popular being Jenkins. As with the RST documentation format every project I have worked on from Drizzle onwards uses Jenkins to test every branch before and after merging. I could have used Jenkins for this project but my goal is not to own the hosting of anything. So, for libAttachSQL I setup Travis CI. This is a hosted service that is free for Open Source projects and has a paid-for variant for private projects.

Our Travis setup will test compiling in CLang and GCC in Linux (Travis uses Ubuntu 12.04), running a test suite in each. Every virtual test host comes with a MySQL server already running for you to use in your tests and it was very simple to set this up. We also get the Travis tests to build the documentation with nitpick mode and warnings as errors so any minor documentation problems are picked-up early. All this is done with a very simple YAML script (although ours has got a little more complex with adding support for Coverity).

At a later date I want our builds to also run Valgrind checks and on the provided OS X platform, but I will work on getting those running at a later date.

Travis is a fantastic service and a breeze to setup and use. The interface shouldn't be too unfamiliar if you have used Jenkins before. My wish is that it supported more platforms. I would really love a Fedora based builder, a more up-to-date Ubuntu and possibly Windows builders. Although they do have OS X builders which is fantastic.

Coverity Scan

Coverity Scan is a static code analyser which is free to Open Source projects hosted on GitHub, it also hooks in nicely with Travis CI with Travis providing the analysis data from the code and builds to break down on the site. This was the most complex of all of these services to setup but has given some fantastic results so far. It has found 13 potential bugs in my code that CLang's lint and Valgrind didn't find. This is really impressive, for starters there are incredibly strict flags set for building the project from git, also there was only one false positive. Unfortunately there is a quota limit for Open Source projects so we only run this occasionally rather than every merge.

Conclusion

We have managed to have all of the services that we would need to setup and manage setup for us completely for free and no hosting for us to manage. And these are all awesome services and most were very quick to setup. I thank all of the companies providing these services, it has easily shaved a week off my time setting up machines to host our project and many more hours managing the services.

Over the next couple of weeks I will be talking a lot more about the libAttachSQL project, so look out for those posts.

Saturday, 22 February 2014

Is Drizzle dead?

Yesterday someone opened a Launchpad question asking "is Drizzle dead?". I have answered that question on Launchpad but wanted to blog about it to give a bit of background context.

As I am sure most of the people who read this know, Drizzle is an Open Source database which was originally forked from the alpha version of MySQL 6.0. At the time it was an extremely radical approach to Open Source development, many features were stripped out and re-written as plugins to turn it into a micro-kernel style architecture. Every merge request was automatically throughly tested on several platforms for regressions, memory leaks and even positive/negative performance changes.

In fact Drizzle has influenced many Open Source projects today. Openstack's Continuous Integration was born from the advanced testing we did on Drizzle. MariaDB's Java connector was originally based on Drizzle's Java connector. Even MySQL itself picked up a few things from it.

Development of Drizzle started off as a "What if?" R&D project at Sun Microsystem spearheaded by Brian Aker. Once Oracle acquired Sun Microsystem a new corporate sponsor was found for Drizzle, Rackspace.

Rackspace hired all the core developers (and that is the point where I joined) and development progressed through to the first GA release of Drizzle. Unfortunately Rackspace decided to no longer sponsor the development of Drizzle and we had to disband. I've heard many reasons for this decision, I don't want to reflect on it, I just want to thank Rackspace for that time.

Where are we now? Of the core team whilst I was at Rackspace:

Brian and I work for HP's Advanced Technology Group.
David Shrewsbury, Monty Taylor and Patrick Crews all work for various part of HP Cloud.
Stewart Smith works for IBM's Linux Technology Centre.

So, back to the core question: "Is Drizzle dead?". The core team all work long hours in our respective jobs to make some awesome Open Source products and in what little spare time we have we all work on many Open Source projects. Unfortunately splitting our time to work on Drizzle is hard, so the pace has dramatically slowed. But it isn't dead. We have been part of Google Summer of Code, we still get commits in from all over the place and Drizzle is still part of the SPI.

Having said this, Drizzle no longer has a corporate sponsor. Whilst Drizzle can live and go on without one, it is unlikely to thrive without one.

Another thing that is frequently asked is: "What happened to the docs and wiki?". Drizzle being a cloud databases had all of its development and public documentation servers hosted in the cloud. Unfortunately the kill switch was accidentally hit prematurely on the cloud account used. This means we not only lost the servers but the storage space being used for backups. This also affected other Open Source projects such as Gearman. The old wiki is dead, we cannot recover that content. The docs were auto-generated from the reStructuredText documentation in the source. It was just automatically compiled and rendered for easy reading.

What I would personally like to see is the docs going to Read The Docs automatically (there is an attempt to do this, but it is currently failing to build) and the main site moved to DokuWiki similar to the new Gearman site.

As for Drizzle itself... It was in my opinion pretty much exactly what an Open Source project should be and indeed was developing into what I think an Open Source database should be. It just needs a little sponsorship and a core team that are paid to develop it and mentor others who wish to contribute. Given that it was designed from the ground-up to be a multi-tenant in-cloud database (perfect for a DBaaS) I suspect that could still happen, especially now projects like Docker are emerging for it to sit on.

Thursday, 13 February 2014

Caveats with Eventlet

The Stackforge Libra project as with most Openstack based projects is written in Python. As anyone who has used Python before probably knows, Python has something called a GIL (Global Interpreter Lock). The GIL basically causes Python to only execute one thread at a time, context switching between the threads. This means you can't really use threads for performance reasons in Python.

One solution to get a little more performance is to use Eventlet. Eventlet is a library which uses what is called "Green Threads" and hacks on top of the networking libraries to give a mutli-threaded like feel to an application. As part of this blogging series for HP's Advanced Technology Group I'll write about some of the things I found out the hard way about Eventlet so hopefully you don't hit them.

What are Green Threads?

Green Threads are basically a way of doing multi-tasking on a single real thread. They use what is called "Cooperative Yielding" to allow each other to run rather than being explicitly scheduled. This has the advantage of removing the need for locks in many cases and making asynchronous IO easier. But they come with caveats which can hurt if you don't know about them.

Threading library patched

One of the first things you typically do with eventlet is "Monkey Patch" standard Python library functions so that the are compatible with cooperative yielding. For example you want the sleep() function to yield rather than hanging all the green threads up until finished.

The threading library is one of the libraries that is monkey patched and the behaviour suddenly becomes slightly different. When you try to spawn a thread control will not return back to the main thread until the child thread has finished execution. So your loop that tries to spawn X threads will suddenly only spawn 1 thread and not spawn the next until that thread has finished. It is recommended you use Eventlet's green thread calls instead (which will actually work as expected).

Application hangs

Cooperative yielding relies on the library functions being able to yield. Which means that if you use functions that do not understand this the yielding will not happen and all your green threads (including the main thread) will hang waiting. Any unpatched system call (such as executing some C/C++ functions) falls into this category.

A common place you can see this is with the MySQLdb library which is a wrapper for the MySQL C connector (libmysqlclient). If you execute some complex query that will take some time, all green threads will wait. If your MySQL connection hangs for any reason... well, you are stuck. I recommend using one of the native Python MySQL connectors instead.

Another place I have seen this is with any library that relies on epoll. Python-gearman is an example of this. It seems that Eventlet only patches the select() calls, so anything that uses epoll.poll() is actually blocking with Eventlet.

In summary there are cases where Eventlet can be useful. But be careful where you are using it or things can grind to a halt really quickly.

Thursday, 5 May 2011

My contribution to MySQL 5.6

[caption id="attachment_209" align="alignright" width="240" caption="Photo by Stéfan under a CC by NC SA 2.0 license"]

[/caption]

If you have been reading Planet MySQL over April you will have seen many blog posts on the new features in the MySQL 5.6 (currently a development release). I developed several patches that are in 5.6 including the 'Slave_last_heartbeat' status variable to show the time of the last replication heartbeat received. One of the cool new features I developed which I am most proud of is the option to remotely backup your binary logs without a MySQL slave:

Remote Binlog Back-up

Enhances operational efficiency by using the replication channel to create real-time back-ups from the binary log.

By adding a raw flag, the binlog is written out to remote back-up servers, without having a MySQL database instance translating it into SQL statements, and without the DBA needing SSH access to each master server.

Here is a quick story as to why I developed it and how it can help people.

Back then I was a MySQL Support Engineer and a customer asked if it was possible to retrieve binary logs from a remote server in real time without needing a MySQL slave using the blackhole engine. The customer had many servers that they wanted to backup into just a few backup servers. Unfortunately at the time there was no such tool, but within 24 hours I had hacked a patch into mysqlbinlog to provide this. The patch had bugs and missed a lot of features back then but the proof of concept was good enough to show that a real patch could be made.

The new 'raw mode' option to mysqlbinlog can connect to a remote MySQL server, retrieve the binary logs and can continue retrieving them until an error occurs. So it is possible to have a backup of your binary logs up to the second that your primary data centre bursts into flames.

You can read up more about how to use this in the MySQL manual.

Wednesday, 4 May 2011

libeatmydata - Feed me, Seymour!

Whilst supporting customers at SkySQL I often have to load gigabytes of SQL data into MySQL servers to run tests. This process can be slow especially for InnoDB because in a standard dump file every insert is a transaction and every transaction has to be synchronised to disk for crash safety. The thing is, most of the time I don't care if the machine I'm using crashes whilst I'm loading this data into the server.

There are of course many ways around this, such as editing the SQL files and wrapping transactions around batches of inserts and editing the configuration files to disable all the syncing involved. But I don't want one configuration to load in data and then another to play with the data, so this is where libeatmydata comes in.

libeatmydata is a preloaded library that disables disk syncing functionality from doing just that. The OS will decide when to sync the data to disk. This is great for loading in an SQL dump file, taking single insert dumps on default configuration down from hours to minutes. But you wouldn't want to do it during the production running of your server because power failure would certainly lose you some data.

So, how do you use libeatmydata with MySQL? Simple, this is the command to start it:

LD_PRELOAD=/usr/lib/libeatmydata.so mysqld

Then you can load in your dump file, shutdown mysqld safely and start it up again without libeatmydata.

A great application I could see for this is scripting the startup of slaves, feeding a dump file into the server with libeatmydata and then restarting without this once the slave is ready.

UPDATE

Kristian Nielsen asked in the comments on SkySQL's blog how much faster it is, so I have run a quick benchmark to find out. In this test I am using a 218MB test file of single row inserts I had generated for an old support issue. I am also using a clean MySQL 5.1.51 installation (cleaned on each run) on my i7 based laptop:

Vanilla MySQL 5.1.51

real    166m19.504s
user    0m23.891s
sys     0m6.084s

MySQL 5.1.51 with --sync-binlog=0 --innodb_flush_log_at_trx_commit=0

real    5m33.578s
user    0m11.096s
sys     0m3.215s

MySQL 5.1.51 with libeatmydata

real    3m14.123s
user    0m10.932s
sys     0m3.108s