LinuxJedi's /dev/null: 2011

Friday, 12 August 2011

Mydumper 0.5.1 released

After much feedback from the community who have been using mydumper I have created the first bugfix release of the 0.5 series of mydumper. Changes in this release are as follows:

Fix MySQL 5.0 compile issue

Make the metadata file visible (after muliple requests)

Add --no-lock option to mydumper

First --daemon snapshot is now at execution instead of the first timed interval

Fix CentOS 5.x compile issue (also affected Mac OSX)

Fix libmysqlclient search bug

Add cppcheck target

Fix errors flaged by cppcheck

Add option to turn off docs build

Add status output at end of CMake

To obtain this release you can download it from Launchpad.

Many thanks to everyone who has been trying it and giving feedback.

Saturday, 23 July 2011

About a week ago I appeared to have started an argument which has now spanned several blog post comments, 39 Google+ comments and two other blog posts. So the first thing I want to do is apologise to the MySQL@Oracle staff (and Sheeri). I was disagreeing on the use of 'The' whilst the real problem is my personal understanding the meaning of the word/name/trademark 'MySQL'. You guys are right and I am wrong. To correct this I'll refer to what I used to call the "MySQL Ecosystem" as the Open Database Community until someone has a cool name for it.

The arguments were so severe that last week I was even thinking about leaving the Open Database Community. Apart from Drizzle, mydumper and a few other things I don't contribute as much useful content I used to now anyway. I would need about another 20 hours in every day to do as much as I would like to do. But for now I will just stay more in the background just pushing out code to anyone who wants it.

In reply to Herik's post, please don't blame Monty Taylor, this was all my fault. Also I never intended for the argument to be hidden (or exist at all) on Google+. Someone at Oracle pointed out that it was wrong to make any of those comments public, so they will stay hidden for now.

Tuesday, 14 June 2011

mydumper, now with continuous backup mode!

It has happened to the best of us. Our MySQL servers are running nicely when the tea-lady in the data centre trips over a wire, knocks a cup hot of java down the UPS batteries and fries every server in the data centre. OK, so that doesn't happen very often, but there have been cases where data centre failures have physically damaged servers.

So, if this happens with your MySQL server, what do you do? You can setup new servers in a new data centre and hope you have an up-to-date backup. But unless you had remote slaves these backups probably won't be as current as they could be. When you get to the point where you have many servers keeping slaves for every one just for backups can be difficult to maintain. Many DBAs have different backup strategies to cover this potential problem.

This is where the new mydumper 0.5.0 comes in. It has a new 'daemon' mode which lets it run in the background. When in daemon mode it will take multi-threaded consistent snapshots on a regular basis (60 minutes by default) whilst also pretending to be a MySQL slave and continuously retrieving a local copy of binary logs. This can, in-theory, give an up-to-the-second backup of your server. Right up until the point where you need to fire the tea-lady. As an example you can use it in this mode as follows to make snapshots every 2 hours and constantly retrieve the binary logs:

mydumper --daemon --snapshot-interval=120 --logfile=mydumper.log --user=mydumper_user
--password=mydumper_password --host=mysql_server

A maximum of 2 snapshots are retained, one for the previous snapshot and one for the current one in case a failure happens during the current snapshot. A symbolic link inside the dump directory will always point to the last good snapshot. Each snapshot contains the binary log positions from that snapshot so at any given time the snapshot can be loaded in using myloader (a multi-threaded restoration tool bundled with mydumper) and then the log applied from that point using mysqlbinlog. In later versions I plan to have myloader handle the binary logs too.

Mydumper 0.5.0 is currently considered in an 'alpha' stage (0.2.3 is the latest stable version but this will only take single snapshots). This means there are probably bugs I haven't spotted yet and there is still a little bit of work to do (such as better myloader support for the daemon mode snapshots). You can download it from Launchpad here.

Tuesday, 31 May 2011

Mydumper 0.2.3 released!

Today marks the release of mydumper 0.2.3. Mydumper is a multi-threaded high-performance data dumper (and loader) for MySQL and Drizzle written in C. This is a minor bugfix release whilst I work on the upcoming 0.5 version. The end goal here will be that there is always a 'stable' and 'development' version. 0.2 will be the first stable version and will only have bug fixes. 0.5 will be the first 'development' version where the next planned set of new features will hit.

So, the changes since 0.2.2 are:

Drizzle support now fully works again

Fixes so mydumper compiles in FreeBSD (thanks to Kirill A. Korinskiy)

If you wish to try this version the source can be downloaded here.

Friday, 20 May 2011

Mydumper now with MyISAM consistent snapshots!

Mydumper 0.2.2 has been released today with a number of fixes and new features. The one that most people have been asking for is consistent snapshots for non-InnoDB tables (such as MyISAM). We have been able to achieve this without locking the database for the entire backup using the following method:

Flush tables with read lock (and start transaction with consistent snapshot on all threads)

Dump non-InnoDB

Start InnoDB dump

When non-InnoDB dump has finished (whilst InnoDB is dumping) unlock tables

Profit

I have also started work on Drizzle support this week. The Drizzle support is not entirely complete mostly down to Drizzle's handling of SHOW TABLE STATUS and I expect to have it fully working next week.

Here is the list of changes since 0.2.1:

Consistent snapshots for non-InnoDB (non-transactional) tables

Fix --binlogs breaking consistent snapshots

(very) Minor performance improvement to mydumper

Initial support for Drizzle

Add --verbose option

Fix multiple ';' at end of file

Fix myloader not closing files

Improve myloader error messages

Several fixes to documentation building

Make myloader tell mysqld to not binlog imports by default (new option --enable-binlog to log them)

Add --database to myloader for single database import to a different database

Change mydumper's --schemas to --no-schemas (--schemas is now default)

Add verbose messages (with --verbose=3)

Fix memory leaks in myloader

You can download mydumper 0.2.2 from Launchpad or by clicking here.

Many thanks to everyone who has been testing mydumper and giving feedback. Your comments and suggestions go a long way to improving mydumper.

Friday, 13 May 2011

Mydumper now with myloader!

It has only been a few days since the 0.2.0 release of mydumper but there have been some big changes since then. I will try and go over them all here.

Mydumper 0.2.1

Mydumper 0.2.1 has been released today, many thanks for all those who have been testing the trunk source, the feedback has gone a long way to making fixes and improvements to mydumper. You can download the source for it here.

Myloader

A few days ago Mark Callaghan asked about restoring mydumper backups. There is of course the great 'myimporter' tool written by Mikael Fridh, but I wanted something that could integrate into the mydumper source better. So, I have been busy hacking for the last few days and the big news in today's release is the addition of 'myloader'. Myloader is a high-performance, multi-threaded tool written in C designed to read mydumper backups and apply them in parallel.

There are still many features I wish to add to it, but is good for basic restorations.

New Website

Last night I knocked together a site for mydumper. There is not much in the way of content yet, but you can take a look at http://www.mydumper.org/

Mikael FridhMikael Fridh

Monday, 9 May 2011

MySQL Data Dumper 0.2.0 released!

A couple of years ago Domas Mituzas created a tool which could basically be thought of as basically a lightweight multi-threaded mysqldump. By this I mean it can retrieve data from multiple tables simultaneously and can even break a table down into parts for simultaneous retrieval. Sometime around 2010 I started hacking on mydumper too but stopped whilst working at Rackspace (Drizzle was way more than full-time for me).

Back when Domas first blogged about it he was managing to dump his sample data over 10x faster than mysqldump!

Since the 0.1 series Domas has fixed a lot of problems and since starting at SkySQL I have finished a lot of work that I started over a year ago.

Today sees the first release of the 0.2 series of mydumper, called 0.2.0. It has many changes over the last 0.1 release:

Better error handling

Many bug fixes

Documentation

Binary log dumping

Compression protocol support

Schema dumps

My personal favourite fetaure I added here is the binary log dumping. Mydumper can connect to a MySQL server and retrieve all the binary logs in parallel whilst also retrieving the table data.

Many more features are already in development including additional features on the binary log dumping but for those who want to try it now you can obtain the source from Launchpad here.

Friday, 6 May 2011

Viewing the MySQL dump import progress

A couple of years ago I wrote a patch for the MySQL command line client which shows the progress of a import as it happens (I also created a similar patch for mysqldump which later made it into Drizzle). I don't have the blog archives from back then but Harrison Fisk commented suggesting I use a utility called 'bar' instead.

The 'bar' utility actually is a lot better than the patch I wrote and I highly recommend it when you are importing a large dump file. To use it simply run:

shell> bar -if=data.sql | mysql

This will generate an output such as:

If you are using Ubuntu then it is a simple case of 'sudo apt-get install bar' to install it. Enjoy!

Thursday, 5 May 2011

My contribution to MySQL 5.6

[caption id="attachment_209" align="alignright" width="240" caption="Photo by Stéfan under a CC by NC SA 2.0 license"]

[/caption]

If you have been reading Planet MySQL over April you will have seen many blog posts on the new features in the MySQL 5.6 (currently a development release). I developed several patches that are in 5.6 including the 'Slave_last_heartbeat' status variable to show the time of the last replication heartbeat received. One of the cool new features I developed which I am most proud of is the option to remotely backup your binary logs without a MySQL slave:

Remote Binlog Back-up

Enhances operational efficiency by using the replication channel to create real-time back-ups from the binary log.

By adding a raw flag, the binlog is written out to remote back-up servers, without having a MySQL database instance translating it into SQL statements, and without the DBA needing SSH access to each master server.

Here is a quick story as to why I developed it and how it can help people.

Back then I was a MySQL Support Engineer and a customer asked if it was possible to retrieve binary logs from a remote server in real time without needing a MySQL slave using the blackhole engine. The customer had many servers that they wanted to backup into just a few backup servers. Unfortunately at the time there was no such tool, but within 24 hours I had hacked a patch into mysqlbinlog to provide this. The patch had bugs and missed a lot of features back then but the proof of concept was good enough to show that a real patch could be made.

The new 'raw mode' option to mysqlbinlog can connect to a remote MySQL server, retrieve the binary logs and can continue retrieving them until an error occurs. So it is possible to have a backup of your binary logs up to the second that your primary data centre bursts into flames.

You can read up more about how to use this in the MySQL manual.

Wednesday, 4 May 2011

libeatmydata - Feed me, Seymour!

Whilst supporting customers at SkySQL I often have to load gigabytes of SQL data into MySQL servers to run tests. This process can be slow especially for InnoDB because in a standard dump file every insert is a transaction and every transaction has to be synchronised to disk for crash safety. The thing is, most of the time I don't care if the machine I'm using crashes whilst I'm loading this data into the server.

There are of course many ways around this, such as editing the SQL files and wrapping transactions around batches of inserts and editing the configuration files to disable all the syncing involved. But I don't want one configuration to load in data and then another to play with the data, so this is where libeatmydata comes in.

libeatmydata is a preloaded library that disables disk syncing functionality from doing just that. The OS will decide when to sync the data to disk. This is great for loading in an SQL dump file, taking single insert dumps on default configuration down from hours to minutes. But you wouldn't want to do it during the production running of your server because power failure would certainly lose you some data.

So, how do you use libeatmydata with MySQL? Simple, this is the command to start it:

LD_PRELOAD=/usr/lib/libeatmydata.so mysqld

Then you can load in your dump file, shutdown mysqld safely and start it up again without libeatmydata.

A great application I could see for this is scripting the startup of slaves, feeding a dump file into the server with libeatmydata and then restarting without this once the slave is ready.

UPDATE

Kristian Nielsen asked in the comments on SkySQL's blog how much faster it is, so I have run a quick benchmark to find out. In this test I am using a 218MB test file of single row inserts I had generated for an old support issue. I am also using a clean MySQL 5.1.51 installation (cleaned on each run) on my i7 based laptop:

Vanilla MySQL 5.1.51

real    166m19.504s
user    0m23.891s
sys     0m6.084s

MySQL 5.1.51 with --sync-binlog=0 --innodb_flush_log_at_trx_commit=0

real    5m33.578s
user    0m11.096s
sys     0m3.215s

MySQL 5.1.51 with libeatmydata

real    3m14.123s
user    0m10.932s
sys     0m3.108s

Wednesday, 27 April 2011

SkySQL - The Return of the Jedi

The last few weeks have been particularly quiet from me on the blogging front. Behind the scenes things have been quite the opposite so here is a summary of things past, present and future.

Rackspace and Drizzle

If you have read my last 'Last Week in Drizzle' post you will know that Rackspace are no longer supporting Drizzle. They have done a fantastic job so far and have decided to pass the baton to other companies. As for the staff, they wished to redeploy us to other teams which is something I personally was not keen on. I would rather remain within the MySQL/Drizzle sphere which I would have no longer been able to do effectively inside Rackspace any more.

Drizzle itself will go on to do great things without Rackspace, there are a number of companies that announced support for Drizzle during the O'Reilly MySQL Conference and Expo and Google Summer of Code is still going ahead as planned.

MySQL Conference

For me personally it was the busiest conference I have ever attended, this is mostly down to the three talks I had to give on top of booth duty, meetings and Drizzle Developer Day. I had some fantastic feedback from people whilst there on many subjects such as Drizzle and the MySQL 5.1 Plugins Development book. It was great to meet up with old friends and make some new ones and I hope that the conference will continue for many years to come.

SkySQL

The day after returning from the conference I started my new role as Senior Sustaining Engineer at SkySQL (very jetlagged and in hindsight I should have given myself a day or two to recover!). In this role I not only go back into supporting customers but also developing tools around the MySQL/Drizzle sphere. I feel very honoured to be working with the team (many of whom I am working with for a second time), they have really done a great job of capturing the traditional MySQL spirit.

One of the first things I have been working on is a new version of mydumper, once this is ready I will create a separate blog post about it. I think it is a fantastic tool and hope that it will be able to help many users in the future.

Google Summer of Code

SkySQL have encouraged me to continue my work on Drizzle which I have also been doing. As part of this I am a mentor for Google Summer of Code, a student called Olaf van der Spek will be working on improving the libdrizzle client API under my guidance. Something I am very much looking forward to.

The Return of the Jedi

So, I am back in a support type role whilst also developing useful tools and patches to enhance the usability of MySQL, I will also be blogging more and getting involved in the community/ecosystem in other ways. This is very similar to what I was doing at Sun/Oracle but for a company designed from the ground up to be much better for the staff and customers. I am looking forward the the bright future of SkySQL.

Monday, 18 April 2011

Last Week in Drizzle

Welcome to this week's edition of Last Week in Drizzle. Unfortunately I could not write this at the Drizzle Developer Day because it is was much busier than I expected. So this one had to wait until I landed in the UK :)

O'Reilly MySQL Conference and Expo

Last week was the MySQL Conference which at this point should probably be called the MySQL & friends or the Open Database Conference. We had many talks, great exposure and some fantastic questions and feedback of ideas we had never thought of during the week. I urge anyone who wasn't there to watch Brian Aker's keynote on the State of Drizzle.

Drizzle Developer Day

On the Friday after the conference we had Drizzle Developer Day which contained people from every level, new users to some of the biggest names in MySQL development. A great many topics were discussed such as the catalogs work, replication and storage engines. There were also discussions governance and I hope there will be announcements on this in the near future.

Development Goes On!

All last week the Drizzle Developers at Rackspace were giving talks and meeting people at the conference so we didn't get a lot of time for actually writing code, but there have been many branches and merge requests thanks to the fantastic community around Drizzle. I still haven't caught up with all the work merged in within the last week!

Drizzle Support

A few companies have come out in the last week offering support services for Drizzle such as Blue Gecko and Percona which is fantastic to see. This is on top of companies such as SkySQL already providing support for Drizzle. There could also be announcements from more companies in the near future, so watch this space!

Rackspace and Drizzle

If you have watched the keynote from Brian Aker referenced above you will notice near the end he talks briefly about the developers getting hired by other companies. Rackspace have done a fantastic job in supporting and funding us right up through the GA but unfortunately can no longer go on providing resources for us. What does that mean for us? Most of the developers who work for Rackspace are moving to other companies, many of whom will continue to work on Drizzle. What does the mean for Drizzle? Not a whole lot really, development will continue as before. The great thing about Drizzle is the Rackspace developers were actually the minority, there are many other companies as well as community developers involved. The features that were originally planned will continue to be developed, Google Summer of Code will also go on as before.

My Next Adventure

My last day at Rackspace was last Friday, today I start my new adventure as Senior Sustaining Engineer at SkySQL. This means I will still get to work on Drizzle as well as providing fantastic support and development resources for MySQL and MariaDB.

Final Thoughts

I'd personally like to thank everyone for their fantastic feedback at the conference and developer day last week. It is great to hear that people think we are on the right path with the technology and many of the ideas and discussions that came up last week will help shape the future of Drizzle.

As always if you have any feedback or topics you would like me to cover, please let me know.

Friday, 8 April 2011

Last Week in Drizzle

Welcome to this week’s Last Week in Drizzle. This again will be a relatively short edition as the 2011 O'Reilly MySQL Conference and Expo is next week and I'm currently packing for it!

Drizzle in Real Time Data Visualization

Many of you will have seen the awesome real time data map of Mozilla's downloads on their glow site. One thing that got me really excited this week was work by Marcus Eriksson to do the same thing using Drizzle and it's RabbitMQ connector. The live demo of this has been hosted on a Rackspace cloud server and can be found here.

Percona's Contributions

It has been very encouraging this week to see staff at Percona submit merge requests. So far there have been several InnoDB branches committed which add cool features such as saving the buffer pool to disk (actually identifiers for the pool so it is really small) for restoring upon server start as well as performance improvements. Many thanks to the Percona guys for you hard work so far!

Drizzle Amazon AMI

Bluegecko have created a 32bit Drizzle AMI image using RPMs (which I believe was not an easy feat given how different it is to other OS's we compile for). Anyone wanting to try this should check out their blog post on it which gives details on how to access the AMI.

Google Summer of Code

We have had many submissions to Google Summer of Code this year and it is unfortunate that we cannot take everyone on via. GSoC. But there has been some awesome submissions this year, we are really impressed with the amount of effort people have put in their proposals. If you want to get a last-minute submission in you can do so via. the GSoC site, the deadline is 12:00 Pacific Time today.

Final Thoughts

That about wraps up the big events of the past week. If you are in Santa Clara, CA next week please come along to one of the many Drizzle talks at the 2011 O'Reilly MySQL Conference and Expo we will also have a booth in the Expo hall where you can come and chat to us.

Next week's edition will be coming to you live from the Drizzle Developer Day, Friday 15th April at the Hilton across the road from the Hyatt (where the conference is held). As always if you have any feedback or topics you would like me to cover, please let me know.

Friday, 1 April 2011

Last Week in Drizzle

Welcome to this week's Last Week in Drizzle. Today will be a relatively short edition due to the work everyone is doing preparing for the 2011 O'Reilly MySQL Conference and Expo and Google Summer of Code.

First Fremont Tarball

The first tarball of the Fremont development branch of Drizzle was created this week, following our tradition of releasing a tarball every two weeks. It includes many experimental things such as the libdrizzle-2.0 separation and the multiple master to single slave replication.

For those wanting the stable release we suggest sticking to the Elliott branch which our GA was cut from. New releases for this will be created much less frequently and will only include bug fixes.

Xtrabackup

Stewart Smith's work on integrating Xtrabackup into Drizzle has now been merged into Fremont. When installing it will create a binary called 'drizzlebackup.innobase'.

Multi-Master Replication

By this I mean the work going into replication of multiple masters to a single slave. Patrick Crews has written the first part of blog post covering the testing of this new feature.

Drizzle Migration

I have written an article for The H Online called 'A migrator's guide to Drizzle' which was published yesterday. For anyone migrating from MySQL to Drizzle this should act as a useful guide. I will also be giving a talk at the 2011 O'Reilly MySQL Conference and Expo entitled 'MySQL to Drizzle, stress free migration' which will be along similar lines.

FreeBSD

FreeBSD support has initially been dropped for Drizzle due to issues we were having with that platform. It may well be coming back soon with the aid of Greg Larkin from FreeBSD who has offered to help get things running again.

Final Thoughts

If you happen to be in the vicinity of Santa Clara, CA between the 11th and 14th April we have quite a few talks on Drizzle as well as a booth in the expo hall. It is a great opportunity to meet the team behind Drizzle.

As always if you have any feedback or topics you would like me to cover, please let me know.

Monday, 28 March 2011

Last Week in Drizzle

Welcome to this week's (slightly late) edition of Last Week in Drizzle. This week sees the kick-off of many new features for the next release of Drizzle codenamed 'Fremont' and the mailing list is a hive of activity around Google Summer of Code. I apologise for publishing a few days late this week and will try and stay on-track for future editions.

Fremont

In the tradition of Drizzle using Seattle road names in alphabetical order for codenames the next release of Drizzle is codenamed 'Fremont' (the current GA release is codenamed 'Elliott'). Monty Taylor has outlined the merge process going forward as can be seen in his mailing list post.

Google Summer of Code

We have been accepted for Google Summer of Code 2011 and are getting a lot of interest from potential applicants. If you are interested in working on the Drizzle project as part of GSoC we have the following recommended instructions:

Check out our wiki page on potential projects

Email the mailing list with an introduction about yourself and join the Freenode #drizzle IRC channel to chat to us

Look at our low-hanging-fruit tasks and try to take one or two on. This gets you used to the code and launchpad processes as well as gives us an insight into your abilities

Xtrabackup

Stewart Smith has been working hard on integrating Percona's Xtrabackup with Drizzle. Xtrabackup is an online backup tool for InnoDB much like MySQL's Enterprise Backup. This is nearly ready for merging into Drizzle and Stewart has written a great blog post on the subject on his blog.

Catalogs

Stewart has been a real busy guy this last week, another project he has been working on is getting catalogs support working with more than one catalog. For those not familiar with catalogs they are a way of totally isolating one user's databases from another, similar to having multiple installations of Drizzle in one box but all running from one daemon. In the GA release a lot of the framework already existed for catalogs and everything in it runs from a catalog called 'local'. More information on the progress Stewart has made can be found on his blog post.

Libdrizzle 2.0

In Fremont we are working towards Libdrizzle 2.0. This is a C++ version of Libdrizzle with a C compatible API. Eventually it will contain new features such as native sharding (we are still working on filling out a potential features list for it). For now in the Drizzle trunk you can see libdrizzle has been moved to libdrizzle-1.0 and a new libdrizzle-2.0 directory exists for the new work.

Multi-Master Replication

David Shrewsbury has been working on multi-master replication in Drizzle with a beta release ready to try. By multi-master I mean having multiple masters write to a single slave. For more information on this work take a look at his blog post.

Final Thoughts

Development is starting to move forward at a rapid pace for our next GA and we have had a lot of branches merged that I haven't discussed here from people such as Olaf van der Spek who has contributed a lot towards code cleanups.

As always if you have any feedback or topics you would like me to cover, please let me know.

Tuesday, 22 March 2011

Using Wordpress 3.1 on Drizzle

Since the GA release of Drizzle7 I've had several people asking me about how to convert their MySQL sites to use Drizzle instead. By far the most common one to crop-up is Wordpress. This is aimed to be a simple guide to starting a new blog using Wordpress 3.1 and Drizzle.

Initial Problems

Wordpress by design is very MySQL orientated, for the most part this is good thing, but when trying to switch to another database for it there can be complications. An attempt has been made to create a plugin to use Drizzle, but unfortunately it has side-effects such as modifying your content if you happen to blog about anything related to MySQL or Drizzle. For the purposes of this blog post I have create a patch and will give instructions on how to use it below. If any Wordpress guru has a way to make this into a good plugin, please get in touch!

Conversions Needed

Almost all the conversions for Wordpress 3.1 revolve around the date. When creating a draft or any other table entry Wordpress uses the date '0000-00-00' in several columns. In Drizzle we try to be closer to the SQL standards, and this means that the first valid date is '0001-01-01'. A large majority of the patch is this particular conversion for the queries throughout the PHP code. The rest is to do with schema creation, to be specific:

Drizzle has no LONGTEXT, TINYTEXT, etc... Just TEXT

Drizzle doesn't support multiple character sets, just UTF-8, so we need to drop the character set part of schema creation

The Patch

To patch your wordpress 3.1 source:

Download the patch

Enter the directory of your wordpress installation

Run the following

patch -p1 < wordpress-drizzle.diff

You should now be good to run the install as normal. Noting that if you are not using the mysql-unix-socket-protocol plugin that you should tell Wordpress to connect to '127.0.0.1' for a local database instead of 'localhost'.

Converting an Installtion

If you already have Wordpress 3.1 installed and using MySQL the patch combined with drizzledump's migration function should still work but I have not tried this, so please backup first before attempting it.

Friday, 18 March 2011

Last Week in Drizzle

Welcome to the latest edition of Last Week in Drizzle. This week we announced our GA release!! Interest in Drizzle in the last week has been much higher than anticipated, this blog alone got 4,500 visitors on Wednesday! (which was also a nice test for the Drizzle database powering it)

GA Release

So, on Tuesday the tarball was cut for our GA release called Drizzle7. Most of the changes from the last week relate to code cleanup, documentation and test suite improvements so that we could keep the codebase stable ready for the release (also many of us are busy writing conference talks around now :) ). For a quick summary of what to expect in Drizzle7 and the future you can see my three-part special called "Drizzle - The Icing on the Cake": part 1 part 2 part 3

There have also been blog posts from several members of the Drizzle team: Brian Aker, Patrick Crews, Stewart Smith and more can be found on Planet Drizzle

Drizzle in the Media

We have had huge coverage by many technology publications which has been fantastic to read, I'll try to link to some of them here but you should be able to find more by searching Google News:

The H: Cloud and web database Drizzle reaches general availability

The Register: Drizzle: Big-Data-happy MySQL fork debuts (some technical inaccuracies in this, HailDB isn't the default engine, InnoDB is, but still great to see The Register covering us)

InternetNews.com: Open Source Drizzle database now GA - Should Oracle worry?

InfoWorld: Non-Oracle MySQL fork deemed ready for prime time

NetworkWorld: Is the 'database for the cloud' ready? Drizzle GA released

ZDNet: MySQL fork Drizzle gets general release

Rackspace Cloud Blog: Drizzle7: The First GA release of the Drizzle Project (OK, I wrote that one ;) )

There has also been a lot of coverage in foreign media which is also fantastic to see.

Drizzle Downloads

The only metric we have for the amount of downloads for Drizzle is the source downloads. We can't record the usage of the Ubuntu PPA or the installations using the pre-release of Ubuntu Natty (which has Drizzle in it's repositories) and currently don't log the RPM yum repository. But on source downloads alone there were more downloads in the first 2 days of GA than the 2 weeks of the RC2 release (I count a shade under 400 downloads at the time of publishing this). On top of this many more people have come to the #drizzle IRC channel on Freenode to ask us questions and even one or two minor bugs have already been found by new users.

Criticisms

The biggest criticism I have seen so far is the name 'Drizzle' for a database. I personally like the name 'Drizzle', but I come from the UK where Drizzle is a very regular weather condition. I actually think that is a really good thing, if the only big complaint is the name we most be going right somewhere :)

Fremont

Development has already started on the next version of Drizzle codename Fremont (Drizzle7's codename was Elliott). This time we aim to make you wait a little less time, with the next GA scheduled for later this year. For those who haven't guessed it already the codenames for Drizzle are based on road names in Seattle (another place where drizzle regularly happens). It hasn't yet been decided whether this will be Drizzle7.1 or Drizzle8 and I'm sure we would take suggestions on this at the upcoming Drizzle Developer Day.

Final Thoughts

The overall feedback from the last week has been fantastic. I'd like to thank everyone who has given us great coverage, everyone who has tried Drizzle and last but not least everyone who develops Drizzle whether it be at Rackspace, another company or just a community developer.

As always if you have any feedback or topics you would like me to cover, please let me know.

Thursday, 17 March 2011

Drizzle - The Icing on the Cake - part 3

[caption id="attachment_159" align="alignleft" width="240" caption="Photo by Sifu Renka under a CC BY-NC-SA 2.0 license"]

[/caption]

In the second part of this three part special on the GA release of Drizzle7 I covered the development and testing model we use for Drizzle. In this final part I will cover what you can expect from the future of Drizzle.

What to expect in the future

Whilst we are very proud of the GA release of Drizzle7 there are still features we would like to implement that we could not complete before the release. Whether the next release is Drizzle7.1 or Drizzle8, we haven't quite decided yet. But one thing is for sure, we will not be making you wait 3 years for it, expect the next GA to come later this year! Some of the features I outline here might not make the next GA but we will work our hardest to make sure as much gets in as possible.

Several of the features outlined in this post are planned as possible Google Summer of Code projects, so if you are interested in picking one up please see our GSoC wiki page.

Data Types

We have done a lot of work on data types in Drizzle7 including microsecond precision TIMESTAMP along with new native BOOLEAN and UUID types. We are planning a few additional types including a native IP address type and a TUPLE type which with encapsulate a replacement for the currently missing SET data type.

Catalogs

This is a big one and can be difficult to explain, but I will give it a go. We have already put much of the framework for catalogs into Drizzle7 and hidden away from the user there is a built in 'local' catalog used when starting Drizzle. But what is a catalog?

Think of it as a way of multiple instances of the Drizzle server but running under one daemon. So, inside a catalog you have users, schemas, tables, etc... and each catalog is isolated from each other. If you think about this for a moment it is a huge deal in many ways. For example if you have some kind of shared services (such as cloud) each user to the cloud can have their own catalog, completely isolating their data from everyone else.

On top of this we are planning adding a tunable per-catalog limits system so that you can have some catalogs with higher priority of resources than others.

Stored Procedures

One of the first things we dropped in Drizzle was stored procedures. In many articles it has been written that they are gone and will never return. But in Drizzle if there is demand for something and the developer time to do it, we will bring it in. What people may not realise is part of the framework for this already exists and is used for the slave applier (much like the slave SQL thread in MySQL). We will be doing stored procedures, we will be doing the properly and they will be done as a plugin so that if you don't want them you don't have to have them.

HailDB

Drizzle7 has an optional HailDB plugin. But lets take a step back as many won't know what HailDB is. HailDB is a fork of the Embedded InnoDB source code with many fixes and improvements in it. It can be used completely independently of Drizzle and integrated into your own code. It is the eventual goal to have HailDB take a bigger role in Drizzle, possibly replacing InnoDB as the primary storage engine.

Summary

This is just a small sample of the things we have planned. But the great thing about Drizzle is you, the community, help shape it. If there is something you feel Drizzle needs we may be able to include it. Being able to code helps but contributions come in many forms, from helping in mailing lists and the IRC channel, to documentation, to filing bugs, to raw code. Every contribution is valuable, and every contribution helps to evolve Drizzle.

I hope this three-part blog post has been useful. If you have any questions please direct them the #drizzle Freenode IRC channel, the mailing list or even contact me directly.

Wednesday, 16 March 2011

Drizzle - The Icing on the Cake - part 2

[caption id="attachment_153" align="alignleft" width="240" caption="Photo by Stéfan under a CC BY-NC-SA 2.0 license"]

[/caption]

In part 1 of this 3 part series I talked about what is new in the recently released Drizzle7 and what makes it different to MySQL. In this part I will talk about the development and testing processes behind Drizzle.

The Development Model

Drizzle is developed differently to many open source products. Instead of dual-licensing Drizzle is developed by companies and users that actually use the product. No part of it is closed-source and there is no contributor agreement to sign. We have had many open source developers come from seemingly nowhere to join in development which is fantastic. Development happens on Launchpad using the Bazaar version control system so that everyone can see what is happening.

Bugs reports are also on Launchpad and are pretty easy to search/track and file a new bug. If you have a problem running Drizzle that may or may not be a bug Launchpad has 'Questions' which are a bit like support tickets. We also have the mailing list and Freenode #drizzle IRC channel to ask any questions on.

Testing

Every code branch to be merged goes through the same process regardless of whether it came from a developer from Rackspace or a general community developer. First the code goes through a peer review and then gets tested on every platform we support using the Jenkins Continuous Integration system. This doesn't just test to see if the code compiles and runs the test suite, every branch also goes through a Valgrind run and multiple performance benchmarks to make sure there are no regressions (and also to see if a branch improves performance). All results of these tests are publicly available on our Builds and Benchmarks mailing lists.

Google Summer of Code

We are big fans of GSoC at Drizzle, and every year we have more and more students come to us asking to be a part of GSoC. Many of these students have gone on to get really good jobs in the database industry straight after GSoC. If you are a student and are interested in being a part of GSoC you can find a list of projects on our wiki page, to register your interest please contact the mailing list.

O'Reilly MySQL Conference and Expo

We have 12 talks lined up for the conference as well as a section in the "Mastering the MySQL And Drizzle Plugin Development" tutorial. This year the focus is much more on using Drizzle rather than the development of it. But anyone interested is welcome to ask as questions. We will have a Drizzle booth in the Expo hall for those who wish to come and have a chat.

Drizzle Developer Day

If you are at the MySQL conference and want to take part in shaping the future of Drizzle or just want to listen to talks about Drizzle development processes please come along to the Drizzle Developer Day which will be on Friday 15th April (the day after the UC).

Summary

One of the keys to Drizzle's success is the open development model. Anyone wanting to join in can see our documentation on the subject or contact us on the Freenode #drizzle channel and the mailing list.

In part 3 I will discuss the features currently planned and in-progress for future versions of Drizzle.

Tuesday, 15 March 2011

Drizzle - The Icing on the Cake - part 1

As I'm sure all of you know already, today marks the GA release of Drizzle7. But what was the recipe behind Drizzle?

Take the raw ingredients from previous delicious, well-tried recipes

Sieve out the lumps, separate the eggs and mix

Bake using many cooks in many ovens for around 3 years

Drizzle the special source on top

What is Drizzle?

There are many marketing buzzwords which can describe what Drizzle is such as "A lightweight, microkernel fork of MySQL optimized for the web and cloud". To me such things are pretty meaningless. So, lets start at the beginning...

Several people inside MySQL saw that the code could really do with re-factoring. At the same time they believed that the focus was heading away from it's core web based installations. They also loved open source, and whilst MySQL is open source, community contributions can be difficult. These people got together inside Sun Microsystems (and other companies) to create a completely open development of a fork of MySQL 6.0 called "Drizzle". They aimed to have it easy for new developers to pick up and develop on, moved many parts out to plugins thereby making it light on resources when features are not needed.

In 2010 the original development team moved to Rackspace and several more members were hired (including me), with the aim of Drizzle being used in it's cloud based products. Even today the amount of active community contributors is higher than the amount of developers inside Rackspace working on Drizzle.

Differences From MySQL

I have been asked many times what the differences are between MySQL and Drizzle. This is something I could probably write a book on now. Something that should be clear at this stage is Drizzle is not MySQL, it was MySQL over 3 years but a lot has changed since then. Having said that, applications that use MySQL can usually be converted to use Drizzle relatively easily. For new users to Drizzle, here are a few of the key differences:

Strictness - Drizzle doesn't assume what you mean (which can cause incorrectly recorded data). For example trying to store an invalid ENUM will error instead of storing an empty value.

Data Types - Drizzle has removed, altered and added data types to simplify things and become closer to the SQL standard. For example:
- There is no TINY/SMALL/MEDIUM INT, just INT (and BIGINT).
- There is no TINY/MEDIUM/LONG TEXT/BLOB, there is just TEXT/BLOB.
- TIMESTAMP supports microsecond precision.
- UUID and a true BOOLEAN type added.

Replication - Drizzle's replication uses Google Protocol Buffer messages so a replication reader can be written in any language in minutes. The replication data is stored in InnoDB as part of the transaction as it is being committed so that writing the replication log is very fast.

Development - Drizzle is developed using a completely open development model which I will discuss in part 2.

Licensing - The main drizzle source is GPLv2 licensed, libdrizzle is BSD licensed and the docs (which are also included in the docs directory of the source) are CC SA 3.0 licensed. There is no proprietary licensing for any part of Drizzle.

Compatibility With MySQL

Despite many changes there is still a great deal of compatibility with MySQL. Drizzle speaks the MySQL protocol, so existing MySQL connectors for PHP/Perl/etc... will also connect to and query Drizzle. The SQL syntax is still very similar to MySQL and on top of all this, drizzledump (which is very similar to mysqldump) can convert table structures and data from MySQL to Drizzle on-the-fly.

Drizzle also includes libdrizzle. This is a BSD licensed client library written in C which can talk to MySQL and Drizzle servers, from our testing as well as the testing of developers who are integrating libdrizzle into their products it appears that libdrizzle performs better than libmysqlclient too. Connectors for libdrizzle have been written for most widely used languages such as Python, Java, Perl and PHP.

Plugins

Drizzle uses a completely new plugin architecture so that almost everything is a plugin. From storage engines, to functions, to protocols, to authentication and even query cache. This makes it much simpler to switch off the parts you don't use as well as customising Drizzle for your unique application. In total there are around 80 plugins bundled in the Drizzle source and several others available around the web.

Despite this we have tried to make this easy for most people by having the plugins that most people will use compiled and running by default.

Summary

It is almost impossible to get a feel for what Drizzle is like without trying it for yourself. We have had some great feedback both positive and negative, and have made changes thanks to this feedback. We are all very approachable on #drizzle on Freenode and the Drizzle mailing list.

In part 2 I will discuss the open development model behind Drizzle

Friday, 11 March 2011

Last Week in Drizzle

Welcome to this week's edition of "Last Week in Drizzle". As an introduction this week I would like to quote John David Duncan's recent Facebook post: "And what's in the weather forecast for next week? Drizzle.". Yes, our first GA release is due next week, does that mean the development pace has slowed? Heck no! Over 150,000 lines of bzr diff in the trunk since last week and quite a few branches still in the merge queue going through our extensive regression testing system.

Google Summer of Code

We have once again applied to be part of the Google Summer of Code program. We had some great students last year and some new faces interested in being students on projects for Drizzle have already started taking on some low-hanging-fruit tasks to get them used to our code and processes. We will have a sign-up form up soon so that anyone interested in being part of the program which I will blog about when ready. In the mean time you can read our wiki page about participation and if you have any suggestions for projects this year, please let us know.

Race to GA

We are just a few short days away now from the first Drizzle GA. The release schedule for Drizzle7 is as follows:

RC1 - ~~14th February 2011~~ Released
RC2 - ~~28th February 2011~~ Released
GA - 14th March 2011

Engine Removal

By "engine removal" I don't mean the poor state of my car but the fact that we have removed some of the bundled storage engines from Drizzle this week. This is because some needed maintaining, some didn't quite fit in with Drizzle and some just plain didn't compile any more. This also helps us as developers support Drizzle by concentrating on the storage engines that are important to users. Will the removed engines be gone forever? If there is demand for them and they can be maintained, they will return. The removed engines are archive, blackhole, filesystem_engine, blitzdb, csv and pbxt.

Node.js

Mariano Iglesias has created a node.js binding for libdrizzle. In his words "the libdrizzle binding is outperforming node.js mysql bindings by a factor of at least 2 to 1" which is great to hear, especially since it can be used against a MySQL server. An example of how to use it can be found here.

Libdrizzle Only Option

Monty Taylor as added the configure option --without-server to go along with "make libdrizzle" which will only compile libdrizzle from the drizzle trunk. This should help anyone who only requires libdrizzle from source and doesn't want to have the dependencies required for the server to get it.

Authentication Defaults

Brian Aker has outlined changes to authentication such as the requirement of a username to connect to Drizzle and only listening on localhost by default. Further details can be found on the mailing list where he is also asking for feedback on changes.

Final Thoughts

This time of year is incredibly busy for us, preparing for the GA release whilst getting ready to give lots of conference talks and other such things. But despite this spirits are still high. I for one am very proud of what has been achieved in Drizzle by the team at Rackspace and other companies and community members involved. I hope new users coming to Drizzle find it as exciting as we do.

As always if you have any feedback or topics you would like me to cover, please let me know.

Friday, 4 March 2011

Last Week in Drizzle

Welcome to the third edition of Last Week in Drizzle. The diff of the trunk between last Friday and right now is just over 230,000 lines in size, 10x the size of the previous week! This includes many changes to the documentation, code clean-ups and Patrick Crews' continued work on our new DBQP test suite.

Replication

David Shrewsbury (I'm going to spell his name correctly this week ;)) and Patrick Crews have been working hard on making replication even more rock solid. The slave plugin is in, working and is stable with everything we can throw at it.

Drizzle developer day

We have a Drizzle Developer Day at the 2011 O'Reilly MySQL Conference and Expo. Anyone is welcome to come and learn, contribute and make suggestions about Drizzle. It will be on Friday 15th April 9:30 a.m. to 5:00 p.m. although the exact location is to be confirmed it will likely take place in the Santa Clara Convention Centre.

If you wish to come along please sign-up here.

Race to GA

Our RC2 was released on the 28th of February and included the stabilised replication code amongst other fixes. So the release schedule for Drizzle7 is:

RC1 – ~~14th February 2011~~ Released
RC2 – ~~28th February 2011~~ Released
GA – 14th March 2011

SQLAlchemy

Monty Taylor has been working hard to get Drizzle working with SQLAlchemy which has a very rigorous test suite. This will help make it easier for Drizzle to act as a data store for OpenStack Compute which uses SQLAlchemy.

Libdrizzle

We have noticed that people have been downloading libdrizzle from the old libdrizzle Launchpad page. Back in October we merged libdrizzle into the main Drizzle trunk and since then all bug fixes/development has happened there. We highly recommend not using the old libdrizzle (which has known bugs) and we are in the process of shutting down the old libdrizzle development page.

Drizzle module for PHP

We have started developing the Drizzle module for PHP in Launchpad and have created a new release of this which is compatible with the libdrizzle in the Drizzle trunk. We are working to get the fixes in PECL but in the mean time we will be developing in Launchpad and basing any binary releases from this.

Of course, Drizzle is compatible with the MySQL protocol so the existing MySQL functions and classes will work with Drizzle.

Drizzle module for Perl

Patrick Galbraith has released version 0.303 of DBD::Drizzle which contains several fixes for talking to a Drizzle server.

Final Thoughts

We are pretty much at the homestretch for the first GA, years of work from a huge amount of contributors will have a stable release. Thanks to the many people who have helped make this happen. We receive valuable feedback every day and all of it goes to make Drizzle a better product.

As always if you have any feedback or topics you would like me to cover, please let me know.

Thursday, 3 March 2011

libdrizzle

We have had several users report issues with libdrizzle lately, but on closer inspection it has been found they are using an old version with known problems.

Back in October we merged libdrizzle into the main drizzle trunk. All libdrizzle development since then has happened in drizzle rather than the separate libdrizzle project. We had intended to shut down the libdrizzle project page but for several reasons it had not happened. The libdrizzle project page now has a message to state that you should use drizzle instead and we have pulled the downloads down. In the next few weeks we intend to:

kill the libdrizzle project page completely

devise a way to compile just libdrizzle when using the drizzle trunk, omitting the need to have all of drizzle's dependencies to compile it.

On a related note, I have created a release of the drizzle module for PHP on Launchpad which is now compatible with current libdrizzle. I am also in the process of working on packaging for this as well as a PDO module.

Friday, 25 February 2011

Last Week in Drizzle

Welcome to the second edition of Last Week in Drizzle. The diff of the trunk between last Friday and right now is just under 23,000 lines in size, so I will do my best to summarise the important parts of this.

Replication

A lot of great work has gone into solidifying replication this week. The slave code has not been merged yet since it has been triggering bugs in the Solaris compiler (see further down this post) but we plan to have it included in the next few hours ready for RC2. For a quick summary of our current progress I turn to Patrick Crews:

"In our testing, we create a master-slave setup, then run the transaction log tests that we have been using since we first started beating on the trx log.

There are a variety of different grammars that produce transactions (autocommit=off) and single queries (with autocommit=on). The various grammars produce different levels of valid queries - some make more valid queries and produce more deadlocks in multi-threaded testing, others create invalid queries to test how the log handles bad input. We test with simple scenarios like 100 cycles and 1 thread, then move on to more complex tests like 10000 cycles and 10 threads as well as setup 1 million cycle tests to stress the server long-term.

There are still bugs to be found, but we can say with confidence that basic replication, even in highly concurrent / high stress (lots of deadlocks, rollbacks, and good commits) scenarios is working well. Data is replicated correctly and that is HUGE!

We'll now be moving onto other tests like different configurations (master + slave restarts and crashes, adding a slave to a populated master, KILLing queries, etc)."

More information about setting up replication can be found in Patrick's blog post and David Shrewsbury will have a blog post on it shortly. Documentation on the docs site will be coming soon.

Race to GA

Our RC2 release is due for tagging on the 28th of February which will include the stabilised replication code. So the release schedule as last week is:

RC1 - 14th February 2011 Released
RC2 - 28th February 2011
GA - 14th March 2011

BOOLEAN data type

There has been a slight change to the output of the new native BOOLEAN data type. In the first RC release the MySQL client API returned this as the varchar 'TRUE' or 'FALSE'. Unfortunately some languages such as PHP and Python did not handle this too well, so thanks to Monty Taylor we now call this TINYINT on the wire for MySQL protocol and return 0 and 1. Our command line utilities know when they are connecting to Drizzle instead of MySQL and will do the conversion to 'TRUE' or 'FALSE' when displaying output.

Docs day

Wednesday 23rd was our documentation day and I would like to thank everyone who took the time to read through the docs and make suggestions and fixes. Since Wednesday morning the diff of the docs directory in trunk is just under 4500 lines in size and there are plenty more changes on their way based on the feedback gained. I'd also like to thank Marisa Plumb, our main documentation writer, for all her hard work so far.

BIT operators

Brian Aker has (by popular demand) added back the SQL bit operators that MySQL has into Drizzle, this includes bit shifting operators.

Message verboseness

Brian has also added a new option to drizzled to set how verbose the output messages should be, defaulting to ERROR only. More information can be found in the mailing list.

Solaris Jenkins Slave

We have hit several bugs in the Solaris compiler in the last week, Monty Taylor has been fire-fighting this but for now our Jenkins Solaris slave is not creating working builds. This does mean that there is a chance that RC2 won't be fully tested on Solaris when released.

PHP module

I have been working with the PECL guys to get the Drizzle module working again. The current release will not compile with the libdrizzle inside Drizzle. The fix for this has been pushed to the SVN trunk in PECL and we are looking to generate a release of this soon. In the mean time, the mysql and mysqli PHP connectors work great with Drizzle.

I have also received a lot of feedback from the PHP community that many of them use PDO for database connections. We currently have no PDO module so I am working on writing one which hopefully will be ready for testing in the next week.

Final Thoughts

A massive amount of bug fixes and improvements have gone into the trunk ready for RC2, including many code clean-ups from Olaf van der Spek who is a new contributor to Drizzle. I for one am looking forward to a very exciting GA release.

As always if you have any feedback or topics you would like me to cover, please let me know.

Friday, 18 February 2011

Last Week in Drizzle

It has been a while since we have done one of these so I thought I might try and resurrect the tradition. So here is my first "Last Week in Drizzle".

Replication

The original plan for replication was to use a Tungsten Replicator based solution to transfer the transaction logs (similar to MySQL's binary logs). Unfortunately this can't be completed in time for the GA release so we have switched to a master-slave solution similar to MySQL. The Tungsten solution is still something we plan to finish though.

Replication events are stored using Google Protocol Buffer messages in an InnoDB table, these events are read by the slave, stored locally and applied. The advantage of the Google Protocol Buffer messages is a script or program can be knocked up in pretty much any language in minutes to read our replication log.

Unfortunately this sudden change in replication method means we could not complete the slave code in time for the RC release, which in turn means we are creating a second RC release as explained further down in this blog post.

Special thanks goes to David Shrewsbury, Patrick Crews and Joe Daly for making this happen.

New Release

Our first RC has been released this week. In this release we have:

Drizzle server can now fork to background via. --daemon. This was primarily implemented to help RedHat/Fedora init.d scripts.

Implicit Cartesian Joins no longer work this is to prevent runaway queries.

Improvements to the replication transaction log.

Many other bug fixes and improvements.

Race to GA

Due to the late entry replication code we intend to have one more RC whist we test it to death in as many horrid ways as Patick Crews can find. So the current release plan for Drizzle7 is now:

RC2 - 28th February 2011
GA - 14th March 2011

New RPM Repository

Derks has created a new RPM repository for us at rpm.drizzle.org, more details on this can be seen here.

Windows Jenkins Slave

Monty Taylor has created a Windows slave for our Jenkins Continuous Integration testing system. This means we now test every trunk merge for libdrizzle regressions in Windows.

Docs Day

On Monday 21st February we have our docs day, the developers will be reviewing the entire docs site for technical errors and any improvements that can be made (such as missing topics). We encourage anyone who would like to improve the quality of our docs to join in this effort, contact us on #drizzle on Freenode or file a bug if you spot anything we could improve on.

Update 2011-02-20: We are postponing this until Wednesday 23rd February due to documentation merges which won't have quite hit trunk by Monday.

Final Thoughts

I'm going to try and do one of these every week, so if you have any feedback or topics you would like me to cover, please let me know.

Monday, 14 February 2011

From Drizzle with love

drizzle> select concat(char(0xe299a5,0xe29da4,0xe299a5), ' Happy Valentine\'s Day ', char(0xe299a5,0xe29da4,0xe299a5)) as Message\G

*************************** 1. row ***************************
Message: ♥❤♥ Happy Valentine's Day ♥❤♥
1 row in set (0 sec)

I may be British, but I am no MI6 spy (and I'm certainly no Sean Connery). Although the 007 reference is still relevant because we are nearing the first GA release of Drizzle7.

This week is a special week for many reasons. First of all, probably the most obvious, it is Valentine's day (yes guys, the flower shop is still open, run before she finds out you forgot!).

Probably not quite as important, depending on your relationship status, is Drizzle's first RC release will be out over the next couple of days. The amount of work that has gone into Drizzle is staggering, even the 6 months I have been working on it full-time has seen an amazing amount of change. For those who don't know about the Drizzle project, here is a quick recap:

Drizzle is a microkernel database primarily aimed at web and cloud installations. It started life in 2008 as a fork of MySQL 6.0, since then it has gone through extensive changes such as migrating a lot of functionality into a new plugin architecture so that parts can be changed easily. It uses InnoDB as it's primary storage engine, much like MySQL 5.5 and has much the same SQL syntax to MySQL. There are many things that have been ripped out of Drizzle that are in MySQL as part of this process but also many more things that have been added in, mainly as plugins.

For example, Drizzle does not have stored procedures. That is not to say it cannot or won't have them, and with the plugin architecture it would be quite easy to add them, but we feel in a cloud environment that logic should be at the application layer. If we find many people need them or a developer wants to work on them, we would be happy for plugins to exist to implement this.

We have modified the parser so that it is a bit stricter, by this I mean if the database can't figure out what you mean it errors instead of making assumptions which can lead to bad data. UTF-8 is the only supported character set, supporting multiple character sets is very useful in some scenarios, but the web is pretty much all UTF-8 now and confusion and corruption could occur if character sets are not handled correctly.

There is a testing build of Drizzle made every two weeks which is basically just a tagged release of our trunk, we run a full regression and benchmark suite on every merge to the trunk to make sure that it is constantly stable, so the fortnightly tagging is not a mad fight to get things stable.

Speaking of which, I'd like to thank the many community users and Rackspace DBAs who has tested Drizzle so far, the feedback has been amazing and we have made many improvements to Drizzle and the documentation due to this.

Is Drizzle fast?

I'm not going to benchmark MySQL/Postgres/Oracle/etc... vs. Drizzle as I can probably make any database look favourable in any such benchmark, I'll leave that to unbiased third parties to decide. But we do benchmark every single build to check for any performance regressions and indeed improvements. This information can be found in our benchmarks mailing list. In my personal opinion, we should be very fast at most things.

So, what can you expect to see in Drizzle7 RC?

I have gone over many of these in previous blog posts but here is a recap of a few things:

Drizzledump can do on-the-fly MySQL->Drizzle conversions without an intermediate file

Microsecond precision for TIMESTAMP data types (note that it is the 6th anniversary of the MySQL feature request for this tomorrow) as well as all-new data types

The framework for an entirely new and portable replication system

Basic Catalogs support (much more to come here)

And this is just a few of the recent changes. Going over changelogs I could probably make this list many pages long of really exciting features. Don't just take my word for it though, try it for yourself!

Tuesday, 8 February 2011

The end of implicit cartesian products

I've done it before, and I'm sure many others have. You type:

SELECT * FROM t1,t2;

Without any conditions, and then just wait as your console spews out every combination of the two tables possible in what is called an implicit cartesian join. Worst still when you are hosting and one of your client's apps does this (I've seen this too many moons ago).

So, in Drizzle trunk today and in our RC release next week we have a new error "Implicit cartesian join attempted." which will fire every time you try a query such as the one above. If you really want a full cartesian join without a WHERE or ON condition (sometimes, it is needed) then you can use the CROSS keyword. For example:

SELECT * FROM t1 CROSS JOIN t2;

Friday, 28 January 2011

MySQL to Drizzle character set considerations

Drizzle supports one character set (unless you include binary) which is UTF8. It is the character set used by most of the web and supporting many different character sets can lead to complications. That is not to say that there are not advantages of supporting many different types of character sets like MySQL does, but more care is needed when using them.

As an example of this, a new Drizzle user came online today saying that drizzledump's MySQL to Drizzle conversion was turning 'è' to 'Ã¨'. When drizzledump connects to a MySQL server it sets the connection to UTF8 so that the dump output is compatible with Drizzle. After a bit of discussion it was discovered that the user's table was latin1 and connection was latin1 (PHP does this by default) but they were storing and retrieving UTF8 data. Essentially their data was getting mangled but it happened to work. The problem came when telling MySQL to export this data as UTF8, it was effectively doing a double UTF8 conversion of the data.

With this in mind we have added a new option to drizzledump so that it stops setting the character set for the connection in these situations, '--my-data-is-mangled'.

Thursday, 20 January 2011

drizzle.org site updates

We have done a few modifications to the drizzle.org website today, most of it is to do with layout reorganisation but there are a couple of new features:

A DrizzleDB twitter feed box (not quite live, we will flip the switch on that part soon)

Binary download links for RPMs and DEBs for Drizzle.

Please let us know what you think about the changes.

Also, please visit and sign-up for the Drizzle Developer Day on April 15th. The venue is TBC but will be in or around the Hyatt Regency in Santa Clara.

Wednesday, 19 January 2011

Change of Drizzle logo

Way back in 2008 (which is a long time ago in Drizzle's history) Zak Greant created a logo for the Drizzle project (as seen above). For some reason which has been lost through the ages we switched from this to using the rain cloud logo which was part of the Tango project.

After a lot of discussion it was found that most of us preferred the old logo, not only does it actually have the name in it but in my opinion it is more stylish than the rain cloud. As a result we have switched back to the old logo and are making this the official Drizzle logo. We will be rolling it out over the various Drizzle related sites shortly.

It is licensed under a Creative Commons Attribution ShareAlike 3.0 License and an SVG of it is available here.

Monday, 17 January 2011

Building Drizzle in RedHat Enterprise Linux 6 and derivatives

Over this weekend I have been playing with a test release of Scientific Linux 6 which is a binary-compatible rebuild of the source for RedHat Enterprise Linux 6 with a few additions, very similar to CentOS (CentOS 6 should be out in the next few weeks). Specifically I have been testing Drizzle in it to see if we can compile it and that it will pass our regression suite. The good news is 'yes' to both. Here is how to do it.

First of all you need some pre-requisites installed. Almost all are available from the operating system's repositories apart from one which I will come to in a minute. You need to install the following packages using yum or any other package manager:

boost-devel

autoconf

automake

gcc-c++

libtool

gperf

libuuid-devel

zlib-devel

pcre-devel

readline-devel

flex

bison

Now, the key thing missing from the above list is Google's Protocol Buffers. Unfortunately, unlike Fedora, this does not seem to be in the RedHat repositories so we need to roll our own. To do this:

Install the following packages using yum or any other package manager:
- rpm-build
- python-devel
- python-setuptools

Download the protobuf source package from here.

Run the following (as root):

rpmbuild --rebuild protobuf-2.2.0-3.el5.src.rpm

Install the protobuf packages as follows (again as root):
```
rpm -Uvh /root/rpmbuild/RPMS/x86_64/*
```
or if you are using 32bit:
```
rpm -Uvh /root/rpmbuild/RPMS/i686/*
```

You are now good to go. You should be able to compile Drizzle in the normal way.

Monday, 10 January 2011

TIMESTAMP with microseconds

Back in 2005 a user requested on the MySQL bug tracker that the TIME/DATE based data types store microseconds. I personally don't think this is an unreasonable request and judging by the many posts to the bug report by users between then and now this is something quite a few people would like to see.

In Drizzle we asked 'What if...' and Brian came up with the answer. We now (in trunk and in next week's release) have TIMESTAMP and NOW() with microsecond precision.

To create a TIMESTAMP column that uses microseconds you simply need to specify TIMESTAMP(6) in your table definition, for example:

CREATE TABLE `t1` (
 `a` INT DEFAULT NULL,
 `b` TIMESTAMP(6) NULL DEFAULT NULL
) ENGINE=InnoDB

You can then use the following (note that ON DEFAULT/UPDATE CURRENT_TIMESTAMP works with microseconds as well):

drizzle> insert into t1 values (1, '2010-01-10 07:32:43.234567');
Query OK, 1 row affected (0.07 sec)

drizzle> select * from t1;
+------+----------------------------+
| a    | b                          |
+------+----------------------------+
|    1 | 2010-01-10 07:32:43.234567 |
+------+----------------------------+
1 row in set (0 sec)

Sunday, 9 January 2011

What if...

I was looking back at Drizzle blog posts today, and noticed the first couple announcing to the world that Drizzle exists are entitled 'What if'. These are two powerful words which can often drive innovation. I was suddenly reminded of a UK advert for Honda called 'OK Factory' where one worker decides to go against the grain and ponder 'What if':

(for those who have problems with the embedded Daily Motion a YouTube version is available here)

In Feburary we aim to show you what these two powerful words can produce. At the very least, I hope the Drizzle project inspires others to ponder 'What if...'

Monday, 3 January 2011

Year 7DB!

OK, yes, I was kinda sad enough at the end of December to convert 2011 to hex to find it is 7DB. Which I think is appropriate as for me at least this will be the year of Drizzle7 DBMS.

It may be the winter holiday season but many of us Drizzle developers haven't taken much of a break. I have been working hard to keep our bug count down and will be working on some really cool new features this month. Brian Aker has also announced things that he has been working on in this mailing list post.

So, what can you expect from the Drizzle project in year 7DB? Here is a sneak preview of a few things off the top of my head (I really hope I haven't missed anyone here):

Completion of replication support (thanks to David Shrewsbury, Joe Daly and everyone else involved there)

Many bugs killed (thanks as always to Patrick Crews for making our jobs that much harder by finding the bugs :) We have also have a bug killing week later this month.

Support for a few more data types (and probably improved support for current types)

Catalogs support

Improvements to system variables and options (thanks to Monty Taylor and Vijay Samuel)

InnoDB 1.1.4 (in fact, it has already been merged in, thanks to Stewart Smith)

Some really rocking documentation taking shape (thanks to our new documentation writer Marisa Plumb)

Some O'Reilly MySQL Conference and Expo talks (I will be there, details in another blog post soon)

Drizzle 7 GA release

And much of this will be done very early on this year. Development is happening at a very rapid pace and have made some great achievements so far. As always we love to for you to try the source or our binaries and love to hear feedback. We have had some fantastic feedback so far which has led to very rapid fixes and improvements in various areas.

Pages

Friday, 12 August 2011

Saturday, 23 July 2011

Tuesday, 14 June 2011

Tuesday, 31 May 2011

Friday, 20 May 2011

Friday, 13 May 2011

Mydumper 0.2.1

Myloader

New Website

Monday, 9 May 2011

Friday, 6 May 2011

Thursday, 5 May 2011

Wednesday, 4 May 2011

Wednesday, 27 April 2011

Rackspace and Drizzle

MySQL Conference

SkySQL

Google Summer of Code

The Return of the Jedi

Monday, 18 April 2011

O'Reilly MySQL Conference and Expo

Drizzle Developer Day

Development Goes On!

Drizzle Support

Rackspace and Drizzle

My Next Adventure

Final Thoughts

Friday, 8 April 2011

Drizzle in Real Time Data Visualization

Percona's Contributions

Drizzle Amazon AMI

Google Summer of Code

Final Thoughts

Friday, 1 April 2011

First Fremont Tarball

Xtrabackup

Multi-Master Replication

Drizzle Migration

FreeBSD

Final Thoughts

Monday, 28 March 2011

Fremont

Google Summer of Code

Xtrabackup

Catalogs

Libdrizzle 2.0

Multi-Master Replication

Final Thoughts

Tuesday, 22 March 2011

Initial Problems

Conversions Needed

The Patch

Converting an Installtion

Friday, 18 March 2011

GA Release

Drizzle in the Media

Drizzle Downloads

Criticisms

Fremont

Final Thoughts

Thursday, 17 March 2011

What to expect in the future

Data Types

Catalogs

Stored Procedures

HailDB

Summary

Wednesday, 16 March 2011

The Development Model

Testing

Google Summer of Code

O'Reilly MySQL Conference and Expo

Drizzle Developer Day

Summary

Tuesday, 15 March 2011

What is Drizzle?

Differences From MySQL

Compatibility With MySQL

Plugins