Friday 25 February 2011

Last Week in Drizzle

Welcome to the second edition of Last Week in Drizzle.  The diff of the trunk between last Friday and right now is just under 23,000 lines in size, so I will do my best to summarise the important parts of this.

Replication


A lot of great work has gone into solidifying replication this week.  The slave code has not been merged yet since it has been triggering bugs in the Solaris compiler (see further down this post) but we plan to have it included in the next few hours ready for RC2.  For a quick summary of our current progress I turn to Patrick Crews:

"In our testing, we create a master-slave setup, then run the transaction log tests that we have been using since we first started beating on the trx log.

There are a variety of different grammars that produce transactions (autocommit=off) and single queries (with autocommit=on).  The various grammars produce different levels of valid queries - some make more valid queries and produce more deadlocks in multi-threaded testing, others create invalid queries to test how the log handles bad input.  We test with simple scenarios like 100 cycles and 1 thread, then move on to more complex tests like 10000 cycles and 10 threads as well as setup 1 million cycle tests to stress the server long-term.

There are still bugs to be found, but we can say with confidence that basic replication, even in highly concurrent / high stress (lots of deadlocks, rollbacks, and good commits) scenarios is working well.  Data is replicated correctly and that is HUGE!

We'll now be moving onto other tests like different configurations (master + slave restarts and crashes, adding a slave to a populated master, KILLing queries, etc)."

More information about setting up replication can be found in Patrick's blog post and David Shrewsbury will have a blog post on it shortly.  Documentation on the docs site will be coming soon.

Race to GA


Our RC2 release is due for tagging on the 28th of February which will include the stabilised replication code.  So the release schedule as last week is:

RC1 - 14th February 2011 Released
RC2 - 28th February 2011
GA - 14th March 2011

BOOLEAN data type


There has been a slight change to the output of the new native BOOLEAN data type.  In the first RC release the MySQL client API returned this as the varchar 'TRUE' or 'FALSE'.  Unfortunately some languages such as PHP and Python did not handle this too well, so thanks to Monty Taylor we now call this TINYINT on the wire for MySQL protocol and return 0 and 1.  Our command line utilities know when they are connecting to Drizzle instead of MySQL and will do the conversion to 'TRUE' or 'FALSE' when displaying output.

Docs day


Wednesday 23rd was our documentation day and I would like to thank everyone who took the time to read through the docs and make suggestions and fixes.  Since Wednesday morning the diff of the docs directory in trunk is just under 4500 lines in size and there are plenty more changes on their way based on the feedback gained.  I'd also like to thank Marisa Plumb, our main documentation writer, for all her hard work so far.

BIT operators


Brian Aker has (by popular demand) added back the SQL bit operators that MySQL has into Drizzle, this includes bit shifting operators.

Message verboseness


Brian has also added a new option to drizzled to set how verbose the output messages should be, defaulting to ERROR only.  More information can be found in the mailing list.

Solaris Jenkins Slave


We have hit several bugs in the Solaris compiler in the last week, Monty Taylor has been fire-fighting this but for now our Jenkins Solaris slave is not creating working builds.  This does mean that there is a chance that RC2 won't be fully tested on Solaris when released.

PHP module


I have been working with the PECL guys to get the Drizzle module working again.  The current release will not compile with the libdrizzle inside Drizzle.  The fix for this has been pushed to the SVN trunk in PECL and we are looking to generate a release of this soon.  In the mean time, the mysql and mysqli PHP connectors work great with Drizzle.

I have also received a lot of feedback from the PHP community that many of them use PDO for database connections.  We currently have no PDO module so I am working on writing one which hopefully will be ready for testing in the next week.

Final Thoughts


A massive amount of bug fixes and improvements have gone into the trunk ready for RC2, including many code clean-ups from Olaf van der Spek who is a new contributor to Drizzle.  I for one am looking forward to a very exciting GA release.

As always if you have any feedback or topics you would like me to cover, please let me know.

Friday 18 February 2011

Last Week in Drizzle

It has been a while since we have done one of these so I thought I might try and resurrect the tradition.  So here is my first "Last Week in Drizzle".

Replication


The original plan for replication was to use a Tungsten Replicator based solution to transfer the transaction logs (similar to MySQL's binary logs).  Unfortunately this can't be completed in time for the GA release so we have switched to a master-slave solution similar to MySQL.  The Tungsten solution is still something we plan to finish though.

Replication events are stored using Google Protocol Buffer messages in an InnoDB table, these events are read by the slave, stored locally and applied.  The advantage of the Google Protocol Buffer messages is a script or program can be knocked up in pretty much any language in minutes to read our replication log.

Unfortunately this sudden change in replication method means we could not complete the slave code in time for the RC release, which in turn means we are creating a second RC release as explained further down in this blog post.

Special thanks goes to David Shrewsbury, Patrick Crews and Joe Daly for making this happen.

New Release


Our first RC has been released this week.  In this release we have:

  • Drizzle server can now fork to background via. --daemon.  This was primarily implemented to help RedHat/Fedora init.d scripts.

  • Implicit Cartesian Joins no longer work this is to prevent runaway queries.

  • Improvements to the replication transaction log.

  • Many other bug fixes and improvements.


Race to GA


Due to the late entry replication code we intend to have one more RC whist we test it to death in as many horrid ways as Patick Crews can find.  So the current release plan for Drizzle7 is now:

RC2 - 28th February 2011
GA - 14th March 2011

New RPM Repository


Derks has created a new RPM repository for us at rpm.drizzle.org, more details on this can be seen here.

Windows Jenkins Slave


Monty Taylor has created a Windows slave for our Jenkins Continuous Integration testing system.  This means we now test every trunk merge for libdrizzle regressions in Windows.

Docs Day


On Monday 21st February we have our docs day, the developers will be reviewing the entire docs site for technical errors and any improvements that can be made (such as missing topics).  We encourage anyone who would like to improve the quality of our docs to join in this effort, contact us on #drizzle on Freenode or file a bug if you spot anything we could improve on.

Update 2011-02-20: We are postponing this until Wednesday 23rd February due to documentation merges which won't have quite hit trunk by Monday.

Final Thoughts


I'm going to try and do one of these every week, so if you have any feedback or topics you would like me to cover, please let me know.

Monday 14 February 2011

From Drizzle with love

drizzle> select concat(char(0xe299a5,0xe29da4,0xe299a5), ' Happy Valentine\'s Day ', char(0xe299a5,0xe29da4,0xe299a5)) as Message\G

*************************** 1. row ***************************
Message: ♥❤♥ Happy Valentine's Day ♥❤♥
1 row in set (0 sec)

I may be British, but I am no MI6 spy (and I'm certainly no Sean Connery).  Although the 007 reference is still relevant because we are nearing the first GA release of Drizzle7.

This week is a special week for many reasons.  First of all, probably the most obvious, it is Valentine's day (yes guys, the flower shop is still open, run before she finds out you forgot!).

Probably not quite as important, depending on your relationship status, is Drizzle's first RC release will be out over the next couple of days.  The amount of work that has gone into Drizzle is staggering, even the 6 months I have been working on it full-time has seen an amazing amount of change.  For those who don't know about the Drizzle project, here is a quick recap:

Drizzle is a microkernel database primarily aimed at web and cloud installations.  It started life in 2008 as a fork of MySQL 6.0, since then it has gone through extensive changes such as migrating a lot of functionality into a new plugin architecture so that parts can be changed easily.  It uses InnoDB as it's primary storage engine, much like MySQL 5.5 and has much the same SQL syntax to MySQL.  There are many things that have been ripped out of Drizzle that are in MySQL as part of this process but also many more things that have been added in, mainly as plugins.

For example, Drizzle does not have stored procedures.  That is not to say it cannot or won't have them, and with the plugin architecture it would be quite easy to add them, but we feel in a cloud environment that logic should be at the application layer.  If we find many people need them or a developer wants to work on them, we would be happy for plugins to exist to implement this.

We have modified the parser so that it is a bit stricter, by this I mean if the database can't figure out what you mean it errors instead of making assumptions which can lead to bad data.  UTF-8 is the only supported character set, supporting multiple character sets is very useful in some scenarios, but the web is pretty much all UTF-8 now and confusion and corruption could occur if character sets are not handled correctly.

There is a testing build of Drizzle made every two weeks which is basically just a tagged release of our trunk, we run a full regression and benchmark suite on every merge to the trunk to make sure that it is constantly stable, so the fortnightly tagging is not a mad fight to get things stable.

Speaking of which, I'd like to thank the many community users and Rackspace DBAs who has tested Drizzle so far, the feedback has been amazing and we have made many improvements to Drizzle and the documentation due to this.

Is Drizzle fast?


I'm not going to benchmark MySQL/Postgres/Oracle/etc... vs. Drizzle as I can probably make any database look favourable in any such benchmark, I'll leave that to unbiased third parties to decide.  But we do benchmark every single build to check for any performance regressions and indeed improvements.  This information can be found in our benchmarks mailing list.  In my personal opinion, we should be very fast at most things.

So, what can you expect to see in Drizzle7 RC?


I have gone over many of these in previous blog posts but here is a recap of a few things:

  • Drizzledump can do on-the-fly MySQL->Drizzle conversions without an intermediate file

  • Microsecond precision for TIMESTAMP data types (note that it is the 6th anniversary of the MySQL feature request for this tomorrow) as well as all-new data types

  • The framework for an entirely new and portable replication system

  • Basic Catalogs support (much more to come here)


And this is just a few of the recent changes.  Going over changelogs I could probably make this list many pages long of really exciting features.  Don't just take my word for it though, try it for yourself!

Tuesday 8 February 2011

The end of implicit cartesian products

I've done it before, and I'm sure many others have.  You type:
SELECT * FROM t1,t2;

Without any conditions, and then just wait as your console spews out every combination of the two tables possible in what is called an implicit cartesian join.  Worst still when you are hosting and one of your client's apps does this (I've seen this too many moons ago).

So, in Drizzle trunk today and in our RC release next week we have a new error "Implicit cartesian join attempted." which will fire every time you try a query such as the one above.  If you really want a full cartesian join without a WHERE or ON condition (sometimes, it is needed) then you can use the CROSS keyword.  For example:
SELECT * FROM t1 CROSS JOIN t2;