Tuesday 31 May 2011

Mydumper 0.2.3 released!

Today marks the release of mydumper 0.2.3.  Mydumper is a multi-threaded high-performance data dumper (and loader) for MySQL and Drizzle written in C. This is a minor bugfix release whilst I work on the upcoming 0.5 version. The end goal here will be that there is always a 'stable' and 'development' version. 0.2 will be the first stable version and will only have bug fixes. 0.5 will be the first 'development' version where the next planned set of new features will hit.

So, the changes since 0.2.2 are:

  • Drizzle support now fully works again

  • Fixes so mydumper compiles in FreeBSD (thanks to Kirill A. Korinskiy)


If you wish to try this version the source can be downloaded here.

    Friday 20 May 2011

    Mydumper now with MyISAM consistent snapshots!

    Mydumper 0.2.2 has been released today with a number of fixes and new features.  The one that most people have been asking for is consistent snapshots for non-InnoDB tables (such as MyISAM).  We have been able to achieve this without locking the database for the entire backup using the following method:

    1. Flush tables with read lock (and start transaction with consistent snapshot on all threads)

    2. Dump non-InnoDB

    3. Start InnoDB dump

    4. When non-InnoDB dump has finished (whilst InnoDB is dumping) unlock tables

    5. ...

    6. Profit

    I have also started work on Drizzle support this week.  The Drizzle support is not entirely complete mostly down to Drizzle's handling of SHOW TABLE STATUS and I expect to have it fully working next week.

    Here is the list of changes since 0.2.1:

    • Consistent snapshots for non-InnoDB (non-transactional) tables

    • Fix --binlogs breaking consistent snapshots

    • (very) Minor performance improvement to mydumper

    • Initial support for Drizzle

    • Add --verbose option

    • Fix multiple ';' at end of file

    • Fix myloader not closing files

    • Improve myloader error messages

    • Several fixes to documentation building

    • Make myloader tell mysqld to not binlog imports by default (new option --enable-binlog to log them)

    • Add --database to myloader for single database import to a different database

    • Change mydumper's --schemas to --no-schemas (--schemas is now default)

    • Add verbose messages (with --verbose=3)

    • Fix memory leaks in myloader

    You can download mydumper 0.2.2 from Launchpad or by clicking here.

    Many thanks to everyone who has been testing mydumper and giving feedback.  Your comments and suggestions go a long way to improving mydumper.

    Friday 13 May 2011

    Mydumper now with myloader!

    It has only been a few days since the 0.2.0 release of mydumper but there have been some big changes since then.  I will try and go over them all here.

    Mydumper 0.2.1


    Mydumper 0.2.1 has been released today, many thanks for all those who have been testing the trunk source, the feedback has gone a long way to making fixes and improvements to mydumper.  You can download the source for it here.

    Myloader


    A few days ago Mark Callaghan asked about restoring mydumper backups.  There is of course the great 'myimporter' tool written by Mikael Fridh, but I wanted something that could integrate into the mydumper source better.  So, I have been busy hacking for the last few days and the big news in today's release is the addition of 'myloader'.  Myloader is a high-performance, multi-threaded tool written in C designed to read mydumper backups and apply them in parallel.

    There are still many features I wish to add to it, but is good for basic restorations.

    New Website


    Last night I knocked together a site for mydumper.  There is not much in the way of content yet, but you can take a look at http://www.mydumper.org/

    Mikael FridhMikael Fridh

    Monday 9 May 2011

    MySQL Data Dumper 0.2.0 released!



    A couple of years ago Domas Mituzas created a tool which could basically be thought of as basically a lightweight multi-threaded mysqldump.  By this I mean it can retrieve data from multiple tables simultaneously and can even break a table down into parts for simultaneous retrieval.  Sometime around 2010 I started hacking on mydumper too but stopped whilst working at Rackspace (Drizzle was way more than full-time for me).

    Back when Domas first blogged about it he was managing to dump his sample data over 10x faster than mysqldump!

    Since the 0.1 series Domas has fixed a lot of problems and since starting at SkySQL I have finished a lot of work that I started over a year ago.

    Today sees the first release of the 0.2 series of mydumper, called 0.2.0.  It has many changes over the last 0.1 release:

    • Better error handling

    • Many bug fixes

    • Documentation

    • Binary log dumping

    • Compression protocol support

    • Schema dumps

    My personal favourite fetaure I added here is the binary log dumping.  Mydumper can connect to a MySQL server and retrieve all the binary logs in parallel whilst also retrieving the table data.

    Many more features are already in development including additional features on the binary log dumping but for those who want to try it now you can obtain the source from Launchpad here.

    Friday 6 May 2011

    Viewing the MySQL dump import progress

    A couple of years ago I wrote a patch for the MySQL command line client which shows the progress of a import as it happens (I also created a similar patch for mysqldump which later made it into Drizzle).  I don't have the blog archives from back then but Harrison Fisk commented suggesting I use a utility called 'bar' instead.

    The 'bar' utility actually is a lot better than the patch I wrote and I highly recommend it when you are importing a large dump file.  To use it simply run:
    shell> bar -if=data.sql | mysql

    This will generate an output such as:



    If you are using Ubuntu then it is a simple case of 'sudo apt-get install bar' to install it.  Enjoy!

    Thursday 5 May 2011

    My contribution to MySQL 5.6

    [caption id="attachment_209" align="alignright" width="240" caption="Photo by Stéfan under a CC by NC SA 2.0 license"][/caption]

    If you have been reading Planet MySQL over April you will have seen many blog posts on the new features in the MySQL 5.6 (currently a development release).  I developed several patches that are in 5.6 including the 'Slave_last_heartbeat' status variable to show the time of the last replication heartbeat received.  One of the cool new features I developed which I am most proud of is the option to remotely backup your binary logs without a MySQL slave:
    Remote Binlog Back-up

    Enhances operational efficiency by using the replication channel to create real-time back-ups from the binary log.

    By adding a raw flag, the binlog is written out to remote back-up servers, without having a MySQL database instance translating it into SQL statements, and without the DBA needing SSH access to each master server.

    Here is a quick story as to why I developed it and how it can help people.

    Back then I was a MySQL Support Engineer and a customer asked if it was possible to retrieve binary logs from a remote server in real time without needing a MySQL slave using the blackhole engine.  The customer had many servers that they wanted to backup into just a few backup servers.  Unfortunately at the time there was no such tool, but within 24 hours I had hacked a patch into mysqlbinlog to provide this.  The patch had bugs and missed a lot of features back then but the proof of concept was good enough to show that a real patch could be made.

    The new 'raw mode' option to mysqlbinlog can connect to a remote MySQL server, retrieve the binary logs and can continue retrieving them until an error occurs.  So it is possible to have a backup of your binary logs up to the second that your primary data centre bursts into flames.

    You can read up more about how to use this in the MySQL manual.

    Wednesday 4 May 2011

    libeatmydata - Feed me, Seymour!

    Whilst supporting customers at SkySQL I often have to load gigabytes of SQL data into MySQL servers to run tests.  This process can be slow especially for InnoDB because in a standard dump file every insert is a transaction and every transaction has to be synchronised to disk for crash safety.  The thing is, most of the time I don't care if the machine I'm using crashes whilst I'm loading this data into the server.

    There are of course many ways around this, such as editing the SQL files and wrapping transactions around batches of inserts and editing the configuration files to disable all the syncing involved.  But I don't want one configuration to load in data and then another to play with the data, so this is where libeatmydata comes in.

    libeatmydata is a preloaded library that disables disk syncing functionality from doing just that.  The OS will decide when to sync the data to disk.  This is great for loading in an SQL dump file, taking single insert dumps on default configuration down from hours to minutes.  But you wouldn't want to do it during the production running of your server because power failure would certainly lose you some data.

    So, how do you use libeatmydata with MySQL?  Simple, this is the command to start it:
    LD_PRELOAD=/usr/lib/libeatmydata.so mysqld

    Then you can load in your dump file, shutdown mysqld safely and start it up again without libeatmydata.

    A great application I could see for this is scripting the startup of slaves, feeding a dump file into the server with libeatmydata and then restarting without this once the slave is ready.

    UPDATE

    Kristian Nielsen asked in the comments on SkySQL's blog how much faster it is, so I have run a quick benchmark to find out. In this test I am using a 218MB test file of single row inserts I had generated for an old support issue. I am also using a clean MySQL 5.1.51 installation (cleaned on each run) on my i7 based laptop:

    Vanilla MySQL 5.1.51
    real    166m19.504s
    user 0m23.891s
    sys 0m6.084s

    MySQL 5.1.51 with --sync-binlog=0 --innodb_flush_log_at_trx_commit=0
    real    5m33.578s
    user 0m11.096s
    sys 0m3.215s

    MySQL 5.1.51 with libeatmydata
    real    3m14.123s
    user 0m10.932s
    sys 0m3.108s