Drizzle supports one character set (unless you include binary) which is UTF8. It is the character set used by most of the web and supporting many different character sets can lead to complications. That is not to say that there are not advantages of supporting many different types of character sets like MySQL does, but more care is needed when using them.
As an example of this, a new Drizzle user came online today saying that drizzledump's MySQL to Drizzle conversion was turning 'è' to 'è'. When drizzledump connects to a MySQL server it sets the connection to UTF8 so that the dump output is compatible with Drizzle. After a bit of discussion it was discovered that the user's table was latin1 and connection was latin1 (PHP does this by default) but they were storing and retrieving UTF8 data. Essentially their data was getting mangled but it happened to work. The problem came when telling MySQL to export this data as UTF8, it was effectively doing a double UTF8 conversion of the data.
With this in mind we have added a new option to drizzledump so that it stops setting the character set for the connection in these situations, '--my-data-is-mangled'.
[...] This post was mentioned on Twitter by drizzlenews, Andrew Hutchings. Andrew Hutchings said: MySQL to Drizzle character set considerations http://j.mp/gx4O81 [...]
ReplyDeleteThat is appropriately named.
ReplyDeleteThat sounds like a killer feature to me. Unmangling data in MySQL is quite a challenge.
ReplyDeleteJust out of curiosity what happens when you try to dump data where the MySQL table or column charset differs from the base charset? MySQL permits this and it leads to some rather ugly problems with replication at least.
ReplyDeleteWhen a table or column differs from the connection character set, that is fine. MySQL should convert the data. The problem comes when the character set for the data differs from the character set for the connection. In the case I outlined in my blog post the default mysqldump should also give a bad dump (I think, judging by the manual, I haven't tested it).
ReplyDeleteIt has been a while since I have worked on MySQL and can't remember about character set issues in replication unfortunately.
[...] View full post on Planet Drizzle [...]
ReplyDeleteA few years ago I wrote a blog post on character set settings: http://www.alberton.info/dbms_charset_settings_explained.html
ReplyDelete[...] has added –my-data-is-mangled for handling certain MySQL charset migration [...]
ReplyDelete