A community in which webmasters can ask for help with topics such as PHP coding , MySQL , IT jobs, web design, IT security.
Current location:homephp forumphp talk in 2008 yearWhat is the best collation to use for MySQL with PHP? - page 1
User InfoPosts
What is the best collation to use for MySQL with PHP?#1
I'm wondering if there is a "best" choice for collation in MySQL for a general website where you aren't 100% sure of what will be entered? I understand that all the encodings should be the same, such as MySQL, Apache, the HTML and anything inside PHP.

In the past I have set PHP to output in "UTF-8", but which collation does this match in MySQL? I'm thinking it's one of the UTF-8 ones, but I have used utf8_unicode_ci, utf8_general_ci, and utf8_bin before.

posted date: 2008-12-14 23:48:00


Re: What is the best collation to use for MySQL with PHP?#2
I had made out the solution of this problem. click to view my topic...

hope that hepls.

posted date: 2008-12-14 23:48:01


Re: What is the best collation to use for MySQL with PHP?#3
For UTF-8 textual information, you should use utf8_general_ci because...utf8_bin: compare strings by thebinary value of each character inthe stringutf8_general_ci: compare stringsusing general language rules andusing case-insensitive comparisonsa.k.a. it will should making searching and indexing the data faster/more efficient/more useful.

posted date: 2008-12-14 23:55:00


Re: What is the best collation to use for MySQL with PHP?#4
The main difference is sorting accuracy (when comparing characters in the language) and performance. The only special one is utf8_bin which is for comparing characters in binary format.utf8_general_ci is somewhat faster than utf8_unicode_ci, but less accurate (for sorting). The specific language utf8 encoding (such as utf8_swedish_ci) contain additional language rules that make them the most accurate to sort for those languages. Most of the time I use utf8_unicode_ci (I prefer accuracy to small performance improvements), unless I have a good reason to prefer a specific language.You can read more on specific unicode character sets on the MySQL manual - http://dev.mysql.com/doc/refman/5.0/en/charset-unicode-sets.html

posted date: 2008-12-14 23:58:00


Re: What is the best collation to use for MySQL with PHP?#5
Actually, you probably want to use utf8_unicode_ci or utf8_general_ci.utf8_general_ci sorts by stripping away all accents and sorting as if it were ASCIIutf8_unicode_ci uses the Unicode sort order, so it sorts correctly in more languagesHowever, if you are only using this to store English text, these shouldn't differ.

posted date: 2008-12-15 00:02:00


Re: What is the best collation to use for MySQL with PHP?#6
Collations affect how data is sorted and how strings are compared to each other. That means you should use the collation that most of your users expect.Example from the documentation:utf8_general_ci also is satisfactory for both German and French, except that ‘ß’ is equal to ‘s’, and not to ‘ss’. If this is acceptable for your application, then you should use utf8_general_ci because it is faster. Otherwise, use utf8_unicode_ci because it is more accurate.So - it depends on your expected user base and on how much you need correct sorting. For an English user base, utf8_general_ci should suffice, for other languages, like Swedish, special collations have been created.

posted date: 2008-12-15 00:04:00


Re: What is the best collation to use for MySQL with PHP?#7
Be very, very aware of this problem that can occur when using utf8_general_ci.MySQL will not distinguish between some characters in select statements, if the utf8_general_ci collation is used. This can lead to very nasty bugs - especially for example, where usernames are involved. Depending on the implementation that uses the database tables, this problem could allow malicious users to create a username matching an administrator account.This problem exposes itself at the very least in early 5.x versions - I'm not sure if this behaviour as changed later.I'm no DBA, but to avoid this problem, I always go with utf8-bin instead of a case-insensitive one.The script below describes the problem by example.-- first, create a sandbox to play inCREATE DATABASE `sandbox`;use `sandbox`;-- next, make sure that your client connection is of the same -- character/collate type as the one we're going to test next:charset utf8 collate utf8_general_ci-- now, create the table and fill it with valuesCREATE TABLE `test` (`key` VARCHAR(16), `value` VARCHAR(16) ) CHARACTER SET utf8 COLLATE utf8_general_ci;INSERT INTO `test` VALUES ('Key ONE', 'value'), ('Key TWO', 'valúe');-- (verify)SELECT * FROM `test`;-- now, expose the problem/bug:SELECT * FROM test WHERE `value` = 'value';---- Note that we get BOTH keys here! MySQLs UTF8 collates that are -- case insensitive (ending with _ci) do not distinguish between -- both values!---- collate 'utf8_bin' doesn't have this problem, as I'll show next:---- first, reset the client connection charset/collate typecharset utf8 collate utf8_bin-- next, convert the values that we've previously inserted in the tableALTER TABLE `test` CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin;-- now, re-check for the bugSELECT * FROM test WHERE `value` = 'value';---- Note that we get just one key now, as you'd expect.---- This problem appears to be specific to utf8. Next, I'll try to -- do the same with the 'latin1' charset:---- first, reset the client connection charset/collate typecharset latin1 collate latin1_general_ci-- next, convert the values that we've previously inserted-- in the tableALTER TABLE `test` CONVERT TO CHARACTER SET latin1 COLLATE latin1_general_ci;-- now, re-check for the bugSELECT * FROM test WHERE `value` = 'value';---- Again, only one key is returned (expected). This shows -- that the problem with utf8/utf8_generic_ci isn't present -- in latin1/latin1_general_ci---- To complete the example, I'll check with the binary collate-- of latin1 as well:-- first, reset the client connection charset/collate typecharset latin1 collate latin1_bin-- next, convert the values that we've previously inserted in the tableALTER TABLE `test` CONVERT TO CHARACTER SET latin1 COLLATE latin1_bin;-- now, re-check for the bugSELECT * FROM test WHERE `value` = 'value';---- Again, only one key is returned (expected).---- Finally, I'll re-introduce the problem in the exact same -- way (for any sceptics out there):-- first, reset the client connection charset/collate typecharset utf8 collate utf8_generic_ci-- next, convert the values that we've previously inserted in the tableALTER TABLE `test` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;-- now, re-check for the problem/bugSELECT * FROM test WHERE `value` = 'value';---- Two keys.--DROP DATABASE sandbox;

posted date: 2010-06-13 03:02:00


Re: What is the best collation to use for MySQL with PHP?#8
small performance improvements ? are you sure about this ? publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp?topic=/… The collation you choose can significantly impact the performance of queries in the database.

posted date: 2010-08-06 23:54:00


Re: What is the best collation to use for MySQL with PHP?#9
This is for DB2 not MySQL. Also, there are no concrete numbers or benchmarks so you are just basing it on the opinion of the writer.

posted date: 2010-08-09 04:14:00


Re: What is the best collation to use for MySQL with PHP?#10
Essentially, it depends on how you think of a string.I always use utf8_bin because of the problem highlighted by Guus. In my opinion, as far as the database should be concerned, a string is still just a string. A string is a number of UTF-8 characters. A character has a binary representation so why does it need to know the language you're using? Usually, people will be constructing databases for systems with the scope for multilingual sites. This is the whole point of using UTF-8 as a character set. I'm a bit of a pureist but I think the bug risks heavily outweigh the slight advantage you may get on indexing. Any language related rules should be done at a much higher level than the DBMS.In my books "value" should never in a million years be equal to "valúe".If I want to store a text field and do a case insensitive search, I will use MYSQL string functions with PHP functions such as LOWER() and the php function strtolower().

posted date: 2010-12-06 17:42:00


Re: What is the best collation to use for MySQL with PHP?#11
Note that if you want to use functions, there is a bug in MySQL (most currently distributed versions) where functions always return the string using utf8_general_ci, causing problems if you're using another collation for your strings - see bugs.mysql.com/bug.php?id=24690

posted date: 2011-02-09 02:49:00


Re: What is the best collation to use for MySQL with PHP?#12
-1: This is surely remedied by applying a unique key to the relevant column. You would see the same behaviour if the two values were 'value' and 'valUe'. The whole point of a collation is that it provides rules for (among other things) when two strings are considered equal to one another.

posted date: 2011-06-09 02:26:00


select page: « 1 2...»
Copyright ©2008-2017 www.momige.com, all rights reserved.