[Solar-talk] Unicode

Antti Holvikari anttih at gmail.com
Thu Jul 19 12:33:25 CDT 2007


Hi All!

I've been researching how we could have *complete* unicode support in
our Solar apps. It seems this is almost impossible to accomplish.
Here's the stuff I found out (mainly for mysql, but I'm sure others
will behave like this too), and some points that need answers:

* We need to have all our data in UTF-8 because all Solar files are in
UTF-8 (locale files and PHP code files).

* We need to set Content-type charset=utf-8 in our HTML and send UTF-8
headers along with the HTTP response. This is for the browser to
display the page in UTF8.

* You can have your data in *any* character set in you (mysql)
database as long as your character set in you columns matches the
actual character encoding your data is stored in. Then you need to set
UTF-8 encoding for your database connection so that the data you get
from your queries will be converted UTF-8 for you. "SET CHARACTER SET
utf8". Now we have data in UTF8 format in our code, no matter what the
character encoding is in our database.

* Now, we have all the files and data in UTF8 but str -functions won't
work for multibyte strings. What can we do? We can use mb_ functions
for string operations. *But* multibyte extension is not enabled by
default :-(. There is a ini setting called "mbstring.func_overload"[1]
which will overload a set of functions so that they work with unicode.
Does someone know if this setting is enough for PHP to handle *all*
str operations? I doubt it.

* Paul pointed out on #solarphp that /u could (maybe) be used with
preg so it would handle unicode. Is this true?

I'd love to see a PHP unicode survival guide somewhere...

[1] http://docs.php.net/manual/en/ref.mbstring.php#mbstring.overload

Ps. Ohh, we love you already, PHP 6

-- 
Antti Holvikari


More information about the Solar-talk mailing list