A Simple ISAPI Filter for Authentication on IIS
Installing Xdebug for use with Eclipse or Netbeans on Linux
Getting Set up with Ogre 3D on Ubuntu
ENUMs, User Preferences, and the MySQL SET Datatype
Nice n' Easy JQuery Image Rotator
Using PHP pspell Spell Check Functions with a Custom Dictionary

Using Multi-Byte Character Sets in PHP (Unicode, UTF-8, etc)

Wednesday, 15 October 08, 9:15 am
The following list details the PHP string functions which could cause problems when handling multi-byte strings. The multi-byte safe alternative is given when available:

Try mb_send_mail() instead.


Try mb_strlen() instead.


Try mb_strpos() instead.


Try mb_strrpos() instead.


Try mb_substr() instead.


Try mb_strtolower() instead.


Try mb_strtoupper() instead.


Try mb_substr_count() instead.


Try mb_ereg() instead.


Try mb_eregi() instead.


Try mb_ereg_replace() instead.


Try mb_eregi_replace() instead.


To avoid having to recompile php with the PCRE UTF-8 flag enabled, you can just add the following sequence at the start of your pattern: (*UTF8) e.g. '/(*UTF8)[[:alnum:]]/' will return true for 'é' where '/[[:alnum:]]/' will return false. Also the /u RegEx option provides UTF-8 awareness. The preg_* functions are contentious, because careful use can be safe. If you are unsure what to do, see mb_eregi() as a possible replacement.


Please investigate the /u option, as that provides UTF-8 awareness. The preg_* functions are contentious, because careful use can be safe. If you are unsure what to do, see mb_ereg_replace() as a possible replacement.


Try mb_split() instead.


Try mb_split() instead.


Try mb_stripos() instead.


Try mb_stristr() instead.


Try mb_strrchr() instead.


Try mb_strripos() instead.


Try mb_strstr() instead.


View comments for possible workarounds.


View comments for possible workarounds.


No known workarounds yet.


View the comment posted on "11-Feb-2008 04:31" for a possible workaround.


This function is flagged because its companion function (ucfirst) is not safe. However, this function is untested.


May be multi-byte safe if you use UTF-8 only (multi-byte UTF-8 characters contain no byte sequences that resemble white space). Avoid UTF-16 & UTF-32, among others.


It may be multi-byte safe if you use UTF-8 only (multi-byte UTF-8 characters contain no byte sequences that resemble less-than or greater-than symbols). Avoid UTF-16 & UTF-32, among others.


Try this code instead:
$str = mb_convert_case($str, MB_CASE_TITLE, "UTF-8");

 
Leave Comment

/xkcd/ Message in a Bottle

About This Page