Monday, December 31, 2007

non-UTF8 characters in XML

Have you ever used an Ajax tool which is very sensitive to the non-UTF characters in the XML data which is coming back from the server-side script??? I mean those tools which are using the responseXML field of XMLHttpRequest instead of its responseText! Taconite is one of this kind of tools.
It is a big headache when you couldn't see the result due to only one invalid character which is there because you (or the operator) have copy paste the text in your DB from MS Word or any other applications which are not supporting Unicode by default!
Some standard browsers are not so sensitive to these invalid characters and do the correction automatically, Firefox, Opera and Safari do this but the most popular one (it's a pity) which is MS IE doesn't do that! And is 100% sensitive to all kind of these characters! :(
In the company everyday one of us reports the others that
Oh shit! search results of the ... part of the portal are not coming in IE!!! Ehsuuuuuuuuuuuuuun!!!
- Oh who entered data for ...part??? Have you copied them from Word?!!! Oh one of you should go through the records and find that invalid character which is 99% a comma!

Tiered of this lengthy solution, I've decided to find a way that convert invalid characters in an string to their corresponding UTF8 character! After some research in the PHP resources I found this nice function here!

$text = iconv("UTF-8","UTF-8//IGNORE",$text);

No comments: