Problem reading accented characters in PHP#1
Got a strange problem in PHP land. Here(s a stripped down example:

    $handle = fopen("file.txt", "r");
while (($line = fgets($handle)) !== FALSE) {
echo $line;

As an example, if I have a file that looks like this:

Lucien Frégis

Then the above code run from the command line outputs the same name, but instead of an e acute I get :

Lucien FrÚgis

Looking at a hex dump of the file I see that the byte in question is E9, which is what I would expect for e acute in php(s default encoding (ISO-8859-1), confirmed by outputting the current value of default_charset.

Any thoughts?


As suggested, I(ve checked the windows codepage, and apparently its 850, which is obsolete (but does explane why 0xE9 is being displayed the way it is...)

posted date: 2009-04-14 05:42:00

Re: Problem reading accented characters in PHP#2
I had made out the solution of this problem. click to view my topic...

hope that hepls.

posted date: 2009-04-14 05:42:01

Re: Problem reading accented characters in PHP#3
I'm not sure how to set it (or which sets it), but what encoding is your shell/OS/terminal using?

posted date: 2009-04-14 05:46:00

Re: Problem reading accented characters in PHP#4
Currently running from a windows command prompt. Not sure how to set the encoding. I'll have a look and update the question if I find anything

posted date: 2009-04-14 05:50:00

Re: Problem reading accented characters in PHP#5
windows cli and special characters is so scary i always ignored this and hoped it would just go away by itself. gshu gshu! but i'm pretty sure it's because windows' defautl charset is not ISO-8859-1 but CPsomething (CP850, it think, at least in the german version).

posted date: 2009-04-14 06:02:00

Re: Problem reading accented characters in PHP#6
The accent might be considered unicode data and you will have to store it as such. Take a look at utf_decode, utf_encode, and iconv functions.No wait, it is in the ISO 8859-1 charset. I don(t know. Have you tried reading in binary mode or using file_get_contents?

posted date: 2009-04-14 06:13:00

Re: Problem reading accented characters in PHP#7
0xE9 is the encoding for é in iso-8859-1. It(s also the unicode codepoint for the same character. If your console interprets output in a different encoding (Such as cp-850), then the same byte will translate to a different codepoint, thus displaying a different character on screen. If you look at the code page for cp-850, you can see that the byte 0xE9 translates to Ú (Unicode codepoint 0xDA). So basically your console interprets the bytes wrongly. I(m not sure how, but you should change the charset of your console to iso-8859-1.

posted date: 2009-04-14 06:25:00

Re: Problem reading accented characters in PHP#8
Before running your php on the command line, try executing the command:chcp 1252This will change the codepage to one where the accented characters are as you expect.See the following links for the difference between the 850 and 1252 codepages:http://en.wikipedia.org/wiki/Code_page_850http://en.wikipedia.org/wiki/Windows-1252

posted date: 2009-04-14 06:48:00

