http://decodeunicode.org/
http://icu.sourceforge.net/download/
http://demo.icu-project.org/icu-bin/icudemos
http://icu.sourceforge.net/download/
http://demo.icu-project.org/icu-bin/icudemos
Unicode & UTF-8 & ICU
How to store unicode characters
The most common way of storing unicode characters is to use UTF-8 which uses up to 4 bytes for a single unicode characterFor example
$ cat unicodecharacters.txt � $ od -t c -t x1 uni 0000000 303 251 342 202 254 343 227 212 360 235 234 234 \n c3 a9 e2 82 ac e3 97 8a f0 9d 9c 9c 0a
NOTE od does not correct display unicode so ignore the first line of decimal numbers
but you can see
U00E9 (é) is represented in UTF-8 by the 2 bytes C3 A9
U20AC (€) is represented in UTF-8 by the 3 bytes E2 82 AC
U35CA (?) is represented in UTF-8 by the 3 bytes E3 97 8A
U1D71C is represented in UTF-8 by the 4 bytes F0 9D 9C 9C
Finally <NL> is represented by the single byte 0A
REFERRERS
PhpUnicode