Unicode & UTF-8 & ICU

How to store unicode characters
The most common way of storing unicode characters is to use UTF-8 which uses up to 4 bytes for a single unicode character
For example
$ cat unicodecharacters.txt
�
$ od -t c -t x1 uni
0000000  303 251  342 202 254  343 227 212  360 235 234 234  \n
  	    c3 a9  e2 82 ac  e3 97 8a   f0 9d 9c 9c   0a

NOTE od does not correct display unicode so ignore the first line of decimal numbers
but you can see
U00E9 (é) is represented in UTF-8 by the 2 bytes C3 A9
U20AC (€) is represented in UTF-8 by the 3 bytes E2 82 AC
U35CA (?) is represented in UTF-8 by the 3 bytes E3 97 8A
U1D71C is represented in UTF-8 by the 4 bytes F0 9D 9C 9C
Finally <NL> is represented by the single byte 0A


REFERRERS
PhpUnicode
Valid XHTML :: Valid CSS: :: Powered by WikkaWiki