Bluetrait

Loading
search


Posts:
Comments:

Popular posts

Click here if you are looking for Bluetrait, the weblog software.

Login:

Username:

Password:

Unicode Character Set

Posted by Michael Dale on Sat, 26 Mar 2005 10:37 PM

UTF-8 is an encoding method that allows you to mix languages and scripts within a single document without needing to switch between different character sets.
Everything is moving away from the old standard ISO-8859-1 to UTF-8.

UTF-8 is great because it allows you to use a wider range of characters. For example Greek:

Τη γλώσσα μου έδωσαν ελληνική
το σπίτι φτωχικό στις αμμουδιές του Ομήρου.
Μονάχη έγνοια η γλώσσα μου στις αμμουδιές του Ομήρου.
από το Άξιον Εστί
του Οδυσσέα Ελύτη

But sometimes there can be problems in the transition from ISO-8859-1 to UTF-8.

There are a whole range of examples:

  • Inputting UTF-8 data into a Mysql 4.0 database while the html form is set to use ISO-8859-1.
  • Reading UTF-8 data out of a database onto a ISO-8859-1 html page.
  • Sending an email where the inputted data is using UTF-8 but is transmitted as ISO-8859-1.

The list could go on.

If you're planning on using UTF-8 (which you should) there is a simple way to set your website to this character set.
Using php:
<?php header('Content-Type: text/html; charset=UTF-8'); ?>

Now I could go into the Content-Type and using application/xhtml+xml if you're using XHTML 1.1 or higher. But I won't because IE is crap and doesn't support it.
I might talk about it later, because it isn't really related to the character set.

One thing to note. Mysql 4.0 doesn't *really* support UTF-8. So that Greek mightn't work. That is one of the reasons I am looking to moving to Mysql 4.1 (and also for sub-query support).

On Sun, 27 Mar 2005 at 2:55 AM, Matthom wrote:

Using php:
<?php header('Content-Type: text/html; charset=UTF-8'); ?>

Does that PHP print out this?:

Just curious... Just trying to figure out the difference of declaring that element through PHP, or just writing the HTML out.


On Sun, 27 Mar 2005 at 2:57 AM, Matthom wrote:

Does that PHP print out this?:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

That is what I meant. Forgot the damn encoding...


On Sun, 27 Mar 2005 at 8:20 AM, Michael Dale wrote:

Header information is a bit different to meta data. The meta stuff just states what the site *should* be, but doesn't set it to that. (meta: data about data).

To get browsers to use UTF-8 you must send out a different header telling the browser to switch to that mode. The PHP code above does that. But you need to remember that header data *must* be outputted before any html.

So you should use both the PHP code and the META html code.


On Sun, 27 Mar 2005 at 10:36 AM, Josh Street wrote:

And, just for the record, the "META html code" dale mentions above should actually be "meta HTML code", if you're using XHTML...

<pedanticism />