I've noticed that Wikipedia's home page loads very slow and always involves my PC banging on the drive. I am fairly positive that this is due to it loading up a number of different character sets for their multilingual interface. Just an observation that this post reminded me of!
J.Ja
Discussion on:
View:
Show:
The UTF-8 encoding does not handle well most character based languages, such as Chinese, Japanese and Korean.
The support for these character sets is improved under UTF-16
If you find you need to support the Asia - Pacific people's character sets, then using either language specific encoding or UTF-16 is going to make the sites display better.
[ most browsers will handle UTF-16, just not as well as UTF-8 ]
Other than that, I myself use UTF-8 as my default character set, even for my systems, rather than any regional or language based encoding. I find that when I get an email on a list from regions using non-latin based characters the support for displaying them is better.
[ UTF-8 is displaying Japanese correctly in these emails. ]
The support for these character sets is improved under UTF-16
If you find you need to support the Asia - Pacific people's character sets, then using either language specific encoding or UTF-16 is going to make the sites display better.
[ most browsers will handle UTF-16, just not as well as UTF-8 ]
Other than that, I myself use UTF-8 as my default character set, even for my systems, rather than any regional or language based encoding. I find that when I get an email on a list from regions using non-latin based characters the support for displaying them is better.
[ UTF-8 is displaying Japanese correctly in these emails. ]
Would this make sense then:
To make sure that the page works well, according to what was just written, you could use two character sets ie:
<meta http-equiv="content-type" content="text/html; charset=UTF-16">
<meta http-equiv="content-type" content="text/html; charset=euc-kr">
That would ensure that both Korean and English on the website works well.
To make sure that the page works well, according to what was just written, you could use two character sets ie:
<meta http-equiv="content-type" content="text/html; charset=UTF-16">
<meta http-equiv="content-type" content="text/html; charset=euc-kr">
That would ensure that both Korean and English on the website works well.
I tested your code and found that the first META character set denoted is the one displayed. I am fairly certain that in HTML you can only use one character set at a time, but seeing as how I had never tested this before I wanted to find out. I only tested the method you used, but would assume (I know, I know) that this rule applies to other forms of specifying the character set.
only one charset can be used, but utf-16 will display the majority of languages that utf-8 is missing, so by using utf-16 you take the coverage from 80% of languages to 95%.
it's not a huge issue for most sites, just those that need the charset support that is missing from utf-8
it's not a huge issue for most sites, just those that need the charset support that is missing from utf-8
In Firefox under View > Character Encoding, you have lots of options. My question is: Does this setting over ride the settings in the HTML code?
you can go into the edit settings area and force firefox to use your preferred character encoding. This will not fix badly done pages so they display the characters properly. only fixing the pages will.
edited to add an e to encoding
edited to add an e to encoding
I always make my web pages on a server that specifies the character encoding as UTF-8. This ensures that characters are correctly displayed to users, at least in the English language. Any symbols or special characters can be denoted using the ASCII representation in the HTML, as they usually occur very infrequently. Of course, I'm one of those people who think that communication is very important and underappreciated, and that we should take the extra time and effort to guarantee that the ideas represented are properly conveyed to the user.
why not support more than english speakers if it will improve the return on investment?
[ it can't hurt the return on investment. ]
[ it can't hurt the return on investment. ]
Additionally, there are reported rendering problems when using UTF-8, such as "?", "?" and "-" being rendered as "�".
These reports come, not from unknowledgable and/or inexperienced users, but from site designers themselves.
These reports come, not from unknowledgable and/or inexperienced users, but from site designers themselves.
Yes, there are many characters that can be represented by Unicode/UCS and be represented by UTF-8, such as maths and scientific symbols and notation, intermediate and advanced punctuation, diacritical marks such as accents and phonetic symbols, music symbols, currency and so on.
The question is then whether to enter these in your HTML document UTF-8 encoded, or use HTML entities (and should these use decimal/hex or labels).
For example, division symbol: ÷
and em dash: —
The question is then whether to enter these in your HTML document UTF-8 encoded, or use HTML entities (and should these use decimal/hex or labels).
For example, division symbol: ÷
and em dash: —
I am at a loss as to what advantages UTF-8 offers for an English language only site that needs nothing beyond the characters found on a standard keyboard.
Keyboards are configurable input devices, and are not themselves standardized in respect to languages; they have variable mappings which do that job. My Mac keyboard has a slightly different configuration to my PC one. Some keys are in different places, and there's a Euro symbol on the latter.
US keyboard configuration is different from UK, and EN-US is different from EN-GB.
And you have the issue of extensibility: a Euro symbol was added recently, and like the GB pound symbol it doesn't fit into lower ASCII, but it can easily be represented in UTF-8 based on a standard Unicode/UCS code.
Your website may, if you are the only person writing to it, accept only content written by you, but if that changes, or if you decide one day to add a price or some extended punctuation or a word in some other language or a technical quote or an equation, then maybe you would be better off future-proofing your site with UTF-8 rather than a tiny Latin subset. Or not, it's entirely up to the web author, of course.
US keyboard configuration is different from UK, and EN-US is different from EN-GB.
And you have the issue of extensibility: a Euro symbol was added recently, and like the GB pound symbol it doesn't fit into lower ASCII, but it can easily be represented in UTF-8 based on a standard Unicode/UCS code.
Your website may, if you are the only person writing to it, accept only content written by you, but if that changes, or if you decide one day to add a price or some extended punctuation or a word in some other language or a technical quote or an equation, then maybe you would be better off future-proofing your site with UTF-8 rather than a tiny Latin subset. Or not, it's entirely up to the web author, of course.
of the 8 or 9 variants of English?
[ some require the French accented e ]
which of the 10 different latin character based keyboards?
and don't forget, character encoding can cause glitches in how the site displays, look at how often TR blogs / articles have ? instead of " in them.
[ the MS Word smart quote doesn't fit into iso-98859-1 charset, yet it does fit into the UTF-8 ]
I'll let the previous post about the Euro and GB Pound symbols stand for itself.
You seem to forget, that if someone in..say Taiwan is viewing your english site it won't display right, unless they have their browser configured to use tf-8 only. the Taiwanese charset they would use by default doesn't support the ascii encoding.
UTF-8 would allow even those NON English countries to accurately view your english website.
[ some require the French accented e ]
which of the 10 different latin character based keyboards?
and don't forget, character encoding can cause glitches in how the site displays, look at how often TR blogs / articles have ? instead of " in them.
[ the MS Word smart quote doesn't fit into iso-98859-1 charset, yet it does fit into the UTF-8 ]
I'll let the previous post about the Euro and GB Pound symbols stand for itself.
You seem to forget, that if someone in..say Taiwan is viewing your english site it won't display right, unless they have their browser configured to use tf-8 only. the Taiwanese charset they would use by default doesn't support the ascii encoding.
UTF-8 would allow even those NON English countries to accurately view your english website.
And, offering content that is strictly domestic; i.e., no need for references to any non-English, non-US characters.
See, for instance, Languages Spoken in the U.S.. I presume you are talking about USA rather than the North and South American continents. The web is global, human languages are fluid and irregular, many symbols are used in 'English' communication that aren't in lower ASCII. Microsoft have developed the notion of 'culture' in their .NET framework to help developers meet the needs of their users, perhaps by reading browser settings.
I'm not saying that your intended restrictions are wrong, I'm saying that I can't see how you define them, or why you would want to.
I'm not saying that your intended restrictions are wrong, I'm saying that I can't see how you define them, or why you would want to.
Not all web content is suitable for all citizens of the world; some content is suitable for some only.
That should be self evident.
That should be self evident.
- Keyboard Shortcuts:
- Prev
- Next
- Toggle









































