Another consideration when developing your web documents is character encoding; for example, when including Chinese characters on your web documents, are you entirely sure they will render on viewers’ browsers? Have you visited websites that do not render all the characters correctly? Maybe you have seen little boxes, or long dashes, or strings of question marks or wing dings that show up rendered on the screen. This is because the developer failed to ensure that the document had declared any character encoding for the browser.

With the global nature of the Internet and websites that continue to grow a worldwide following, it behooves web developers to recognize these issues and address their content to international audiences. It is considered a best practice that the character encoding be properly set at the server level, either with a default setting that the authors can override or on a per-document basis, and that it is also available at the individual document level for both the XML declaration if applicable and the meta element, and for standalone use as well.

What is character encoding? Character encoding tells the browser and validation tools what set of characters to use when converting a sequence of bits to bytes into a sequence of characters.

Why do I need character encoding in my html documents? The character encoding is included in the HTML5.2 specification. If not declared in the document, the browser will guess as to what character set to use and could select the wrong one based on your content. And the third reason is that the visitor may have updated their default character encoding in the browser and it does not match the intended encoding for the web document.

How do you choose the correct character encoding? Authoring tools (e.g., Dreamweaver) will encode HTML documents in the character encoding of their choice, and the choice largely depends on the conventions used by that particular system software. The most common character encodings for the web include: ISO-8859-1, ISO-8859-5, and UTF-8. For further reading on character encoding read the HTML Document Representation: Choosing an encoding.

According to the W3C, there are several steps that a browser will take to render the character encoding of a resource:

  1. The HTTP Content-Type header sent by the server, such as an HTTP “charset” parameter in a “Content-Type” field.
  2. The HTML/XHTML metadata element declaration with “http-equiv” set to “Content-Type” and a value set for charset.
  3. The charset attribute set on an element that designates an external resource.
  4. Other ways, such as there are certain algorithms that user agents utilizing deduction to get the correct the character encoding, for example, the HTTP Content-Type header has precedence, and is the easiest information to retrieve, therefore, it is almost always the preferred method for providing the character encoding for any (X) HTML document.

How do you specify the character encoding? Specify encoding using several examples given here:

Within the <head> element:

Content-Type: text/html; charset=ISO-8859-1
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

HTML5 Syntax:

<meta charset="utf-8">

XHMTL Syntax:

<?xml version="1.0" encoding="ISO-8859-1"?>

Click for a more complete list of XML and HTML character entity references.

Would you like to share any examples or special situations for character encoding that your organization’s web documents use today?

Download the PDF version of this tip here: How To Specify Character Encoding For Web Content