Skip to main content

Character Encodings in .htaccess

When a browser receives a text file from your server, it needs to know which character encoding was used to convert human-readable characters into bytes. Without this information, the browser has to guess, and guessing often goes wrong. Accented characters turn into garbled symbols, quotation marks become question marks, and emoji disappear entirely.

This guide explains what character encoding means, why UTF-8 is the universal standard, and how to configure Apache to serve your files with the correct encoding using the AddDefaultCharset and AddCharset directives. We will also clarify the relationship between server-side charset headers, HTML meta tags, and byte order marks (BOM).

What Character Encoding Means

A character encoding is a system that maps characters (letters, numbers, symbols, emoji) to specific byte sequences that computers can store and transmit. When you type the letter "é", your text editor stores it as one or more bytes according to the encoding in use. When a browser receives those bytes, it must use the same encoding to convert them back into the correct character.

Consider the character é (e with acute accent):

EncodingByte RepresentationResult if Read Wrong
UTF-80xC3 0xA9Correct: é
ISO-8859-10xE9Correct: é
UTF-8 bytes read as ISO-8859-10xC3 0xA9Garbled: é
ISO-8859-1 bytes read as UTF-80xE9Error or: �

The last two rows show what happens when encodings are mismatched. The server sends bytes using one encoding, but the browser interprets them using a different one. The result is mojibake, the common term for garbled text caused by encoding mismatches.

This is not just a cosmetic issue. Broken characters in form submissions can corrupt database records. Broken characters in JavaScript or JSON can cause parsing errors that break your application. Broken characters in filenames can make downloads fail.

UTF-8 and Why It Matters

UTF-8 (Unicode Transformation Format, 8-bit) is the dominant character encoding on the web. As of today, over 98% of all web pages use UTF-8, and every modern browser, operating system, programming language, and database supports it fully.

UTF-8 has several key advantages:

  • Universal coverage: It can represent every character in the Unicode standard, which includes virtually every writing system on Earth, plus thousands of symbols, mathematical notation, and emoji.
  • Backward compatible with ASCII: The first 128 characters in UTF-8 are identical to ASCII. This means plain English text is byte-for-byte the same in both encodings.
  • Variable width: Common characters (Latin alphabet, digits, basic punctuation) use just one byte, while less common characters use two, three, or four bytes. This keeps file sizes efficient for most content.
  • Self-synchronizing: If a byte is lost or corrupted during transmission, the decoder can recover at the next character boundary without losing the rest of the data.
tip

Unless you have a very specific legacy requirement, always use UTF-8 for all text-based content on the web. This includes HTML, CSS, JavaScript, JSON, XML, Markdown, and any other text format your server delivers.

How the Browser Determines the Encoding

The browser uses multiple signals to determine the encoding of a response, checked in this priority order:

  1. HTTP Content-Type header (highest priority)
  2. BOM (Byte Order Mark) at the beginning of the file
  3. HTML <meta> tag (for HTML documents only)
  4. Browser heuristics and guessing (unreliable, last resort)

The most reliable method is the HTTP header, which is what we configure through .htaccess. When the server sends:

Content-Type: text/html; charset=utf-8

The browser knows immediately and unambiguously that the content is HTML encoded in UTF-8. No guessing needed.

The AddDefaultCharset Directive

The AddDefaultCharset directive sets the character encoding for all responses that have a text/html or text/plain content type. It is the simplest way to ensure your HTML pages are served with the correct encoding.

The syntax is:

AddDefaultCharset charset

The most common and recommended usage:

AddDefaultCharset utf-8

When this directive is active, Apache adds charset=utf-8 to the Content-Type header for any response served as text/html or text/plain. The header looks like this:

Content-Type: text/html; charset=utf-8

What AddDefaultCharset Does NOT Cover

Despite its name, AddDefaultCharset only applies to text/html and text/plain responses. It does not affect:

  • CSS files (text/css)
  • JavaScript files (text/javascript)
  • JSON files (application/json)
  • XML files (application/xml)
  • Any other media types

For those file types, you need the AddCharset directive (covered in the next section).

Common Mistake: Assuming AddDefaultCharset Covers Everything

Wrong assumption:

# "This will set UTF-8 for all my text files, right?"
AddDefaultCharset utf-8

A developer adds this line and assumes CSS, JavaScript, and JSON files are covered. But when an international user reports broken characters in a CSS file that contains comments with accented characters, or a JSON API response with Unicode data displays incorrectly, the issue is that AddDefaultCharset never applied to those file types.

Correct approach:

Use AddDefaultCharset for HTML and plain text, and AddCharset for everything else:

AddDefaultCharset utf-8

<IfModule mod_mime.c>
AddCharset utf-8 .css .js .json
</IfModule>

Wrapping in IfModule

While AddDefaultCharset is a core Apache directive (it does not require any extra module), some administrators still wrap it in a module check for consistency:

<IfModule mod_mime.c>
AddDefaultCharset utf-8
</IfModule>

Technically, AddDefaultCharset works without mod_mime, but if your .htaccess file groups all charset-related directives together under one <IfModule mod_mime.c> block, it is perfectly fine and keeps your configuration organized.

Setting Charset per Media Type

The AddCharset directive from mod_mime lets you attach a character encoding to specific file extensions, regardless of their media type. This is how you ensure that CSS, JavaScript, JSON, and other text-based files include the charset parameter in their Content-Type header.

The syntax is:

AddCharset charset extension [extension] ...

text/html

HTML files are already covered by AddDefaultCharset, but you can also explicitly set the charset using AddCharset for completeness:

<IfModule mod_mime.c>
AddCharset utf-8 .html .htm
</IfModule>

With both AddDefaultCharset utf-8 and AddCharset utf-8 .html in place, the result is the same. The AddCharset directive takes precedence if both are present.

text/plain

Plain text files (.txt, .text, .log) are covered by AddDefaultCharset, but if you serve other text-like formats with custom extensions, you should list them explicitly:

<IfModule mod_mime.c>
AddCharset utf-8 .txt .text .log
</IfModule>

text/css

CSS files can contain non-ASCII characters in comments, content properties (content: "→"), or font-family names. Without the correct charset, these characters may render incorrectly.

<IfModule mod_mime.c>
AddCharset utf-8 .css
</IfModule>

This produces the header:

Content-Type: text/css; charset=utf-8

application/javascript (text/javascript)

JavaScript files frequently contain string literals with Unicode characters, especially in internationalized applications. ES modules (.mjs files) are equally affected.

<IfModule mod_mime.c>
AddCharset utf-8 .js .mjs
</IfModule>

This produces:

Content-Type: text/javascript; charset=utf-8

application/json

JSON is defined by its specification (RFC 8259) to use UTF-8 as the default encoding. However, Apache does not add the charset parameter automatically, and some parsers may behave unexpectedly without it.

<IfModule mod_mime.c>
AddCharset utf-8 .json .map .topojson .geojson .jsonld
</IfModule>

The .map extension covers JavaScript and CSS source map files, which are JSON formatted. Including the charset ensures development tools can read them correctly.

Comprehensive AddCharset Configuration

Here is a complete configuration covering all common text-based file types that benefit from an explicit UTF-8 charset declaration:

Complete charset configuration
<IfModule mod_mime.c>

# Set default charset for text/html and text/plain
AddDefaultCharset utf-8

# Set charset for specific file types
AddCharset utf-8 .appcache \
.bbaw \
.css \
.htc \
.ics \
.js \
.json \
.manifest \
.map \
.markdown \
.md \
.mjs \
.topojson \
.vtt \
.vcard \
.vcf \
.webmanifest \
.xloc

</IfModule>

Each extension in this list represents a text-based format that may contain non-ASCII characters:

ExtensionFormatWhy Charset Matters
.cssStylesheetsContent properties, comments, font names
.js / .mjsJavaScript / ES ModulesString literals, template literals
.jsonJSON dataInternational content, API responses
.mapSource mapsJSON format with file paths
.md / .markdownMarkdownInternational text content
.vttWebVTT subtitlesSubtitle text in any language
.vcard / .vcfContact cardsNames and addresses in any script
.webmanifestWeb App ManifestApp name and descriptions
.icsCalendar eventsEvent titles and descriptions
note

The backslash (\) at the end of each line is a line continuation character. It lets you split a long directive across multiple lines for readability. Apache reads the entire block as a single AddCharset directive.

Charset vs Meta Tag vs BOM

There are three ways to declare the character encoding of a document. Understanding their relationship and priority is important for avoiding conflicts.

HTTP Content-Type Header (Server-Side)

This is what AddDefaultCharset and AddCharset configure. It is the highest priority signal and always wins when present.

Content-Type: text/html; charset=utf-8

Advantages:

  • Applies before the browser starts parsing the document.
  • Works for all file types, not just HTML.
  • Cannot be overridden by anything in the document itself.

HTML Meta Tag (Document-Level)

The <meta charset> tag inside the HTML document declares the encoding:

<meta charset="utf-8">

Or the older, equivalent form:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Advantages:

  • Works even when the file is viewed locally (without a server).
  • Serves as a fallback if the server does not send a charset header.

Limitations:

  • Only works for HTML documents.
  • Overridden by the HTTP header if both are present and different.
  • Must appear within the first 1024 bytes of the document.

BOM (Byte Order Mark)

A BOM is a special invisible character (U+FEFF) placed at the very beginning of a file. For UTF-8, it is the byte sequence 0xEF 0xBB 0xBF. Some text editors insert it automatically.

Advantages:

  • Provides an encoding hint even without a server or meta tag.

Limitations and problems:

  • Not recommended for UTF-8 on the web. The UTF-8 specification says a BOM is neither required nor recommended.
  • Can cause problems in PHP (output before headers), JavaScript, and JSON parsing.
  • Some editors and tools treat the BOM as visible garbage characters.
  • Overridden by both the HTTP header and the meta tag.

Priority Order

When multiple declarations exist, the browser follows this priority:

HTTP Header (highest priority)

BOM

Meta Tag

Browser Guessing (lowest priority, unreliable)

Common Mistake: Conflicting Declarations

Wrong approach:

The server sends:

Content-Type: text/html; charset=iso-8859-1

But the HTML document contains:

<meta charset="utf-8">

The HTTP header wins. The browser uses ISO-8859-1 despite the meta tag saying UTF-8. If the file was actually saved as UTF-8, every non-ASCII character will display incorrectly.

Correct approach:

Ensure all declarations agree:

.htaccess
AddDefaultCharset utf-8
HTML document
<meta charset="utf-8">

And save the file as UTF-8 without BOM in your text editor.

Best Practice Summary

MethodWhen to UsePriority
HTTP HeaderAlways. Configure via AddDefaultCharset and AddCharset.Highest
HTML Meta TagAlways for HTML. Serves as fallback for offline viewing.Medium
BOMAvoid for UTF-8. Only necessary for UTF-16.Low
warning

The safest strategy is to set the charset in both the HTTP header and the HTML meta tag, and make sure they agree. The HTTP header handles the server-to-browser communication, and the meta tag handles cases where the file might be opened locally or served by a different server without charset configuration. Never rely on browser guessing as your only encoding strategy.

Proper character encoding configuration is one of those invisible details that, when done right, nobody notices. But when done wrong, it creates visible, frustrating problems for every user who encounters a non-ASCII character on your site. Set AddDefaultCharset utf-8 as your baseline, add AddCharset utf-8 for all your text-based file types, include the <meta charset="utf-8"> tag in your HTML documents, and save all your files as UTF-8 without BOM. This consistent approach eliminates an entire category of bugs before they ever reach your users.