How to Resolve a UnicodeDecodeError for a CSV File in Python

The UnicodeDecodeError is one of the most common errors encountered when reading CSV files in Python. It occurs when Python tries to decode the file's bytes using the wrong encoding scheme: typically defaulting to UTF-8 when the file was actually saved in a different encoding like Latin-1, UTF-16, or Windows-1252.

In this guide, you'll learn why this error happens, how to identify the correct encoding, and multiple methods to resolve it when working with CSV files in Pandas and Python.

Understanding the Error

Every text file is stored as a sequence of bytes on disk. An encoding scheme (like UTF-8, ASCII, or Latin-1) defines how those bytes map to characters. When Python reads a file, it must use the same encoding the file was saved with. If there's a mismatch, certain byte sequences can't be decoded, and Python raises a UnicodeDecodeError.

Simple Example

# ASCII can only represent characters 0–127
text = b"a".decode("ascii")    # ✅ Works: 'a' is within ASCII range
print(text)

# Byte 0xf1 (ñ) is outside ASCII range
try:
    text = b"a\xf1".decode("ascii")
except UnicodeDecodeError as e:
    print(f"Error: {e}")

Output:

a
Error: 'ascii' codec can't decode byte 0xf1 in position 1: ordinal not in range(128)

The fix is to use an encoding that supports the byte 0xf1:

# ✅ Latin-1 (ISO-8859-1) supports bytes 0–255
text = b"a\xf1".decode("latin-1")
print(text)  # Output: añ

The Error When Reading CSV Files

When using pd.read_csv(), Pandas defaults to UTF-8 encoding. If the CSV file was saved in a different encoding, you'll see an error like this:

import pandas as pd

# ❌ Fails if the file isn't UTF-8 encoded
try:
    df = pd.read_csv('data.csv')
except UnicodeDecodeError as e:
    print(f"Error: {e}")

Output:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Solution 1: Specify the Correct Encoding

If you know the file's encoding, pass it directly to pd.read_csv():

import pandas as pd

# ✅ Specify the correct encoding
df = pd.read_csv('data.csv', encoding='utf-16')
print(df.head())

Common Encodings

Encoding	When to Use
`utf-8`	Default for most modern files, web data, Linux/Mac systems
`latin-1` (or `iso-8859-1`)	Western European languages, older systems
`utf-16`	Files from Windows apps, Excel exports
`cp1252` (or `windows-1252`)	Windows-generated files with special characters
`ascii`	Plain English text with no special characters
`utf-8-sig`	UTF-8 files with a BOM (Byte Order Mark), common from Excel

import pandas as pd

# Examples with different encodings
df = pd.read_csv('european_data.csv', encoding='latin-1')
df = pd.read_csv('windows_export.csv', encoding='cp1252')
df = pd.read_csv('excel_export.csv', encoding='utf-8-sig')

Solution 2: Detect the Encoding Automatically

When you don't know the file's encoding, use the chardet library to detect it:

pip install chardet

import chardet
import pandas as pd

# Detect the encoding
with open('data.csv', 'rb') as f:
    raw_data = f.read()
    result = chardet.detect(raw_data)

print(f"Detected encoding: {result['encoding']}")
print(f"Confidence: {result['confidence']:.0%}")

# Use the detected encoding
df = pd.read_csv('data.csv', encoding=result['encoding'])
print(df.head())

Output:

Detected encoding: UTF-16
Confidence: 100%

For large files, reading only a portion is more efficient:

import chardet

with open('large_file.csv', 'rb') as f:
    # Read only the first 100KB for detection
    raw_data = f.read(100_000)
    result = chardet.detect(raw_data)

print(f"Detected: {result['encoding']} ({result['confidence']:.0%} confidence)")

tip

An alternative to chardet is the charset-normalizer library (used internally by the requests library), which is often faster and more accurate:

pip install charset-normalizer

from charset_normalizer import from_path

result = from_path('data.csv')
print(f"Detected: {result.best().encoding}")

Solution 3: Use `latin-1` as a Fallback

Latin-1 (ISO-8859-1) can decode any byte value (0–255) without raising an error, making it a reliable fallback when you can't determine the correct encoding:

import pandas as pd

# ✅ latin-1 never raises UnicodeDecodeError
df = pd.read_csv('data.csv', encoding='latin-1')
print(df.head())

warning

While latin-1 will never raise a UnicodeDecodeError, it may misinterpret characters if the file's actual encoding is something else (like UTF-16 or Shift-JIS). Special characters may appear garbled. Use this as a temporary workaround while you identify the correct encoding.

Solution 4: Use `errors` Parameter to Handle Bad Bytes

Python's built-in open() function supports an errors parameter that controls how decoding errors are handled:

import pandas as pd
import io

# Option 1: Ignore problematic bytes
with open('data.csv', 'r', encoding='utf-8', errors='ignore') as f:
    df = pd.read_csv(f)
print("With errors='ignore':")
print(df.head())

# Option 2: Replace problematic bytes with ''
with open('data.csv', 'r', encoding='utf-8', errors='replace') as f:
    df = pd.read_csv(f)
print("\nWith errors='replace':")
print(df.head())

`errors` Value	Behavior
`'strict'`	Raises `UnicodeDecodeError` (default)
`'ignore'`	Silently skips undecodable bytes
`'replace'`	Replaces bad bytes with `` (U+FFFD)

warning

Both 'ignore' and 'replace' can cause data loss or corruption. Characters may be silently dropped or replaced. Use these options only when you're confident the problematic bytes aren't critical to your analysis.

Solution 5: Convert the File Encoding

Instead of changing your code, you can convert the file itself to UTF-8 before reading it:

Using Python

import shutil

input_file = 'data_utf16.csv'
output_file = 'data_utf8.csv'

# Convert from UTF-16 to UTF-8
with open(input_file, 'r', encoding='utf-16') as source:
    with open(output_file, 'w', encoding='utf-8') as target:
        shutil.copyfileobj(source, target)

print(f"Converted '{input_file}' to UTF-8 as '{output_file}'")

Using a Text Editor

You can also convert the encoding using a text editor:

Open the CSV file in Notepad, Notepad++, or VS Code.
Go to File → Save As.
Change the Encoding dropdown to UTF-8.
Save the file.

After conversion, the file will work with pd.read_csv() without specifying an encoding.

Reusable Function for Safe CSV Reading

Here's a utility function that tries multiple encodings automatically:

import pandas as pd

def read_csv_safe(filepath, encodings=None, **kwargs):
    """
    Attempt to read a CSV file by trying multiple encodings.

    Args:
        filepath: Path to the CSV file.
        encodings: List of encodings to try. Defaults to common encodings.
        **kwargs: Additional arguments passed to pd.read_csv().

    Returns:
        A Pandas DataFrame.
    """
    if encodings is None:
        encodings = ['utf-8', 'utf-8-sig', 'latin-1', 'cp1252', 'utf-16', 'ascii']

    for encoding in encodings:
        try:
            df = pd.read_csv(filepath, encoding=encoding, **kwargs)
            print(f"Successfully read with encoding: {encoding}")
            return df
        except (UnicodeDecodeError, UnicodeError):
            continue

    raise ValueError(f"Could not read '{filepath}' with any of the tried encodings: {encodings}")


# Usage
df = read_csv_safe('mystery_file.csv')
print(df.head())

Quick Troubleshooting Guide

Error Message	Likely Cause	Fix
`can't decode byte 0xff in position 0`	File is UTF-16 encoded	`encoding='utf-16'`
`can't decode byte 0xe9 in position X`	File uses Latin-1 or cp1252	`encoding='latin-1'`
`can't decode byte 0xef in position 0`	File has a BOM (Byte Order Mark)	`encoding='utf-8-sig'`
`ordinal not in range(128)`	File has non-ASCII characters	`encoding='utf-8'` or `encoding='latin-1'`

Summary

The UnicodeDecodeError occurs when Python tries to read a CSV file with the wrong encoding. To resolve it:

Specify the correct encoding: pd.read_csv('file.csv', encoding='latin-1'): the best solution when you know the encoding.
Detect the encoding: Use the chardet library to automatically identify the file's encoding.
Use latin-1 as a fallback: It accepts all byte values and never raises an error, though characters may be misinterpreted.
Handle errors gracefully: Use errors='ignore' or errors='replace' to skip or substitute problematic bytes.
Convert the file: Re-save the file as UTF-8 using Python or a text editor.

The most robust approach is to detect the encoding first with chardet, then read the file with the correct encoding explicitly set.

Understanding the Error​

Simple Example​

The Error When Reading CSV Files​

Solution 1: Specify the Correct Encoding​

Common Encodings​

Solution 2: Detect the Encoding Automatically​

Solution 3: Use latin-1 as a Fallback​

Solution 4: Use errors Parameter to Handle Bad Bytes​

Solution 5: Convert the File Encoding​

Using Python​

Using a Text Editor​

Reusable Function for Safe CSV Reading​

Quick Troubleshooting Guide​

Summary​

Table of Contents

Understanding the Error

Simple Example

The Error When Reading CSV Files

Solution 1: Specify the Correct Encoding

Common Encodings

Solution 2: Detect the Encoding Automatically

Solution 3: Use `latin-1` as a Fallback

Solution 4: Use `errors` Parameter to Handle Bad Bytes

Solution 5: Convert the File Encoding

Using Python

Using a Text Editor

Reusable Function for Safe CSV Reading

Quick Troubleshooting Guide

Summary