How to Convert Emojis to Text in Python
Emojis carry significant semantic meaning in text data. For natural language processing, sentiment analysis, or machine learning pipelines, converting emojis to their textual descriptions preserves this information that would otherwise be lost during text cleaning.
This guide covers conversion techniques using popular Python libraries.
Using the emoji Library (Recommended)β
The emoji library provides bidirectional conversion between Unicode emojis and text shortcodes:
pip install emoji
import emoji
text = "Python is π₯ and π is awesome! π―"
# Convert emojis to text shortcodes
converted = emoji.demojize(text)
print(converted)
# Output: Python is :fire: and :snake: is awesome! :hundred_points:
# Convert shortcodes back to emojis
restored = emoji.emojize(converted)
print(restored)
# Output: Python is π₯ and π is awesome! π―
Output:
Python is :fire: and :snake: is awesome! :hundred_points:
Python is π₯ and π is awesome! π―
Customizing Delimitersβ
import emoji
text = "Great job! πβ¨"
# Use different delimiters
result = emoji.demojize(text, delimiters=(" [", "] "))
print(result)
# Output: Great job! [thumbs_up] [sparkles]
# No delimiters (just the name)
result = emoji.demojize(text, delimiters=(" ", " "))
print(result)
# Output: Great job! thumbs_up sparkles
Output:
Great job! [thumbs_up] [sparkles]
Great job! thumbs_up sparkles
Language Supportβ
import emoji
text = "Hello π World π"
# Spanish descriptions
result = emoji.demojize(text, language='es')
print(result)
Output:
Hello :mano_saludando: World :globo_terrΓ‘queo_mostrando_europa_y_Γ‘frica:
Multiple languages available: en, es, pt, it, fr, de, etc.
Using demoji for Descriptive Textβ
When you need official Unicode descriptions rather than shortcodes, use demoji:
pip install demoji
import demoji
# Download emoji data (run once)
demoji.download_codes()
text = "I love this! β€οΈπ So exciting! π"
# Replace with full descriptions
clean_text = demoji.replace_with_desc(text, sep=" ")
print(clean_text)
# Output: I love this! red heart rocket So exciting! party popper
# Find all emojis and their descriptions
emoji_dict = demoji.findall(text)
print(emoji_dict)
# Output: {'β€οΈ': 'red heart', 'π': 'rocket', 'π': 'party popper'}
- emoji: Best for shortcode format (
:fire:), chat applications, and reversible conversions - demoji: Best for NLP/ML features where you need natural language descriptions
Removing Emojis Entirelyβ
Sometimes you need to strip emojis without replacement:
import emoji
def remove_emojis(text):
"""Remove all emojis from text."""
return emoji.replace_emoji(text, replace='')
text = "Hello π World π!"
clean = remove_emojis(text)
print(clean) # Output: Hello World !
# Clean up extra spaces
clean = ' '.join(clean.split())
print(clean) # Output: Hello World!
Using demoji for Removalβ
import demoji
text = "Check this out! π₯π―π"
# Remove all emojis
clean = demoji.replace(text, "")
print(clean) # Output: "Check this out!"
Extracting Emojis for Analysisβ
import emoji
def extract_emojis(text):
"""Extract all emojis from text."""
return [char for char in text if char in emoji.EMOJI_DATA]
text = "Having a great day! ππ Let's celebrate π"
emojis = extract_emojis(text)
print(emojis)
# ['π', 'π', 'π']
# Get emoji names
names = [emoji.demojize(e) for e in emojis]
print(names)
# [':smiling_face_with_smiling_eyes:', ':sun_with_face:', ':party_popper:']
Output:
['π', 'π', 'π']
[':smiling_face_with_smiling_eyes:', ':sun_with_face:', ':party_popper:']
Counting Emoji Usageβ
import emoji
from collections import Counter
def count_emojis(text):
"""Count emoji occurrences in text."""
emojis = [char for char in text if char in emoji.EMOJI_DATA]
return Counter(emojis)
text = "Love it! β€οΈβ€οΈβ€οΈ Amazing! π₯π₯"
counts = count_emojis(text)
print(counts)
# Counter({'β€οΈ': 3, 'π₯': 2})
Preprocessing for NLP Pipelinesβ
import emoji
import re
def preprocess_for_nlp(text, mode='convert'):
"""
Preprocess text with emoji handling for NLP.
Modes:
- 'convert': Replace emojis with text descriptions
- 'remove': Strip emojis entirely
- 'separate': Move emoji descriptions to end
"""
if mode == 'remove':
return emoji.replace_emoji(text, replace='')
elif mode == 'convert':
# Convert to words without colons
converted = emoji.demojize(text, delimiters=(' ', ' '))
# Replace underscores with spaces
converted = converted.replace('_', ' ')
return ' '.join(converted.split())
elif mode == 'separate':
emojis = [emoji.demojize(c) for c in text if c in emoji.EMOJI_DATA]
clean = emoji.replace_emoji(text, replace='')
return f"{clean.strip()} [{', '.join(emojis)}]"
return text
text = "This movie is π₯π₯π₯! Best ever π"
print(preprocess_for_nlp(text, 'convert'))
# Output: This movie is fire fire fire ! Best ever thumbs up
print(preprocess_for_nlp(text, 'remove'))
# Output: This movie is ! Best ever
print(preprocess_for_nlp(text, 'separate'))
# Output: This movie is ! Best ever [:fire:, :fire:, :fire:, :thumbs_up:]
Handling Emoji Sequencesβ
Some emojis are composed of multiple Unicode characters:
import emoji
# Compound emojis (skin tones, gender, flags)
text = "Hello π¨βπ©β π§βπ¦ and π³οΈβπ"
converted = emoji.demojize(text)
print(converted)
# Output: Hello :family_man_woman_girl_boy: and :rainbow_flag:
Summaryβ
| Goal | Library | Function |
|---|---|---|
Shortcodes (:smile:) | emoji | emoji.demojize() |
| Restore emojis | emoji | emoji.emojize() |
| Full descriptions | demoji | demoji.replace_with_desc() |
| Find all emojis | demoji | demoji.findall() |
| Remove emojis | emoji | emoji.replace_emoji() |
Use emoji for general applications, chat platforms, and when you need reversible conversions. Use demoji for data science and NLP tasks where natural language descriptions improve feature extraction and model understanding.