How to Detect Homoglyph Confusable Characters in Text (2026)
Cyrillic “а” and Latin “a” are pixel-for-pixel identical but different characters — and attackers exploit exactly that to register legitimate-looking domains and slip past filters. The Homoglyph Detector scans text for these confusable lookalikes from Cyrillic, Greek, Armenian, and other scripts, flags each with its codepoint and source script, and can normalise them back to plain ASCII.
What Are Homoglyph Characters
Homoglyphs are characters from different writing systems that share identical or near-identical visual appearance. A homograph attack uses these lookalikes to register domains that appear legitimate — replacing a Latin “o” with a Cyrillic “о” tricks both users and automated scanners. Our tool detects these suspicious characters by comparing each glyph against known confusable pairs across Unicode scripts including Cyrillic, Greek, Armenian, and more.
How to Use Our Homoglyph Detector
- Open the Homoglyph Detector and paste or type the text you want to scan.
- Click “Scan” or let the tool auto-detect suspicious characters instantly.
- Review detected homoglyphs in the results table — each entry shows the character, its ASCII lookalike, Unicode codepoint, and source script.
- Click “Replace with ASCII” to normalize all suspicious characters, or copy the detection report.
Example Homoglyph Detections
| Character | Looks Like | Unicode Codepoint | Script |
|---|---|---|---|
| а | a | U+0430 | Cyrillic |
| е | e | U+0435 | Cyrillic |
| о | o | U+043E | Cyrillic |
| с | c | U+0441 | Cyrillic |
| р | p | U+0440 | Cyrillic |
Common Use Cases
- Security audits — scan domain names and user-submitted text for homograph attack indicators.
- Content validation — ensure submitted text uses only the intended script and character set.
- Data cleaning — normalize mixed-script text to standard ASCII before database storage or search indexing.
- Unicode research — identify and catalog confusable character pairs in multilingual text.
FAQ
What is a homoglyph?
A homoglyph is a character that looks like another character but has a different Unicode codepoint. For example, Cyrillic “а” (U+0430) looks identical to Latin “a” (U+0061) but is a completely different character from a different script.
Why should I detect homoglyphs?
Homoglyphs are often used in phishing attacks (lookalike domains) and can cause issues in text processing, search indexing, and content validation. Detecting them helps ensure text authenticity and security.
Can the tool fix detected homoglyphs?
Yes. The Homoglyph Detector includes a “Replace with ASCII” option that normalizes all detected homoglyph characters back to their standard Latin equivalents, producing clean ASCII text.
What scripts are supported for detection?
The tool checks against known confusable pairs from Cyrillic, Greek, Armenian, Georgian, and several other Unicode scripts that contain characters visually similar to Latin letters.
Is this different from a Unicode inspector?
Yes. While a Unicode inspector shows every character’s codepoint, the homoglyph detector specifically flags characters that have known visual confusables — it highlights only the suspicious ones rather than listing every character.
Validating user input or auditing a suspicious domain or message? Run it through the Homoglyph Detector to surface hidden lookalikes — and for the flip side (creating them deliberately), see the Fake Text Generator guide.