DocumentsImagesMediaPDF Tools

Punycode/Unicode Converter

Convert internationalized domain names between Punycode and Unicode, in your browser.

Drag your PDF here

.pdf · up to 2 GB

FreeNo signupNo watermarkOCR included

Punycode converter for IDN domains

All Unicode scripts

Supports domains in Chinese, Arabic, Cyrillic, Devanagari, Hebrew, Japanese and any Unicode script.

100% private

Punycode conversion happens entirely in your browser using the native API. No servers.

Homograph detection

Reveals hidden Unicode characters in suspicious domains. Identifies potential phishing attempts.

Instant

Bidirectional real-time conversion. No signup, no waiting, RFC 3492 compliant.

Three steps, no hassle

1

Enter the domain

Type or paste a domain in Unicode (like café.com or 中文.com) or in Punycode (like xn--caf-dma.com). The tool detects the format automatically.

2

Instant conversion

The domain is converted between its readable Unicode representation and its ASCII Punycode (xn--...) representation using the RFC 3492 standard.

3

Verify and copy

Copy the result to use it in DNS configurations, domain registrations, or to check if a suspicious domain uses homograph characters.

Got questions?

Punycode is an encoding algorithm defined in RFC 3492 (published in 2003) by the IETF, designed to represent internationalized domain names (IDN) using only ASCII characters. The DNS system was originally designed only for ASCII characters (letters a-z, digits 0-9, and hyphen), which excluded languages that use accents or non-Latin characters (Chinese, Arabic, Cyrillic, etc.). Punycode solves this by encoding Unicode characters as an ASCII string beginning with the prefix xn--. For example, münchen.de becomes xn--mnchen-3ya.de.

The xn-- prefix is the ACE (ASCII Compatible Encoding) prefix defined in the IDNA (Internationalizing Domain Names in Applications) standard. It indicates that the domain label following it is a Punycode representation of Unicode characters. The xn-- prefix appears in each label (dot-separated part) of the domain that contains non-ASCII characters. If the entire domain is Unicode, all labels will have the prefix. For example, bücher.de has the label bücher which in Punycode is xn--bcher-kva, resulting in xn--bcher-kva.de. The TLD (.de, .com) rarely needs encoding.

An IDN homograph attack (also called a homophone attack or Unicode spoofing) occurs when an attacker registers a domain that visually looks identical to a legitimate domain but uses characters from a different alphabet. For example, the Latin letter 'a' (U+0061) is visually identical to the Cyrillic letter 'а' (U+0430). A domain like pаypal.com with the Cyrillic 'а' is different from paypal.com with the Latin 'a', but they look the same. This tool lets you verify whether a suspicious domain uses non-ASCII characters by revealing its Punycode representation.

ICANN (Internet Corporation for Assigned Names and Numbers) approved IDN ccTLDs (internationalized country-code Top Level Domains) in 2009 and generic IDN gTLDs starting in 2012. The first IDN ccTLDs to go live were in Arabic (Egypt, Saudi Arabia) and Cyrillic (Russia) in 2010. Today there are TLDs entirely in Chinese, Arabic, Devanagari, Hebrew and other scripts. ICANN has security policies to prevent homograph attacks: registrars must validate that domains do not mix scripts from different languages in the same label (script mixing restriction).

There are several ways to check if a domain uses Punycode: 1) Use this tool: paste the domain and you will see if it contains hidden Unicode characters. 2) Inspect the browser's address bar: Chrome and Firefox display Punycode (xn--...) in the address bar when they detect potential homograph attacks. 3) Run in terminal: python3 -c "import idna; print(idna.encode('domain.com'))" or in Node.js: require('punycode').toASCII('domain.com'). 4) Query the DNS registry with nslookup or dig: the displayed name may reveal the Punycode.

IDN history (ICANN 2003), domain spoofing prevention, and web internationalization

The history of internationalized domain names (IDN) begins with the recognition that DNS, designed in the 1980s by Paul Mockapetris, only supported ASCII, excluding the languages of most of the world's population. The first standardization efforts began in the 1990s. RFC 3492 (Punycode) was published in March 2003, and the IDNA standard (RFC 3490) the same year. ICANN approved support for IDN ccTLDs in October 2009 after years of debate, and the first fully Arabic domains (.مصر for Egypt, .السعودية for Saudi Arabia) went live in 2010.

Domain spoofing through Unicode homographs is a real security threat. The most famous case was demonstrated by Evgeniy Gabrilovich and Alex Gontmakher in 2001, when they registered pаypal.com with a Cyrillic 'a'. In 2017, researcher Xudong Zheng demonstrated that it was possible to register аррӏе.com (using Cyrillic characters that look like Latin letters) and obtain a valid TLS certificate, making the attack practically undetectable. Modern browsers responded by displaying Punycode in the address bar when they detect mixed scripts or characters from uncommon scripts.

Web internationalization (i18n) goes beyond domains. Unicode, developed since 1991 by the Unicode Consortium, defines over 140,000 characters from virtually all of the world's writing systems. UTF-8 encoding, which represents Unicode as ASCII-compatible bytes, is now the universal standard for the web (over 98% of web pages use UTF-8 according to W3Techs). Support for internationalized domain names completes the internationalization of the transport layer: with IDN, fully native-language URLs are possible. The WHATWG URL Standard (implemented in browsers) uses Punycode internally to process Unicode domains in URLs.