The Unicode Text Debugger @ unicode.run

Try an example!

🧑🏾‍❤️‍💋‍🧑🏻 The most complex emoji in the current Unicode standard is composd of 10 code points including skin color modifiers, zero-width joiners, and a variation selector.

S̶t̶r̶i̶k̶e̶о𝘂𝘁 See how combining characters and misusing unusual characters can be used to create interesting text effects and homographs.

Å != Å Learn about composing characters and normalized forms.

‮12345‬ This text renders backwards from the order of its characters using BIDI control code points. Inspired by https://trojansource.codes/.

Hi! ‏(שלום!)‏ This example contains bidirectional text with BIDI glyph mirroring and right-to-left markers. Inspired by https://blog.georeactor.com/osm-1.

(שלום!) This bidirectional text displays differently depending on context. Inspired by https://blog.georeactor.com/osm-1.

↙ ~ ↙️ and 你好！ ~ 你好！︁ Examples of an emoji variation sequence and an East Asian punctuation positional variant using variation selectors.

Send me other interesting Unicode examples at @josh@joshdata.me on Mastodon.

About Unicode.run

Text is unexpectedly complicated. Use Unicode.run to debug text.

Here are some things you can do here:

See each code point’s escape code in a variety of programming languages.
See the “length” of the text as it would be reported in different programming languages.
See when characters (technically “extended grapheme clusters”) are composed of multiple code points.
Click code points in the debugger output to highlight them in the text. (In Firefox you can also select text to highlight the code points in the debugger output.)
Switch between the text and its UTF-32 or UTF-16BE hex encodings at the top of the page.
See where text changes direction in bidirectional text, and get warnings when text direction depends on where it is used. Mirrored glyphs in bidirectional text are also noted.
Get warnings about hidden code points that can alter the display of the text (see https://trojansource.codes/), invalidly placed combining code points, invalid code points, and characters that are not in normalized form.

This is a project by JoshData.

Thanks to ucd-full (based on Unicode 15.1), stdlib-js/string-split-grapheme-clusters (based on Unicode 13), bidi-js (based on Unicode 13), html-entities, and the Inter Typeface.

Nikita Prokopov’s The Absolute Minimum Every Software Developer Must Know About Unicode in 2023 (Still No Excuses!) was inspiration for this project.