Track Awesome Unicode Updates Weekly

:joy: :ok_hand: A curated list of delightful Unicode tidbits, packages and resources.

🏠 Home · 🔍 Search · 🔥 Feed · 📮 Subscribe · ❤️ Sponsor · 😺 jagracey/Awesome-Unicode · ⭐ 809 · 🏷️ Miscellaneous

[ Daily / Weekly / Overview ]

Jul 04 - Jul 10, 2016

💥 Lowercase Transformation Collisions / Wait a second... what did I just read?

String length is typically determined by counting codepoints. This means that surrogate pairs would count as two characters. Combining multiple diacritics may be stacked over the same character. a + ̈ == ̈a , increasing length, while only producing a single character.

Similarily, reversing strings often is a non-trivial task. Again, surrogate pairs and diacritics must be reversed together. ES Reverser (⭐859) provides a pretty good solution.

Upper and lower case mappings are not always one-to-one. They can also be:
- One-to-many: (ß → SS )
- Contextual: (…Σ ↔ …ς AND …ΣΤ… ↔ …στ… )
- Locale-sensitive: ( I ↔ ı AND İ ↔ i )

Unicode Blocks / Wait a second... what did I just read?

Version 9.0.0 (Latest Version, August 2016 - adds exactly 7,500 characters)

Version 8.0.0

Jun 13 - Jun 19, 2016

One-To-Many Case Mappings / Wait a second... what did I just read?

python-ftfy (⭐3.3k) - Given Unicode text, make its representation consistent and possibly less broken.

vim-troll-stopper (⭐166) - Stop Unicode trolls from messing with your code.

Recursive HTML Tag Renaming Script / Wait a second... what did I just read?

What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text

May 30 - Jun 05, 2016

Myths of Unicode

Unicode is simply a 16-bit code - Some people are under the misconception that Unicode is simply a 16-bit code where each character takes 16 bits and therefore there are 65,536 possible characters. This is not, actually, correct. It is the single most common myth about Unicode, so if you thought that, don't feel bad.

You can use any unassigned codepoint for internal use - No. Eventually that hole will be filled with a different character. Instead use private use or noncharacters.

Every Unicode code point represents a character - No. There are lots of nonCharacters (FFFE, FFFF, 1FFFE,…) There are also surrogate code points, private and unassigned codepoints, and control/format “characters" (RLM, ZWNJ,…)

Unicode will run out of space - If it were linear, we would run out in 2140 AD. But it isn't linear. See http://www.unicode.org/roadmaps/

Case mappings are 1-1 - No. They can also be:
- One-to-many: (ß → SS )
- Contextual: (…Σ ↔ …ς AND …ΣΤ… ↔ …στ… )
- Locale-sensitive: ( I ↔ ı AND İ ↔ i )

One-To-Many Case Mappings / Wait a second... what did I just read?

PhantomScript (⭐39) - 👻 🔦 Invisible JavaScript code execution & social engineering

ESReverser (⭐859) - A Unicode-aware string reverser written in JavaScript.

mimic (⭐3.7k) - [ab]using Unicode to create tragedy

Unicode Consortium's Emoji Chart

Emojipedia - Information about specific emoji, news blog.

emojitracker - Realtime emoji use on Twitter.

World Translation Foundation - A way to promote, explore, and translate the written word into the pictorial alphabet of Emoji.

Can I Emoji? - Displays the current status of native Emoji support across iOS, Android and Windows.

How to register an emoji URL

Recursive HTML Tag Renaming Script / Wait a second... what did I just read?

https://en.wikipedia.org/wiki/Unicode_font#List_of_Unicode_fonts

http://www.unifont.org/fontguide/

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets - By Joel Spolsky

The Unicode Consortium's Recommended Reading List

Space Yourself - Smashing Magazine's Spacing Guide

JavaScript has a Unicode Problem

Creative usernames and Spotify account hijacking

Shapecatcher - Draw the character you're looking for.

Confusable Unicode Characters

Unicode Character Database

Database Dumps of Codepoints.net

Unicode Blocks List

Unicode Character Code Charts

Unicode Case Charts

Unicode Normalization Chart

Unicode FAQ

Unicode Blocks / Wait a second... what did I just read?

Universal repertoire - Every writing system ever used shall be respected and represented in the standard

Logical order - In bidirectional text are the characters stored in logical order, not in a way that the representaion

Efficiency - The documentation must be efficient and complete.

Unification - Where different cultures or languages use the same character, it shall be only included once. This point is

Characters, not glyphs - Only characters, not glyphs shall be encoded. In a nutshell, glyphs are the actual graphical

Dynamic composition - New characters can be composed of other, already standardized characters. For example, the character “Ä” can be composed of an “A” and a dieresis sign (“ ¨ ”).

Semantics - Included characters must be well defined and distinguished from others.

Stability - Once defined characters shall never be removed or their codepoints reassigned. In the case of an error, a codepoint shall be deprecated.

Plain Text - Characters in the standard are text and never mark-up or metacharacters.

Convertibility - Every other used encoding shall be representable in terms of a Unicode encoding.

Version 7.0.0

Version 6.3.0

Version 6.2.0

Version 6.1.0

Version 6.0.0

Version 5.2.0

Version 5.1.0

Version 5.0.0 (unavailable)

Version 4.0.1

Version 4.0.0