Skip to content

Instantly share code, notes, and snippets.

@flipeador
Last active January 24, 2025 17:28
Show Gist options
  • Save flipeador/4ea725293c49a270bcc6e96ef2b8d281 to your computer and use it in GitHub Desktop.
Save flipeador/4ea725293c49a270bcc6e96ef2b8d281 to your computer and use it in GitHub Desktop.
Unicode characters and emojis.

Unicode Characters & Emojis

A character is an overloaded term that can mean many things. This gist will briefly touch on the differences between Code Point, Code Unit, Grapheme and Glyph, with a particular focus on emojis.

Tip

Use Full Emoji Support For All Websites to ensure correct rendering of all emoji graphemes on any website.

Code Point

A code point is the atomic unit of information that consists of a numerical value that maps to a specific character which is given meaning by the Unicode®︎ Standard. A code point represent a digit, letter, whitespace, punctuation mark, emoji, symbol, control character, or formatting. The text is a sequence of code points.

Code Unit

A code unit is the unit of storage of a part of an encoded code point.

  • Some characters are encoded using multiple code units, resulting in a variable-length encoding.
  • UTF-32 is a fixed width encoding, hence all code points can be encoded in a single 32-bit code unit.

The following table shows the three most common character encoding schemes and their corresponding word size. There are also 5 characters with their respective number of code units for the corresponding encoding.

Encoding Word Size Width Type A © 😀 👉🏿
UTF-8 8-bit Variable 1 CU 2 CU 3 CU 4 CU 8 CU
UTF-16 16-bit Variable 1 CU 1 CU 1 CU 2 CU 4 CU
UTF-32 32-bit Fixed 1 CU 1 CU 1 CU 1 CU 2 CU

Note

  • The character 👉🏿 is a combination of two code points wich requires two 32-bit code units.
  • UTF-32 does not always have a 1:1 mapping between code points and what a user perceives as characters.

In JavaScript, strings are represented fundamentally as sequences of UTF-16 code units:

  • Every UTF-16 code unit is exactly 16 bits long.
  • Code units and can be written in a string with \u followed by exactly four hex digits.
  • Code points can be written in a string with \u{xxxxxx} where xxxxxx represents 1–6 hex digits.

The entire Unicode character set is much bigger than what a single UTF-16 code unit can represent, supplementary characters are encoded as two 16-bit code units called a surrogate pair. The length data property of a String value counts UTF-16 code units.

Grapheme

A grapheme is a sequence of one or more code points that are displayed as a single, graphical unit that a reader recognizes as a single element in the context of a particular writing system.

  • User-perceived characters are referred to as graphemes.
  • Some code points are never part of any grapheme (e.g. the ZWNJ, or directional overrides).

The following table shows an example with 3 characters:

Character Code Point Description
ä 000E4 Single code point.
00061 00308 Base character (a), and combining diaeresis (◌̈).
👉🏿 1F449 1F3FF Backhand index pointing right (👉), and dark skin tone (🏿).

Note

  • The characters ä and are the same grapheme, but have different code points.
  • The character 👉🏿 is two code points, but is displayed as a single character.
  • The diaeresis character (¨) and the combining diaeresis (◌̈) are not the same code point.

In JavaScript, Intl.Segmenter is used to split a string into segments at grapheme cluster boundaries, as determined by a specified locale. Expand the details below to see a JavaScript example that demonstrates the differences between code point, code unit, and grapheme in different string encodings.

Expand to see an example of code point, code unit, and grapheme.
// Backhand Index Pointing Right: Dark Skin Tone.
const str = '👉🏿'; // 1F449 1F3FF
console.log('String:', str);

// Split using UTF-8 code units.
const encoder = new TextEncoder();
const encoded = encoder.encode(str);
const u8_cu = [...encoded].map(byte => byte.toString(16));
console.log('UTF-8:', u8_cu);
// ['f0', '9f', '91', '89', 'f0', '9f', '8f', 'bf']

// Split using UTF-16 code units.
// This is the default encoding in JavaScript.
const u16_cu = str.split('');
console.log('UTF-16:', u16_cu);
// ['\uD83D', '\uDC49', '\uD83C', '\uDFFF']

// Split using UTF-32 code units.
// A UTF-32 code unit is always a single code point.
const u32_cu = Array.from(str); // or [...str]
console.log('UTF-32:', u32_cu); // ['👉', '🏿']

// Split using graphemes.
const segmenter = new Intl.Segmenter();
const segment = segmenter.segment(str);
const graphemes = [...segment].map(x => x.segment);
console.log('Grapheme:', graphemes); // ['👉🏿']

In JavaScript, String.prototype.split() and String.prototype[@@iterator]() splits the characters of a string in different ways. String indexes and split() operates by UTF-16 code units, and @@iterator() iterates by code points.

Glyph

A glyph is an image, usually stored in a font (which is a collection of glyphs), used to represent graphemes or parts thereof.

Fonts may compose multiple glyphs into a single representation, for example, if the above ä is a single code point, a font may choose to render that as two separate, spatially overlaid glyphs.

For OTF, the font's GSUB and GPOS tables contain substitution and positioning information to make this work. A font may contain multiple alternative glyphs for the same grapheme, too.

Modifiers & Variation Selectors

Variation Selectors (VS) designates a specific block (FE00-FE0F) within the Unicode character set containing 16 selectors used to specify a glyph variant for a preceding character. Certain characters can have variant forms or styles, and variation selectors are used to indicate these variations. These selectors are generally applied to base characters to modify their appearance, providing a means to represent different writing styles, scripts, or regional preferences.

Emojis can have diverse presentations, this includes variations in skin tones, genders, or other contextual modifications. The following items are the emoji-specific variation selectors.

  • The VS-15 (FE0E) is used to request a text presentation (monochrome) for an emoji character (♀︎).
  • The VS-16 (FE0F) is used to request an emoji presentation (polychrome) for an emoji character (♀️).

Skin tone variations are achieved using modifiers from 1F3FB to 1F3FF (🏻🏼🏽🏾🏿), they are based on the Fitzpatrick scale, which is a formal classification of human skin tones. When one of these code points is appended to an emoji that supports skin tone modifiers, it will change the skin tone of the emoji.

Expand to see a table with all skin tones.
Character Code Point Name
🏻 1F3FB Light skin tone.
🏼 1F3FC Medium-light skin tone.
🏽 1F3FD Medium skin tone.
🏾 1F3FE Medium-dark skin tone.
🏿 1F3FF Dark skin tone.

Zero Width Joiner

The zero-width joiner (ZWJ) is a non-printing character (200D) used in the computerized typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes (complex scripts). When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms.

When a ZWJ is placed between two emoji characters (or interspersed between multiple), it can result in a single glyph or new emoji, such as the family emoji (👪), made up of two adult emoji (👨👩) and one or two child emoji (👦).

Note

When not available, the ZWJ characters are ignored and a fallback sequence of separate emoji is displayed. Thus an emoji ZWJ sequence should only be supported where the fallback sequence would also make sense to a viewer. See the Recommended Emoji ZWJ Sequences.

When working with skin tones, you should be careful with those emojis that consist of two or more emojis joined by the ZWJ.

  • Skin tone modifiers must be included after the emoji but before the ZWJ.
  • Some ZWJ sequences include multiple emoji that each have different skin tone modifiers.
Expand to see an example combining characters and skin tones.
String.fromCodePoint(
  0x1FAF1, // Rightwards Hand (🫱)
  0x1F3FB, // Light skin tone (🏻)
  0x0200D, // Zero Width Joiner
  0x1FAF2, // Leftwards Hand (🫲)
  0x1F3FF, // Dark skin tone (🏿)
); // Handshake: Light Skin Tone, Dark Skin Tone (🫱🏻‍🫲🏿)

Useful Unicode Characters

Mathematical

Binary operators

Grapheme Code Point Description
2212 Minus sign
× 00D7 Multiplication sign
÷ 00F7 Division sign
22C5 Dot operator (sdot)
2215 Division slash

Unary operators

Grapheme Code Point Description
221A Square root
221B Cube root
221C Fourth root
± 00B1 Plus-minus sign
2213 Minus-or-plus sign
2032 Prime
2033 Double prime
2034 Triple prime
220F N-ary product
2211 N-ary summation

Fractions

Grapheme Code Point Description
½ 00BD One-half
2153 One-third
2154 Two-thirds
¼ 00BC One-quarter
¾ 00BE Three-quarters
2155 One-fifth
2156 Two-fifths
2157 Three-fifths
2158 Four-fifths
2159 One-sixth
215A Five-sixths
2150 One-seventh
215B One-eighth
215C Three-eighths
215D Five-eighths
215E Seven-eighths
2151 One-ninth
2152 One-tenth
215F Fraction numerator one
2044 Fraction slash

Calculus & Equations

Grapheme Code Point Description
221E Infinity
2202 Partial differential
222B Integral
222C Double integral
222D Triple integral
2A0C Quadruple integral
222E Contour integral
222F Surface integral
2230 Volume integral
2231 Clockwise integral
2232 Clockwise contour integral
2233 Anticlockwise contour integral
2207 Nabla

Sets

Grapheme Code Point Description
2282 Subset of
2283 Superset of
2286 Subset of or equal to
2287 Superset of or equal to
2284 Not a subset of
2229 Intersection
222A Union
2208 Element of
220A Small element of
2209 Not an element of
220B Contains as member
220D Small contains as member
220C Does not contain as member
2205 Empty set

Relationships

Grapheme Code Point Description
221D Proportional to
2223 Divides
2224 Does not divide
2243 Asymptotically equal to
2244 Not asymptotically equal to
2245 Approximately equal to
2246 Approximately but not actually equal to
2247 Neither approximately nor actually equal to
2248 Almost equal to
2249 Not almost equal to
225C Delta equal to
225D Equal to by definition
225F Questioned equal to
2260 Not equal to
2261 Identical to
2264 Less-than or equal to
2265 Greater-than or equal to
226A Much less-than
226B Much greater-than

Geometry

Grapheme Code Point Description
2220 Angle
2221 Measured angle
2222 Spherical angle
27C2 Perpendicular
221F Right angle

Logic

Grapheme Code Point Description
¬ 00AC Logical NOT
2227 Logical AND
2228 Logical OR
220E End of proof
2234 Therefore
2235 Because
2200 For all
2203 There exists
2204 There does not exists

Typography

Grapheme Code Point Description
22EE Vertical ellipsis
22EF Horizontal ellipsis
22F0 Up right diagonal ellipsis
22F1 Down right diagonal ellipsis
27E8 Left angle bracket
27E9 Right angle bracket
° 00B0 Degree sign

Roman Numeral

Grapheme Code Point Description
2160 Roman numeral one
2161 Roman numeral two
2162 Roman numeral three
2163 Roman numeral four
2164 Roman numeral five
2165 Roman numeral six
2166 Roman numeral seven
2167 Roman numeral eight
2168 Roman numeral nine
2169 Roman numeral ten
216A Roman numeral eleven
216B Roman numeral twelve
216C Roman numeral fifty
216D Roman numeral one hundred
216E Roman numeral five hundred
216F Roman numeral one thousand
2180 Roman numeral one thousand C D
2181 Roman numeral five thousand
2182 Roman numeral ten thousand
2187 Roman numeral fifty thousand
2188 Roman numeral one hundred thousand
2170 Small roman numeral one
2171 Small roman numeral two
2172 Small roman numeral three
2173 Small roman numeral four
2174 Small roman numeral five
2175 Small roman numeral six
2176 Small roman numeral seven
2177 Small roman numeral eight
2178 Small roman numeral nine
2179 Small roman numeral ten
217A Small roman numeral eleven
217B Small roman numeral twelve
217C Small roman numeral fifty
217D Small roman numeral one hundred
217E Small roman numeral five hundred
217F Small roman numeral one thousand
2185 Roman numeral six late form
2186 Roman numeral fifty early form
2183 Roman numeral reversed one hundred

Other

Grapheme Code Point Description
μ 003BC Greek small letter MU
π 003C0 Greek small letter PI
02102 Complex numbers
0210D Hyperbolic plane, quaternions
02113 Script small L
02115 Natural numbers
𝕆 1D546 Octonions
02119 Prime numbers
0211A Rational numbers
0211D Real numbers
02124 Integers numbers

Science

Physics

Grapheme Code Point Description
ħ 0127 Reduced Planck constant
ƛ 019B Reduced wavelength
Ψ 03A8 Greek capital letter psi
λ 03BB Greek small letter lambda
ν 03BD Greek small letter nu
ψ 03C8 Greek small letter psi
ω 03C9 Greek small letter omega

Chemistry

Grapheme Code Point Description
21CC Equilibrium sign

Engineering

Grapheme Code Point Description
232D Cylindricity
Ω 03A9 Omega
2104 Centre line
FE4A Centreline overline
FE4E Centreline low line

Power Symbols

Grapheme Code Point Description
23FB Power symbol
23FC Power on-off
23FD Power on
2B58 Power off
23FE Power sleep

Financial

Currency

Grapheme Code Point Description
฿ 0E3F Thai Baht
20BF Bitcoin
¢ 00A2 Cent
¤ 00A4 Currency
$ 0024 Dollar
20AC Euro
20B4 Hryvnia
20B9 Indian rupee
20A4 Lira
20AA New shekel
20B1 Peso
20BD Russian ruble
£ 00A3 Sterling pound
20BA Turkish lira
20A9 Won
¥ 00A5 Yen (Yuan)

Other

Grapheme Code Point Description
2030 Per mille
2031 Per ten thousand

Box Drawing

┌─┬─┐  ╭─┬─╮  ┏━┳━┓  ╔═╦═╗
│ │ │  │ │ │  ┃ ┃ ┃  ║ ║ ║
├─┼─┤  ╞═╪═╡  ┣━╋━┫  ╠═╬═╣
│ │ │  │ │ │  ┃ ┃ ┃  ║ ║ ║
└─┴─┘  ╰─┴─╯  ┗━┻━┛  ╚═╩═╝

Miscellaneous

Numbers

Grapheme Code Point Description
2780 Circled digit one
2781 Circled digit two
2782 Circled digit three
2783 Circled digit four
2784 Circled digit five
2785 Circled digit six
2786 Circled digit seven
2787 Circled digit eight
2788 Circled digit nine
2789 Circled number ten
2776 Negative circled digit one
2777 Negative circled digit two
2778 Negative circled digit three
2779 Negative circled digit four
277A Negative circled digit five
277B Negative circled digit six
277C Negative circled digit seven
277D Negative circled digit eight
277E Negative circled digit nine
277F Negative circled number ten

Design

Grapheme Code Point Description
· 00B7 Middle dot
2024 One dot leader
2026 Horizontal ellipsis
201C Left double quotation mark
201D Right double quotation mark
« 00AB Left-pointing double angle quotation mark
» 00BB Right-pointing double angle quotation mark
2039 Single left-pointing angle quotation mark
203A Single right-pointing angle quotation mark
201E Double low-9 quotation mark
27E8 Mathematical left angle bracket
27E9 Mathematical right angle bracket
2014 General punctuation
~ 007E Basic latin
203B Reference mark

Arrows

Grapheme Code Point Description
21B6 Anticlockwise top semicircle arrow
21B7 Clockwise top semicircle arrow
2B9C Black leftwards equilateral arrowhead
2B9D Black upwards equilateral arrowhead
2B9E Black rightwards equilateral arrowhead
2B9F Black downwards equilateral arrowhead
2190 Leftwards arrow
2191 Upwards arrow
2192 Rightwards arrow
2193 Downwards arrow
2929 South east arrow and south west arrow
292A South west arrow and north west arrow
2927 North west arrow and north east arrow
2928 North east arrow and south east arrow
2936 Arrow pointing downwards then curving leftwards
2937 Arrow pointing downwards then curving rightwards
2938 Right-side arc clockwise arrow
2939 Left-side arc anticlockwise arrow
293A Top arc anticlockwise arrow
293B Bottom arc anticlockwise arrow
2921 North west and south east arrow
2922 North east and south west arrow
279B Drafting point rightwards arrow
279D Triangle-headed rightwards arrow
279E Heavy triangle-headed rightwards arrow
279F Dashed triangle-headed rightwards arrow
279C Heavy round-tipped rightwards arrow
27BE Open-outlined rightwards arrow
27BC Wedge-tailed rightwards arrow
27A6 Heavy black curved upwards and rightwards arrow
27A7 Squat black rightwards arrow
27A5 Heavy black curved downwards and rightwards arrow
27A4 Black rightwards arrowhead
27A2 Three-d top-lighted rightwards arrowhead
2798 Heavy south east arrow
2799 Heavy rightwards arrow
279A Heavy north east arrow
21C7 Leftwards paired arrows
21C8 Upwards paired arrows
21C9 Rightwards paired arrows
21CA Downwards paired arrows
21F6 Three rightwards arrows
21F3 Up down white arrow
21F1 North west arrow to corner
21F2 South east arrow to corner
21E4 Leftwards arrow to bar
21E5 Rightwards arrow to bar
21FD Leftwards open-headed arrow
21FE Rightwards open-headed arrow
21FF Left right open-headed arrow
21E6 Leftwards white arrow
21E7 Upwards white arrow
21E8 Rightwards white arrow
21E9 Downwards white arrow
23F4 Black medium left-pointing triangle
23F5 Black medium right-pointing triangle
23F6 Black medium up-pointing triangle
23F7 Black medium down-pointing triangle
21C4 Rightwards arrow over leftwards arrow
21C6 Leftwards arrow over rightwards arrow
21C5 Upwards arrow leftwards of downwards arrow
21BA Anticlockwise open circle arrow
21BB Clockwise open circle arrow

Temperature

Grapheme Code Point Description
2103 Degree celsius
2109 Degree fahrenheit
2606 White star
2605 Black star
2601 Cloud
263D First quarter moon

Music

Grapheme Code Point Description
2669 Quarter note
266A Eighth note
266B Beamed eighth notes
266C Beamed sixteenth notes
266D Flat sign
266E Natural sign
266F Sharp sign
𝄆 1D106 Left repeat sign
𝄇 1D107 Right repeat sign
𝄐 1D110 Fermata
𝄜 1D11C Six-string fretboard
𝄞 1D11E G clef
𝄟 1D11F G clef ottava alta
𝄠 1D120 G clef ottava bassa
𝄡 1D121 C clef
𝄢 1D122 F clef
𝄣 1D123 F clef ottava alta
𝄤 1D124 F clef ottava bassa
𝄥 1D125 Drum clef-1
𝄦 1D126 Drum clef-2
𝇐 1D1D0 Gregorian c clef
𝇑 1D1D1 Gregorian f clef
𝄪 1D12A Double sharp
𝄫 1D12B Double flat
𝄴 1D134 Common time
𝄻 1D13B Whole rest
𝄼 1D13C Half rest
𝄽 1D13D Quarter rest
𝅗 1D157 Void notehead
𝅘 1D158 Notehead black
𝆒 1D192 Crescendo

Chess

Grapheme Code Point Description
2654 White king
2655 White queen
2656 White rook
2657 White bishop
2658 White knight
2659 White pawn
265A Black king
265B Black queen
265C Black rook
265D Black bishop
265E Black knight
265F Black pawn

Design

Grapheme Code Point Description
2022 Bullet
25CE Bullseye
25C9 Fisheye
2B58 Heavy circle and arrows
2717 Ballot x
2713 Check mark
2610 Ballot box
2611 Ballot box with check
2612 Ballot box with x
270E Lower right pencil
2710 Upper right pencil
261B Black right pointing index
2725 Four club-spoked asterisk
2724 Heavy four balloon-spoked asterisk
273B Teardrop-spoked asterisk
2732 Open centre asterisk
2731 Heavy asterisk

Block Elements

https://www.unicode.org/charts/PDF/U2580.pdf

Grapheme Code Point Description
2580 Upper half block
2581 Lower one eighth block
2582 Lower one quarter block
2583 Lower three eighths block
2584 Lower half block
2585 Lower five eighths block
2586 Lower three quarters block
2587 Lower seven eighths block
2588 Full block
2589 Left seven eighths block
258A Left three quarters block
258B Left five eighths block
258C Left half block
258D Left three eighths block
258E Left one quarter block
258F Left one eighth block
2590 Right half block
2591 Light shade
2592 Medium shade
2593 Dark shade
2594 Upper one eighth block
2595 Right one eighth block
2596 Quadrant lower left
2597 Quadrant lower right
2598 Quadrant upper left
2599 Quadrant upper left and lower left and lower right
259A Quadrant upper left and lower right
259B Quadrant upper left and upper right and lower left
259C Quadrant upper left and upper right and lower right
259D Quadrant upper right
259E Quadrant upper right and lower left
259F Quadrant upper right and lower left and lower right

Control Pictures

https://www.unicode.org/charts/PDF/U2400.pdf

Grapheme Code Point Description
2400 Null
2401 Start of heading
2402 Start of text
2403 End of text
2404 End of transmission
2405 Enquiry
2406 Acknowledge
2407 Bell
2408 Backspace
2409 Horizontal tabulation
240A Line feed
240B Vertical tabulation
240C Form feed
240D Carriage return
240E Shift out
240F Shift in
2410 Data link escape
2411 Device control one
2412 Device control two
2413 Device control three
2414 Device control four
2415 Negative acknowledge
2416 Synchronous idle
2417 End of transmission bloCK
2418 Cancel
2419 End of medium
241A Substitute
241B Escape
241C File separator
241D Group separator
241E Record separator
241F Unit separator
2420 Space
2421 Delete
2422 Blank symbol
2423 Open box
2424 Newline
2425 Delete form two
2426 Substitute form two

Reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment