module
Termisu::UnicodeWidth
Overview
Unicode width calculation for terminal display.
This module implements Unicode Annex #11 (East Asian Width) to determine the display column width of characters and grapheme clusters.
Based on Markus Kuhn's wcwidth.c reference implementation and Unicode 15.
Width Values
0: Combining marks, control characters, non-printable1: Narrow characters (Latin, Greek, Cyrillic, most symbols)2: Wide characters (CJK, fullwidth forms, emoji)
Ambiguous Width Characters
East Asian Ambiguous characters default to width 1 for consistency
across terminals. See AMBIGUOUS_WIDTH policy constant.
Defined in:
termisu/unicode_width.crConstant Summary
-
AMBIGUOUS_WIDTH =
1_u8 -
Policy for East Asian Ambiguous width characters. These characters can render as width 1 or 2 depending on terminal/font. We default to 1 for stable cross-terminal behavior.
-
COMBINING_MARK_COUNT =
COMBINING_MARK_RANGES.size // 2 -
Number of ranges in COMBINING_MARK_RANGES (array size / 2).
-
COMBINING_MARK_RANGES =
[768, 879, 1155, 1161, 1552, 1562, 1611, 1631, 1648, 1648, 1750, 1756, 1759, 1764, 1767, 1768, 1770, 1773, 1809, 1809, 1840, 1866, 1958, 1968, 2027, 2035, 2045, 2045, 2070, 2073, 2075, 2083, 2085, 2087, 2089, 2093, 2137, 2139, 2200, 2207, 2250, 2273, 2275, 2306, 2362, 2362, 2364, 2364, 2369, 2376, 2381, 2381, 2385, 2391, 2402, 2403, 2433, 2433, 2492, 2492, 2497, 2500, 2509, 2509, 2530, 2531, 2558, 2558, 2561, 2562, 2620, 2620, 2625, 2626, 2631, 2632, 2635, 2637, 2641, 2641, 2672, 2673, 2677, 2677, 2689, 2690, 2748, 2748, 2753, 2757, 2759, 2760, 2765, 2765, 2786, 2787, 2810, 2815, 2817, 2817, 2876, 2876, 2879, 2879, 2881, 2884, 2893, 2893, 2901, 2902, 2914, 2915, 2946, 2946, 3008, 3008, 3021, 3021, 3072, 3072, 3076, 3076, 3132, 3132, 3134, 3136, 3142, 3144, 3146, 3149, 3157, 3158, 3170, 3171, 3201, 3201, 3260, 3260, 3263, 3263, 3270, 3270, 3276, 3277, 3298, 3299, 3328, 3329, 3387, 3388, 3393, 3396, 3405, 3405, 3426, 3427, 3457, 3457, 3530, 3530, 3538, 3540, 3542, 3542, 3633, 3633, 3636, 3642, 3655, 3662, 3761, 3761, 3764, 3772, 3784, 3790, 3864, 3865, 3893, 3893, 3895, 3895, 3897, 3897, 3953, 3966, 3968, 3972, 3974, 3975, 3981, 3991, 3993, 4028, 4038, 4038, 4141, 4144, 4146, 4151, 4153, 4154, 4157, 4158, 4184, 4185, 4190, 4192, 4209, 4212, 4226, 4226, 4229, 4230, 4237, 4237, 4253, 4253, 4957, 4959, 5906, 5908, 5938, 5939, 5970, 5971, 6002, 6003, 6068, 6069, 6071, 6077, 6086, 6086, 6089, 6099, 6109, 6109, 6155, 6157, 6159, 6159, 6277, 6278, 6313, 6313, 6432, 6434, 6439, 6440, 6450, 6450, 6457, 6459, 6679, 6680, 6683, 6683, 6742, 6742, 6744, 6750, 6752, 6752, 6754, 6754, 6757, 6764, 6771, 6780, 6783, 6783, 6832, 6862, 6912, 6915, 6964, 6964, 6966, 6970, 6972, 6972, 6978, 6978, 7019, 7027, 7040, 7041, 7074, 7077, 7080, 7081, 7083, 7085, 7142, 7142, 7144, 7145, 7149, 7149, 7151, 7153, 7212, 7219, 7222, 7223, 7376, 7378, 7380, 7392, 7394, 7400, 7405, 7405, 7412, 7412, 7416, 7417, 7616, 7679, 8400, 8432, 11503, 11505, 11647, 11647, 11744, 11775, 12330, 12333, 12441, 12442, 42607, 42610, 42612, 42621, 42654, 42655, 42736, 42737, 43010, 43010, 43014, 43014, 43019, 43019, 43045, 43046, 43052, 43052, 43204, 43205, 43232, 43249, 43263, 43263, 43302, 43309, 43335, 43345, 43392, 43394, 43443, 43443, 43446, 43449, 43452, 43453, 43493, 43493, 43561, 43566, 43569, 43570, 43573, 43574, 43587, 43587, 43596, 43596, 43644, 43644, 43696, 43696, 43698, 43700, 43703, 43704, 43710, 43711, 43713, 43713, 43756, 43757, 43766, 43766, 44005, 44005, 44008, 44008, 44013, 44013, 65024, 65039, 65056, 65071, 66045, 66045, 66272, 66272, 66422, 66426, 68097, 68099, 68101, 68102, 68108, 68111, 68152, 68154, 68159, 68159, 68325, 68326, 68900, 68903, 69291, 69292, 69373, 69375, 69446, 69456, 69506, 69509, 69633, 69633, 69688, 69702, 69744, 69744, 69747, 69748, 69759, 69761, 69811, 69814, 69817, 69818, 69826, 69826, 69888, 69890, 69927, 69931, 69933, 69940, 70003, 70003, 70016, 70017, 70070, 70078, 70089, 70092, 70095, 70095, 70191, 70193, 70196, 70196, 70198, 70199, 70206, 70206, 70209, 70209, 70367, 70367, 70371, 70378, 70400, 70401, 70459, 70460, 70464, 70464, 70502, 70508, 70512, 70516, 70712, 70719, 70722, 70724, 70726, 70726, 70750, 70750, 70835, 70840, 70842, 70842, 70847, 70848, 70850, 70851, 71090, 71093, 71100, 71101, 71103, 71104, 71132, 71133, 71219, 71226, 71229, 71229, 71231, 71232, 71339, 71339, 71341, 71341, 71344, 71349, 71351, 71351, 71453, 71455, 71458, 71461, 71463, 71467, 71727, 71735, 71737, 71738, 71995, 71996, 71998, 71998, 72003, 72003, 72148, 72151, 72154, 72155, 72160, 72160, 72193, 72202, 72243, 72248, 72251, 72254, 72263, 72263, 72273, 72278, 72281, 72283, 72330, 72342, 72344, 72345, 72752, 72758, 72760, 72765, 72767, 72767, 72850, 72871, 72874, 72880, 72882, 72883, 72885, 72886, 73009, 73014, 73018, 73018, 73020, 73021, 73023, 73029, 73031, 73031, 73104, 73105, 73109, 73109, 73111, 73111, 73459, 73460, 73472, 73473, 73526, 73530, 73536, 73536, 73538, 73538, 78912, 78912, 78919, 78933, 92912, 92916, 92976, 92982, 94031, 94031, 94095, 94098, 94180, 94180, 113821, 113822, 118528, 118573, 118576, 118598, 119143, 119145, 119163, 119170, 119173, 119179, 119210, 119213, 119362, 119364, 121344, 121398, 121403, 121452, 121461, 121461, 121476, 121476, 121499, 121503, 121505, 121519, 122880, 122886, 122888, 122904, 122907, 122913, 122915, 122916, 122918, 122922, 123023, 123023, 123184, 123190, 123566, 123566, 123628, 123631, 124140, 124143, 125136, 125142, 125252, 125258, 917760, 917999] -
Unicode combining mark ranges for categories Mn (Nonspacing_Mark) and Me (Enclosing_Mark). Sorted by start codepoint for binary search. Derived from Unicode 15.0 character database.
Stored as a flat array of [start, end] pairs (inclusive on both sides). Index
i * 2= range start,i * 2 + 1= range end. UseCOMBINING_MARK_COUNTfor the number of ranges.
Class Method Summary
-
.codepoint_width(cp : Int32) : UInt8
Returns the display width of a single Unicode codepoint.
-
.grapheme_width(grapheme : String) : UInt8
Returns the display width of a grapheme cluster (String).
-
.string_width(text : String) : Int32
Returns the display width of a string (multiple grapheme clusters).
Class Method Detail
Returns the display width of a single Unicode codepoint.
Parameters:
cp: Unicode codepoint as Int32
Returns 0, 1, or 2 for display columns.
UnicodeWidth.codepoint_width('A'.ord) # => 1
UnicodeWidth.codepoint_width('中'.ord) # => 2
UnicodeWidth.codepoint_width(0x0301) # => 0 (combining acute)
Returns the display width of a grapheme cluster (String).
Uses Crystal's built-in grapheme segmentation to handle combining sequences, ZWJ sequences, and emoji correctly.
Parameters:
grapheme: A String representing a single grapheme cluster
Returns 0, 1, or 2 for display columns.
UnicodeWidth.grapheme_width("e\u{301}") # => 1 (é as combining sequence)
UnicodeWidth.grapheme_width("\u{26A0}\u{FE0E}") # => 1 (⚠︎ text presentation)
UnicodeWidth.grapheme_width("\u{26A0}\u{FE0F}") # => 2 (⚠️ emoji presentation)
UnicodeWidth.grapheme_width("👨👩👧👦") # => 2 (family emoji ZWJ sequence)
UnicodeWidth.grapheme_width("🇺🇸") # => 2 (regional indicator flag)
Returns the display width of a string (multiple grapheme clusters).
Uses Crystal's grapheme segmentation and sums grapheme widths.
Parameters:
text: Any String
Returns total column width.
UnicodeWidth.string_width("Hello") # => 5
UnicodeWidth.string_width("你好") # => 4
UnicodeWidth.string_width("café") # => 4