module Termisu::UnicodeWidth

Overview

Unicode width calculation for terminal display.

This module implements Unicode Annex #11 (East Asian Width) to determine the display column width of characters and grapheme clusters.

Based on Markus Kuhn's wcwidth.c reference implementation and Unicode 15.

Width Values

Ambiguous Width Characters

East Asian Ambiguous characters default to width 1 for consistency across terminals. See AMBIGUOUS_WIDTH policy constant.

Defined in:

termisu/unicode_width.cr

Constant Summary

AMBIGUOUS_WIDTH = 1_u8

Policy for East Asian Ambiguous width characters. These characters can render as width 1 or 2 depending on terminal/font. We default to 1 for stable cross-terminal behavior.

COMBINING_MARK_COUNT = COMBINING_MARK_RANGES.size // 2

Number of ranges in COMBINING_MARK_RANGES (array size / 2).

COMBINING_MARK_RANGES = [768, 879, 1155, 1161, 1552, 1562, 1611, 1631, 1648, 1648, 1750, 1756, 1759, 1764, 1767, 1768, 1770, 1773, 1809, 1809, 1840, 1866, 1958, 1968, 2027, 2035, 2045, 2045, 2070, 2073, 2075, 2083, 2085, 2087, 2089, 2093, 2137, 2139, 2200, 2207, 2250, 2273, 2275, 2306, 2362, 2362, 2364, 2364, 2369, 2376, 2381, 2381, 2385, 2391, 2402, 2403, 2433, 2433, 2492, 2492, 2497, 2500, 2509, 2509, 2530, 2531, 2558, 2558, 2561, 2562, 2620, 2620, 2625, 2626, 2631, 2632, 2635, 2637, 2641, 2641, 2672, 2673, 2677, 2677, 2689, 2690, 2748, 2748, 2753, 2757, 2759, 2760, 2765, 2765, 2786, 2787, 2810, 2815, 2817, 2817, 2876, 2876, 2879, 2879, 2881, 2884, 2893, 2893, 2901, 2902, 2914, 2915, 2946, 2946, 3008, 3008, 3021, 3021, 3072, 3072, 3076, 3076, 3132, 3132, 3134, 3136, 3142, 3144, 3146, 3149, 3157, 3158, 3170, 3171, 3201, 3201, 3260, 3260, 3263, 3263, 3270, 3270, 3276, 3277, 3298, 3299, 3328, 3329, 3387, 3388, 3393, 3396, 3405, 3405, 3426, 3427, 3457, 3457, 3530, 3530, 3538, 3540, 3542, 3542, 3633, 3633, 3636, 3642, 3655, 3662, 3761, 3761, 3764, 3772, 3784, 3790, 3864, 3865, 3893, 3893, 3895, 3895, 3897, 3897, 3953, 3966, 3968, 3972, 3974, 3975, 3981, 3991, 3993, 4028, 4038, 4038, 4141, 4144, 4146, 4151, 4153, 4154, 4157, 4158, 4184, 4185, 4190, 4192, 4209, 4212, 4226, 4226, 4229, 4230, 4237, 4237, 4253, 4253, 4957, 4959, 5906, 5908, 5938, 5939, 5970, 5971, 6002, 6003, 6068, 6069, 6071, 6077, 6086, 6086, 6089, 6099, 6109, 6109, 6155, 6157, 6159, 6159, 6277, 6278, 6313, 6313, 6432, 6434, 6439, 6440, 6450, 6450, 6457, 6459, 6679, 6680, 6683, 6683, 6742, 6742, 6744, 6750, 6752, 6752, 6754, 6754, 6757, 6764, 6771, 6780, 6783, 6783, 6832, 6862, 6912, 6915, 6964, 6964, 6966, 6970, 6972, 6972, 6978, 6978, 7019, 7027, 7040, 7041, 7074, 7077, 7080, 7081, 7083, 7085, 7142, 7142, 7144, 7145, 7149, 7149, 7151, 7153, 7212, 7219, 7222, 7223, 7376, 7378, 7380, 7392, 7394, 7400, 7405, 7405, 7412, 7412, 7416, 7417, 7616, 7679, 8400, 8432, 11503, 11505, 11647, 11647, 11744, 11775, 12330, 12333, 12441, 12442, 42607, 42610, 42612, 42621, 42654, 42655, 42736, 42737, 43010, 43010, 43014, 43014, 43019, 43019, 43045, 43046, 43052, 43052, 43204, 43205, 43232, 43249, 43263, 43263, 43302, 43309, 43335, 43345, 43392, 43394, 43443, 43443, 43446, 43449, 43452, 43453, 43493, 43493, 43561, 43566, 43569, 43570, 43573, 43574, 43587, 43587, 43596, 43596, 43644, 43644, 43696, 43696, 43698, 43700, 43703, 43704, 43710, 43711, 43713, 43713, 43756, 43757, 43766, 43766, 44005, 44005, 44008, 44008, 44013, 44013, 65024, 65039, 65056, 65071, 66045, 66045, 66272, 66272, 66422, 66426, 68097, 68099, 68101, 68102, 68108, 68111, 68152, 68154, 68159, 68159, 68325, 68326, 68900, 68903, 69291, 69292, 69373, 69375, 69446, 69456, 69506, 69509, 69633, 69633, 69688, 69702, 69744, 69744, 69747, 69748, 69759, 69761, 69811, 69814, 69817, 69818, 69826, 69826, 69888, 69890, 69927, 69931, 69933, 69940, 70003, 70003, 70016, 70017, 70070, 70078, 70089, 70092, 70095, 70095, 70191, 70193, 70196, 70196, 70198, 70199, 70206, 70206, 70209, 70209, 70367, 70367, 70371, 70378, 70400, 70401, 70459, 70460, 70464, 70464, 70502, 70508, 70512, 70516, 70712, 70719, 70722, 70724, 70726, 70726, 70750, 70750, 70835, 70840, 70842, 70842, 70847, 70848, 70850, 70851, 71090, 71093, 71100, 71101, 71103, 71104, 71132, 71133, 71219, 71226, 71229, 71229, 71231, 71232, 71339, 71339, 71341, 71341, 71344, 71349, 71351, 71351, 71453, 71455, 71458, 71461, 71463, 71467, 71727, 71735, 71737, 71738, 71995, 71996, 71998, 71998, 72003, 72003, 72148, 72151, 72154, 72155, 72160, 72160, 72193, 72202, 72243, 72248, 72251, 72254, 72263, 72263, 72273, 72278, 72281, 72283, 72330, 72342, 72344, 72345, 72752, 72758, 72760, 72765, 72767, 72767, 72850, 72871, 72874, 72880, 72882, 72883, 72885, 72886, 73009, 73014, 73018, 73018, 73020, 73021, 73023, 73029, 73031, 73031, 73104, 73105, 73109, 73109, 73111, 73111, 73459, 73460, 73472, 73473, 73526, 73530, 73536, 73536, 73538, 73538, 78912, 78912, 78919, 78933, 92912, 92916, 92976, 92982, 94031, 94031, 94095, 94098, 94180, 94180, 113821, 113822, 118528, 118573, 118576, 118598, 119143, 119145, 119163, 119170, 119173, 119179, 119210, 119213, 119362, 119364, 121344, 121398, 121403, 121452, 121461, 121461, 121476, 121476, 121499, 121503, 121505, 121519, 122880, 122886, 122888, 122904, 122907, 122913, 122915, 122916, 122918, 122922, 123023, 123023, 123184, 123190, 123566, 123566, 123628, 123631, 124140, 124143, 125136, 125142, 125252, 125258, 917760, 917999]

Unicode combining mark ranges for categories Mn (Nonspacing_Mark) and Me (Enclosing_Mark). Sorted by start codepoint for binary search. Derived from Unicode 15.0 character database.

Stored as a flat array of [start, end] pairs (inclusive on both sides). Index i * 2 = range start, i * 2 + 1 = range end. Use COMBINING_MARK_COUNT for the number of ranges.

Class Method Summary

Class Method Detail

def self.codepoint_width(cp : Int32) : UInt8 #

Returns the display width of a single Unicode codepoint.

Parameters:

  • cp: Unicode codepoint as Int32

Returns 0, 1, or 2 for display columns.

UnicodeWidth.codepoint_width('A'.ord) # => 1
UnicodeWidth.codepoint_width('中'.ord) # => 2
UnicodeWidth.codepoint_width(0x0301)  # => 0 (combining acute)

[View source]
def self.grapheme_width(grapheme : String) : UInt8 #

Returns the display width of a grapheme cluster (String).

Uses Crystal's built-in grapheme segmentation to handle combining sequences, ZWJ sequences, and emoji correctly.

Parameters:

  • grapheme: A String representing a single grapheme cluster

Returns 0, 1, or 2 for display columns.

UnicodeWidth.grapheme_width("e\u{301}")         # => 1 (é as combining sequence)
UnicodeWidth.grapheme_width("\u{26A0}\u{FE0E}") # => 1 (⚠︎ text presentation)
UnicodeWidth.grapheme_width("\u{26A0}\u{FE0F}") # => 2 (⚠️ emoji presentation)
UnicodeWidth.grapheme_width("👨‍👩‍👧‍👦")          # => 2 (family emoji ZWJ sequence)
UnicodeWidth.grapheme_width("🇺🇸")               # => 2 (regional indicator flag)

[View source]
def self.string_width(text : String) : Int32 #

Returns the display width of a string (multiple grapheme clusters).

Uses Crystal's grapheme segmentation and sums grapheme widths.

Parameters:

  • text: Any String

Returns total column width.

UnicodeWidth.string_width("Hello") # => 5
UnicodeWidth.string_width("你好")    # => 4
UnicodeWidth.string_width("café")  # => 4

[View source]