Unicode diacritics using combining characters fail in Numbers
Unicode characters work normally very well in Numbers, but I think I found a bug in how some text functions, such as RIGHT(), work when unicode combining characters are used.
There are several ways to write a Greek small iota with perispomeni (ῖ)
- build from scratch "ι"&CHAR("0x342")
- use existing unicode CHAR("0x1FD6")
Similarly, Greek iota with macron and acute accent (ῑ́) can be
- built from scratch "ι"&CHAR("0x304")&CHAR("0x301"), or
- using unicode Greek small iota with macron CHAR("0x1FD1")&CHAR("0x301").
In Numbers, the resulting characters are in both cases identical under =, though they have different values under CODE(), 4093103833860 resp. 34982508626689 Greek small iota with macron.
In general this all works well in Numbers, at least once a suitable font is used (here Arial).
Indeed, if I use only single, unicode-ready characters, for instance the 5 last characters using RIGHT(*, 5) of eg. αβγῑκλμῖπρσ are, correctly, μῖπρσ. Five in total.
Problems: if I create any diacritics using Unicode combining characters and CHAR(), the last FIVE (5) characters are wrong
- αβγικλμῖπρσ gives ῖπρσ which is 4 -characters - WRONG
- αβγῑ́κλμιπρσ gives κλμιπρσ or λμιπρσ (depending on construction), which is 7 resp. 6 characters - WRONG
- αβγῑ́κλμῖπρσ gives κλμῖπρσ or λμῖπρσ (depending construction), which is 7 resp. 6 characters - WRONG
So there is a problem with both of the diacritics I tested, both when used together and in isolation. Here below there is a screenshot of the worksheet I used to show the problem:
Tried so far:
- checked that diacritics are from right unicode blocks
- removing any non-printing matter with CLEAN(), TRIM() and checking with LEN().
Questions:
- How can I construct Unicode characters with combining characters that are correctly understood by Numbers?
- What's the underlying problem here?
If there are Numbers and/or Unicode resources on this topic, I'd be grateful.
Mac Studio, macOS 15.1