Skip to content

Unmapped XCCS and Unicode characters #1971

@rmkaplan

Description

@rmkaplan

I originally set up the XCCS-Unicode tables to set up on-the-fly mappings for codes that were not defined in the tables.

The goal was to do something reasonable when you read a character from UTF-8/Unicode file into Medley's internal XCCS encoding, even if the mapping tables do not provide a proper corresponding XCCS code for that particular Unicode code. Thus, a unique XCCS code is assigned from a reserved/unused part of the XCCS space, internal manipulations can be carried out on that code even if it can't be interpreted, and the original Unicode character shows up when that XCCS code is written back to a Unicode file.

As I understand it, @hjellinek has now asked for a function that fetches the Unicode corresponding to an arbitrary XCCS code, and returns NIL if such a Unicode does not exist. A correspondent would be missing for all of the reserved/unused parts of the XCCS space, and perhaps also missing because of holes in our table.

But the strategy for assigning temporary mappings introduces a confusion. An initially reserved/unused code may have been filled in as an arbitrary mapping to preserve the round-trip behave for an otherwise unmapped Unicode.

So I want to confirm: an XCCS code whose Unicode did not come from one of our (reliable) mapping tables should be treated as if it did not have a mapping at all, NIL should be returned for that XCCS code.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions