-
-
Notifications
You must be signed in to change notification settings - Fork 27
Description
This is a proposal for cleaning up the confusion between the internal MCCS character encoding and external format, the Xerox XCCS encoding/standard, Unicode, etc.
To start, we clarify that MCCS is the character coding standard used internally. For example, the mappings provided by CHARACTERSETNAMES and CHARACTERNAMES go from names to MCCS values.
MCCS is a derivative of, but not identical to, the Xerox character encoding standard. The well-known differences are the codes for dollar, underscore, and caret. To make this explicit, the functions (MTOXCODE MEDLEYCODE) and (XTOMCODE XCODE) will map between the 2 different encodings. MCODE x24 corresponds to XCODE xA4, and vice versa, underscore interchanges with left arrow, caret switches with uparrow.
:MEDLEY will be the default external format if another format isn't specified for reading or writing. (:XCCS must be specified to read a Xerox-encoded file, if we ever want to do that.)
The :Medley external format represents character codes using the shifting stringlets of XCCS, but reading and writing is completely transparent, no codes are translated. In contrast, either XTOMCODE or MTOXCODE is applied to every code decoded from a :XCCS file. depending on reading or writing.
(Note that the key-action tables will produce the MCCS codes for the keys labeled $, ^, _. We'll need some other keyboard convention to type in currency, caret, or underscore.)
UNICODE:
With respect to Unicode, functions MTOUCODE and UTOMCODE will map from Unicode codes (in an UTF-8 external format) to Medley codes. Those mappings will be derived from our XCCS-to-Unicode tables in the Unicode/xerox directory, essentially by executing either (XTOMCODE (UTOX CODE....)) or (MTOXCODE (XTOUCODE ...). These compositions will be applied when the mapping vectors are compiled from the data files, to produce the resource files with explicit names UNICODE-TO-MEDLEY.TXT and MEDLEY-TO-UNICODE.TXT. These will replace the current UNICODE-MAPPINGS.TXT and INVERTED-UNICODE-MAPPINGS.TXT, which are implicitly defined on XCCS.
FONTS:
In-memory fonts will have glyphs in positions that correspond to the MCCS codes. If those are read from XCCS font files (Classic, Terminal...), the glyphs will be moved on reading according to the XTOMCODE mappings. Thus, the uparrow glyph will end up at x5E, the caret glyph will end up at xAD.
Alternatively, we could run an offline process to transform our XCCS fonts to MCCS fonts, but that runs up against the problem that Matt has pointed out, that we currently don't have a way of writing font files that include all of the character spacing information (kerning etc.). So we may be stuck with load-time fiddling.
Fonts that are constructed from BDF files will have the glyphs properly situated according to MCCS, by translating with UTOMCODE.
Alto text-fonts (Gacha, Helvetica, Timesroman) that already have the glyphs in the right place might not need to be further transformed.
HARDCOPY
The mappings for hardcopy files may also require some adjustment. For example, MTOXCODE should be applied in the creation of Interpress files (if we ever want to do that). The postscript stream may also need to be modified, if it doesn't already have the Medley mappings built into its substitutions (perhaps moving metrics around).
The goal of all of this is to move XCCS out to the periphery, like any other code mappings or format that we don't want to pay much attention to but which we might want to interact with (e.g. the various IO8859 code sets). It's just an accident that we inherited a set of XCCS-coded fonts and that we were able to create/find the XCCS-to-Unicode mapping tables.
Sub-issues
Metadata
Metadata
Assignees
Labels
Type
Projects
Status