vfont2 Proposal
Introduction
vfont2 is an attempt to overcome the limitations of the venerable Berkeley Unix vfont format, in the same way that H. Peter Anvin's original PC Screen Font format was expanded to the more capable PSF2 format under Linux.
Like the original PSF, the original vfont is limited in how many characters it can contain (256), the size of characters (127 × 127 pixels) and its lack of Unicode support. vfont2 attempts to overcome these limitations, so that a PSF2 file can be losslessly converted to vfont2 and back.
File format
All 2-byte and 4-byte integers are stored in little-endian form. This differs from the original vfont, which could use either big- or little- endian storage, distinguished by the magic number.
Header
The header is very similar to a PSF2 header:
#define VFONT2_MAGIC0 0x27 #define VFONT2_MAGIC1 0x5b #define VFONT2_MAGIC2 0xA4 #define VFONT2_MAGIC3 0x68 /* bits used in flags */ #define VFONT2_HAS_UNICODE_TABLE 0x01 /* max version recognized so far */ #define VFONT2_MAXVERSION 0 /* UTF8 separators */ #define VFONT2_SEPARATOR 0xFF #define VFONT2_STARTSEQ 0xFE struct vfont2_header { unsigned char magic[4]; unsigned int version; unsigned int headersize; /* offset of dispatch table */ unsigned int flags; unsigned int length; /* number of glyphs */ unsigned int bitmap_size; /* total number of bytes for bitmaps */ unsigned int max_height; /* max size of a character bitmap */ unsigned int max_width; };
Compared to an original vfont file, the "unused" xtend field has no counterpart. If an extension to the header is required, it would be straightforward to increase headersize and add the extra field(s).
Compared to a PSF2 file, the only fields that differ in meaning are:
magic: Based on the PSF2 magic number, only with high and low nibbles swapped in each byte. A nod to the fact that the original vfont used the same magic number as PSF1, but in a different base: 0436 octal rather than 0x436 hexadecimal.
bitmap_size: Since characters in a vfont can each have their own size, the header gives the total size of all bitmaps, not the size of the bitmap for a single character.
max_height: The maximum vertical size of any glyph in the font, in pixels.
max_width: The maximum horizontal size of any glyph in the font, in pixels.
Dispatch table
At offset headersize in the file is the dispatch table. This has length entries, each 18 bytes long.
struct vfont2_dispatch { unsigned int addr; /* offset of glyph from start of bitmaps */ unsigned int size; /* number of bytes (0 if no glyph) */ signed short up; /* number of rows above baseline point */ signed short down; /* number of rows below baseline point */ signed short left; /* number of columns left of baseline point */ signed short right; /* number of columns right of baseline point */ signed short width; /* logical width, used by troff */ };
The number of lines present in the bitmap is up+down. Similarly, the number of columns is left+right.
Provided the total height and width remain positive, any of up, down, left or right can be negative. This corresponds to a baseline point outside the character bitmap.
The logical width is used when rendering the characters; it gives the number of pixels between one character's baseline point and the next.
Bitmaps
The bitmaps immediately follow the dispatch table. For each character, there are up+down rows, each ((left+right) + 7) / 8 bytes.
Unicode table
If a Unicode table is present, it will immediately follow the bitmaps. It is in the same format as PSF2:
<unicodedescription> := <uc>*<seq>*<term> <seq> := <ss><uc><uc>* <ss> := 0xFE <term> := 0xFF
where <uc> is a Unicode value coded in UTF-8, and
*
denotes zero or more occurrences of the preceding item.
The leading <uc>* part gives Unicode symbols that are all represented by this font position. The following sequences are sequences of Unicode symbols — probably a symbol together with combining accents — also represented by this font position.
For example, at the font position for a capital A-ring glyph, the Unicode sequence may be:
c3 85 e2 84 ab fe 41 cc 8a ff
The first six bytes correspond to two Unicode characters (U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE and U+212B ANGSTROM SIGN). Then 0xfe acts as a separator, and the next three bytes correspond to the two Unicode characters U+0041 LATIN CAPITAL LETTER A and U+030A COMBINING RING ABOVE.
File extension
The original vfont did not use a specific file extension, preferring instead to end with a point size (eg: .12, .14). For vfont2, I suggest the obvious .vfont2, or .vfont2u if you want to indicate that it contains a Unicode directory.
John Elliott 2021-01-22