Skip to content

Latest commit

 

History

History
284 lines (226 loc) · 12.5 KB

HOW_TO_MAKE_EN.md

File metadata and controls

284 lines (226 loc) · 12.5 KB

How to make pinyin-font

Requirement

Based font

han-serif

The font used here is based on Source-Han-TrueType. This is a TTF version of Source Han Sans/Source Han Serif with reduced file size. All required Chinese characters are included.
M+ M Type-1's mplus-1m-medium.ttf is used for the pinyin part of this font.

handwritten

The font used here is based on 小赖字体/Xiaolai Font.
And remove Hangul characters(a960 #ꥠ ~ d7fb #ퟻ) from this font to reduce glyphs.
SetoFontSP is used for the pinyin part of this font.

Dependencies

  • macOS 10.15(Catalina)
  • python 3.7
  • otfcc

python

$ pyenv global 3.7.2
$ pip install -r requirements.txt

otfcc

otfcc is lightweight and support IVS

jq

jq can mangle the data format that you have into the one that you want with very little effort

mac only

# Install Xcode by mas-cli
$ mas install 497799835
# Note: Xcode initially gets an error because the [Command line Tools:] list box is blank.
# The following solutions will fix this problem.
# Refer to [エラー:xcode-select: error: tool 'xcodebuild' requires Xcode, but active developer directory '/Library/Developer/CommandLineTools' is a command line tools instance](https://qiita.com/eytyet/items/59c5bad1c167d5addc68)

# Install otfcc
$ brew tap caryll/tap 
$ brew install otfcc-mac64

Generation procedure

  1. Making a homograph dictionary(optional)
    to details
$ cd <PROJECT-ROOT>/res/phonics/duo_yin_zi/scripts/
$ python make_pattern_table.py
  1. Make an unicode table of the target Chinese characters(optional)
    to details
$ cd <PROJECT-ROOT>/res/phonics/unicode_mapping_table/
$ python make_unicode_pinyin_map_table.py 
  1. Build the font
$ cd <PROJECT ROOT>
$ time python src/main.py --style han_serif

or

$ time python src/main.py --style handwritten

Technical Notes

How to set the canvas size of the pinyin display area

outline

    METADATA_FOR_PINYIN = {
        "pinyin_canvas":{
            "width"    : 850,   # The width of the canvas.
            "height"   : 283.3, # The height of the canvas.
            "base_line": 935,   # The height from the bottom of the Chinese character canvas to pinyin canvas.
            "tracking" : 22.145 # Character spacing in the pinyin display area (Tracking is about uniform spacing across a text selection).
        },
        "expected_hanzi_canvas":{
            "width" : 1000, # Expected Width of the Chinese character canvas.
            "height": 1000, # Expected height of the Chinese character canvas.
        }
    }

refer to pinyin_glyph.py, config.py

Componentization of the glyfs

glyf can be componentized and referenced. You can reduce the volume by reusing them, and since they are placed by affine transformation, you can easily set their size and position.

Reference usage examples:

"cid48219": {
  "advanceWidth": 2048,
  "advanceHeight": 2628.2,
  "verticalOrigin": 1803,
  "references": [
    {
      "glyph": "arranged_ji1", "x": 0, "y": 0, "a": 1, "b": 0, "c": 0, "d": 1
    },
    {
      "glyph": "cid48219.ss00", "x": 0, "y": 0, "a": 1, "b": 0, "c": 0, "d": 1
    }
  ]
},

Apple-The 'glyf' table

The transformation entries determine the values of an affine transformation applied to the component prior to its being incorporated into the parent glyph. Given the component matrix [a b c d e f], the transformation applied to the component is:

In the reference, a-d is the value of the affine transformation. In this tool, using a,d (scale) and x,y (move).

$$\begin{align*} \begin{pmatrix} x' \\\ y' \\\ \end{pmatrix} = \begin{pmatrix} a & c & e \\\ b & d & f \\\ \end{pmatrix} \begin{pmatrix} x \\\ y \\\ 1 \\\ \end{pmatrix} \end{align*}$$

Note: For unknown reasons, otfccbuild lost glyphs if a and d are the same value. If the sizes are different, it will be reflected, so set a=0.9, d=0.91 for 90%.
refer to pinyin_glyph.py

feature tag

"aalt" is set to display the alternative characters.

  • "aalt_0" is set to "gsub_single". In use case, a symbol character and when the pronunciation changes only one Chinese character.
  • "aalt_1" is set to "gsub_alternate". In use case, When the pronunciation changes more than two Chinese characters.

"rclt" is used for homograph substitution. This feature is used for chaining contextual substitution

  • "pattern one" is pattern of the pronunciation changes only one Chinese character.
  • "pattern two" is pattern of the pronunciation changes more than two Chinese characters.
  • "exception pattern" is pattern of the duplicates that affect phrases of pattern one or two.
    to details

Specifications (constraints)

  • This font assumes horizontal writing only

  • The glyf table can only store up to 65536

  • The glyf table is large, save it as another json

  • Duplicately defined Chinese characters refer to the same glyph to reduce the number of glyphs.
    (⺎:U+2E8E, 兀:U+5140, 兀:U+FA0C and 嗀:U+55C0, 嗀:U+FA0D )

  • The only font that can be used as a glyf is Fixed-width latin alphabet only

  • The json of the standard python library becomes bloated and slow when converted to dict, so use orjson
    Refer to Choosing a faster JSON library for Python,
    PythonのJSONパーサのメモリ使用量と処理時間を比較してみる

  • ssNN range from ss00 - 20
    Refer to Tag: 'ss01' - 'ss20'

  • Chinese Pinyin is simplified in the glyf table (yī -> yi1)

  • Exclude the specific pronunciations(e.g: 呣 m̀, 嘸 m̄) as that is not included in unicode

  • overwrite.txt has been added phrase for various purposes

    1. Register Pinyin that can not be acquired by pypinyin
    2. Adjust the priority of pronunciation
    3. Add the pronunciation of the "儿" as "r"
    4. Add light tone(轻声), Integrate pronounce of the duplicate Chinese characters
    5. Exclude the specific pronunciations(e.g: 呣 m̀, 嘸 m̄)
  • IVS responds as follows:

code Pinyin glyf
0xE01E0 None. Chinese character only
0xE01E1 With the standard pronunciation
0xE01E2 With the variational pronunciation
  • The correspondence between ssNN and Pinyin is as follows:

    -> If you don't put the standard pronunciation in ssNN, GSUB will immediately return to the original state when reverting to the standard reading in cmap_uvs.
    Therefore, prepare a glyph for reverting to the standard pronunciation in ss01.

Naming Rules glyf type
hanzi_glyf Chinese character glyf with the standard pronunciation
hanzi_glyf.ss00 Chinese character glyf without Pinyin. Pinyin can be changed by simply changing the IVS code.
hanzi_glyf.ss01 (When Chinese character has the variational pronunciation)
Chinese character glyf with the standard pronunciation (duplicates with hanzi_glyf, but replaces it by overriding GSUB replacements)
hanzi_glyf.ss02 (When Chinese character has the variational pronunciation)
After that, Chinese character glyf with the variational pronunciation
  • The name of the lookup table is free, but it obeys the following rules to reveal the reference source
lookup table name reference source
lookup_pattern_0N pattern one
lookup_pattern_1N pattern two
lookup_pattern_2N exception pattern

e.g.:

U+5F3A: qiáng,qiǎng,jiàng  #强
1, 强, qiáng, [~调|~暴|~度|~占|~攻|加~|~奸|~健|~项|~行|~硬|~壮|~盗|~权|~制|~盛|~烈|~化|~大|~劲]
2, 强, qiǎng, [~求|~人|~迫|~辩|~词夺理|~颜欢笑]
3, 强, jiàng, [~嘴|倔~]

Terminology used

Collection of Chinese characters that are not found in pypinyin

FIX_PINYIN.md

References

Heteronyms (多音字)

Dictionary Sites

Opentype Specification