README.en-US.md 6.6 KB

pīnyīn (v3)

pinyin, The convert tool of chinese pinyin.


NPM version Build Status Coverage Status NPM downloads

Web Site: 简体中文 | English | 한국어

README: 简体中文 | English | 한국어

Convert Han to pinyin. useful for phonetic notation, sorting, and searching.

Note: This module both support Node and Web browser.

Python version see mozillazg/python-pinyin


Feature

  • Segmentation for heteronym words.
  • Support Traditional and Simplified Chinese.
  • Support multiple pinyin style.

Install

via npm:

npm install pinyin --save

Usage

for developer:

import pinyin from "pinyin";

console.log(pinyin("中心"));    // [ [ 'zhōng' ], [ 'xīn' ] ]

console.log(pinyin("中心", {
  heteronym: true                // Enable heteronym mode.
}));                            // [ [ 'zhōng', 'zhòng' ], [ 'xīn' ] ]

console.log(pinyin("中心", {
  heteronym: true,              // Enable heteronym mode.
  segment: true                 // Enable Chinese words segmentation, fix most heteronym problem.
}));                            // [ [ 'zhōng' ], [ 'xīn' ] ]

console.log(pinyin("我喜欢你", {
  segment: true,                // Enable segmentation. Needed for grouping.
  group: true                   // Group pinyin segments
}));                            // [ [ 'wǒ' ], [ 'xǐhuān' ], [ 'nǐ' ] ]

console.log(pinyin("中心", {
  style: pinyin.STYLE_INITIALS, // Setting pinyin style.
  heteronym: true
}));                            // [ [ 'zh' ], [ 'x' ] ]

console.log(pinyin("华夫人", {
  mode: "surname",              // 姓名模式。
}));                            // [ ['huà'], ['fū'], ['rén'] ]

for cli:

$ pinyin 中心
zhōng xīn
$ pinyin -h

Types

IPinyinOptions

The types for the second argument of pinyin method.

export interface IPinyinOptions {
  style?: IPinyinStyle; // output style of pinyin.
  mode?: IPinyinMode, // mode of pinyin.
  segment?: IPinyinSegment | boolean;
  heteronym?: boolean;
  group?: boolean;
  compact?: boolean;
}

IPinyinStyle

The output style of pinyin.

export type IPinyinStyle =
  "normal" | "tone" | "tone2" | "to3ne" | "initials" | "first_letter" | // Suggest.
  "NORMAL" | "TONE" | "TONE2" | "TO3NE" | "INITIALS" | "FIRST_LETTER" |
  0        | 1      | 2       | 5       | 3          | 4;               // compatibility.

IPinyinMode

The mode of pinyin.

// - NORMAL: Default mode is normal mode.
// - SURNAME: surname mode, for chinese surname.
export type IPinyinMode =
  "normal" | "surname" |
  "NORMAL" | "SURNAME";

IPinyinSegment

The segment method.

  • Default is disable segment: false
  • If set true, use "Intl.Segmenter" module default for segment on Web and Node.
  • Also specify follow string for segment (bug just "Intl.Segmenter", "segmentit" is support on web):

    export type IPinyinSegment = "Intl.Segmenter" | "nodejieba" | "segmentit" | "@node-rs/jieba";
    

API

<Array> pinyin(words[, options])

Convert Han (汉字) to pinyin.

options argument is optional, for sepcify heteronym mode and pinyin styles.

Return a Array<Array<String>>. If one of Han is heteronym word, it would be have multiple pinyin.

Number pinyin.compare(a, b)

Default compare implementation for pinyin.

Options

<Boolean> options.segment

Enable Chinese word segmentation. Segmentation is helpful for fix heteronym problem, but performance will be more slow, and need more CPU and memory.

Default is false.

<Boolean> options.heteronym

Enable or disable heteronym mode. default is disabled, false.

<Boolean> options.group

Group pinyin by phrases. for example:

我喜欢你
wǒ xǐhuān nǐ

<Object> options.style

Specify pinyin style. please use static properties like STYLE_*. default is .STYLE_TONE. see Static Property for more.

options.mode

pinyin mode, default is pinyin.MODE_NORMAL. If you cleared in surname scene, use pinyin.MODE_SURNAME maybe better.

Static Property

.STYLE_NORMAL

Normal mode.

Example: pin yin

.STYLE_TONE

Tone style, this is default.

Example: pīn yīn

.STYLE_TONE2

tone style by postfix number [0-4].

Example: pin1 yin1

.STYLE_TO3NE

tone style by number [0-4] after phonetic notation character.

Example: pin1 yin1

.STYLE_INITIALS

Initial consonant (of a Chinese syllable).

Example: pinyin of 中国 is zh g

Note: when a Han (汉字) without initial consonant, will convert to empty string.

.STYLE_FIRST_LETTER

First letter style.

Example: p y

pinyin.MODE_NORMAL

Normal mode. This is the default mode.

pinyin.MODE_SURNAME

Surname mode. If chinese word is surname, The pinyin of surname is prioritized.

Test

npm test

Q&A

How to sort by pinyin?

This module provide default compare implementation:

const pinyin = require('pinyin');

const data = '我要排序'.split('');
const sortedData = data.sort(pinyin.compare);

But if you need different implementation, do it like:

const pinyin = require('pinyin');

const data = '我要排序'.split('');

// Suggest you to store pinyin result by data persistence.
const pinyinData = data.map(han => ({
  han: han,
  pinyin: pinyin(han)[0][0], // Choose you options and styles.
}));
const sortedData = pinyinData.sort((a, b) => {
  return a.pinyin.localeCompare(b.pinyin);
}).map(d => d.han);

If this module is helpful for you, please Star this repository.

And you have chioce donate to me via Aliapy or WeChat:

License

MIT