The ultimate solution for converting Chinese characters to Pinyin in JavaScript, with a detailed introduction to the JS Pinyin input method-JS Tutorial-php.cn

The ultimate solution for converting Chinese characters to Pinyin in JavaScript, with a detailed introduction to the JS Pinyin input method

黄舟

Mar 03, 2017 pm 03:22 PM

javascript

Preface

There are many articles on the Internet about how to convert Chinese characters and Pinyin using JS, but they are quite messy. They are all copied from each other, and some do not support polyphonic characters, and some do not support polyphonic characters. does not support tones, and some dictionary files are too large. For example, sometimes I just need to get the first letter of Chinese pinyin but I have to import a 200kb dictionary file, which cannot meet the actual needs.

To sum up, I have carefully organized and modified several common dictionary files on the Internet and simply encapsulated a tool library that can be used directly.

Code and DEMO demonstration

github project address: https://github.com/liuxianan/pinyinjs

Full demo demonstration: http://demo.liuxianan.com /pinyinjs/

Convert Chinese characters to Pinyin:

Convert Pinyin to Chinese characters:

Chinese characters and Pinyin Popularization of relevant knowledge

Range of Chinese characters

It is generally believed that the range of Chinese characters in Unicode encoding is /^[\u2E80-\u9FFF]+$/(11904-40959), However, many of them are not Chinese characters, or are readable Chinese characters. The Chinese character ranges of several dictionary files used in this article are /^[\u4E00-\u9FA5]+$/, that is ( 19968-40869), and there is also a separate Chinese character 〇, whose Unicode position is 12295.

Pinyin combination

Chinese characters have 21 initial consonants: b, p, m, f, d, t, n, l, g, k, h, j, q, x, zh, ch, sh, r, z, c, s, 24 finals, including 6 single finals: a, o, e, i, u, v, and 18 compound finals: ai, ei, ui, ao, ou , iu , ie, ve, er, an , en , in, un , vn , ang, eng, ing , ong, assuming that initial consonants and finals are combined in pairs, there will be 24X21=504 combinations. The actual situation is that some combinations are Meaningless ones, such as bv, gie, ve, etc. After removing this part, there are still 412 types left.

Pinyin dictionary files

Introduced in ascending order according to the size of the dictionary files.

Dictionary 1: The first letter of Pinyin

The contents of this dictionary file are roughly as follows:

/**
 * 拼音首字母字典文件
 */
var pinyin_dict_firstletter = {};
pinyin_dict_firstletter.all = "YDYQSXMWZSSXJBYMGCCZQPSSQBYCDSCDQLDYLYBSSJG...";
pinyin_dict_firstletter.polyphone = {"19969":"DZ","19975":"WM","19988":"QJ","20048":"YL",...};

This data dictionary converts Unicode characters into 4E00(19968) -9FA5(40869) The pinyin initials of a total of 20,902 Chinese characters are spliced together to obtain a very long string, and then the Chinese characters with polyphonic characters (a total of 370 polyphonic characters) are listed separately. The dictionary file size is 25kb.

The advantage of this dictionary file is that it is small in size and supports multi-phonetic characters. The disadvantage is that it can only obtain the first letter of Pinyin.

Dictionary 2: Commonly used Chinese characters

This dictionary file classifies Chinese characters according to pinyin, with a total of 401 combinations and 6763 commonly used Chinese characters. Polyphonetic characters are not supported. Since it was collected from the Internet and the number of words included is small, the file size is only 24kb. I will see if I can expand it when I have time later.

The approximate contents of the dictionary file are as follows (this is just an example, so only a small part is shown):

/**
 * 常规拼音数据字典，收录常见汉字6763个，不支持多音字
 */
var pinyin_dict_notone = 
{
    "a":"啊阿锕",
    "ai":"埃挨哎唉哀皑癌蔼矮艾碍爱隘诶捱嗳嗌嫒瑷暧砹锿霭",
    "an":"鞍氨安俺按暗岸胺案谙埯揞犴庵桉铵鹌顸黯",
    "ang":"肮昂盎",
    "ao":"凹敖熬翱袄傲奥懊澳坳拗嗷噢岙廒遨媪骜聱螯鏊鳌鏖",
    "ba":"芭捌扒叭吧笆八疤巴拔跋靶把耙坝霸罢爸茇菝萆捭岜灞杷钯粑鲅魃",
    "bai":"白柏百摆佰败拜稗薜掰鞴",
    "ban":"斑班搬扳般颁板版扮拌伴瓣半办绊阪坂豳钣瘢癍舨",
    "bang":"邦帮梆榜膀绑棒磅蚌镑傍谤蒡螃",
    "bao":"苞胞包褒雹保堡饱宝抱报暴豹鲍爆勹葆宀孢煲鸨褓趵龅",
    "bo":"剥薄玻菠播拨钵波博勃搏铂箔伯帛舶脖膊渤泊驳亳蕃啵饽檗擘礴钹鹁簸跛",
    "bei":"杯碑悲卑北辈背贝钡倍狈备惫焙被孛陂邶埤蓓呗怫悖碚鹎褙鐾",
    "ben":"奔苯本笨畚坌锛"
    // 省略其它
};

Later, I slowly discovered that there were many errors in this dictionary file, such as abuse# The pinyin of ## is written as nue (the correct spelling should be nve), Lie is written as thang, and multi-phonetic characters are not supported, so I later based it on other dictionaries. The file itself regenerates a dictionary file in this format:

There are 404 pinyin combinations
Contains 6763 commonly used Chinese characters
Support polyphonic characters
Do not support tones
File size is 27kb

At the same time, I sorted these 6763 Chinese characters according to their frequency of use based on a frequency table of 6763 commonly used Chinese characters on the Internet, so that I can implement a satisfactory JS version of the input method.

In addition, according to another more complete dictionary file, it was found that there are actually 412 pinyin combinations. The 8 pronunciations that do not appear in the above dictionary file are:

chua,den,eng,fiao,m,kei,nun,shei

Dictionary 3: Ultimate Dictionary

First of all, I found the following structured dictionary file (hereinafter referred to as dictionary A) from the Internet. I don’t remember the specific one. It supports tones and polyphonic characters, and it converts Unicode characters into

4E00(19968 )-9FA5(40869) A total of 20902 Chinese characters (20903 if 0 is included) are listed in pinyin. The dictionary file size is 280kb:

3007 (ling2)
4E00 (yi1)
4E01 (ding1,zheng1)
4E02 (kao3)
4E03 (qi1)
4E04 (shang4,shang3)
4E05 (xia4)
4E06 (none0)
4E07 (wan4,mo4)
4E08 (zhang4)
4E09 (san1)
4E0A (shang4,shang3)
4E0B (xia4)
4E0C (ji1)
4E0D (bu4,bu2,fou3)
4E0E (yu3,yu4,yu2)
4E0F (mian3)
4E10 (gai4)
4E11 (chou3)
4E12 (chou3)
4E13 (zhuan1)
4E14 (qie3,ju1)
...

Among them, Chinese characters that have no or no pronunciation are marked as

none0. I counted a total of 525 such Chinese characters.

In line with the goal of reducing the size of the dictionary file as much as possible, I found that except for the first 0 (3007) in the above file, the others are continuous, so I changed it to the following structure, The file size has also been reduced from

280kb to 117kb:

var pinyin_dict_withtone = "yi1,ding1 zheng1,kao3,qi1,shang4 shang3,xia4,none0,wan4 mo4,zhang4,san1,shang4 shang3,xia4,ji1,
bu4 bu2 fou3,yu3 yu4 yu2,mian3,gai4,chou3,chou3,zhuan1,qie3 ju1...";

The disadvantage of this dictionary file is that the tones are marked with numbers. If you want to get something like

xiǎo míng tóng xuéFor pinyin like this, an algorithm is needed to convert the letters in the appropriate position into āáǎàōóǒòēéěèīíǐìūúǔùüǖǘǚǜńň.

本来还准备自己尝试写一个转换的方法的，后来又找到了如下字典文件(下面称为字典B)，它收录了20867个汉字，也支持声调和多音字，但是声调是直接标在字母上方的，由于它将汉字也列举出来，所以文件体积比较大，有327kb，大致内容如下：

{
    "吖": "yā,ā",
    "阿": "ā,ē",
    "呵": "hē,a,kē",
    "嗄": "shà,á",
    "啊": "ā,á,ǎ,à,a",
    "腌": "ā,yān",
    "锕": "ā",
    "錒": "ā",
    "矮": "ǎi",
    "爱": "ài",
    "挨": "āi,ái",
    "哎": "āi",
    "碍": "ài",
    "癌": "ái",
    "艾": "ài",
    "唉": "āi,ài",
    "蔼": "ǎi"
    /* 省略其它 */
}

但是经过比对，发现有502个汉字是字典A中读音为none但是字典B中有读音的，还有21个汉字是字典A中有但是B中没有的：

{
    "兙": "shí kè",
    "兛": "qiān",
    "兝": "fēn",
    "兞": "máo",
    "兡": "bǎi kè",
    "兣": "lǐ",
    "唞": "dǒu",
    "嗧": "jiā lún",
    "囍": "xǐ",
    "堎": "lèng líng",
    "猤": "hú",
    "瓩": "qián wǎ",
    "礽": "réng",
    "膶": "rùn",
    "芿": "rèng",
    "蟘": "tè",
    "貣": "tè",
    "酿": "niàng niàn niáng",
    "醸": "niàng",
    "鋱": "tè",
    "铽": "tè"
}

还有7个汉字是B中有但是A中没有的：

{
    "㘄": "lēng",
    "䉄": "léng",
    "䬋": "léng",
    "䮚": "lèng",
    "䚏": "lèng,lì,lìn",
    "㭁": "réng",
    "䖆": "niàng"
}

所以我在字典A的基础上将二者进行了合并，得到了最终的字典文件pinyin_dict_withtone.js，文件大小为122kb：

var pinyin_dict_withtone = "yī,dīng zhēng,kǎo qiǎo yú,qī,shàng,xià,hǎn,wàn mò,zhàng,sān,shàng shǎng,xià,qí jī...";

如何使用

我将这几种字典文件放在一起并简单封装了一下解析方法，使用中可以根据实际需要引入不同字典文件。

封装好的3个方法：

/**
 * 获取汉字的拼音首字母
 * @param str 汉字字符串，如果遇到非汉字则原样返回
 * @param polyphone 是否支持多音字，默认false，如果为true，会返回所有可能的组合数组
 */
pinyinUtil.getFirstLetter(str, polyphone);
/**
 * 根据汉字获取拼音，如果不是汉字直接返回原字符
 * @param str 要转换的汉字
 * @param splitter 分隔字符，默认用空格分隔
 * @param withtone 返回结果是否包含声调，默认是
 * @param polyphone 是否支持多音字，默认否
*/
pinyinUtil.getPinyin(str, splitter, withtone, polyphone);
/**
 * 拼音转汉字，只支持单个汉字，返回所有匹配的汉字组合
 * @param pinyin 单个汉字的拼音，不能包含声调
 */
pinyinUtil.getHanzi(pinyin)；

下面分别针对不同场合如何使用作介绍。

如果你只需要获取拼音首字母

<script type="text/javascript" src="pinyin_dict_firstletter.js"></script>
<script type="text/javascript" src="pinyinUtil.js"></script>

<script type="text/javascript">
pinyinUtil.getFirstLetter(&#39;小茗同学&#39;); // 输出 XMTX
pinyinUtil.getFirstLetter(&#39;大中国&#39;, true); // 输出 [&#39;DZG&#39;, &#39;TZG&#39;]
</script>

需要特别说明的是，如果你引入的是其它2个字典文件，也同样可以获取拼音首字母的，只是说用这个字典文件更适合。

如果拼音不需要声调

<script type="text/javascript" src="pinyin_dict_noletter.js"></script>
<script type="text/javascript" src="pinyinUtil.js"></script>

<script type="text/javascript">
pinyinUtil.getPinyin(&#39;小茗同学&#39;); // 输出 &#39;xiao ming tong xue&#39;
pinyinUtil.getHanzi(&#39;ming&#39;); // 输出 &#39;明名命鸣铭冥茗溟酩瞑螟暝&#39;
</script>

如果需要声调或者需要处理生僻字

<script type="text/javascript" src="pinyin_dict_withletter.js"></script>
<script type="text/javascript" src="pinyinUtil.js"></script>

<script type="text/javascript">
pinyinUtil.getPinyin(&#39;小茗同学&#39;); // 输出 &#39;xiǎo míng tóng xué&#39;
pinyinUtil.getPinyin(&#39;小茗同学&#39;, &#39;-&#39;, true, true); // 输出 [&#39;xiǎo-míng-tóng-xué&#39;, &#39;xiǎo-míng-tòng-xué&#39;]
</script>

关于简单拼音输入法

一个正式的输入法需要考虑的东西太多太多，比如词库、用户个人输入习惯等，这里只是实现一个最简单的输入法，没有任何词库（虽然加上也可以，但是web环境不适合引入太大的文件）。

推荐使用第二个字典文件pinyin_dict_noletter.js，虽然字典三字数更多，但是不能按照汉字使用频率排序，一些生僻字反而在前面。

<link rel="stylesheet" type="text/css" href="simple-input-method/simple-input-method.css">
<input type="text" class="test-input-method"/>
<script type="text/javascript" src="pinyin_dict_noletter.js"></script>
<script type="text/javascript" src="pinyinUtil.js"></script>
<script type="text/javascript" src="simple-input-method/simple-input-method.js"></script>
<script type="text/javascript">
    SimpleInputMethod.init(&#39;.test-input-method&#39;);
</script>

结语

由于本工具类的目标环境是web，而web注定了文件体积不能太大，所以不能引入太大的词库文件，由于没有词库的支持，所以多音字无法识别，实现的拼音输入法也无法智能地匹配出合适的词语，需要词库支持的可以参考这个nodejs环境下的项目：http://www.php.cn/

以上就是JavaScript 汉字与拼音互转终极方案附JS拼音输入法的详细介绍的内容，更多相关内容请关注PHP中文网（www.php.cn）！

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

C and JavaScript: The Connection ExplainedApr 23, 2025 am 12:07 AM

C and JavaScript achieve interoperability through WebAssembly. 1) C code is compiled into WebAssembly module and introduced into JavaScript environment to enhance computing power. 2) In game development, C handles physics engines and graphics rendering, and JavaScript is responsible for game logic and user interface.

From Websites to Apps: The Diverse Applications of JavaScriptApr 22, 2025 am 12:02 AM

JavaScript is widely used in websites, mobile applications, desktop applications and server-side programming. 1) In website development, JavaScript operates DOM together with HTML and CSS to achieve dynamic effects and supports frameworks such as jQuery and React. 2) Through ReactNative and Ionic, JavaScript is used to develop cross-platform mobile applications. 3) The Electron framework enables JavaScript to build desktop applications. 4) Node.js allows JavaScript to run on the server side and supports high concurrent requests.

Python vs. JavaScript: Use Cases and Applications ComparedApr 21, 2025 am 12:01 AM

Python is more suitable for data science and automation, while JavaScript is more suitable for front-end and full-stack development. 1. Python performs well in data science and machine learning, using libraries such as NumPy and Pandas for data processing and modeling. 2. Python is concise and efficient in automation and scripting. 3. JavaScript is indispensable in front-end development and is used to build dynamic web pages and single-page applications. 4. JavaScript plays a role in back-end development through Node.js and supports full-stack development.

The Role of C/C in JavaScript Interpreters and CompilersApr 20, 2025 am 12:01 AM

C and C play a vital role in the JavaScript engine, mainly used to implement interpreters and JIT compilers. 1) C is used to parse JavaScript source code and generate an abstract syntax tree. 2) C is responsible for generating and executing bytecode. 3) C implements the JIT compiler, optimizes and compiles hot-spot code at runtime, and significantly improves the execution efficiency of JavaScript.

JavaScript in Action: Real-World Examples and ProjectsApr 19, 2025 am 12:13 AM

JavaScript's application in the real world includes front-end and back-end development. 1) Display front-end applications by building a TODO list application, involving DOM operations and event processing. 2) Build RESTfulAPI through Node.js and Express to demonstrate back-end applications.

JavaScript and the Web: Core Functionality and Use CasesApr 18, 2025 am 12:19 AM

The main uses of JavaScript in web development include client interaction, form verification and asynchronous communication. 1) Dynamic content update and user interaction through DOM operations; 2) Client verification is carried out before the user submits data to improve the user experience; 3) Refreshless communication with the server is achieved through AJAX technology.

Understanding the JavaScript Engine: Implementation DetailsApr 17, 2025 am 12:05 AM

Understanding how JavaScript engine works internally is important to developers because it helps write more efficient code and understand performance bottlenecks and optimization strategies. 1) The engine's workflow includes three stages: parsing, compiling and execution; 2) During the execution process, the engine will perform dynamic optimization, such as inline cache and hidden classes; 3) Best practices include avoiding global variables, optimizing loops, using const and lets, and avoiding excessive use of closures.

Python vs. JavaScript: The Learning Curve and Ease of UseApr 16, 2025 am 12:12 AM

Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

4 weeks agoByDDD

Atomfall guide: item locations, quest guides, and tips

4 weeks agoByDDD

Hot Tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

Atom editor mac version download

The most popular open source editor

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

Where is the login entrance for gmail email?

7651

CakePHP Tutorial

1392

What is the format of the account name of steam

win11 activation key permanent

nyt mini crossword answers

110