Detailed explanation of examples of using JavaScript to convert Chinese characters to Pinyin-JS Tutorial-php.cn

Home

Web Front-end

JS Tutorial

Detailed explanation of examples of using JavaScript to convert Chinese characters to Pinyin

Y2J

May 22, 2017 am 11:54 AM

1. The current situation of converting Chinese characters to Pinyin

First of all, it should be said that there is a strong demand for converting Chinese characters to Pinyin, such as sorting/filtering contacts by Pinyin letters; such as destinations ( Typical such as ticket purchase)
Classified by pinyin first letter, etc. But the solution to this requirement, but I haven’t heard of any clever implementation (especially on the browser side), probably requires a huge dictionary.
Specific to JavaScript, check github and npm. The better libraries for converting Chinese characters to pinyin include pinyin
and pinyinjs. You can see that both of them come with huge dictionaries.
These dictionaries often weigh tens or hundreds of KB (some even several MB). It still takes some courage to use them on the browser side. So when we encounter the need to convert Chinese characters to Pinyin, it is not surprising that our first reaction is to reject the request (or implement it on the server side).
Now, if I tell you that you can convert Chinese characters to Pinyin in 300 lines of code on the browser side, is it unbelievable?

2. Starting from the Android 4.2.2 contact code

Again emphasize this blog - using the Android source code, Easily convert Chinese characters to Pinyin.
Today I would like to share with you a solution for converting Chinese characters into Pinyin extracted from the Android system source code. With just one class and more than 560 lines of code, you can easily realize the function of converting Chinese characters into Pinyin without any other third party. rely.
Has this broken your thinking pattern: Is there any powerful algorithm that can abandon the dictionary?
After reading the blog for the first time, I was a little disappointed. There was no algorithm analysis. It just introduced the hundreds of lines of code discovered from the Android code. The second time I read the code with the idea of porting it to JavaScript, I finally understood the principle, so I started the journey of porting.

3. Teach you step by step how to convert Chinese characters to Pinyin with 300 lines of JavaScript code

First, let’s get straight to the core: why converting Chinese characters to Pinyin requires a huge dictionary of thinking Settlement?
Because the arrangement of Chinese characters has nothing to do with pinyin, for example, in the Chinese character range \u4E00-\u9FFF, the former may be ha, and the latter may be ze. There is no way to associate the unicode of Chinese characters with pinyin, so we can only There is a huge dictionary that records the pinyin of every Chinese character (or commonly used Chinese character).
However, suppose we can sort all Chinese characters by pinyin, such as 'A','AI','AN','ANG','AO','BA',...,'ZUI',' ZUN','ZUO' sorting, then we only need to remember the first Chinese character of each Chinese character queue with the same pinyin. Then, the required dictionary will be very small (just cover all pinyin, the number of pinyin itself is not large).
Now, the difficulty is to sort the Chinese characters by pinyin. Fortunately, the ICU/localization related API provides this sorting API (if there were no convenient sorting/comparison methods, this article may not appear).

So, this is why 300 lines can be used to convert Chinese characters to Pinyin: Intl.CollatorAPI: Intl.Collator internally implements localization-related string sorting. We can basically sort all Chinese characters according to Pinyin through Intl.Collator.prototype.compare.
Boundary Chinese character table: records the sorted boundary points. Each Chinese character in this Chinese character table is the first Chinese character in a set of Chinese characters with the same pinyin after sorting (Eachunihansisthefirstonewithinsamepinyinwhencollatoriszh_CN).
Speaking of this, there may still be something that is not clearly explained, so I will directly upload a piece of code:

For interested students You can execute the script.js above node--icu-data-dir=node_modules/full-icu to have a look, and then see if you get a Chinese character table basically sorted by pinyin.

Here are a few points to note:

I bolded "Basic" again because the list of Chinese characters we got is not completely sorted according to Pinyin. There are occasionally some Chinese characters with other Pinyin inserted in the middle. This should be paid extra attention to when making the boundary table.
The table obtained in the above script is the sorting of all Chinese characters. Some of them are different from the table of HanziToPinyin.java in the Android code, so the table of HanziToPinyin.java needs to be updated. (The biggest pitfall and workload in switching from Java to JavaScript: correcting the boundary table)
I believe everyone has seen the core code: constCOLLATOR=newIntl.Collator(['zh-Hans-CN']), Intl.Collator
(The locale specified here is China zh-Hans-CN) is the key to sorting Chinese characters by pinyin. It is an Internationalization API that sorts strings in locale-specific order.
Please first npmifull-icu when executing the script. This dependency will automatically install the missing Chinese support and prompt how to specify the ICU data file to execute the script.
1.ICUICU stands for InternationalComponentsforUnicode, which provides Unicode and internationalization support for applications.
ICUisamature,widelyusedsetofC/C++andJavalibrariesprovidingUnicodeandGlobalizationsupportforsoftwareapplications.ICUiswidelyportableandgivesapplicationsthesameresultsonallplatformsandbetweenC/C++andJavasoftware.
And ICU provides localized string comparison services (UnicodeCollationAlgorithm+locally specific comparison rules):
Collation:Comparestringsaccordingtotheconvention sandstandardsofaparticularlanguage,regionorcountry. ICU'scollationisbasedontheUnicodeCollationAlgorithmpluslocale-specificcomparisonrulesfromtheCommonLocaleDataRepository,acomprehensivesourceforthistypeofdata.
On modern browsers, generally ICU has built-in support for the user's local language, and we can use it directly.
But for node.js, usually, ICU only contains a subset (usually English), so we need to add support for Chinese ourselves. Generally speaking, you can install full-icu
through npminstallfull-icu to install missing Chinese support. (See node--icu-data-dir=node_modules/full-icu above).
2.IntlAPI The previous section should basically explain the knowledge related to internationalization/localization. Here we will add the use of built-in API. How to check whether the user language and Runtime support this language? Intl.Collator.supportedLocalesOf(array|string)
Returns an array containing locales that are supported (without falling back to the default locale). The parameter can be an array or a string, which is the locales you want to test ( That is BCP47languagetag).

Construct the Collator object and sort the string

through Intl.Collator.prototype. compare, we can sort strings in the order specified by the language. In Chinese, this sorting happens to be mostly in pinyin order, 'A', 'AI', 'AN', 'ANG', 'AO', 'BA', 'BAI', 'BAN' ,'BANG','BAO','BEI','BEN','BENG','BI','BIAN','BIAO','BIE','BIN','BING','BO',' BU','CA','CAI','CAN',...
, this is the key to converting Chinese characters to Pinyin as we mentioned above.

4. Boundary table correction

Obviously, there is a problem with this boundary table and needs to be corrected.
We can see that most of the Chinese characters have been converted into qing. It can be seen that there is a problem with the Chinese character corresponding to the pinyin of qing.
Found this Chinese character, it is '\u72c5'/'狅', plus one character before and after, ['\u4eb2','\u72c5','\u828e']/["情","狅", "芎"]
.
Search, '\u72c5'/'狅' can be read as qing, but now it is read as kuang, which should be the cause of the error.
According to the initial sorting list of all Chinese characters, the first Chinese character of qing is '\u9751'/'靑'.
After the change, only 104 failed conversions.

【Related recommendations】

1. Javascript Free Video Tutorial

2. Share 15 A commonly used js regular expression

3. Detailed example of implementing search toolbar through javascript

4. About async and await in Javascript Detailed introduction to the usage

5. Sharing the 12 most practical skills in JavaScript

The above is the detailed content of Detailed explanation of examples of using JavaScript to convert Chinese characters to Pinyin. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Is JavaScript Written in C? Examining the EvidenceApr 25, 2025 am 12:15 AM

Yes, the engine core of JavaScript is written in C. 1) The C language provides efficient performance and underlying control, which is suitable for the development of JavaScript engine. 2) Taking the V8 engine as an example, its core is written in C, combining the efficiency and object-oriented characteristics of C. 3) The working principle of the JavaScript engine includes parsing, compiling and execution, and the C language plays a key role in these processes.

JavaScript's Role: Making the Web Interactive and DynamicApr 24, 2025 am 12:12 AM

JavaScript is at the heart of modern websites because it enhances the interactivity and dynamicity of web pages. 1) It allows to change content without refreshing the page, 2) manipulate web pages through DOMAPI, 3) support complex interactive effects such as animation and drag-and-drop, 4) optimize performance and best practices to improve user experience.

C and JavaScript: The Connection ExplainedApr 23, 2025 am 12:07 AM

C and JavaScript achieve interoperability through WebAssembly. 1) C code is compiled into WebAssembly module and introduced into JavaScript environment to enhance computing power. 2) In game development, C handles physics engines and graphics rendering, and JavaScript is responsible for game logic and user interface.

From Websites to Apps: The Diverse Applications of JavaScriptApr 22, 2025 am 12:02 AM

JavaScript is widely used in websites, mobile applications, desktop applications and server-side programming. 1) In website development, JavaScript operates DOM together with HTML and CSS to achieve dynamic effects and supports frameworks such as jQuery and React. 2) Through ReactNative and Ionic, JavaScript is used to develop cross-platform mobile applications. 3) The Electron framework enables JavaScript to build desktop applications. 4) Node.js allows JavaScript to run on the server side and supports high concurrent requests.

Python vs. JavaScript: Use Cases and Applications ComparedApr 21, 2025 am 12:01 AM

Python is more suitable for data science and automation, while JavaScript is more suitable for front-end and full-stack development. 1. Python performs well in data science and machine learning, using libraries such as NumPy and Pandas for data processing and modeling. 2. Python is concise and efficient in automation and scripting. 3. JavaScript is indispensable in front-end development and is used to build dynamic web pages and single-page applications. 4. JavaScript plays a role in back-end development through Node.js and supports full-stack development.

The Role of C/C in JavaScript Interpreters and CompilersApr 20, 2025 am 12:01 AM

C and C play a vital role in the JavaScript engine, mainly used to implement interpreters and JIT compilers. 1) C is used to parse JavaScript source code and generate an abstract syntax tree. 2) C is responsible for generating and executing bytecode. 3) C implements the JIT compiler, optimizes and compiles hot-spot code at runtime, and significantly improves the execution efficiency of JavaScript.

JavaScript in Action: Real-World Examples and ProjectsApr 19, 2025 am 12:13 AM

JavaScript's application in the real world includes front-end and back-end development. 1) Display front-end applications by building a TODO list application, involving DOM operations and event processing. 2) Build RESTfulAPI through Node.js and Express to demonstrate back-end applications.

JavaScript and the Web: Core Functionality and Use CasesApr 18, 2025 am 12:19 AM

The main uses of JavaScript in web development include client interaction, form verification and asynchronous communication. 1) Dynamic content update and user interaction through DOM operations; 2) Client verification is carried out before the user submits data to improve the user experience; 3) Refreshless communication with the server is achieved through AJAX technology.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

4 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

4 weeks agoByDDD

Roblox: Dead Rails - How To Complete Every Challenge

1 months agoByDDD

Atomfall guide: item locations, quest guides, and tips

1 months agoByDDD

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software