Home  >  Article  >  Backend Development  >  C# Convert Chinese characters to Pinyin (supports multi-phonetic characters)

C# Convert Chinese characters to Pinyin (supports multi-phonetic characters)

黄舟
黄舟Original
2017-02-06 16:46:282453browse

Previously, due to project needs, a function of converting Chinese characters to pinyin and first spelling was needed for querying. I felt that this function has basically matured, so I searched for relevant codes. The first two articles that caught my eye were the following two articles

C# Convert Chinese characters to Pinyin (supports all Chinese characters in the GB2312 character set) (http://www.cnblogs.com/cxd4321/p/4203383.html)

[Dry stuff] JS version of Chinese characters and Pinyin The ultimate solution for mutual conversion, with a simple JS Pinyin input method (http://www.cnblogs.com/liuxianan/p/pinyinjs.html)

Thanks to the two bloggers for their comprehensive and detailed writing , all provide source code, you can refer to it.

Considering the needs of the interface, I referred to the first article. The author's source code in the article can basically meet the needs of converting Chinese characters to Pinyin. For other special characters, you can also add and supplement them. Any shortcomings It just doesn’t support multi-phonetic characters. Since we need to support multi-phonetic character queries, I checked other articles later and found that there are no ready-made articles (maybe my search skills are poor).

Later, I found that for converting Chinese characters to Pinyin, it turns out that Microsoft has provided Microsoft Visual Studio International Pack, and it is very powerful. So I tried it

First reference the corresponding package in nuget

Find PinYinConverter

C# Convert Chinese characters to Pinyin (supports multi-phonetic characters)

Simple demo

Small Give it a try, it’s very simple to use, just use the ChineseChar class directly for replacement.

string ch = Console.ReadLine();
ChineseChar cc = new ChineseChar(ch[0]);
var pinyins = cc.Pinyins.ToList();
pinyins.ForEach(Console.WriteLine);

The results are as follows:

C# Convert Chinese characters to Pinyin (supports multi-phonetic characters)

##We can see that, OK There are three polyphonic characters: hang, heng, and xing. Even the phonetic symbols are shown here, which is really convenient. The function I need is to input "bank" and then convert it into pinyin as "yinhang, yinheng, yinxing", and the first pinyin is "yh, yx". With the ChineseChar class, the idea is simple.

Chinese character to pinyin package

1. First split the input Chinese characters

2. Then use ChineseChar to obtain multiple pinyin for each Chinese character

3. Then remove the numbers, remove duplications, extract the first character, and then combine them.

So I wrote a helper class for replacement. The code is as follows:

public class PinYinConverterHelp
    {
        public static PingYinModel GetTotalPingYin(string str)
        {
            var chs = str.ToCharArray();
            //记录每个汉字的全拼
            Dictionary<int, List<string>> totalPingYins = new Dictionary<int, List<string>>();
            for (int i = 0; i < chs.Length; i++)
            {
                var pinyins = new List<string>();
                var ch = chs[i];
                //是否是有效的汉字
                if (ChineseChar.IsValidChar(ch))
                {
                    ChineseChar cc = new ChineseChar(ch);
                    pinyins = cc.Pinyins.Where(p => !string.IsNullOrWhiteSpace(p)).ToList();
                }
                else
                {
                    pinyins.Add(ch.ToString());
                }
                //去除声调,转小写
                pinyins = pinyins.ConvertAll(p => Regex.Replace(p, @"\d", "").ToLower());
                //去重
                pinyins = pinyins.Where(p => !string.IsNullOrWhiteSpace(p)).Distinct().ToList();
                if (pinyins.Any())
                {
                    totalPingYins[i] = pinyins;
                }
            }
            PingYinModel result = new PingYinModel();
            foreach (var pinyins in totalPingYins)
            {
                var items = pinyins.Value;
                if (result.TotalPingYin.Count <= 0)
                {
                    result.TotalPingYin = items;
                    result.FirstPingYin = items.ConvertAll(p => p.Substring(0, 1)).Distinct().ToList();
                }
                else
                {
                    //全拼循环匹配
                    var newTotalPingYins = new List<string>();
                    foreach (var totalPingYin in result.TotalPingYin)
                    {
                        newTotalPingYins.AddRange(items.Select(item => totalPingYin + item));
                    }
                    newTotalPingYins = newTotalPingYins.Distinct().ToList();
                    result.TotalPingYin = newTotalPingYins;

                    //首字母循环匹配
                    var newFirstPingYins = new List<string>();
                    foreach (var firstPingYin in result.FirstPingYin)
                    {
                        newFirstPingYins.AddRange(items.Select(item => firstPingYin + item.Substring(0, 1)));
                    }
                    newFirstPingYins = newFirstPingYins.Distinct().ToList();
                    result.FirstPingYin = newFirstPingYins;
                }
            }
            return result;
        }
    }

Result:

C# Convert Chinese characters to Pinyin (supports multi-phonetic characters)

I have tried to support some rare characters so far, but I have not tried some that are too biased. However, for general Chinese characters to be converted to pinyin, the multi-phonetic character support here is enough. .


This is just using the Chinese character to Pinyin function in the Microsoft Visual Studio International Pack expansion pack. In fact, there are also language packs for Chinese, Japanese, Korean, English and other countries. , and provides methods to realize powerful functions such as mutual transfer, acquisition, word count, and even stroke count. Interested friends can check its API by themselves.


Source code sharing


Sharing is a virtue, sometimes awesome articles can improve our technical level , but sometimes more needs are at the business level, and the sharing of many small knowledge applications can help us improve business-level issues. As long as the knowledge points shared are useful and don't mislead others, no matter how big or small it is, it is a kind of learning, so I hope everyone will be brave enough to share it.

Address: https://github.com/qq1206676756/PinYinParse

The above is the content of converting C# Chinese characters to Pinyin (supporting multi-phonetic characters). For more related content, please pay attention to the PHP Chinese website (www .php.cn)!


Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn