Home > Article > Backend Development > Share a method of traversing strings (including Chinese characters) using python
This article mainly introduces relevant information about python traversing strings (including Chinese characters) and detailed examples. Friends who need it can refer to
python traversing strings (including Chinese characters) detailed examples
s = "中国china" for j in s: print j
First of all, what is the encoding of your 'a'? It may not be what you think gbk
>>> a='中国' >>> a
Try this. If it comes out with 6 words, it means it is utf-8. If it comes out with 4 words, it means gbk.
In addition, whether it is utf-8 or gbk, it cannot be traversed in this way, because it will be taken out word by word here. The virtual machine treats a as a string with a length of len(a).
The next step is the traversal problem.
Most Linux shells default to utf-8, so one Chinese character is three words, so you have to read it three by three. You can try:
>>> a[:3]
Come out It's just a "中" character
The default command of windows is cp936, which is gbk. One Chinese character is two characters, so two characters are read two characters (a[:2] ).
There is another way to traverse, convert the string into unicode, so that both Chinese and English are one word, and you can use your for i in a method to traverse. The advantage of this is that Chinese and English characters are all one word, while in utf-8 and gbk, English letters only occupy one word.
s = u"中国china" for j in s: print j
The output is as follows:
中 国 c h i n a
The above is the detailed content of Share a method of traversing strings (including Chinese characters) using python. For more information, please follow other related articles on the PHP Chinese website!