Home  >  Article  >  Backend Development  >  Python implementation example for interception function containing Chinese strings

Python implementation example for interception function containing Chinese strings

黄舟
黄舟Original
2017-09-23 11:05:002153browse

This article mainly introduces Python's implementation of the interception function for Chinese strings, and analyzes the relevant implementation skills of Python's Chinese string interception operations for utf-8 and gb18030 encoding based on specific examples. Friends in need can refer to the following

The example in this article describes how Python implements the interception function for Chinese strings. Share it with everyone for your reference, the details are as follows:

For strings containing multi-bytes, when truncation, you must determine how many bytes of characters are at the truncation point, and multi-byte characters cannot be divided from them to avoid truncation. After garbled code

The implementation on utf8 and gb18030 is given below. You can use either one. You can transcode first, use encode, decode;

Method 1: Convert utf8 :


def subString(string,length):
  if length >= len(string):
        return string
  result = ''
  i = 0
  p = 0
  while True:
        ch = ord(string[i])
        #1111110x
        if ch >= 252:
            p = p + 6
        #111110xx
        elif ch >= 248:
            p = p + 5
        #11110xxx
        elif ch >= 240:
            p = p + 4
        #1110xxxx
        elif ch >= 224:
            p = p + 3
        #110xxxxx
        elif ch >= 192:
            p = p + 2
        else:
            p = p + 1
        if p >= length:
            break;
        else:
            i = p
  return string[0:i]

Method 2: Encoding gb18030


##

def cut_string_off(string,s_len):
    if len(string)==0 or s_len <=0:
        return string
    elif len(string)==1 or s_len >= len(string):
        return string
    elif s_len < len(string):
        len_num=0
        while len_num < s_len:
            tmp_c=ord(string[len_num])
            if tmp_c >0 and tmp_c <=0x7F:
                len_num+=1
                continue
            tmp_nextc=ord(string[len_num+1])
            if tmp_c >= 0x81 and tmp_c <=0xFE and tmp_nextc>=0x40 and tmp_nextc<=0xFE:
                len_num+=2
                continue
            else:
                len_num +=1;
                continue
            break
        tmp = string[0:len_num]
#    print utf2gbk(tmp)
    return tmp

The above is the detailed content of Python implementation example for interception function containing Chinese strings. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn