Home > Article > Backend Development > Summary of string search operations in Python
Basic string position search method
Python uses the variable .find("what to find" [, start position, end position]) to find a string. The start position and end position represent the range to be searched. If it is empty, it means to search all. After the search is found, the position will be returned. The position is calculated from 0, and -1 will be returned every time it is found.
str = 'a,hello' print str.find('hello') # 在字符串str里查找字符串hello >> 2 # 输出结果
Naive Matching Algorithm
The naive matching algorithm is a one-to-one match between the target string and the template string. If there is a match, move the subscript one position to the right, otherwise clear it and start matching again.
target = 'abb aba' pattern = 'aba' def match(target, pattern): i = j = 0 n, m = len(target), len(pattern) while i < n and j < m: # 如果字符相等则目标和模板的下标都向右移 if target[i] == pattern[j]: i, j = i+1, j+1 else: # 如果字符不相等则目标下标切换到不相等的下标 # 模板下标移动到初始下标 i = i - j + 1 j = 0 if j == m: return i - j return -1
Add print to the above and print it again
#修改的地方 else: i = i -j + 1 j = 0 print(target[i], pattern[j], i, j) # 打印结果 b a 1 0 b a 2 0 a 3 0 a a 4 0
The loop will continue until equal matching values. This method is inefficient, mainly because the template characters will be looped again when there is no match. It may occur up to m * (n-m 1) times. m is the length of template characters, n-m 1 is the number of times to exclude unequal characters.
KMP algorithm
kmp is an algorithm for shifting by known matching characters. For example, if abb is compared with abc above, ab is known.
def match(target, pattern): i = j = 0 n, m = len(target), len(pattern) while i < n and j < m: # 如果字符相等则目标和模板的下标都向右移 if if j == -1 and target[i] == pattern[j]: i, j = i+1, j+1 else: # 这里通过next 函数来判断位移个数 i = i - j + pattern_next(pattern[:j]) j = 0 if j == m: return i - j return -1 def pattern_next(s): prefix = [s[:i+1] for i in range(len(s)-1)] suffix = [s[i+1:] for i in range(len(s)-1)] l = list(set(prefix) & set(suffix)) return len(l)