Home > Backend Development > Python Tutorial > Summary of string search operations in Python

Summary of string search operations in Python

WBOY
Release: 2016-07-06 13:29:49
Original
2536 people have browsed it

Basic string position search method
Python uses the variable .find("what to find" [, start position, end position]) to find a string. The start position and end position represent the range to be searched. If it is empty, it means to search all. After the search is found, the position will be returned. The position is calculated from 0, and -1 will be returned every time it is found.

1

2

3

str = 'a,hello'

print str.find('hello') # 在字符串str里查找字符串hello

>> 2     # 输出结果

Copy after login

Naive Matching Algorithm

The naive matching algorithm is a one-to-one match between the target string and the template string. If there is a match, move the subscript one position to the right, otherwise clear it and start matching again.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

target = 'abb aba'

pattern = 'aba'

 

def match(target, pattern):

  i = j = 0

  n, m = len(target), len(pattern)

  while i < n and j < m:

    # 如果字符相等则目标和模板的下标都向右移

    if target[i] == pattern[j]:

      i, j = i+1, j+1

    else:

      # 如果字符不相等则目标下标切换到不相等的下标

      # 模板下标移动到初始下标

      i = i - j + 1

      j = 0

  if j == m:

    return i - j

  return -1

Copy after login

Add print to the above and print it again

1

2

3

4

5

6

7

8

9

10

11

#修改的地方

else:

  i = i -j + 1

  j = 0

  print(target[i], pattern[j], i, j)

 

# 打印结果

b a 1 0

b a 2 0

 a 3 0

a a 4 0

Copy after login

The loop will continue until equal matching values. This method is inefficient, mainly because the template characters will be looped again when there is no match. It may occur up to m * (n-m 1) times. m is the length of template characters, n-m 1 is the number of times to exclude unequal characters.

KMP algorithm

kmp is an algorithm for shifting by known matching characters. For example, if abb is compared with abc above, ab is known.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

def match(target, pattern):

  i = j = 0

  n, m = len(target), len(pattern)

  while i < n and j < m:

    # 如果字符相等则目标和模板的下标都向右移

    if if j == -1 and target[i] == pattern[j]:

      i, j = i+1, j+1

    else:

      # 这里通过next 函数来判断位移个数

      i = i - j + pattern_next(pattern[:j])

      j = 0

  if j == m:

    return i - j

  return -1

 

 

def pattern_next(s): 

  prefix = [s[:i+1] for i in range(len(s)-1)]

  suffix = [s[i+1:] for i in range(len(s)-1)]

  l = list(set(prefix) & set(suffix))

  return len(l)

Copy after login

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template