search

Home  >  Q&A  >  body text

python - 求优化一个查找有序list的相同子串的算法

输入:有序的list
要求:查找相同的子串,且相同子串必须是连续的,例如[1,2,1,2]就是1,2子串为相同的,但[1,2,3,1,2]就不是。
输出:[(字串,次数), ...]

现在的问题是这两重循环导致了算法为O(n^2),效率过于低下了,求优化~~~

from collections import defaultdict

def get_patterns(values):
    start = 0
    total_length = len(values)
    patterns = defaultdict(lambda : 1)
    while start < total_length:
        next_start = start + 2
        while next_start < total_length:
            ori_values = values[start:next_start]
            if ori_values == values[next_start:(next_start+len(ori_values))]:
                patterns[tuple(ori_values)] += 1
                next_start += len(ori_values)
            else:
                next_start += 1
        start += 1
    return list(patterns.iteritems())

if __name__ == '__main__':
    print get_patterns([1,2,3,5,1,2])
    print get_patterns([1,2,1,2,3,5])
    print get_patterns([1,2,1,2,1,3,5])

运行结果:

[]
[((1, 2), 2)]
[((1, 2), 2), ((2, 1), 2)]
PHP中文网PHP中文网2888 days ago430

reply all(2)I'll reply

  • 迷茫

    迷茫2017-04-18 09:14:51

    It’s done. After using numpy, the performance is so good. Now there are no performance problems...

    reply
    0
  • 巴扎黑

    巴扎黑2017-04-18 09:14:51

    If you want to output all specific substrings, the efficiency is very slow.

    reply
    0
  • Cancelreply