在两个 python 列表开头查找公共元素的最快方法？

小能豆

在两个 python 列表开头查找公共元素的最快方法？

在两个 Python 列表的开头找到公共元素的最快方法是什么？我使用 for 循环对其进行了编码，但我认为使用列表推导式编写它会更快……不幸的是，我不知道如何在列表推导式中中断。这是我写的代码：

import datetime

list1=[1,2,3,4,5,6]
list2=[1,2,4,3,5,6]

#This is the "for loop" version, and takes about 60 ms on my machine
start=datetime.datetime.now()
out=[]
    for (e1, e2) in zip(list1, list2):
    if e1 == e2:
        out.append(e1)
    else:
        break
end=datetime.datetime.now()
print out
print "Execution time: %s ms" % (float((end - start).microseconds) / 1000)

#This is the list-comprehension version, it takes about 15 ms to run,
#but unfortunately returns the wrong result because I can't break the loop.
start=datetime.datetime.now()
out = [ e1 for (e1, e2) in zip(list1, list2) if e1 == e2 ]
end=datetime.datetime.now()
print out
print "Execution time: %s ms" % (float((end - start).microseconds) / 1000)

有没有不使用列表理解的好的解决方案？

阅读 9

2024-11-19

共1个答案

小能豆

你的问题在于试图在列表推导式中实现短路行为（即在不满足条件时终止迭代）。遗憾的是，Python 的列表推导式并不支持中途 break。不过，你可以使用更高效的方法解决这个问题。

下面是一些改进或替代解决方案：

1. 使用 `itertools.takewhile`

itertools.takewhile 是一个很好的工具，它可以从迭代器中获取满足特定条件的连续元素，一旦条件失败，就会停止处理。

from itertools import takewhile
import datetime

list1 = [1, 2, 3, 4, 5, 6]
list2 = [1, 2, 4, 3, 5, 6]

start = datetime.datetime.now()
out = [e1 for e1, e2 in takewhile(lambda pair: pair[0] == pair[1], zip(list1, list2))]
end = datetime.datetime.now()

print(out)
print("Execution time: %s ms" % (float((end - start).microseconds) / 1000))

这段代码会在两个列表中遇到第一个不相等的元素时停止。

2. 使用生成器函数

编写一个生成器函数手动实现短路逻辑：

def common_prefix(list1, list2):
    for e1, e2 in zip(list1, list2):
        if e1 != e2:
            break
        yield e1

list1 = [1, 2, 3, 4, 5, 6]
list2 = [1, 2, 4, 3, 5, 6]

start = datetime.datetime.now()
out = list(common_prefix(list1, list2))
end = datetime.datetime.now()

print(out)
print("Execution time: %s ms" % (float((end - start).microseconds) / 1000))

生成器在逻辑上等效于你的 for 循环版本，但更加优雅。

3. 原地优化的 `for` 循环

虽然这是你当前的实现，但优化其性能和简洁性可能是一个实际的选择。直接添加条件逻辑以避免额外的处理：

list1 = [1, 2, 3, 4, 5, 6]
list2 = [1, 2, 4, 3, 5, 6]

start = datetime.datetime.now()
out = []
for e1, e2 in zip(list1, list2):
    if e1 == e2:
        out.append(e1)
    else:
        break
end = datetime.datetime.now()

print(out)
print("Execution time: %s ms" % (float((end - start).microseconds) / 1000))

4. 使用 NumPy（适用于长列表）

如果你处理非常大的列表并希望进一步提升性能，NumPy 提供了一种高效的方法：

import numpy as np
import datetime

list1 = np.array([1, 2, 3, 4, 5, 6])
list2 = np.array([1, 2, 4, 3, 5, 6])

start = datetime.datetime.now()
mask = list1 == list2
out = list1[:np.argmax(~mask)] if not mask.all() else list1
end = datetime.datetime.now()

print(out.tolist())
print("Execution time: %s ms" % (float((end - start).microseconds) / 1000))

此方法使用布尔掩码来高效地找到第一个不匹配的元素。