我有两个numpy数组,它们定义了网格的x和y轴。例如:
numpy
x
y
x = numpy.array([1,2,3]) y = numpy.array([4,5])
我想生成这些数组的笛卡尔积以生成:
array([[1,4],[2,4],[3,4],[1,5],[2,5],[3,5]])
在某种程度上,这并不是非常低效的,因为我需要循环执行多次。我假设将它们转换为Python列表并使用itertools.product并返回到numpy数组不是最有效的形式。
Python
itertools.product
>>> numpy.transpose([numpy.tile(x, len(y)), numpy.repeat(y, len(x))]) array([[1, 4], [2, 4], [3, 4], [1, 5], [2, 5], [3, 5]])
规范的cartesian_product(几乎) 对于具有不同属性的此问题,有许多方法。有些比其他的更快,有些则更通用。经过大量的测试和调整,我发现cartesian_product对于许多输入而言,以下函数(计算n维)比大多数其他函数要快。有关稍微复杂一些但在许多情况下甚至更快的一对方法,请参阅Paul Panzer的答案。
cartesian_product
Paul Panzer
有了这个答案,就我所知,这不再是笛卡尔积的最快实现numpy。但是,我认为它的简单性将继续使其成为将来改进的有用基准:
def cartesian_product(*arrays): la = len(arrays) dtype = numpy.result_type(*arrays) arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype) for i, a in enumerate(numpy.ix_(*arrays)): arr[...,i] = a return arr.reshape(-1, la)
值得一提的是,此功能的使用ix_方式很不常见。记录的用法ix_是将索引生成 到数组中,而恰恰是具有相同形状的数组可用于广播分配。非常感谢mgilson(启发了我尝试使用ix_这种方法),以及unutbu(对这个答案提供了一些非常有用的反馈,包括使用建议)numpy.result_type。
替代品
有时以Fortran顺序写入连续的内存块会更快。这是替代方法的基础,该方法cartesian_product_transpose在某些硬件上已被证明比cartesian_product(见下文)更快。但是,使用相同原理的Paul Panzer的答案甚至更快。尽管如此,我还是在这里为感兴趣的读者提供这些:
def cartesian_product_transpose(*arrays): broadcastable = numpy.ix_(*arrays) broadcasted = numpy.broadcast_arrays(*broadcastable) rows, cols = numpy.prod(broadcasted[0].shape), len(broadcasted) dtype = numpy.result_type(*arrays) out = numpy.empty(rows * cols, dtype=dtype) start, end = 0, rows for a in broadcasted: out[start:end] = a.reshape(-1) start, end = end, end + rows return out.reshape(cols, rows).T
在了解了Panzer的方法之后,我写了一个新版本,该版本几乎和Panzer一样快,并且几乎很简单cartesian_product:
def cartesian_product_simple_transpose(arrays): la = len(arrays) dtype = numpy.result_type(*arrays) arr = numpy.empty([la] + [len(a) for a in arrays], dtype=dtype) for i, a in enumerate(numpy.ix_(*arrays)): arr[i, ...] = a return arr.reshape(la, -1).T
这似乎具有恒定的时间开销,这使得它在小输入情况下的运行速度比Panzer的慢。但是对于较大的输入,在我进行的所有测试中,它的性能都和他最快的实现一样好cartesian_product_transpose_pp。
在以下各节中,我将对其他替代方案进行一些测试。这些现在有些过时了,但是为了避免重复,我决定出于历史的考虑将它们留在这里。有关最新测试,请参阅Panzer的答案以及NicoSchlömer的答案。
替代测试
这是一系列测试,这些测试显示了其中某些功能相对于许多替代产品所提供的性能提升。此处显示的所有测试都是在运行Mac OS 10.12.5,Python 3.6.1和numpy1.12.1 的四核计算机上执行的。已知硬件和软件的变化会产生不同的结果,因此称为YMMV。确保自己运行这些测试!
定义:
import numpy import itertools from functools import reduce ### Two-dimensional products ### def repeat_product(x, y): return numpy.transpose([numpy.tile(x, len(y)), numpy.repeat(y, len(x))]) def dstack_product(x, y): return numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2) ### Generalized N-dimensional products ### def cartesian_product(*arrays): la = len(arrays) dtype = numpy.result_type(*arrays) arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype) for i, a in enumerate(numpy.ix_(*arrays)): arr[...,i] = a return arr.reshape(-1, la) def cartesian_product_transpose(*arrays): broadcastable = numpy.ix_(*arrays) broadcasted = numpy.broadcast_arrays(*broadcastable) rows, cols = numpy.prod(broadcasted[0].shape), len(broadcasted) dtype = numpy.result_type(*arrays) out = numpy.empty(rows * cols, dtype=dtype) start, end = 0, rows for a in broadcasted: out[start:end] = a.reshape(-1) start, end = end, end + rows return out.reshape(cols, rows).T # from https://stackoverflow.com/a/1235363/577088 def cartesian_product_recursive(*arrays, out=None): arrays = [numpy.asarray(x) for x in arrays] dtype = arrays[0].dtype n = numpy.prod([x.size for x in arrays]) if out is None: out = numpy.zeros([n, len(arrays)], dtype=dtype) m = n // arrays[0].size out[:,0] = numpy.repeat(arrays[0], m) if arrays[1:]: cartesian_product_recursive(arrays[1:], out=out[0:m,1:]) for j in range(1, arrays[0].size): out[j*m:(j+1)*m,1:] = out[0:m,1:] return out def cartesian_product_itertools(*arrays): return numpy.array(list(itertools.product(*arrays))) ### Test code ### name_func = [('repeat_product', repeat_product), ('dstack_product', dstack_product), ('cartesian_product', cartesian_product), ('cartesian_product_transpose', cartesian_product_transpose), ('cartesian_product_recursive', cartesian_product_recursive), ('cartesian_product_itertools', cartesian_product_itertools)] def test(in_arrays, test_funcs): global func global arrays arrays = in_arrays for name, func in test_funcs: print('{}:'.format(name)) %timeit func(*arrays) def test_all(*in_arrays): test(in_arrays, name_func) # `cartesian_product_recursive` throws an # unexpected error when used on more than # two input arrays, so for now I've removed # it from these tests. def test_cartesian(*in_arrays): test(in_arrays, name_func[2:4] + name_func[-1:]) x10 = [numpy.arange(10)] x50 = [numpy.arange(50)] x100 = [numpy.arange(100)] x500 = [numpy.arange(500)] x1000 = [numpy.arange(1000)]
检测结果:
In [2]: test_all(*(x100 * 2)) repeat_product: 67.5 µs ± 633 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) dstack_product: 67.7 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) cartesian_product: 33.4 µs ± 558 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cartesian_product_transpose: 67.7 µs ± 932 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cartesian_product_recursive: 215 µs ± 6.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cartesian_product_itertools: 3.65 ms ± 38.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [3]: test_all(*(x500 * 2)) repeat_product: 1.31 ms ± 9.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) dstack_product: 1.27 ms ± 7.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cartesian_product: 375 µs ± 4.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cartesian_product_transpose: 488 µs ± 8.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cartesian_product_recursive: 2.21 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_itertools: 105 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In [4]: test_all(*(x1000 * 2)) repeat_product: 10.2 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) dstack_product: 12 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product: 4.75 ms ± 57.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_transpose: 7.76 ms ± 52.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_recursive: 13 ms ± 209 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_itertools: 422 ms ± 7.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
在所有情况下,cartesian_product如本答案开头所定义的,最快。
对于那些接受任意数量的输入数组的函数,同样值得检查性能len(arrays) > 2。(直到我可以确定为什么cartesian_product_recursive在这种情况下会引发错误,我才从这些测试中将其删除。)
len(arrays) > 2
cartesian_product_recursive
In [5]: test_cartesian(*(x100 * 3)) cartesian_product: 8.8 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_transpose: 7.87 ms ± 91.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_itertools: 518 ms ± 5.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [6]: test_cartesian(*(x50 * 4)) cartesian_product: 169 ms ± 5.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) cartesian_product_transpose: 184 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) cartesian_product_itertools: 3.69 s ± 73.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [7]: test_cartesian(*(x10 * 6)) cartesian_product: 26.5 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cartesian_product_transpose: 16 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_itertools: 728 ms ± 16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [8]: test_cartesian(*(x10 * 7)) cartesian_product: 650 ms ± 8.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) cartesian_product_transpose: 518 ms ± 7.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) cartesian_product_itertools: 8.13 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
如这些测试所示,cartesian_product在输入阵列的数量增加到(大约)四个以上之前,它仍然具有竞争力。在那之后,cartesian_product_transpose确实有一点优势。
cartesian_product_transpose
值得重申的是,使用其他硬件和操作系统的用户可能会看到不同的结果。例如,unutbu报告在使用Ubuntu 14.04,Python 3.4.3和numpy1.14.0.dev0 + b7050a9的这些测试中看到以下结果:
Ubuntu 14.04,Python 3.4.3
numpy1.14.0.dev0 + b7050a9
>>> %timeit cartesian_product_transpose(x500, y500) 1000 loops, best of 3: 682 µs per loop >>> %timeit cartesian_product(x500, y500) 1000 loops, best of 3: 1.55 ms per loop
下面,我将详细介绍我按照这些思路进行的早期测试。对于不同的硬件和不同版本的Python和,这些方法的相对性能已经随着时间而改变numpy。尽管它对于使用的最新版本的人们不是立即有用的numpy,但它说明了自该答案的第一个版本以来情况已发生了变化。
一个简单的选择:meshgrid+dstack
meshgrid+dstack
当前接受的答案使用tile和repeat一起广播两个阵列。但是该meshgrid功能实际上是相同的。这是tile和repeat传递给转置之前的输出:
In [1]: import numpy In [2]: x = numpy.array([1,2,3]) ...: y = numpy.array([4,5]) ...: In [3]: [numpy.tile(x, len(y)), numpy.repeat(y, len(x))] Out[3]: [array([1, 2, 3, 1, 2, 3]), array([4, 4, 4, 5, 5, 5])]
这是输出meshgrid:
In [4]: numpy.meshgrid(x, y) Out[4]: [array([[1, 2, 3], [1, 2, 3]]), array([[4, 4, 4], [5, 5, 5]])]
如你所见,它几乎是相同的。我们只需要重整结果就可以得到完全相同的结果。
In [5]: xt, xr = numpy.meshgrid(x, y) ...: [xt.ravel(), xr.ravel()] Out[5]: [array([1, 2, 3, 1, 2, 3]), array([4, 4, 4, 5, 5, 5])]
不过,我们无需重新塑形,而可以传递meshgridto 的输出dstack并在之后进行塑形,从而节省了一些工作:
In [6]: numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2) Out[6]: array([[1, 4], [2, 4], [3, 4], [1, 5], [2, 5], [3, 5]])
与本评论中的主张相反,我没有看到任何证据表明不同的输入会产生不同形状的输出,并且如上面所展示的,它们做的事情非常相似,因此如果这样做会很奇怪。如果你找到反例,请告诉我。
测试meshgrid+ dstack与repeat+transpose
meshgrid+ dstack与repeat+transpose
这两种方法的相对性能已随时间变化。在早期版本的Python(2.7)中,使用meshgrid+ 的结果dstack对于较小的输入而言明显更快。(请注意,这些测试来自该答案的旧版本。)定义:
>>> def repeat_product(x, y): ... return numpy.transpose([numpy.tile(x, len(y)), numpy.repeat(y, len(x))]) ... >>> def dstack_product(x, y): ... return numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2) ...
对于中等大小的输入,我看到了明显的加速。但我numpy在较新的计算机上使用Python(3.6.1)和(1.12.1)的最新版本重试了这些测试。两种方法现在几乎相同。
旧测试
>>> x, y = numpy.arange(500), numpy.arange(500) >>> %timeit repeat_product(x, y) 10 loops, best of 3: 62 ms per loop >>> %timeit dstack_product(x, y) 100 loops, best of 3: 12.2 ms per loop
新测试
In [7]: x, y = numpy.arange(500), numpy.arange(500) In [8]: %timeit repeat_product(x, y) 1.32 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [9]: %timeit dstack_product(x, y) 1.26 ms ± 8.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
和往常一样,YMMV,但这表明在Python和numpy的最新版本中,它们是可互换的。
广义产品功能
通常,我们可能希望对于较小的输入使用内置函数会更快,而对于较大的输入,使用专用函数可能会更快。此外,对于广义n维的产品,tile而repeat不会帮助,因为他们没有明确的高维类似物。因此,值得研究专用功能的行为。
大多数相关测试都出现在此答案的开头,但是这里有一些在较早版本的Python上执行的测试,numpy用于比较。
另一个答案中cartesian定义的函数用于较大的输入时表现很好。(这是一样调用该函数上面)。为了比较到,我们只使用两个维度。cartesian_product_recursivecartesiandstack_prodct
cartesian
cartesian_product_recursivecartesiandstack_prodct
同样,旧测试显示出显着差异,而新测试显示几乎没有。
>>> x, y = numpy.arange(1000), numpy.arange(1000) >>> %timeit cartesian([x, y]) 10 loops, best of 3: 25.4 ms per loop >>> %timeit dstack_product(x, y) 10 loops, best of 3: 66.6 ms per loop
In [10]: x, y = numpy.arange(1000), numpy.arange(1000) In [11]: %timeit cartesian([x, y]) 12.1 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [12]: %timeit dstack_product(x, y) 12.7 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
和以前一样,dstack_product仍然cartesian以较小的比例跳动。
新测试(未显示冗余旧测试)
In [13]: x, y = numpy.arange(100), numpy.arange(100) In [14]: %timeit cartesian([x, y]) 215 µs ± 4.75 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [15]: %timeit dstack_product(x, y) 65.7 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
我认为这些区别很有趣,值得记录。但他们最终还是学术的。正如该答案开头的测试所示,所有这些版本的速度几乎总是比cartesian_product该答案开头定义的慢-它本身比该问题答案中最快的实现要慢一些。