我想在共享内存中使用一个numpy数组,以便与多处理模块一起使用。困难是像numpy数组一样使用它,而不仅仅是ctypes数组。
from multiprocessing import Process, Array import scipy def f(a): a[0] = -a[0] if __name__ == '__main__': # Create the array N = int(10) unshared_arr = scipy.rand(N) arr = Array('d', unshared_arr) print "Originally, the first two elements of arr = %s"%(arr[:2]) # Create, start, and finish the child processes p = Process(target=f, args=(arr,)) p.start() p.join() # Printing out the changed values print "Now, the first two elements of arr = %s"%arr[:2]
这将产生如下输出:
Originally, the first two elements of arr = [0.3518653236697369, 0.517794725524976] Now, the first two elements of arr = [-0.3518653236697369, 0.517794725524976]
可以ctypes方式访问该数组,例如arr[i]说得通。但是,它不是一个numpy数组,因此我无法执行-1*arr,或arr.sum()。我想一个解决方案是将ctypes数组转换为numpy数组。但是(除了无法完成这项工作外),我不相信会再共享它。
-1*arr
arr.sum()
ctypes
numpy
对于必须解决的常见问题,似乎将有一个标准解决方案。
添加到@unutbu(不再可用)和@Henry Gomersall的答案中。你可以shared_arr.get_lock()在需要时使用来同步访问:
@unutbu
@Henry Gomersall
shared_arr.get_lock()
shared_arr = mp.Array(ctypes.c_double, N) # ... def f(i): # could be anything numpy accepts as an index such another numpy array with shared_arr.get_lock(): # synchronize access arr = np.frombuffer(shared_arr.get_obj()) # no data copying arr[i] = -arr[i]
例
import ctypes import logging import multiprocessing as mp from contextlib import closing import numpy as np info = mp.get_logger().info def main(): logger = mp.log_to_stderr() logger.setLevel(logging.INFO) # create shared array N, M = 100, 11 shared_arr = mp.Array(ctypes.c_double, N) arr = tonumpyarray(shared_arr) # fill with random values arr[:] = np.random.uniform(size=N) arr_orig = arr.copy() # write to arr from different processes with closing(mp.Pool(initializer=init, initargs=(shared_arr,))) as p: # many processes access the same slice stop_f = N // 10 p.map_async(f, [slice(stop_f)]*M) # many processes access different slices of the same array assert M % 2 # odd step = N // 10 p.map_async(g, [slice(i, i + step) for i in range(stop_f, N, step)]) p.join() assert np.allclose(((-1)**M)*tonumpyarray(shared_arr), arr_orig) def init(shared_arr_): global shared_arr shared_arr = shared_arr_ # must be inherited, not passed as an argument def tonumpyarray(mp_arr): return np.frombuffer(mp_arr.get_obj()) def f(i): """synchronized.""" with shared_arr.get_lock(): # synchronize access g(i) def g(i): """no synchronization.""" info("start %s" % (i,)) arr = tonumpyarray(shared_arr) arr[i] = -1 * arr[i] info("end %s" % (i,)) if __name__ == '__main__': mp.freeze_support() main()
如果不需要同步访问或创建自己的锁,则mp.Array()没有必要。mp.sharedctypes.RawArray在这种情况下,你可以使用。