我拥有的这段代码看起来像这样:
glbl_array = # a 3 Gb array def my_func( args, def_param = glbl_array): #do stuff on args and def_param if __name__ == '__main__': pool = Pool(processes=4) pool.map(my_func, range(1000))
有没有一种方法可以确保(或鼓励)不同的进程不会获得glbl_array的副本而是共享它。如果没有办法停止复制,我将使用内存映射数组,但是我的访问模式不是很规则,因此我希望内存映射数组会更慢。以上似乎是要尝试的第一件事。这是在Linux上。我只是想从Stackoverflow获得一些建议,而又不想惹恼sysadmin。您认为第二个参数是真正的不可变对象(如)是否会有所帮助glbl_array.tostring()?
glbl_array.tostring()
您可以multiprocessing轻松地将共享内存与Numpy一起使用:
multiprocessing
import multiprocessing import ctypes import numpy as np shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10) shared_array = np.ctypeslib.as_array(shared_array_base.get_obj()) shared_array = shared_array.reshape(10, 10) #-- edited 2015-05-01: the assert check below checks the wrong thing # with recent versions of Numpy/multiprocessing. That no copy is made # is indicated by the fact that the program prints the output shown below. ## No copy was made ##assert shared_array.base.base is shared_array_base.get_obj() # Parallel processing def my_func(i, def_param=shared_array): shared_array[i,:] = i if __name__ == '__main__': pool = multiprocessing.Pool(processes=4) pool.map(my_func, range(10)) print shared_array
哪个打印 ``
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.] [ 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.] [ 4. 4. 4. 4. 4. 4. 4. 4. 4. 4.] [ 5. 5. 5. 5. 5. 5. 5. 5. 5. 5.] [ 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.] [ 7. 7. 7. 7. 7. 7. 7. 7. 7. 7.] [ 8. 8. 8. 8. 8. 8. 8. 8. 8. 8.] [ 9. 9. 9. 9. 9. 9. 9. 9. 9. 9.]]
但是,Linux在上具有写入时复制的语义fork(),因此,即使不使用multiprocessing.Array,也不会复制数据,除非将其写入。
fork()
multiprocessing.Array