我有两个数组 A 和 B。A 有多个值(这些值可以是字符串、整数或浮点数),B 有值 0 和 1。对于 A 中的每个唯一值,我需要与 B 中的 1 和 B 中的 0 重合的点数。这两个计数都需要存储为单独的变量。例如:
A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # input multivalue array; it has three unique values – 1,2,3 B = [0, 0, 0, 1, 1, 1, 0, 1, 0] # input binary array #Desired result: countA1_B1 = 1 #for unique value of '1' in A the count of places where there is '1' in B countA1_B0 = 3 #for unique value of '1' in A the count of places where there is '0' in B countAno1_B1 = 3 #for unique value of '1' in A the count of places where there is no '1' in A but there is '1' in B countAno1_B0 = 2 #for unique value of '1' in A the count of places where there is no '1' in A and there is '0' in B
我需要这个来获取 A 中的所有唯一值。A 数组/列表将是一个栅格,因此唯一值将不为人知。因此,代码将首先提取 A 中的唯一值,然后进行剩余的计算。
Import numpy as np A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # input array B = [0, 0, 0, 1, 1, 1, 0, 1, 0] # input binary array A_arr = np.array(A) A_unq = np.unique(A_arr) #code 1 A_masked_arrays = np.array((A_arr[None, :] == A_unq[:, None]).astype(int)) #code 2 # A_masked_arrays = [(A==unique_val).astype(int) for unique_val in np.unique(A)] print(A_masked_arrays) out = {val: arr for val, arr in zip(list(A_unq), list(A_arr))} #zip() throws error #TypeError: 'zip' object is not callable. dict = {} for i in A_unq: for j in A_masked_arrays: dict = i, j print(dict)
得到结果:
# from code 1 [[1 1 0 0 0 1 1 0 0] [0 0 0 1 1 0 0 0 0] [0 0 1 0 0 0 0 1 1]] # from code 2 [array([1, 1, 0, 0, 0, 1, 1, 0, 0]), array([0, 0, 0, 1, 1, 0, 0, 0, 0]), array([0, 0, 1, 0, 0, 0, 0, 1, 1])]
使用字典创建我得到了这个结果
(1, array([1, 1, 0, 0, 0, 1, 1, 0, 0])) (1, array([0, 0, 0, 1, 1, 0, 0, 0, 0])) (1, array([0, 0, 1, 0, 0, 0, 0, 1, 1])) (2, array([1, 1, 0, 0, 0, 1, 1, 0, 0])) (2, array([0, 0, 0, 1, 1, 0, 0, 0, 0])) (2, array([0, 0, 1, 0, 0, 0, 0, 1, 1])) (3, array([1, 1, 0, 0, 0, 1, 1, 0, 0])) (3, array([0, 0, 0, 1, 1, 0, 0, 0, 0])) (3, array([0, 0, 1, 0, 0, 0, 0, 1, 1]))
这就是我被困住的地方。从这里如何得到 A 中每个唯一值的最终计数,如 countA1_B1、countA1_B0、countAno1_B1、countAno1_B0 等等。需要帮助。提前谢谢。
使用 pandas 执行这种 groupby 操作要容易得多:
In [11]: import pandas as pd In [12]: df = pd.DataFrame({"A": A, "B": B}) In [13]: df Out[13]: A B 0 1 0 1 1 0 2 3 0 3 2 1 4 2 1 5 1 1 6 1 0 7 3 1 8 3 0
现在你可以使用 groupby:
In [14]: gb = df.groupby("A")["B"] In [15]: gb.count() # number of As Out[15]: A 1 4 2 2 3 3 Name: B, dtype: int64 In [16]: gb.sum() # number of As where B == 1 Out[16]: A 1 1 2 2 3 1 Name: B, dtype: int64 In [17]: gb.count() - gb.sum() # number of As where B == 0 Out[17]: A 1 3 2 0 3 2 Name: B, dtype: int64
您还可以通过应用更明确、更一般地执行此操作(例如,如果它不仅仅是 0 和 1):
In [18]: gb.apply(lambda x: (x == 1).sum()) Out[18]: A 1 1 2 2 3 1 Name: B, dtype: int64