小能豆

Pandas change values in pandas dataframe based on a slice with selected indexes and shift()

py

Let’s say I have a dataframe df. First I filter it, then I need to find all rows that satisfy a criteria and then mark all these rows plus 2 previous rows for each in a specific bool column in the original dataframe:

df = pd.DataFrame({{'A': {0: nan, 1: nan, 2: 1944.09, 3: nan, 4: nan, 5: 1926.0, 6: nan, 7: 1930.31, 8: nan, 9: nan, 10: nan, 11: nan, 12: nan, 13: nan, 14: nan, 15: 1917.66, 16: 1920.43, 17: nan, 18: 1909.04, 19: nan, 20: nan, 21: nan, 22: nan, 23: nan, 24: 1920.05, 25: nan, 26: 1915.4, 27: 1921.87, 28: nan, 29: nan, 30: nan, 31: 1912.42, 32: 1920.08, 33: 1915.8, 34: nan, 35: nan, 36: nan, 37: nan, 38: 1919.71, 39: 1916.2, 40: nan, 41: 1926.79, 42: nan, 43: 1918.66, 44: nan, 45: 1925.5, 46: 1922.22, 47: nan, 48: nan, 49: 1927.87, 50: 1923.24, 51: nan, 52: 1929.53, 53: nan, 54: nan, 55: nan, 56: nan, 57: nan, 58: nan, 59: nan, 60: nan, 61: 1918.37, 62: nan, 63: nan, 64: 1923.61, 65: nan, 66: 1917.1, 67: nan, 68: nan, 69: nan, 70: nan, 71: nan, 72: nan, 73: nan, 74: nan, 75: nan, 76: nan, 77: nan, 78: nan, 79: nan, 80: nan, 81: 1924.48, 82: nan, 83: nan, 84: 1923.03, 85: nan, 86: nan, 87: nan, 88: nan, 89: 1926.87, 90: nan, 91: nan, 92: nan, 93: 1921.79, 94: nan, 95: 1925.27, 96: nan, 97: 1919.0, 98: nan, 99: nan, 100: 1923.74, 101: nan, 102: nan, 103: nan, 104: nan, 105: 1911.61, 106: nan, 107: 1923.33, 108: nan, 109: nan, 110: nan, 111: 1912.0, 112: nan, 113: 1915.8, 114: nan, 115: 1913.05, 116: nan, 117: nan, 118: nan, 119: nan, 120: nan, 121: nan, 122: 1916.93, 123: nan, 124: 1913.69, 125: nan, 126: nan, 127: nan, 128: nan, 129: 1918.38, 130: 1913.7, 131: nan, 132: nan, 133: nan, 134: nan, 135: nan, 136: 1919.5, 137: nan, 138: 1916.14, 139: nan, 140: nan, 141: nan, 142: nan, 143: nan, 144: 1921.28, 145: nan, 146: nan, 147: nan, 148: nan, 149: nan, 150: 1915.0, 151: nan, 152: nan, 153: nan, 154: nan, 155: nan, 156: 1927.48, 157: 1889.17, 158: nan, 159: 1921.91, 160: 1917.67, 161: 1923.23, 162: nan, 163: nan, 164: nan, 165: 1909.88, 166: nan, 167: 1913.82, 168: 1902.51, 169: nan, 170: nan, 171: nan, 172: nan, 173: nan, 174: nan, 175: nan, 176: nan, 177: nan, 178: nan, 179: 1920.15}, 'C': {0: False, 1: False, 2: True, 3: False, 4: False, 5: False, 6: False, 7: False, 8: False, 9: False, 10: False, 11: False, 12: False, 13: False, 14: False, 15: False, 16: False, 17: False, 18: False, 19: False, 20: False, 21: False, 22: False, 23: False, 24: False, 25: False, 26: False, 27: True, 28: False, 29: False, 30: False, 31: False, 32: True, 33: False, 34: False, 35: False, 36: False, 37: False, 38: False, 39: False, 40: False, 41: True, 42: False, 43: False, 44: False, 45: False, 46: False, 47: False, 48: False, 49: False, 50: False, 51: False, 52: True, 53: False, 54: False, 55: False, 56: False, 57: False, 58: False, 59: False, 60: False, 61: False, 62: False, 63: False, 64: False, 65: False, 66: False, 67: False, 68: False, 69: False, 70: False, 71: False, 72: False, 73: False, 74: False, 75: False, 76: False, 77: False, 78: False, 79: False, 80: False, 81: False, 82: False, 83: False, 84: False, 85: False, 86: False, 87: False, 88: False, 89: True, 90: False, 91: False, 92: False, 93: False, 94: False, 95: False, 96: False, 97: False, 98: False, 99: False, 100: False, 101: False, 102: False, 103: False, 104: False, 105: False, 106: False, 107: True, 108: False, 109: False, 110: False, 111: False, 112: False, 113: False, 114: False, 115: False, 116: False, 117: False, 118: False, 119: False, 120: False, 121: False, 122: False, 123: False, 124: False, 125: False, 126: False, 127: False, 128: False, 129: False, 130: False, 131: False, 132: False, 133: False, 134: False, 135: False, 136: False, 137: False, 138: False, 139: False, 140: False, 141: False, 142: False, 143: False, 144: False, 145: False, 146: False, 147: False, 148: False, 149: False, 150: False, 151: False, 152: False, 153: False, 154: False, 155: False, 156: True, 157: False, 158: False, 159: False, 160: False, 161: True, 162: False, 163: False, 164: False, 165: False, 166: False, 167: False, 168: False, 169: False, 170: False, 171: False, 172: False, 173: False, 174: False, 175: False, 176: False, 177: False, 178: False, 179: False}})

df2 = df[df.C]
print(df2)

2      1944.09
27     1921.87
32     1920.08
41     1926.79
52     1929.53
89     1926.87
107    1923.33
156    1927.48
161    1923.23
Name: A, dtype: float64

df3 = df2[(df2.A > df2.A.shift(1)) & (df2.A.shift(1) > df2.A.shift(2))]
print(df3)

52    1929.53
Name: A, dtype: float64

df[FROM_2nd_PREVIOUS_TO_EVERY_ROW_OF_df3_IN_df2:TO_EVERY_ROW_IN_df3, 'B'] = True
print(df)
           A        B
0        NaN    False
..       ...      ...
31   1234.56    False
32   1920.08    True
33   1234.56    True
..       ...      ...
41   1926.79    True
40   1234.56    True
..       ...      ...
51   1234.56    True
52   1929.53    True
52   1234.56    False
..       ...      ...
176      NaN    False
177      NaN    False
178      NaN    False
179  1920.15    False

What is the correct way to do it?


阅读 59

收藏
2023-12-15

共1个答案

小能豆

You can achieve this by using the following steps:

  1. Identify the indices of the rows in df2 that satisfy the condition.
  2. Create a boolean mask for the rows in df that need to be marked as True.
  3. Assign True to the specified rows in the new column.

Here’s the modified code:

import pandas as pd
import numpy as np

# Assuming df is defined as provided in the question

df2 = df[df['C']]
print(df2)

df3 = df2[(df2['A'] > df2['A'].shift(1)) & (df2['A'].shift(1) > df2['A'].shift(2))]
print(df3)

# Get the indices of rows in df2 that satisfy the condition
indices = df3.index

# Create a boolean mask for the rows in df
mask = np.zeros(len(df), dtype=bool)

# Set the corresponding values in the mask to True
mask[indices - 2] = True  # Subtract 2 to mark 2 previous rows

# Add a new column 'B' to df and set values based on the mask
df['B'] = mask

print(df)

This will correctly mark the rows in df according to your specified conditions. Note that the indices - 2 part ensures that the two previous rows are marked

To mark the specified rows in the original DataFrame (df) based on the conditions, you can use the following code:

import pandas as pd
import numpy as np

# Assuming df is defined as provided in the question

df2 = df[df['C']]
print(df2)

df3 = df2[(df2['A'] > df2['A'].shift(1)) & (df2['A'].shift(1) > df2['A'].shift(2))]
print(df3)

# Get the indices of rows in df that need to be marked as True
indices_to_mark = np.where(df['C'])[0][df3.index - 2]

# Add a new column 'B' to df and set values based on the indices
df['B'] = False
df.loc[indices_to_mark, 'B'] = True

print(df)

This code first identifies the indices of rows in df that satisfy the condition and then marks those rows as True in the new column ‘B’.

2023-12-15