Let’s say I have a dataframe df
. First I filter it, then I need to find all rows that satisfy a criteria and then mark all these rows plus 2 previous rows for each in a specific bool column in the original dataframe:
df = pd.DataFrame({{'A': {0: nan, 1: nan, 2: 1944.09, 3: nan, 4: nan, 5: 1926.0, 6: nan, 7: 1930.31, 8: nan, 9: nan, 10: nan, 11: nan, 12: nan, 13: nan, 14: nan, 15: 1917.66, 16: 1920.43, 17: nan, 18: 1909.04, 19: nan, 20: nan, 21: nan, 22: nan, 23: nan, 24: 1920.05, 25: nan, 26: 1915.4, 27: 1921.87, 28: nan, 29: nan, 30: nan, 31: 1912.42, 32: 1920.08, 33: 1915.8, 34: nan, 35: nan, 36: nan, 37: nan, 38: 1919.71, 39: 1916.2, 40: nan, 41: 1926.79, 42: nan, 43: 1918.66, 44: nan, 45: 1925.5, 46: 1922.22, 47: nan, 48: nan, 49: 1927.87, 50: 1923.24, 51: nan, 52: 1929.53, 53: nan, 54: nan, 55: nan, 56: nan, 57: nan, 58: nan, 59: nan, 60: nan, 61: 1918.37, 62: nan, 63: nan, 64: 1923.61, 65: nan, 66: 1917.1, 67: nan, 68: nan, 69: nan, 70: nan, 71: nan, 72: nan, 73: nan, 74: nan, 75: nan, 76: nan, 77: nan, 78: nan, 79: nan, 80: nan, 81: 1924.48, 82: nan, 83: nan, 84: 1923.03, 85: nan, 86: nan, 87: nan, 88: nan, 89: 1926.87, 90: nan, 91: nan, 92: nan, 93: 1921.79, 94: nan, 95: 1925.27, 96: nan, 97: 1919.0, 98: nan, 99: nan, 100: 1923.74, 101: nan, 102: nan, 103: nan, 104: nan, 105: 1911.61, 106: nan, 107: 1923.33, 108: nan, 109: nan, 110: nan, 111: 1912.0, 112: nan, 113: 1915.8, 114: nan, 115: 1913.05, 116: nan, 117: nan, 118: nan, 119: nan, 120: nan, 121: nan, 122: 1916.93, 123: nan, 124: 1913.69, 125: nan, 126: nan, 127: nan, 128: nan, 129: 1918.38, 130: 1913.7, 131: nan, 132: nan, 133: nan, 134: nan, 135: nan, 136: 1919.5, 137: nan, 138: 1916.14, 139: nan, 140: nan, 141: nan, 142: nan, 143: nan, 144: 1921.28, 145: nan, 146: nan, 147: nan, 148: nan, 149: nan, 150: 1915.0, 151: nan, 152: nan, 153: nan, 154: nan, 155: nan, 156: 1927.48, 157: 1889.17, 158: nan, 159: 1921.91, 160: 1917.67, 161: 1923.23, 162: nan, 163: nan, 164: nan, 165: 1909.88, 166: nan, 167: 1913.82, 168: 1902.51, 169: nan, 170: nan, 171: nan, 172: nan, 173: nan, 174: nan, 175: nan, 176: nan, 177: nan, 178: nan, 179: 1920.15}, 'C': {0: False, 1: False, 2: True, 3: False, 4: False, 5: False, 6: False, 7: False, 8: False, 9: False, 10: False, 11: False, 12: False, 13: False, 14: False, 15: False, 16: False, 17: False, 18: False, 19: False, 20: False, 21: False, 22: False, 23: False, 24: False, 25: False, 26: False, 27: True, 28: False, 29: False, 30: False, 31: False, 32: True, 33: False, 34: False, 35: False, 36: False, 37: False, 38: False, 39: False, 40: False, 41: True, 42: False, 43: False, 44: False, 45: False, 46: False, 47: False, 48: False, 49: False, 50: False, 51: False, 52: True, 53: False, 54: False, 55: False, 56: False, 57: False, 58: False, 59: False, 60: False, 61: False, 62: False, 63: False, 64: False, 65: False, 66: False, 67: False, 68: False, 69: False, 70: False, 71: False, 72: False, 73: False, 74: False, 75: False, 76: False, 77: False, 78: False, 79: False, 80: False, 81: False, 82: False, 83: False, 84: False, 85: False, 86: False, 87: False, 88: False, 89: True, 90: False, 91: False, 92: False, 93: False, 94: False, 95: False, 96: False, 97: False, 98: False, 99: False, 100: False, 101: False, 102: False, 103: False, 104: False, 105: False, 106: False, 107: True, 108: False, 109: False, 110: False, 111: False, 112: False, 113: False, 114: False, 115: False, 116: False, 117: False, 118: False, 119: False, 120: False, 121: False, 122: False, 123: False, 124: False, 125: False, 126: False, 127: False, 128: False, 129: False, 130: False, 131: False, 132: False, 133: False, 134: False, 135: False, 136: False, 137: False, 138: False, 139: False, 140: False, 141: False, 142: False, 143: False, 144: False, 145: False, 146: False, 147: False, 148: False, 149: False, 150: False, 151: False, 152: False, 153: False, 154: False, 155: False, 156: True, 157: False, 158: False, 159: False, 160: False, 161: True, 162: False, 163: False, 164: False, 165: False, 166: False, 167: False, 168: False, 169: False, 170: False, 171: False, 172: False, 173: False, 174: False, 175: False, 176: False, 177: False, 178: False, 179: False}})
df2 = df[df.C]
print(df2)
2 1944.09
27 1921.87
32 1920.08
41 1926.79
52 1929.53
89 1926.87
107 1923.33
156 1927.48
161 1923.23
Name: A, dtype: float64
df3 = df2[(df2.A > df2.A.shift(1)) & (df2.A.shift(1) > df2.A.shift(2))]
print(df3)
52 1929.53
Name: A, dtype: float64
df[FROM_2nd_PREVIOUS_TO_EVERY_ROW_OF_df3_IN_df2:TO_EVERY_ROW_IN_df3, 'B'] = True
print(df)
A B
0 NaN False
.. ... ...
31 1234.56 False
32 1920.08 True
33 1234.56 True
.. ... ...
41 1926.79 True
40 1234.56 True
.. ... ...
51 1234.56 True
52 1929.53 True
52 1234.56 False
.. ... ...
176 NaN False
177 NaN False
178 NaN False
179 1920.15 False
What is the correct way to do it?
You can achieve this by using the following steps:
df2
that satisfy the condition.df
that need to be marked as True
.True
to the specified rows in the new column.Here’s the modified code:
import pandas as pd
import numpy as np
# Assuming df is defined as provided in the question
df2 = df[df['C']]
print(df2)
df3 = df2[(df2['A'] > df2['A'].shift(1)) & (df2['A'].shift(1) > df2['A'].shift(2))]
print(df3)
# Get the indices of rows in df2 that satisfy the condition
indices = df3.index
# Create a boolean mask for the rows in df
mask = np.zeros(len(df), dtype=bool)
# Set the corresponding values in the mask to True
mask[indices - 2] = True # Subtract 2 to mark 2 previous rows
# Add a new column 'B' to df and set values based on the mask
df['B'] = mask
print(df)
This will correctly mark the rows in df
according to your specified conditions. Note that the indices - 2
part ensures that the two previous rows are marked
To mark the specified rows in the original DataFrame (df
) based on the conditions, you can use the following code:
import pandas as pd
import numpy as np
# Assuming df is defined as provided in the question
df2 = df[df['C']]
print(df2)
df3 = df2[(df2['A'] > df2['A'].shift(1)) & (df2['A'].shift(1) > df2['A'].shift(2))]
print(df3)
# Get the indices of rows in df that need to be marked as True
indices_to_mark = np.where(df['C'])[0][df3.index - 2]
# Add a new column 'B' to df and set values based on the indices
df['B'] = False
df.loc[indices_to_mark, 'B'] = True
print(df)
This code first identifies the indices of rows in df
that satisfy the condition and then marks those rows as True
in the new column ‘B’.