This is an extension to this post.
This is my dataframe:
import pandas as pd df = pd.DataFrame( { 'a': [ 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0], 'b': [-1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, -1] } )
And my desired outcome which is about grouping them is:
a b 4 1 1 5 0 -1 10 1 1 11 1 1 12 0 -1
Basically, I want to group them by streak of 1 and one row after where streak ends in column a. This answer does that:
a
g = df.loc[::-1, 'a'].eq(0).cumsum() out = [g for _,g in df.groupby(g, sort=False) if len(g)>1]
But now what I want is check if the first value in b for each group is 1.
b
I don’t know what is the best approach to check the first value of b. This is what I have tried but I am not sure if it works in every case.
groups = df.groupby(g).filter(lambda x: x.b.iloc[0] == 1)
I have experienced some situations where the code works in an example but it does not work in every situation with different conditions so I want to double check my code.
Your approach using filter seems reasonable, but it might be more robust to check the first value of ‘b’ directly when creating the groups. Here’s a modification of your code:
filter
import pandas as pd df = pd.DataFrame( { 'a': [1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0], 'b': [-1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, -1] } ) # Create groups based on streaks of 1 in column 'a' g = df.loc[::-1, 'a'].eq(0).cumsum() # Filter groups where the first value in column 'b' is 1 groups = df.groupby(g).filter(lambda x: x['b'].iloc[-1] == 1) print(groups)
In this modification, groups will contain only the groups where the last (since we reversed the DataFrame with [::-1]) value in column ‘b’ is 1. This should handle situations where you want to check the first value in ‘b’ for each group.
groups
[::-1]
Remember that the condition x['b'].iloc[-1] == 1 checks the last value of ‘b’ in each group, and you can adjust it if needed based on your specific requirements.
x['b'].iloc[-1] == 1