小能豆

groupby streak of numbers and one row after it then check the first value of a column for each group

py

This is an extension to this post.

This is my dataframe:

import pandas as pd
df = pd.DataFrame(
    {
        'a': [ 1, 1, 1,  0, 1,  0,  1, 1,  0,  0, 1, 1,  0,  0],
        'b': [-1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, -1]
    }
)

And my desired outcome which is about grouping them is:

    a  b
4   1  1
5   0 -1

10  1  1
11  1  1
12  0 -1

Basically, I want to group them by streak of 1 and one row after where streak ends in column a. This answer does that:

g = df.loc[::-1, 'a'].eq(0).cumsum()

out = [g for _,g in df.groupby(g, sort=False) if len(g)>1]

But now what I want is check if the first value in b for each group is 1.

I don’t know what is the best approach to check the first value of b. This is what I have tried but I am not sure if it works in every case.

groups = df.groupby(g).filter(lambda x: x.b.iloc[0] == 1)

I have experienced some situations where the code works in an example but it does not work in every situation with different conditions so I want to double check my code.


阅读 73

收藏
2023-12-12

共1个答案

小能豆

Your approach using filter seems reasonable, but it might be more robust to check the first value of ‘b’ directly when creating the groups. Here’s a modification of your code:

import pandas as pd

df = pd.DataFrame(
    {
        'a': [1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0],
        'b': [-1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, -1]
    }
)

# Create groups based on streaks of 1 in column 'a'
g = df.loc[::-1, 'a'].eq(0).cumsum()

# Filter groups where the first value in column 'b' is 1
groups = df.groupby(g).filter(lambda x: x['b'].iloc[-1] == 1)
print(groups)

In this modification, groups will contain only the groups where the last (since we reversed the DataFrame with [::-1]) value in column ‘b’ is 1. This should handle situations where you want to check the first value in ‘b’ for each group.

Remember that the condition x['b'].iloc[-1] == 1 checks the last value of ‘b’ in each group, and you can adjust it if needed based on your specific requirements.

2023-12-12