一尘不染

如何在SQL中使用'in'和'not in'过滤Pandas数据帧

python pandas

怎样才能达到SQL IN和的等效NOT IN

我有一个包含所需值的列表。这是案例:

df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = ['UK','China']

# pseudo-code:
df[df['countries'] not in countries]

我目前的做法如下:

df = pd.DataFrame({'countries':['US','UK','Germany','China']})
countries = pd.DataFrame({'countries':['UK','China'], 'matched':True})

# IN
df.merge(countries,how='inner',on='countries')

# NOT IN
not_in = df.merge(countries,how='left',on='countries')
not_in = not_in[pd.isnull(not_in['matched'])]

但这似乎是一个可怕的冲突。有人可以改进吗?


阅读 817

收藏
2020-02-04

共1个答案

一尘不染

您可以使用pd.Series.isin

对于”IN”使用: something.isin(somewhere)

或对于”NOT IN”: ~something.isin(somewhere)

作为一个工作示例:

>>> df
  countries
0        US
1        UK
2   Germany
3     China
>>> countries
['UK', 'China']
>>> df.countries.isin(countries)
0    False
1     True
2    False
3     True
Name: countries, dtype: bool
>>> df[df.countries.isin(countries)]
  countries
1        UK
3     China
>>> df[~df.countries.isin(countries)]
  countries
0        US
2   Germany
2020-02-04