Python Pandas-根据先前获得的子集从数据框中删除行

Z时代
2024-01-10
分类：问答

我运行Python 2.7与Pandas 0.11.0安装的库。

我一直在寻找一个尚未找到该问题答案的地方，所以我希望有人比我有解决方案的经验更丰富。

可以说，我在df1中的数据如下所示：

df1=

zip x y access 123 1 1 4 123 1 1 6 133 1 2 3 145 2 2 3 167 3 1 1 167 3 1 2

例如，使用，df2 = df1[df1['zip'] == 123]然后df2 = df2.join(df1[df1['zip'] ==

133])获得以下数据子集：

df2=

zip x y access 123 1 1 4 123 1 1 6 133 1 2 3

我想做的是：

1）从df1定义/加入的行中删除df2

要么

2）之后，df2已经被创建，删除该行（区别？）从df1其中df2由

希望所有这些都是有道理的。请让我知道是否需要更多信息。

编辑：

理想情况下，将创建第三个数据框，如下所示：

df2=

zip x y access 145 2 2 3 167 3 1 1 167 3 1 2

就是说，一切df1都不在df2。谢谢！

回答：

我想到两个选择。首先，使用isin和口罩：

>>> df
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> df_no
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2

二，使用方法groupby：

>>> grouped = df.groupby(df['zip'].isin(keep))

然后任何

>>> grouped.get_group(True)
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> grouped.get_group(False)
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> [g for k,g in list(grouped)]
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]
>>> dict(list(grouped))
{False:    zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2, True:    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3}
>>> dict(list(grouped)).values()
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]