使用pandas比较两列

Z时代
2024-01-10
分类：问答

以此为起点：

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
Out[8]: 
  one  two three
0   10  1.2   4.2
1   15  70   0.03
2    8   5     0

我想if在熊猫中使用类似声明的内容。

if df['one'] >= df['two'] and df['one'] <= df['three']:
    df['que'] = df['one']

基本上，通过if语句检查每一行，然后创建新列。

文档说要使用，.all但没有示例…

回答：

您可以使用np.where。如果cond是布尔数组，A并且B是数组，则

C = np.where(cond, A, B)

将C定义为等于A哪里cond为True，B哪里cond为False。

import numpy as np
import pandas as pd
a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df['que'] = np.where((df['one'] >= df['two']) & (df['one'] <= df['three'])
                     , df['one'], np.nan)

产量

one two three que 0 10 1.2 4.2 10 1 15 70 0.03 NaN 2 8 5 0 NaN

如果您有多个条件，则可以使用np.select代替。例如，如果你想df['que']等于df['two']时df['one']

< df['two']，则

conditions = [
    (df['one'] >= df['two']) & (df['one'] <= df['three']), 
    df['one'] < df['two']]
choices = [df['one'], df['two']]
df['que'] = np.select(conditions, choices, default=np.nan)

产量

one two three que 0 10 1.2 4.2 10 1 15 70 0.03 70 2 8 5 0 NaN

如果我们可以假设df['one'] >= df['two']whendf['one'] < df['two']为False，那么条件和选择可以简化为

conditions = [
    df['one'] < df['two'],
    df['one'] <= df['three']]
choices = [df['two'], df['one']]

（如果包含df['one']或df['two']包含NaN，则该假设可能不正确。）

注意

a = [['10', '1.2', '4.2'], ['15', '70', '0.03'], ['8', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])

用字符串值定义一个DataFrame。由于它们看起来是数字，因此最好将这些字符串转换为浮点数：

df2 = df.astype(float)

但是，这会改变结果，因为字符串会逐个字符地进行比较，而浮点数会进行数字比较。

In [61]: '10' <= '4.2'
Out[61]: True
In [62]: 10 <= 4.2
Out[62]: False

以上是使用pandas比较两列的全部内容，来源链接： utcz.com/qa/435141.html

使用pandas比较两列

回答：

其他人也看了：