优化:值替换在数据帧wiith多个条件

我具有类似于该样品的数据帧:根据在两列我要通过大小和颜色的项进行分类的信息优化:值替换在数据帧wiith多个条件

df <- structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100)), .Names = c("Ball", "size"), class = "data.frame", row.names = c(NA, -6L)) 

。输出应该是这样的:

structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100), Class = c("small red ball", "small red ball", "small blue ball", "medium red ball", "medium blue ball", "big red ball")), row.names = c(NA, -6L), .Names = c("Ball", "size", "Class"), class = "data.frame") 

我已经运行的代码,但是它很长,混乱的,我相信有一种更简洁的方式让我所需的输出。

那么我做了什么?

我开始选择第一类的项目和重命名选定df$Class值:

df["Class"] <- NA #add new column 

df[grepl("red", df$Ball) & df$size <10, ]$Class <- "small red ball"

因为我grepl选择有时是空的,我加了if (length() > 0)条件:

if (length(df[grepl("red", df$Ball) & df$size <10, ]$Class) > 0) {df[grepl("red", df$Ball) & df$size <10, ]$Class <- "small red ball"} 

最后我结合我在一个循环中的所有选择

df["Class"] <- NA #add new column 

z <- c("red", "blue")

for (i in z){

if (length(df[grepl(i, df$Ball) & df$size <10, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size <10, ]$Class <- paste("small", i, "ball", sep=" ")}

if (length(df[grepl(i, df$Ball) & df$size >=10 & df$size <100, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size >=10 & df$size <100, ]$Class <- paste("medium", i, "ball", sep=" ")}

if (length(df[grepl(i, df$Ball) & df$size >=100, ]$Class) > 0) {df[grepl(i, df$Ball) & df$size >=100, ]$Class <- paste("big", i, "ball", sep=" ")}

}

它适用于两种颜色和三种尺寸类别,但我的原始数据框要大得多。因此,(因为它看起来非常混乱),我的问题: 我该如何简化我的代码?

回答:

我们可以使用cut创建使用str_extract

library(stringr) 

df$Class <- with(df, paste(as.character(cut(size, breaks = c(1, 9, 99, Inf),

labels = c('small', 'medium', 'big'))), str_extract(Ball, 'red|blue'), 'ball'))

df$Class

#[1] "small red ball" "small red ball" "small blue ball"

#[4] "medium red ball" "medium blue ball" "big red ball"

回答:

基于“大小”与“球”的提取值paste它的分组似乎是一个很大的情况下使用dplyrstringr包:

library(stringr) 

library(dplyr)

df <- structure(list(Ball = structure(c(5L, 3L, 2L, 4L, 1L, 3L), .Label = c("blue", "blue is my favourite", "red", "red ", "red ball"), class = "factor"), size = c(1.2, 2, 3, 10, 12, 100)), .Names = c("Ball", "size"), class = "data.frame", row.names = c(NA, -6L))

df %>%

mutate(

color = str_extract(`Ball`, "(red)|(blue)"),

size_category = case_when(

size < 10 ~ "small",

size >= 10 & size < 100 ~ "medium",

size >= 100 ~ "large"

),

category = str_c(size_category, color, "ball", sep = " ")

)

回答:

这个答案是非常相似@ akrun的,但你可以包括多种颜色(在这里就是我使用colors()调色板,但ÿ你也可以使用其他的。我也稍微改变了cut函数的参数。

size<- cut(df$size, c(0, 10, 100, Inf), labels = c("small", "medium", "big"), right=F) 

colors<- str_extract(df$Ball, paste(colors(), collapse="|"))

df$Class<- paste(size, colors, "ball", sep = " ")

> df

Ball size Class

1 red ball 1.2 small red ball

2 red 2.0 small red ball

3 blue is my favourite 3.0 small blue ball

4 red 10.0 medium red ball

5 blue 12.0 medium blue ball

6 red 100.0 big red ball

此外,为了使它有点更一般的,你可以通过使用允许大写字母:

colors<- str_extract(df$Ball, regex(paste(colors(), collapse="|"), ignore_case=T)) 

所以,如果df$Ball[1] = "Red ball",使用线以上您将获得:

colors 

#[1] "Red" "red" "blue" "red" "blue" "red"

df$Class<- paste(size, tolower(colors), "ball", sep = " ")

df$Class

#[1] "small red ball" "small red ball" "small blue ball" "medium red ball" "medium blue ball"

#[6] "big red ball"

以上是 优化:值替换在数据帧wiith多个条件 的全部内容, 来源链接: utcz.com/qa/267058.html

回到顶部