通过参考类似的列名将多列与Tidyr的联合使用

library(tidyr) 

library(dplyr)

library(tidyverse)

下面是简单数据框的代码。我有一些混乱的数据,导出的列因子类别分布在不同的列中。通过参考类似的列名将多列与Tidyr的联合使用

Client<-c("Client1","Client2","Client3","Client4","Client5") 

Sex_M<-c("Male","NA","Male","NA","Male")

Sex_F<-c(" ","Female"," ","Female"," ")

Satisfaction_Satisfied<-c("Satisfied"," "," ","Satisfied","Satisfied")

Satisfaction_VerySatisfied<-c(" ","VerySatisfied","VerySatisfied"," "," ")

CommunicationType_Email<-c("Email"," "," ","Email","Email")

CommunicationType_Phone<-c(" ","Phone ","Phone "," "," ")

DF<-data_frame(Client,Sex_M,Sex_F,Satisfaction_Satisfied,Satisfaction_VerySatisfied,CommunicationType_Email,CommunicationType_Phone)

我想用tidyr的“团结”将这些类别重新组合成单​​列。

DF<-DF%>%unite(Sat,Satisfaction_Satisfied,Satisfaction_VerySatisfied,sep=" ")%>% 

unite(Sex,Sex_M,Sex_F,sep=" ")

不过,我必须写多个“团结”行,我觉得这违反了三次规则,所以必须有一种方法,使这更容易,尤其是因为我真正的数据包含几十个需要列合并。是否有一种方法可以使用“统一”一次,但不知何故指的是匹配列名,以便所有相似的列名(例如,包含“Sex”为“Sex_M”和“Sex_F”,以及“CommunicationType”为“CommunicationType_Email”和“CommunicationType_Phone”)与上面的公式结合?

我也在想一个允许我输入列名的函数,但这对我来说太难了,因为它涉及复杂的标准评估。

回答:

我们可以使用unite

library(tidyverse) 

DF %>%

unite(Sat, matches("^Sat"))


对于多个的情况下,也许

gather(DF, Var, Val, -Client, na.rm = TRUE) %>% 

separate(Var, into = c("Var1", "Var2")) %>%

group_by(Client, Var1) %>%

summarise(Val = paste(Val[!(is.na(Val)|Val=="")], collapse="_")) %>%

spread(Var1, Val)

# Client CommunicationType Satisfaction Sex

#* <chr> <chr> <chr> <chr>

#1 Client1 Email Satisfied Male

#2 Client2 Phone VerySatisfied Female

#3 Client3 Phone VerySatisfied Male

#4 Client4 Email Satisfied Female

#5 Client5 Email Satisfied Male

回答:

是这样的吗?如果你有很多列。

result<-with(new.env(),{ 

Client<-c("Client1","Client2","Client3","Client4","Client5")

Sex_M<-c("Male","NA","Male","NA","Male")

Sex_F<-c(" ","Female"," ","Female"," ")

Satisfaction_Satisfied<-c("Satisfied"," "," ","Satisfied","Satisfied")

Satisfaction_VerySatisfied<-c(" ","VerySatisfied","VerySatisfied"," "," ")

CommunicationType_Email<-c("Email"," "," ","Email","Email")

CommunicationType_Phone<-c(" ","Phone ","Phone "," "," ")

x<-ls()

categories<-unique(sub("(.*)_(.*)", "\\1", x))

df<-setNames(data.frame(lapply(x, function(y) get(y))), x)

for(nm in categories){

df<-unite_(df, nm, x[contains(vars = x, match = nm)])

}

return(df)

})

Client CommunicationType Satisfaction Sex

1 Client1 Email_ Satisfied_ _Male

2 Client2 _Phone _VerySatisfied Female_NA

3 Client3 _Phone _VerySatisfied _Male

4 Client4 Email_ Satisfied_ Female_NA

5 Client5 Email_ Satisfied_ _Male

以上是 通过参考类似的列名将多列与Tidyr的联合使用 的全部内容, 来源链接: utcz.com/qa/259976.html

回到顶部