在R中动态添加列
我需要一些帮助,以便找到一种很好的方式来动态添加列,其中包含需要从字符串中提取的不同类别的计数。在R中动态添加列
在我的数据中,我有一个包含类别名称和计数的列。这些字段可以为空或包含任何可以想到的类别的组合。下面是一些例子:
themes:firstcategory_1;secondcategory_33;thirdcategory_5 themes:secondcategory_33;fourthcategory_2
themes:fifthcategory_1
我需要的是为每个类别的列(应该有类别的名称),并从上面的字符串中提取的计数。类别列表是动态的,所以我不知道哪些是存在的。
我该如何解决这个问题?
回答:
此代码将为每个类别获得一列,并为每一行计数。
library(dplyr) library(tidyr)
library(stringr)
# Create test dataframe
df <- data.frame(themes = c("firstcategory_1;secondcategory_33;thirdcategory_5", "secondcategory_33;fourthcategory_2","fifthcategory_1"), stringsAsFactors = FALSE)
# Get the number of columns to split values into
cols <- max(str_count(df$themes,";")) + 1
# Get vector of temporary column names
cols <- paste0("col",c(1:cols))
df <- df %>%
# Add an ID column based on row number
mutate(ID = row_number()) %>%
# Separate multiple categories by semicolon
separate(col = themes, into = cols, sep = ";", fill = "right") %>%
# Gather categories into a single column
gather_("Column", "Value", cols) %>%
# Drop temporary column
select(-Column) %>%
# Filter out NA values
filter(!is.na(Value)) %>%
# Separate categories from their counts by underscore
separate(col = Value, into = c("Category","Count"), sep = "_", fill = "right") %>%
# Spread categories to create a column for each category, with the count for each ID in that category
spread(Category, Count)
以上是 在R中动态添加列 的全部内容, 来源链接: utcz.com/qa/264595.html