在R中动态添加列

我需要一些帮助,以便找到一种很好的方式来动态添加列,其中包含需要从字符串中提取的不同类别的计数。在R中动态添加列

在我的数据中,我有一个包含类别名称和计数的列。这些字段可以为空或包含任何可以想到的类别的组合。下面是一些例子:

themes:firstcategory_1;secondcategory_33;thirdcategory_5 

themes:secondcategory_33;fourthcategory_2

themes:fifthcategory_1

我需要的是为每个类别的列(应该有类别的名称),并从上面的字符串中提取的计数。类别列表是动态的,所以我不知道哪些是存在的。

我该如何解决这个问题?

回答:

此代码将为每个类别获得一列,并为每一行计数。

library(dplyr) 

library(tidyr)

library(stringr)

# Create test dataframe

df <- data.frame(themes = c("firstcategory_1;secondcategory_33;thirdcategory_5", "secondcategory_33;fourthcategory_2","fifthcategory_1"), stringsAsFactors = FALSE)

# Get the number of columns to split values into

cols <- max(str_count(df$themes,";")) + 1

# Get vector of temporary column names

cols <- paste0("col",c(1:cols))

df <- df %>%

# Add an ID column based on row number

mutate(ID = row_number()) %>%

# Separate multiple categories by semicolon

separate(col = themes, into = cols, sep = ";", fill = "right") %>%

# Gather categories into a single column

gather_("Column", "Value", cols) %>%

# Drop temporary column

select(-Column) %>%

# Filter out NA values

filter(!is.na(Value)) %>%

# Separate categories from their counts by underscore

separate(col = Value, into = c("Category","Count"), sep = "_", fill = "right") %>%

# Spread categories to create a column for each category, with the count for each ID in that category

spread(Category, Count)

以上是 在R中动态添加列 的全部内容, 来源链接: utcz.com/qa/264595.html

回到顶部