R - 如何通过从列名称中提取项目名称来添加新列来创建新表格？

Z时代
2024-01-10
分类：问答

我有一个data.frame列名：R - 如何通过从列名称中提取项目名称来添加新列来创建新表格？

Machine1.workingTime, Machine2.workingTime, Machine3.workingTime, Machine1.producedItems, Machine2.producedItems, ...

这个框架可以通过传递时间与更多的机器展开。我需要A R脚本，我一定要得到这个解决方案：

workingTime, producedItems, MachineNum

凡MachineNum是列从那里我得到的数据（例如数量，如果我得到Machine2.workingTime列，并加入到。新创建的列“workingTime”“MachineNum”将是2

我必须遍历整个data.frame并将列合并到具有旧部分名称的旧名称（例如workingTime）和从旧原始列名的第一部分过滤MachineNum。

我尝试并搜索了最近几个小时，但找不到任何解决方案。

回答：

我想（希望）这是关于你在找什么。我知道我的答案不是最简洁的，并期待看到其他更清晰的答案。

library(data.table) # for melt() and merge(), other package have similar func. 
# Dummy data.frame 
df <- data.frame(date = Sys.Date() - 800:1, 
       matrix(sample(0:10000, 2400), ncol = 6)) 
colnames(df)[-1] <- paste0("m", 1:3, c(rep(".wt", 3), rep(".pi", 3))) 
head(df) 
     date m1.wt m2.wt m3.wt m1.pi m2.pi m3.pi 
1 2015-09-24 6271 2491 6525 6680 7708 2949 
2 2015-09-25 1173 5794 5616 7402 3274 8997 
3 2015-09-26 516 6659 2144 8739 7168 1704 
4 2015-09-27 583 2499 4768 9501 2710 6800 
5 2015-09-28 2433 8622 6492 7124 4127 233 
6 2015-09-29 3409 662 6952 3824 5755 9479 
# Now first take working time (filter using regex) to long form 
df_wt <- melt(df[, c("date", grep("wt$", colnames(df), value = TRUE))], 
       id.vars = c("date"), 
       value.name = "workingTime", 
       variable.name = "MachineNum") 
df_wt$MachineNum <- gsub("m([0-9]).+", "\\1", df_wt$MachineNum) 
head(df_wt) 
     date MachineNum workingTime 
1 2015-09-24   1  6271 
2 2015-09-25   1  1173 
3 2015-09-26   1   516 
4 2015-09-27   1   583 
5 2015-09-28   1  2433 
6 2015-09-29   1  3409 
# Same for produced item 
df_pi <- melt(df[, c("date", grep("pi$", colnames(df), value = TRUE))], 
       id.vars = c("date"), 
       value.name = "producedItems", 
       variable.name = "MachineNum") 
df_pi$MachineNum <- gsub("m([0-9]).+", "\\1", df_pi$MachineNum) 
head(df_pi) 
     date MachineNum producedItems 
1 2015-09-24   1   6680 
2 2015-09-25   1   7402 
3 2015-09-26   1   8739 
4 2015-09-27   1   9501 
5 2015-09-28   1   7124 
6 2015-09-29   1   3824 
# Now merge everything 
df_long <- merge(df_wt, df_pi) 
head(df_long) 
     date MachineNum workingTime producedItems 
1 2015-09-24   1  6271   6680 
2 2015-09-24   2  2491   7708 
3 2015-09-24   3  6525   2949 
4 2015-09-25   1  1173   7402 
5 2015-09-25   2  5794   3274 
6 2015-09-25   3  5616   8997

回答：

以下是使用reshape2库的方法。

machine1.workingTime <- 1:10 
machine2.workingTime <- 21:30 
machine1.producedItems <- 101:110 
machine2.producedItems <- 201:210 
date <- c("2017-01-01","2017-01-02","2017-01-03","2017-01-04","2017-01-05","2017-01-06", 
      "2017-01-07","2017-01-08","2017-01-09","2017-01-10") 
    theData <- data.frame(date, 
         machine1.producedItems, 
         machine1.workingTime, 
         machine2.producedItems, 
         machine2.workingTime 
        ) 
library(reshape2) 
meltedData <- melt(theData,measure.vars=2:5) 
meltedData$variable <- as.character(meltedData$variable) 
# now, extract machine numbers and variable names 
variableNames <- strsplit(as.character(meltedData$variable),"[.]") 
# token after the . is variable name 
meltedData$columnName <- unlist(lapply(variableNames,function(x) x[2])) 
# since all variables start with word 'machine' we can set chars 8+ as ID 
meltedData$machineId <- as.numeric(unlist(lapply(variableNames,function(x) y <- substr(x[1],8,nchar(x[1]))))) 
theResult <- dcast(meltedData,machineId + date ~ columnName,value.var="value") 
head(theResult)

的结果是：

> head(theResult) machineId date producedItems workingTime 1 1 2017-01-01 101 1 2 1 2017-01-02 102 2 3 1 2017-01-03 103 3 4 1 2017-01-04 104 4 5 1 2017-01-05 105 5 6 1 2017-01-06 106 6 >

UPDATE（02Dec2017）：回应的意见，如果没有其它标识符来唯一区分的多个行对一台机器，一个可以使用的聚合功能导致每台机器观察一次。

theResult <- dcast(meltedData,machineId ~ columnName, 
        fun.aggregate=mean,value.var="value") 
head(theResult)

的结果如下。

> head(theResult) 
    machineId producedItems workingTime 
1   1   105.5   5.5 
2   2   205.5  25.5 
>

UPDATE（02Dec2017）：回应的意见，即使用一个唯一的顺序号来区分数据的行的溶液看起来是这样。

machine1.workingTime <- 1:10 
machine2.workingTime <- 21:30 
machine1.producedItems <- 101:110 
machine2.producedItems <- 201:210 
id <- 1:length(machine1.workingTime) 
theData <- data.frame(id, 
         machine1.producedItems, 
         machine1.workingTime, 
         machine2.producedItems, 
         machine2.workingTime 
) 
meltedData <- melt(theData,measure.vars=2:5) 
head(meltedData) 
meltedData$variable <- as.character(meltedData$variable) 
# now, extract machine numbers and variable names 
variableNames <- strsplit(as.character(meltedData$variable),"[.]") 
meltedData$columnName <- unlist(lapply(variableNames,function(x) x[2])) 
meltedData$machineId <- as.numeric(unlist(lapply(variableNames,function(x) y <- substr(x[1],8,nchar(x[1]))))) 
theResult <- dcast(meltedData,machineId + id ~ columnName,value.var="value") 
head(theResult)

...和输出。

head(theResult) machineId id producedItems workingTime 1 1 1 101 1 2 1 2 102 2 3 1 3 103 3 4 1 4 104 4 5 1 5 105 5 6 1 6 106 6 >

以上是 R - 如何通过从列名称中提取项目名称来添加新列来创建新表格？的全部内容，来源链接： utcz.com/qa/257316.html

R - 如何通过从列名称中提取项目名称来添加新列来创建新表格？

回答：

回答：

其他人也看了：