R - 如何通过从列名称中提取项目名称来添加新列来创建新表格?
我有一个data.frame
列名:R - 如何通过从列名称中提取项目名称来添加新列来创建新表格?
Machine1.workingTime, Machine2.workingTime, Machine3.workingTime, Machine1.producedItems, Machine2.producedItems, ...
这个框架可以通过传递时间与更多的机器展开。 我需要A R脚本,我一定要得到这个解决方案:
workingTime, producedItems, MachineNum
凡MachineNum是列从那里我得到的数据(例如数量,如果我得到Machine2.workingTime列,并加入到。新创建的列“workingTime”“MachineNum”将是2
我必须遍历整个data.frame
并将列合并到具有旧部分名称的旧名称(例如workingTime) 和从旧原始列名的第一部分过滤MachineNum。
我尝试并搜索了最近几个小时,但找不到任何解决方案。
回答:
我想(希望)这是关于你在找什么。我知道我的答案不是最简洁的,并期待看到其他更清晰的答案。
library(data.table) # for melt() and merge(), other package have similar func. # Dummy data.frame
df <- data.frame(date = Sys.Date() - 800:1,
matrix(sample(0:10000, 2400), ncol = 6))
colnames(df)[-1] <- paste0("m", 1:3, c(rep(".wt", 3), rep(".pi", 3)))
head(df)
date m1.wt m2.wt m3.wt m1.pi m2.pi m3.pi
1 2015-09-24 6271 2491 6525 6680 7708 2949
2 2015-09-25 1173 5794 5616 7402 3274 8997
3 2015-09-26 516 6659 2144 8739 7168 1704
4 2015-09-27 583 2499 4768 9501 2710 6800
5 2015-09-28 2433 8622 6492 7124 4127 233
6 2015-09-29 3409 662 6952 3824 5755 9479
# Now first take working time (filter using regex) to long form
df_wt <- melt(df[, c("date", grep("wt$", colnames(df), value = TRUE))],
id.vars = c("date"),
value.name = "workingTime",
variable.name = "MachineNum")
df_wt$MachineNum <- gsub("m([0-9]).+", "\\1", df_wt$MachineNum)
head(df_wt)
date MachineNum workingTime
1 2015-09-24 1 6271
2 2015-09-25 1 1173
3 2015-09-26 1 516
4 2015-09-27 1 583
5 2015-09-28 1 2433
6 2015-09-29 1 3409
# Same for produced item
df_pi <- melt(df[, c("date", grep("pi$", colnames(df), value = TRUE))],
id.vars = c("date"),
value.name = "producedItems",
variable.name = "MachineNum")
df_pi$MachineNum <- gsub("m([0-9]).+", "\\1", df_pi$MachineNum)
head(df_pi)
date MachineNum producedItems
1 2015-09-24 1 6680
2 2015-09-25 1 7402
3 2015-09-26 1 8739
4 2015-09-27 1 9501
5 2015-09-28 1 7124
6 2015-09-29 1 3824
# Now merge everything
df_long <- merge(df_wt, df_pi)
head(df_long)
date MachineNum workingTime producedItems
1 2015-09-24 1 6271 6680
2 2015-09-24 2 2491 7708
3 2015-09-24 3 6525 2949
4 2015-09-25 1 1173 7402
5 2015-09-25 2 5794 3274
6 2015-09-25 3 5616 8997
回答:
以下是使用reshape2
库的方法。
machine1.workingTime <- 1:10 machine2.workingTime <- 21:30
machine1.producedItems <- 101:110
machine2.producedItems <- 201:210
date <- c("2017-01-01","2017-01-02","2017-01-03","2017-01-04","2017-01-05","2017-01-06",
"2017-01-07","2017-01-08","2017-01-09","2017-01-10")
theData <- data.frame(date,
machine1.producedItems,
machine1.workingTime,
machine2.producedItems,
machine2.workingTime
)
library(reshape2)
meltedData <- melt(theData,measure.vars=2:5)
meltedData$variable <- as.character(meltedData$variable)
# now, extract machine numbers and variable names
variableNames <- strsplit(as.character(meltedData$variable),"[.]")
# token after the . is variable name
meltedData$columnName <- unlist(lapply(variableNames,function(x) x[2]))
# since all variables start with word 'machine' we can set chars 8+ as ID
meltedData$machineId <- as.numeric(unlist(lapply(variableNames,function(x) y <- substr(x[1],8,nchar(x[1])))))
theResult <- dcast(meltedData,machineId + date ~ columnName,value.var="value")
head(theResult)
的结果是:
> head(theResult) machineId date producedItems workingTime
1 1 2017-01-01 101 1
2 1 2017-01-02 102 2
3 1 2017-01-03 103 3
4 1 2017-01-04 104 4
5 1 2017-01-05 105 5
6 1 2017-01-06 106 6
>
UPDATE(02Dec2017):回应的意见,如果没有其它标识符来唯一区分的多个行对一台机器,一个可以使用的聚合功能导致每台机器观察一次。
theResult <- dcast(meltedData,machineId ~ columnName, fun.aggregate=mean,value.var="value")
head(theResult)
的结果如下。
> head(theResult) machineId producedItems workingTime
1 1 105.5 5.5
2 2 205.5 25.5
>
UPDATE(02Dec2017):回应的意见,即使用一个唯一的顺序号来区分数据的行的溶液看起来是这样。
machine1.workingTime <- 1:10 machine2.workingTime <- 21:30
machine1.producedItems <- 101:110
machine2.producedItems <- 201:210
id <- 1:length(machine1.workingTime)
theData <- data.frame(id,
machine1.producedItems,
machine1.workingTime,
machine2.producedItems,
machine2.workingTime
)
meltedData <- melt(theData,measure.vars=2:5)
head(meltedData)
meltedData$variable <- as.character(meltedData$variable)
# now, extract machine numbers and variable names
variableNames <- strsplit(as.character(meltedData$variable),"[.]")
meltedData$columnName <- unlist(lapply(variableNames,function(x) x[2]))
meltedData$machineId <- as.numeric(unlist(lapply(variableNames,function(x) y <- substr(x[1],8,nchar(x[1])))))
theResult <- dcast(meltedData,machineId + id ~ columnName,value.var="value")
head(theResult)
...和输出。
head(theResult) machineId id producedItems workingTime
1 1 1 101 1
2 1 2 102 2
3 1 3 103 3
4 1 4 104 4
5 1 5 105 5
6 1 6 106 6
>
以上是 R - 如何通过从列名称中提取项目名称来添加新列来创建新表格? 的全部内容, 来源链接: utcz.com/qa/257316.html