R - 如何通过从列名称中提取项目名称来添加新列来创建新表格?

我有一个data.frame列名:R - 如何通过从列名称中提取项目名称来添加新列来创建新表格?

Machine1.workingTime, Machine2.workingTime, Machine3.workingTime, 

Machine1.producedItems, Machine2.producedItems, ...

这个框架可以通过传递时间与更多的机器展开。 我需要A R脚本,我一定要得到这个解决方案:

workingTime, producedItems, MachineNum 

凡MachineNum是列从那里我得到的数据(例如数量,如果我得到Machine2.workingTime列,并加入到。新创建的列“workingTime”“MachineNum”将是2

我必须遍历整个data.frame并将列合并到具有旧部分名称的旧名称(例如workingTime) 和从旧原始列名的第一部分过滤MachineNum。

我尝试并搜索了最近几个小时,但找不到任何解决方案。

回答:

我想(希望)这是关于你在找什么。我知道我的答案不是最简洁的,并期待看到其他更清晰的答案。

library(data.table) # for melt() and merge(), other package have similar func. 

# Dummy data.frame

df <- data.frame(date = Sys.Date() - 800:1,

matrix(sample(0:10000, 2400), ncol = 6))

colnames(df)[-1] <- paste0("m", 1:3, c(rep(".wt", 3), rep(".pi", 3)))

head(df)

date m1.wt m2.wt m3.wt m1.pi m2.pi m3.pi

1 2015-09-24 6271 2491 6525 6680 7708 2949

2 2015-09-25 1173 5794 5616 7402 3274 8997

3 2015-09-26 516 6659 2144 8739 7168 1704

4 2015-09-27 583 2499 4768 9501 2710 6800

5 2015-09-28 2433 8622 6492 7124 4127 233

6 2015-09-29 3409 662 6952 3824 5755 9479

# Now first take working time (filter using regex) to long form

df_wt <- melt(df[, c("date", grep("wt$", colnames(df), value = TRUE))],

id.vars = c("date"),

value.name = "workingTime",

variable.name = "MachineNum")

df_wt$MachineNum <- gsub("m([0-9]).+", "\\1", df_wt$MachineNum)

head(df_wt)

date MachineNum workingTime

1 2015-09-24 1 6271

2 2015-09-25 1 1173

3 2015-09-26 1 516

4 2015-09-27 1 583

5 2015-09-28 1 2433

6 2015-09-29 1 3409

# Same for produced item

df_pi <- melt(df[, c("date", grep("pi$", colnames(df), value = TRUE))],

id.vars = c("date"),

value.name = "producedItems",

variable.name = "MachineNum")

df_pi$MachineNum <- gsub("m([0-9]).+", "\\1", df_pi$MachineNum)

head(df_pi)

date MachineNum producedItems

1 2015-09-24 1 6680

2 2015-09-25 1 7402

3 2015-09-26 1 8739

4 2015-09-27 1 9501

5 2015-09-28 1 7124

6 2015-09-29 1 3824

# Now merge everything

df_long <- merge(df_wt, df_pi)

head(df_long)

date MachineNum workingTime producedItems

1 2015-09-24 1 6271 6680

2 2015-09-24 2 2491 7708

3 2015-09-24 3 6525 2949

4 2015-09-25 1 1173 7402

5 2015-09-25 2 5794 3274

6 2015-09-25 3 5616 8997

回答:

以下是使用reshape2库的方法。

machine1.workingTime <- 1:10 

machine2.workingTime <- 21:30

machine1.producedItems <- 101:110

machine2.producedItems <- 201:210

date <- c("2017-01-01","2017-01-02","2017-01-03","2017-01-04","2017-01-05","2017-01-06",

"2017-01-07","2017-01-08","2017-01-09","2017-01-10")

theData <- data.frame(date,

machine1.producedItems,

machine1.workingTime,

machine2.producedItems,

machine2.workingTime

)

library(reshape2)

meltedData <- melt(theData,measure.vars=2:5)

meltedData$variable <- as.character(meltedData$variable)

# now, extract machine numbers and variable names

variableNames <- strsplit(as.character(meltedData$variable),"[.]")

# token after the . is variable name

meltedData$columnName <- unlist(lapply(variableNames,function(x) x[2]))

# since all variables start with word 'machine' we can set chars 8+ as ID

meltedData$machineId <- as.numeric(unlist(lapply(variableNames,function(x) y <- substr(x[1],8,nchar(x[1])))))

theResult <- dcast(meltedData,machineId + date ~ columnName,value.var="value")

head(theResult)

的结果是:

> head(theResult) 

machineId date producedItems workingTime

1 1 2017-01-01 101 1

2 1 2017-01-02 102 2

3 1 2017-01-03 103 3

4 1 2017-01-04 104 4

5 1 2017-01-05 105 5

6 1 2017-01-06 106 6

>

UPDATE(02Dec2017):回应的意见,如果没有其它标识符来唯一区分的多个行对一台机器,一个可以使用的聚合功能导致每台机器观察一次。

theResult <- dcast(meltedData,machineId ~ columnName, 

fun.aggregate=mean,value.var="value")

head(theResult)

的结果如下。

> head(theResult) 

machineId producedItems workingTime

1 1 105.5 5.5

2 2 205.5 25.5

>

UPDATE(02Dec2017):回应的意见,即使用一个唯一的顺序号来区分数据的行的溶液看起来是这样。

machine1.workingTime <- 1:10 

machine2.workingTime <- 21:30

machine1.producedItems <- 101:110

machine2.producedItems <- 201:210

id <- 1:length(machine1.workingTime)

theData <- data.frame(id,

machine1.producedItems,

machine1.workingTime,

machine2.producedItems,

machine2.workingTime

)

meltedData <- melt(theData,measure.vars=2:5)

head(meltedData)

meltedData$variable <- as.character(meltedData$variable)

# now, extract machine numbers and variable names

variableNames <- strsplit(as.character(meltedData$variable),"[.]")

meltedData$columnName <- unlist(lapply(variableNames,function(x) x[2]))

meltedData$machineId <- as.numeric(unlist(lapply(variableNames,function(x) y <- substr(x[1],8,nchar(x[1])))))

theResult <- dcast(meltedData,machineId + id ~ columnName,value.var="value")

head(theResult)

...和输出。

head(theResult) 

machineId id producedItems workingTime

1 1 1 101 1

2 1 2 102 2

3 1 3 103 3

4 1 4 104 4

5 1 5 105 5

6 1 6 106 6

>

以上是 R - 如何通过从列名称中提取项目名称来添加新列来创建新表格? 的全部内容, 来源链接: utcz.com/qa/257316.html

回到顶部