如何通过引用来变换按位置索引的数据表列?
我有一个data.table
,它包含几列factor
s。我想将最初读为factor
的2列转换为其原始数值。以下是我已经试过:如何通过引用来变换按位置索引的数据表列?
data[, c(4,5):=c(as.numeric(as.character(4)), as.numeric(as.character(5))), with=FALSE]
这给了我以下警告:
Warning messages: 1: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Supplied 2 items to be assigned to 7 items of column 'Bentley (R)' (recycled leaving remainder of 1 items).
2: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Supplied 2 items to be assigned to 7 items of column 'Sparks (D)' (recycled leaving remainder of 1 items).
3: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
4: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
而且我可以告诉转换在这之后是factor
s已没有成功,因为第4和第5列坚持代码已经运行。
作为替代,我想这个代码,这将不会运行在所有:
data[, ':=' (4=c(as.numeric(as.character(4)), 5 = as.numeric(as.character(5)))), with=FALSE]
最后,我试图通过colnames
引用的列名:
data[ , (colnames(data)[4]) := as.numeric(as.character(colnames(data)[4]))]
此运行,但结果连续出现NA
s以及以下错误:
Warning messages: 1: In eval(expr, envir, enclos) : NAs introduced by coercion
2: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
3: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) :
RHS contains -2147483648 which is outside the levels range ([1,6]) of column 1, NAs generated
我需要按位置而不是按列名进行此操作,因为列名取决于URL。使用data.table
按位置转换列的正确方法是什么?
我也有一个相关的查询,它是如何相对于其他编号列转换编号列。例如,如果我想将第三列设置为等于45减去第三列的值加上第四列的值,我该怎么做?有什么方法可以区分真正的#号和列号吗?我知道这样的事情不是要走的路:
dt[ , .(4) = 45 - .(3) + .(4), with = FALSE]
那么这怎么办呢?
回答:
如果要按引用和位置进行分配,则需要将列名称分配为字符向量或列号作为整数向量,并使用.SDcols
(至少在data.table 1.9.4中) 。
首先重复的例子:
library(data.table) DT <- data.table(iris)
DT[, c("Sepal.Length", "Petal.Length") := list(factor(Sepal.Length), factor(Petal.Length))]
str(DT)
现在让我们转换列:
DT[, names(DT)[c(1, 3)] := lapply(.SD, function(x) as.numeric(as.character(x))), .SDcols = c(1, 3)]
str(DT)
或者:
DT[, c(1,3) := lapply(.SD, function(x) as.numeric(as.character(x))), .SDcols=c(1,3)] str(DT)
注意:=
预计列名或位置的矢量左侧和右侧的列表。
以上是 如何通过引用来变换按位置索引的数据表列? 的全部内容, 来源链接: utcz.com/qa/261634.html