蒙特卡罗模拟,引导和回归中的R

我一直在使用SAS很长一段时间,现在我想翻译我的代码在R.我需要帮助做到以下几点:蒙特卡罗模拟,引导和回归中的R

  1. 生成若干自举样品
  2. 运行在每个样品
  3. 存储在一个新的数据集的参数的线性回归模型的复制样品

我编辑的代码,更清晰。 我用了很多for循环,我知道并不总是推荐。这个过程非常缓慢

是否有代码/软件包(例如,应用家庭功能,“caret”软件包)可以使这个非常干净的高效/快,尤其是如samplesize * bootsample> 1000万

任何帮助将不胜感激。

samplesize <- 200 

bootsize<- 500

myseed <- 123

#generating a fake dataset

id=1:n

set.seed(myseed)

x <- rnorm(samplesize, 5, 5)

y <- rnorm(samplesize, 2 + 0.4*x, 0.5)

data <- data.frame(id, x, y)

head(data)

id x y

1 1 2.197622 3.978454

2 2 3.849113 4.195852

3 3 12.793542 6.984844

4 4 5.352542 4.412614

5 5 5.646439 4.051405

6 6 13.575325 7.192007

# generate bootstrap samples

bootstrap <- function(nbootsamples, data, seed) {

bootdata <- data.frame() #to initialize it

set.seed(seed)

for (i in 1:nbootsamples) {

replicate <- i

bootstrapIndex <- sample(1:nrow(data), replace = TRUE)

datatemp <- data[bootstrapIndex, ]

tempall <- cbind(replicate, datatemp)

bootdata <- rbind(bootdata, tempall)

}

return(bootdata)

}

bootdata <- bootstrap(nbootsamples=bootsize, data=data, seed=myseed)

bootdata <- dplyr::arrange(bootdata, replicate, id)

head(bootdata)

#The data should look like this

replicate id x y

1 1 1 2.197622 3.978454

2 1 3 12.793542 6.984844

3 1 5 5.646439 4.051405

4 1 9 1.565736 3.451748

5 1 10 2.771690 3.081662

6 1 10 2.771690 3.081662

#Model-fitting and saving coefficient and means

modelFitting <- function(y, x, data) {

modeltemp <- glm(y ~ x,

data = data,

family = gaussian('identity'))

Inty <- coef(modeltemp)["(Intercept)"]

betaX <- coef(modeltemp)["x"]

sdy <- sd(residuals.glm(modeltemp))

data.frame(Inty, betaX, sdy, row.names = NULL)

}

saveParameters <- function(nbootsamples, data, seed) {

parameters <- data.frame() #to initialize it

for (i in 1:length(unique(data$replicate))) {

replicate <- i

datai <- data[ which(data$replicate==i),]

datatemp <- modelFitting(y, x,data=datai)

meandata <- data.frame(Pr_X=mean(datai$x))

tempall <- cbind(replicate, datatemp, meandata)

parameters <- rbind(parameters, tempall)

}

return(parameters)

}

parameters <- saveParameters(nbootsamples=bootsize, data=bootdata, seed=myseed)

head(parameters)

#Ultimately all I want is my final dataset to look like the following

replicate Inty betaX sdy Pr_X

1 1 2.135529 0.3851757 0.5162728 4.995836

2 2 1.957152 0.4094682 0.5071635 4.835884

3 3 2.044257 0.3989742 0.4734178 5.111185

4 4 2.093452 0.3861861 0.4921470 4.741299

5 5 2.017825 0.4037699 0.5240363 4.931793

6 6 2.026952 0.3979731 0.4898346 5.502320

回答:

使用caret包轻松实现了重采样回归。根据您的示例数据,通过广义线性模型运行200个bootstrap样本的代码如下所示。从插入符号

library(caret) 

x = round(rnorm(200, 5, 5))

y= rnorm(200, 2 + 0.4*x, 0.5)

theData <- data.frame(id=1:200,x, y)

# configure caret training parameters to 200 bootstrap samples

fitControl <- trainControl(method = "boot",

number = 200)

fit <- train(y ~ x, method="glm",data=theData,

trControl = fitControl)

# print output object

fit

# print first 10 resamples

fit$resample[1:10,]

输出如下:有关如何使用插入符号,包括生成的模型对象的内容

> fit 

Generalized Linear Model

200 samples

1 predictor

No pre-processing

Resampling: Bootstrapped (200 reps)

Summary of sample sizes: 200, 200, 200, 200, 200, 200, ...

Resampling results:

RMSE Rsquared MAE

0.4739306 0.9438834 0.3772199

> fit$resample[1:10,]

RMSE Rsquared MAE Resample

1 0.5069606 0.9520896 0.3872257 Resample001

2 0.4636029 0.9460214 0.3711900 Resample002

3 0.4446103 0.9549866 0.3435148 Resample003

4 0.4464119 0.9443726 0.3636947 Resample004

5 0.5193685 0.9191259 0.4010104 Resample005

6 0.4995917 0.9451417 0.4044659 Resample006

7 0.4347831 0.9494606 0.3383224 Resample007

8 0.4725041 0.9483434 0.3716319 Resample008

9 0.5295650 0.9458453 0.4241543 Resample009

10 0.4796985 0.9514595 0.3927207 Resample010

>

细节(例如访问各个车型,因此您可以生成与predict()功能预测为模拟)可在caret GitHub site。

卡特也支持并行处理。有关如何使用脱字符并行处理的示例,请参阅Improving Performance of Random Forest with caret::train()。

此外,蒙特卡洛仿真中的R通过Monte Carlo包在R.

以上是 蒙特卡罗模拟,引导和回归中的R 的全部内容, 来源链接: utcz.com/qa/259444.html

回到顶部