This is an example taken from the tutorial article: “Conducting Simulation Studies in the R Programming Environment” in Tutorials in Quantitative Methods for Psychology (2013, Vol. 9(2), p. 43-60) by Kevin A. Hallgren. While the majority of the article seems to be good at explaining the mechanics of performing Monte Carlo simulations in R, the implementation unfortunately used the for loop strategy, and demonstrates very nicely why for loops make coding simulations very difficult to read, modify, and debug.
The following is the code from Appendix A regarding how Sobel’s test for mediation performs. However, to save space the comments have been removed. As well, the original code itself actually contained typographical errors (a phenomenon Sigal and Chalmers, 2016, argued happens quite often with the for-loop strategy), which I have commented on below. For further details about the simulation, see the original article.
Simulation Using For-Loops
N_list =c(100, 300)a_list =c(-.3, 0, .3)b_list =c(-.3, 0, .3)cp_list =c(-.2, 0, .2) reps =1000set.seed(192)sobel_test <-function(X, M, Y){ M_X =lm(M ~ X) Y_XM =lm(Y ~ X + M) a =coefficients(M_X)[2] b =coefficients(Y_XM)[3] stdera =summary(M_X)$coefficients[2,2] stderb =summary(Y_XM)$coefficients[3,2] sobelz = a*b /sqrt(b^2* stdera^2+ a^2* stderb^2) return(sobelz)}# the following line was added to determine the computation timetime0 <-Sys.time()d =NULLfor (N in N_list){for (a in a_list){ #this line was fixed; originally was "for (a in 1:a_list){"for (b in b_list){for (cp in cp_list){for (i in1:reps){ X =rnorm(N, 0, 1) M = a*X +rnorm(N, 0, 1) Y = cp*X + b*M +rnorm(N, 0, 1) d =rbind(d, c(i, a, b, cp, N, 1,sobel_test(X, M, Y))) d =rbind(d, c(i, a, b, cp, N, 2,sobel_test(X, Y, M))) } } } }}colnames(d) =c("iteration", "a", "b", "cp", "N", "model", "Sobel_z")d =data.frame(d)# the following line was added to determine the computation timetime1 <-Sys.time()time1 - time0
Compare the previous for-loop approach with SimDesign. Personally, I find it more telling what the \(p\)-values are doing instead of how the \(z\)-values are behaving, so the Sobel test function was modified to return both the \(z\) and \(p\)-values.
# Generate then edit R template file 'Hallgren2013.R'# SimDesign::SimFunctions('Hallgren2013')# function returns large-sample p-values and other informationsobel_test <-function(X, M, Y){ M_X <-lm(M ~ X) Y_XM <-lm(Y ~ X + M) a <-coefficients(M_X)[2] # extract from numeric vector b <-coefficients(Y_XM)[3] stdera <-summary(M_X)$coefficients[2,2] # extract from list first then matrix stderb <-summary(Y_XM)$coefficients[3,2] sobelz <- a*b /sqrt(b^2* stdera^2+ a^2* stderb^2) sobelp <-pnorm(abs(sobelz), lower.tail =FALSE)*2 ret <-data.frame(a=a, SE_a=stdera, b=b, SE_b=stderb, z=sobelz, p=sobelp) ret}#-------------------------------------------------------------------library(SimDesign)# fully-crossed simulation experimentDesign <-createDesign(N =c(100, 300),a =c(-.3, 0, .3),b =c(-.3, 0, .3),cp =c(-.2, 0, .2))#-------------------------------------------------------------------Generate <-function(condition, fixed_objects =NULL) {Attach(condition) # make N, a, b, and cp accessable X <-rnorm(N) M <- a*X +rnorm(N) Y <- cp*X + b*M +rnorm(N) dat <-data.frame(X=X, M=M, Y=Y) dat }Analyse <-function(condition, dat, fixed_objects =NULL) {Attach(dat) # make objects X, M, and Y directly accessible sobel <-sobel_test(X=X, M=M, Y=Y)$p sobel_incorrect <-sobel_test(X=X, M=Y, Y=M)$p ret <-c(sobel=sobel, sobel_incorrect=sobel_incorrect) ret # named vector of p-values}Summarise <-function(condition, results, fixed_objects =NULL) { ret <-EDR(results, alpha = .05) # results object is a 'data.frame' ret # empirical detection rate returned}#-------------------------------------------------------------------res <-runSimulation(design=Design, replications=1000, generate=Generate, analyse=Analyse, summarise=Summarise)
To see all the analysis results from Design[1,], for example, inspect the object returned from SimResults().
row1 <-SimResults(res, 1)
Comparison
There are a few interesting properties to note about this comparison example. First, SimDesign is faster than the for-loop strategy, despite the fact it is technically doing more computations in the new Sobel test function and summarise steps. This is without running the SimDesign simulation code in parallel too, which would decrease the simulations times by a factor proportional to the number of cores available (and, unlike the for-loop code, only requires adding a parallel = TRUE flag to runSimulation()). Furthermore, the current state of the for loop code unfortunately cannot be used with parallel computations without a large amount of re-writing, because the object d is updated across each replications (therefore, the objects between each replication condition are not strictly independent).
Finally, the code itself is much less error prone and is much safer in case errors/warnings arise, the generation/analysis components are isolated better, the output is saved to external .rds files in order to inspect the results at a later time (thereby being more memory efficient), the code-base and results are generally more readable, …, the list goes on. All this despite the fact that the working code is nearly identical.