# if you haven't done so, install the cmdstanr package using the following:
# install.packages("cmdstanr",
# repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
#
# and follow all the necessary setup instructions to get the package working
Pre-compiled Binary Files in Simulation Studies
Reusing pre-compiled binaries in simulation studies
The following example demonstrate the logic of how to reuse pre-compiled binary files in the context of Monte Carlo simulations. The goal in this case is to:
- Pre-compile the necessary binary files before running
runSimulation()
, - Include tractable extraction information in the
Design
/fixed_objects
inputs to point to and extract the necessary binary object associated with the respective simulation conditions under investigation, respectively, and - Sit around for a while, as no doubt the use of pre-compiled binaries suggests the computational problem is already intensive!
The following code example is an adaption of Mark Lai’s example using cmdstanr
alongside SimDesign
. See https://mc-stan.org/cmdstanr/articles/cmdstanr.html for further information on using the cmdstanr
package.
Example using Stan
Step 1: Define and compile binary files
The first step in the process would be to generate all the necessary Stan objects and store them into a meaningfully named list. Below two objects are generated to perform the same estimation of a population mean, however the first uses a weak normal prior (\(\mu \sim N(0, 10)\)) while the latter uses a much stronger prior (\(\mu \sim N(0,2\))).
library(cmdstanr)
<- "
bmean_stan_weak data {
int<lower=0> N;
array[N] real x;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
target += normal_lpdf(mu | 0, 10); // weakly informative prior
target += normal_lpdf(x | mu, sigma);
}
"
<- "
bmean_stan_strong data {
int<lower=0> N;
array[N] real x;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
target += normal_lpdf(mu | 0, 2); // more informative prior
target += normal_lpdf(x | mu, sigma);
}
"
# Save files and compile
<- write_stan_file(bmean_stan_weak)
stan_path_weak <- write_stan_file(bmean_stan_strong)
stan_path_strong
# compile and store within fixed objects for use in runSimulation()
<- list(weak=cmdstan_model(stan_path_weak),
fo strong=cmdstan_model(stan_path_strong))
In file included from stan/src/stan/model/model_header.hpp:5:
stan/src/stan/model/model_base_crtp.hpp:159:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(std::vector<stan::math::var_value<double>, std::allocator<stan::math::var_value<double> > >&, std::vector<int>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
159 | inline math::var log_prob(std::vector<math::var>& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note: by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:154:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(std::vector<double>&, std::vector<int>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
154 | inline double log_prob(std::vector<double>& theta, std::vector<int>& theta_i,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note: by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:96:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(Eigen::Matrix<stan::math::var_value<double>, -1, 1>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
96 | inline math::var log_prob(Eigen::Matrix<math::var, -1, 1>& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note: by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:91:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(Eigen::VectorXd&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
91 | inline double log_prob(Eige
n::VectorXd& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note: by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
In file included from stan/src/stan/model/model_header.hpp:5:
stan/src/stan/model/model_base_crtp.hpp:159:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(std::vector<stan::math::var_value<double>, std::allocator<stan::math::var_value<double> > >&, std::vector<int>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
159 | inline math::var log_prob(std::vector<math::var>& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note: by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:154:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(std::vector<double>&, std::vector<int>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
154 | inline double log_prob(std::vector<double>& theta, std::vector<int>& theta_i,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note: by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:96:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(Eigen::Matrix<stan::math::var_value<double>, -1, 1>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
96 | inline math::var log_prob(Eigen::Matrix<math::var, -1, 1>& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note: by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
stan/src/stan/model/model_base_crtp.hpp:91:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(Eigen::VectorXd&, std::ostrea
m*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
91 | inline double log_prob(Eigen::VectorXd& theta,
| ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note: by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
350 | log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
|
Step 2: Define suitable simulation flow
Elements in the above fo
object (passed to runSimulation()
via the fixed_objects
input) are correctly pointed to when needed.
# function to run MCMC and extract associated mean estimate
<- function(mod, dat){
stan_fit
# Stan is noisy, so tell it to be more quiet()
<- quiet(mod$sample(list(x = dat, N = length(dat)),
mcmc refresh = 0, chains = 1,
show_messages = FALSE))
if (mcmc$summary("mu")[1, "rhat"] > 1.01) {
# Gives an error when the convergence criterion is not met
stop("MCMC did not converge.")
}<- mcmc$summary("mu", mean)$mean[1]
MB
MB }
# SimDesign::SimFunctions()
library(SimDesign)
<- createDesign(sample_size = c(30, 60, 120, 240),
Design distribution = c('norm', 'chi'))
Design
# A tibble: 8 × 2
sample_size distribution
<dbl> <chr>
1 30 norm
2 60 norm
3 120 norm
4 240 norm
5 30 chi
6 60 chi
7 120 chi
8 240 chi
<- function(condition, fixed_objects = NULL) {
Generate Attach(condition)
if(distribution == 'norm'){
<- rnorm(sample_size, mean = 3)
dat else if(distribution == 'chi'){
} <- rchisq(sample_size, df = 3)
dat
}
dat
}
<- function(condition, dat, fixed_objects = NULL) {
Analyse Attach(condition)
<- mean(dat)
M0 <- mean(dat, trim = .1)
M1 <- mean(dat, trim = .2)
M2 <- median(dat)
med
# extract suitable pre-compiled Stan object
<- stan_fit(fixed_objects$weak, dat)
MB_weak <- stan_fit(fixed_objects$strong, dat)
MB_strong
<- c(mean_no_trim = M0, mean_trim.1 = M1,
ret mean_trim.2 = M2, median = med,
mean_weak_prior = MB_weak,
mean_strong_prior = MB_strong)
ret
}
<- function(condition, results, fixed_objects = NULL) {
Summarise <- bias(results, parameter = 3)
obs_bias <- RMSE(results, parameter = 3)
obs_RMSE <- c(bias = obs_bias, RMSE = obs_RMSE, RE = RE(obs_RMSE))
ret
ret }
Step 3: Run the simulation in parallel
<- runSimulation(Design, replications = 1000, generate = Generate,
res analyse = Analyse, summarise = Summarise,
fixed_objects = fo, parallel=TRUE,
packages = "cmdstanr")
Number of parallel clusters in use: 27
Design: 1/8; Replications: 1000; RAM Used: 65.8 Mb; Total Time: 0.00s
Conditions: sample_size=30, distribution=norm
Design: 2/8; Replications: 1000; RAM Used: 67.3 Mb; Total Time: 01m 52.66s
Conditions: sample_size=60, distribution=norm
Design: 3/8; Replications: 1000; RAM Used: 67.3 Mb; Total Time: 03m 45.57s
Conditions: sample_size=120, distribution=norm
Design: 4/8; Replications: 1000; RAM Used: 67.4 Mb; Total Time: 05m 34.54s
Conditions: sample_size=240, distribution=norm
Design: 5/8; Replications: 1000; RAM Used: 67.4 Mb; Total Time: 07m 27.41s
Conditions: sample_size=30, distribution=chi
Design: 6/8; Replications: 1000; RAM Used: 67.5 Mb; Total Time: 09m 19.83s
Conditions: sample_size=60, distribution=chi
Design: 7/8; Replications: 1000; RAM Used: 67.5 Mb; Total Time: 11m 13.40s
Conditions: sample_size=120, distribution=chi
Design: 8/8; Replications: 1000; RAM Used: 67.6 Mb; Total Time: 13m 9.46s
Conditions: sample_size=240, distribution=chi
Simulation complete. Total execution time: 15m 6.33s
Warning in system2("quarto", "-V", stdout = TRUE, env = paste0("TMPDIR=", :
running command '"quarto"
TMPDIR=C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/file1154453c42e6 -V' had
status 1
res
# A tibble: 8 × 26
sample_size distribution bias.mean_no_trim bias.mean_trim.1 bias.mean_trim.2
<dbl> <chr> <dbl> <dbl> <dbl>
1 30 norm -0.012206 -0.012205 -0.010733
2 60 norm -0.00066471 -0.00058635 -0.00049449
3 120 norm -0.00045496 -0.0013078 -0.0018907
4 240 norm -0.0033323 -0.0030002 -0.0034330
5 30 chi 0.0084255 -0.30383 -0.44208
6 60 chi -0.0032308 -0.33978 -0.48129
7 120 chi -0.0061516 -0.35092 -0.49200
8 240 chi -0.0055454 -0.35250 -0.49620
# ℹ 21 more variables: bias.median <dbl>, bias.mean_weak_prior <dbl>,
# bias.mean_strong_prior <dbl>, RMSE.mean_no_trim <dbl>,
# RMSE.mean_trim.1 <dbl>, RMSE.mean_trim.2 <dbl>, RMSE.median <dbl>,
# RMSE.mean_weak_prior <dbl>, RMSE.mean_strong_prior <dbl>,
# RE.mean_no_trim <dbl>, RE.mean_trim.1 <dbl>, RE.mean_trim.2 <dbl>,
# RE.median <dbl>, RE.mean_weak_prior <dbl>, RE.mean_strong_prior <dbl>,
# REPLICATIONS <dbl>, SIM_TIME <chr>, RAM_USED <chr>, SEED <int>, …
# render output better
::kable(res) knitr
sample_size | distribution | bias.mean_no_trim | bias.mean_trim.1 | bias.mean_trim.2 | bias.median | bias.mean_weak_prior | bias.mean_strong_prior | RMSE.mean_no_trim | RMSE.mean_trim.1 | RMSE.mean_trim.2 | RMSE.median | RMSE.mean_weak_prior | RMSE.mean_strong_prior | RE.mean_no_trim | RE.mean_trim.1 | RE.mean_trim.2 | RE.median | RE.mean_weak_prior | RE.mean_strong_prior | REPLICATIONS | SIM_TIME | RAM_USED | SEED | COMPLETED | ERRORS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
30 | norm | -0.012 | -0.012 | -0.011 | -0.013 | -0.014 | -0.040 | 0.177 | 0.182 | 0.189 | 0.218 | 0.177 | 0.180 | 1 | 1.05 | 1.13 | 1.52 | 1.001 | 1.031 | 1000 | 113 | 67.3 Mb | 1.94e+09 | Thu Apr 17 16:24:52 2025 | 54 |
60 | norm | -0.001 | -0.001 | 0.000 | -0.001 | -0.001 | -0.014 | 0.127 | 0.131 | 0.136 | 0.160 | 0.127 | 0.128 | 1 | 1.05 | 1.14 | 1.57 | 0.999 | 1.005 | 1000 | 113 | 67.3 Mb | 1.34e+09 | Thu Apr 17 16:26:45 2025 | 45 |
120 | norm | 0.000 | -0.001 | -0.002 | -0.002 | -0.001 | -0.007 | 0.095 | 0.097 | 0.101 | 0.116 | 0.095 | 0.095 | 1 | 1.05 | 1.13 | 1.50 | 1.000 | 1.002 | 1000 | 109 | 67.4 Mb | 6.70e+08 | Thu Apr 17 16:28:34 2025 | 43 |
240 | norm | -0.003 | -0.003 | -0.003 | -0.005 | -0.003 | -0.006 | 0.068 | 0.069 | 0.071 | 0.081 | 0.068 | 0.068 | 1 | 1.03 | 1.09 | 1.42 | 1.000 | 1.006 | 1000 | 113 | 67.4 Mb | 1.23e+09 | Thu Apr 17 16:30:27 2025 | 38 |
30 | chi | 0.008 | -0.304 | -0.442 | -0.587 | 0.002 | -0.152 | 0.433 | 0.516 | 0.612 | 0.746 | 0.430 | 0.401 | 1 | 1.42 | 2.00 | 2.97 | 0.987 | 0.860 | 1000 | 112 | 67.5 Mb | 1.52e+09 | Thu Apr 17 16:32:20 2025 | 42 |
60 | chi | -0.003 | -0.340 | -0.481 | -0.617 | -0.007 | -0.082 | 0.316 | 0.452 | 0.567 | 0.707 | 0.315 | 0.304 | 1 | 2.05 | 3.23 | 5.01 | 0.992 | 0.925 | 1000 | 114 | 67.5 Mb | 1.13e+09 | Thu Apr 17 16:34:13 2025 | 36 |
120 | chi | -0.006 | -0.351 | -0.492 | -0.628 | -0.008 | -0.044 | 0.219 | 0.406 | 0.533 | 0.670 | 0.219 | 0.215 | 1 | 3.45 | 5.95 | 9.39 | 1.004 | 0.965 | 1000 | 116 | 67.6 Mb | 1.89e+09 | Thu Apr 17 16:36:09 2025 | 42 |
240 | chi | -0.006 | -0.353 | -0.496 | -0.638 | -0.006 | -0.024 | 0.152 | 0.381 | 0.518 | 0.659 | 0.152 | 0.151 | 1 | 6.27 | 11.57 | 18.74 | 1.002 | 0.989 | 1000 | 117 | 67.6 Mb | 1.30e+09 | Thu Apr 17 16:38:06 2025 | 34 |
Notes about parallel processing
Much like the philosophy of constructing simulation code that can be executed using multiple cores in the package, it is generally better to let SimDesign
perform any necessary parallel processing distribution of the workload. Hence, while Stan supports parallel computations and the execution of multiple chains in parallel, it is generally advisable to execute each instance of Stan code on a single core so that within a given simulation replication there will not be any competition for available processor resources.
Notes on convergence of MCMC
The above example uses Stan for Markov Chain Monte Carlo (MCMC) sampling via the Hamiltonian method. While convergence is likely not an issue for this particular example, it is possible that convergence issues will appear in more complex models. One way to ensure the resulting replications have converged is to give an error when some convergence diagnostic statistic, such as \(\hat R\), is below a recommended threshold. In the stan_fit()
function above, an error is thrown whenever \(\hat R\) > 1.01 (see this paper). Another possibility is to call Stan again with more iterations if convergence failed.