Pre-compiled Binary Files in Simulation Studies

Author

Mark Lai and Phil Chalmers

Reusing pre-compiled binaries in simulation studies

The following example demonstrate the logic of how to reuse pre-compiled binary files in the context of Monte Carlo simulations. The goal in this case is to:

Pre-compile the necessary binary files before running runSimulation(),
Include tractable extraction information in the Design/fixed_objects inputs to point to and extract the necessary binary object associated with the respective simulation conditions under investigation, respectively, and
Sit around for a while, as no doubt the use of pre-compiled binaries suggests the computational problem is already intensive!

The following code example is an adaption of Mark Lai’s example using cmdstanr alongside SimDesign. See https://mc-stan.org/cmdstanr/articles/cmdstanr.html for further information on using the cmdstanr package.

Example using Stan

Step 1: Define and compile binary files

# if you haven't done so, install the cmdstanr package using the following:
# install.packages("cmdstanr", 
#   repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
#
# and follow all the necessary setup instructions to get the package working

The first step in the process would be to generate all the necessary Stan objects and store them into a meaningfully named list. Below two objects are generated to perform the same estimation of a population mean, however the first uses a weak normal prior (\(\mu \sim N(0, 10)\)) while the latter uses a much stronger prior (\(\mu \sim N(0,2\))).

library(cmdstanr)

bmean_stan_weak <- "
    data {
        int<lower=0> N;
        array[N] real x;
    }
    parameters {
        real mu;
        real<lower=0> sigma;
    }
    model {
        target += normal_lpdf(mu | 0, 10);  // weakly informative prior
        target += normal_lpdf(x | mu, sigma);
    }
"

bmean_stan_strong <- "
    data {
        int<lower=0> N;
        array[N] real x;
    }
    parameters {
        real mu;
        real<lower=0> sigma;
    }
    model {
        target += normal_lpdf(mu | 0, 2);  // more informative prior
        target += normal_lpdf(x | mu, sigma);
    }
"

# Save files and compile
stan_path_weak <- write_stan_file(bmean_stan_weak)
stan_path_strong <- write_stan_file(bmean_stan_strong)

# compile and store within fixed objects for use in runSimulation()
fo <- list(weak=cmdstan_model(stan_path_weak), 
           strong=cmdstan_model(stan_path_strong))

In file included from stan/src/stan/model/model_header.hpp:5:
stan/src/stan/model/model_base_crtp.hpp:159:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(std::vector<stan::math::var_value<double>, std::allocator<stan::math::var_value<double> > >&, std::vector<int>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
  159 |   inline math::var log_prob(std::vector<math::var>& theta,
      |                    ^~~~~~~~

C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note:   by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      |

stan/src/stan/model/model_base_crtp.hpp:154:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(std::vector<double>&, std::vector<int>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
  154 |   inline double log_prob(std::vector<double>& theta, std::vector<int>& theta_i,
      |                 ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note:   by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:96:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(Eigen::Matrix<stan::math::var_value<double>, -1, 1>&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
   96 |   inline math::var log_prob(Eigen::Matrix<math::var, -1, 1>& theta,
      |                    ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note:   by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:91:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(Eigen::VectorXd&, std::ostream*) const [with M = model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
   91 |   inline double log_prob(Eige

n::VectorXd& theta,
      |                 ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-11544854df1.hpp:350: note:   by 'model_1ccb90d83ecfc533ac78a7a094930339_model_namespace::model_1ccb90d83ecfc533ac78a7a094930339_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      |

In file included from stan/src/stan/model/model_header.hpp:5:
stan/src/stan/model/model_base_crtp.hpp:159:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(std::vector<stan::math::var_value<double>, std::allocator<stan::math::var_value<double> > >&, std::vector<int>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
  159 |   inline math::var log_prob(std::vector<math::var>& theta,
      |                    ^~~~~~~~

C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note:   by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:154:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(std::vector<double>&, std::vector<int>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
  154 |   inline double log_prob(std::vector<double>& theta, std::vector<int>& theta_i,
      |                 ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note:   by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:96:20: warning: 'stan::math::var stan::model::model_base_crtp<M>::log_prob(Eigen::Matrix<stan::math::var_value<double>, -1, 1>&, std::ostream*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; stan::math::var = stan::math::var_value<double>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
   96 |   inline math::var log_prob(Eigen::Matrix<math::var, -1, 1>& theta,
      |                    ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note:   by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      | 
stan/src/stan/model/model_base_crtp.hpp:91:17: warning: 'double stan::model::model_base_crtp<M>::log_prob(Eigen::VectorXd&, std::ostrea

m*) const [with M = model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
   91 |   inline double log_prob(Eigen::VectorXd& theta,
      |                 ^~~~~~~~
C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/model-1154722665dd.hpp:350: note:   by 'model_ce5780470cdd3998dd61add1e9ef0e81_model_namespace::model_ce5780470cdd3998dd61add1e9ef0e81_model::log_prob'
  350 |   log_prob(std::vector<T_>& params_r, std::vector<int>& params_i,
      |

Step 2: Define suitable simulation flow

Elements in the above fo object (passed to runSimulation() via the fixed_objects input) are correctly pointed to when needed.

# function to run MCMC and extract associated mean estimate
stan_fit <- function(mod, dat){
    
    # Stan is noisy, so tell it to be more quiet()
    mcmc <- quiet(mod$sample(list(x = dat, N = length(dat)),
                             refresh = 0, chains = 1, 
                             show_messages = FALSE))
    if (mcmc$summary("mu")[1, "rhat"] > 1.01) {
        # Gives an error when the convergence criterion is not met
        stop("MCMC did not converge.")
    }
    MB <- mcmc$summary("mu", mean)$mean[1]
    MB
}

# SimDesign::SimFunctions()
library(SimDesign)

Design <- createDesign(sample_size = c(30, 60, 120, 240), 
                       distribution = c('norm', 'chi'))
Design

# A tibble: 8 × 2
  sample_size distribution
        <dbl> <chr>       
1          30 norm        
2          60 norm        
3         120 norm        
4         240 norm        
5          30 chi         
6          60 chi         
7         120 chi         
8         240 chi

Generate <- function(condition, fixed_objects = NULL) {
    Attach(condition)
    if(distribution == 'norm'){
        dat <- rnorm(sample_size, mean = 3)
    } else if(distribution == 'chi'){
        dat <- rchisq(sample_size, df = 3)
    }
    dat
}

Analyse <- function(condition, dat, fixed_objects = NULL) {
    Attach(condition)
    
    M0 <- mean(dat)
    M1 <- mean(dat, trim = .1)
    M2 <- mean(dat, trim = .2)
    med <- median(dat)
    
    # extract suitable pre-compiled Stan object
    MB_weak <- stan_fit(fixed_objects$weak, dat)
    MB_strong <- stan_fit(fixed_objects$strong, dat)
    
    ret <- c(mean_no_trim = M0, mean_trim.1 = M1, 
             mean_trim.2 = M2, median = med, 
             mean_weak_prior = MB_weak, 
             mean_strong_prior = MB_strong)
    ret
}

Summarise <- function(condition, results, fixed_objects = NULL) {
    obs_bias <- bias(results, parameter = 3)
    obs_RMSE <- RMSE(results, parameter = 3)
    ret <- c(bias = obs_bias, RMSE = obs_RMSE, RE = RE(obs_RMSE))
    ret
}

Step 3: Run the simulation in parallel

res <- runSimulation(Design, replications = 1000, generate = Generate, 
                     analyse = Analyse, summarise = Summarise, 
                     fixed_objects = fo, parallel=TRUE, 
                     packages = "cmdstanr")


Number of parallel clusters in use: 27



Design: 1/8;   Replications: 1000;   RAM Used: 65.8 Mb;   Total Time: 0.00s 
 Conditions: sample_size=30, distribution=norm


Design: 2/8;   Replications: 1000;   RAM Used: 67.3 Mb;   Total Time: 01m 52.66s 
 Conditions: sample_size=60, distribution=norm


Design: 3/8;   Replications: 1000;   RAM Used: 67.3 Mb;   Total Time: 03m 45.57s 
 Conditions: sample_size=120, distribution=norm


Design: 4/8;   Replications: 1000;   RAM Used: 67.4 Mb;   Total Time: 05m 34.54s 
 Conditions: sample_size=240, distribution=norm


Design: 5/8;   Replications: 1000;   RAM Used: 67.4 Mb;   Total Time: 07m 27.41s 
 Conditions: sample_size=30, distribution=chi


Design: 6/8;   Replications: 1000;   RAM Used: 67.5 Mb;   Total Time: 09m 19.83s 
 Conditions: sample_size=60, distribution=chi


Design: 7/8;   Replications: 1000;   RAM Used: 67.5 Mb;   Total Time: 11m 13.40s 
 Conditions: sample_size=120, distribution=chi


Design: 8/8;   Replications: 1000;   RAM Used: 67.6 Mb;   Total Time: 13m 9.46s 
 Conditions: sample_size=240, distribution=chi


Simulation complete. Total execution time: 15m 6.33s

Warning in system2("quarto", "-V", stdout = TRUE, env = paste0("TMPDIR=", :
running command '"quarto"
TMPDIR=C:/Users/chalmrp/AppData/Local/Temp/RtmpwvuLia/file1154453c42e6 -V' had
status 1

res

# A tibble: 8 × 26
  sample_size distribution bias.mean_no_trim bias.mean_trim.1 bias.mean_trim.2
        <dbl> <chr>                    <dbl>            <dbl>            <dbl>
1          30 norm               -0.012206        -0.012205        -0.010733  
2          60 norm               -0.00066471      -0.00058635      -0.00049449
3         120 norm               -0.00045496      -0.0013078       -0.0018907 
4         240 norm               -0.0033323       -0.0030002       -0.0034330 
5          30 chi                 0.0084255       -0.30383         -0.44208   
6          60 chi                -0.0032308       -0.33978         -0.48129   
7         120 chi                -0.0061516       -0.35092         -0.49200   
8         240 chi                -0.0055454       -0.35250         -0.49620   
# ℹ 21 more variables: bias.median <dbl>, bias.mean_weak_prior <dbl>,
#   bias.mean_strong_prior <dbl>, RMSE.mean_no_trim <dbl>,
#   RMSE.mean_trim.1 <dbl>, RMSE.mean_trim.2 <dbl>, RMSE.median <dbl>,
#   RMSE.mean_weak_prior <dbl>, RMSE.mean_strong_prior <dbl>,
#   RE.mean_no_trim <dbl>, RE.mean_trim.1 <dbl>, RE.mean_trim.2 <dbl>,
#   RE.median <dbl>, RE.mean_weak_prior <dbl>, RE.mean_strong_prior <dbl>,
#   REPLICATIONS <dbl>, SIM_TIME <chr>, RAM_USED <chr>, SEED <int>, …

# render output better
knitr::kable(res)

sample_size	distribution	bias.mean_no_trim	bias.mean_trim.1	bias.mean_trim.2	bias.median	bias.mean_weak_prior	bias.mean_strong_prior	RMSE.mean_no_trim	RMSE.mean_trim.1	RMSE.mean_trim.2	RMSE.median	RMSE.mean_weak_prior	RMSE.mean_strong_prior	RE.mean_no_trim	RE.mean_trim.1	RE.mean_trim.2	RE.median	RE.mean_weak_prior	RE.mean_strong_prior	REPLICATIONS	SIM_TIME	RAM_USED	SEED	COMPLETED	ERRORS
30	norm	-0.012	-0.012	-0.011	-0.013	-0.014	-0.040	0.177	0.182	0.189	0.218	0.177	0.180	1	1.05	1.13	1.52	1.001	1.031	1000	113	67.3 Mb	1.94e+09	Thu Apr 17 16:24:52 2025	54
60	norm	-0.001	-0.001	0.000	-0.001	-0.001	-0.014	0.127	0.131	0.136	0.160	0.127	0.128	1	1.05	1.14	1.57	0.999	1.005	1000	113	67.3 Mb	1.34e+09	Thu Apr 17 16:26:45 2025	45
120	norm	0.000	-0.001	-0.002	-0.002	-0.001	-0.007	0.095	0.097	0.101	0.116	0.095	0.095	1	1.05	1.13	1.50	1.000	1.002	1000	109	67.4 Mb	6.70e+08	Thu Apr 17 16:28:34 2025	43
240	norm	-0.003	-0.003	-0.003	-0.005	-0.003	-0.006	0.068	0.069	0.071	0.081	0.068	0.068	1	1.03	1.09	1.42	1.000	1.006	1000	113	67.4 Mb	1.23e+09	Thu Apr 17 16:30:27 2025	38
30	chi	0.008	-0.304	-0.442	-0.587	0.002	-0.152	0.433	0.516	0.612	0.746	0.430	0.401	1	1.42	2.00	2.97	0.987	0.860	1000	112	67.5 Mb	1.52e+09	Thu Apr 17 16:32:20 2025	42
60	chi	-0.003	-0.340	-0.481	-0.617	-0.007	-0.082	0.316	0.452	0.567	0.707	0.315	0.304	1	2.05	3.23	5.01	0.992	0.925	1000	114	67.5 Mb	1.13e+09	Thu Apr 17 16:34:13 2025	36
120	chi	-0.006	-0.351	-0.492	-0.628	-0.008	-0.044	0.219	0.406	0.533	0.670	0.219	0.215	1	3.45	5.95	9.39	1.004	0.965	1000	116	67.6 Mb	1.89e+09	Thu Apr 17 16:36:09 2025	42
240	chi	-0.006	-0.353	-0.496	-0.638	-0.006	-0.024	0.152	0.381	0.518	0.659	0.152	0.151	1	6.27	11.57	18.74	1.002	0.989	1000	117	67.6 Mb	1.30e+09	Thu Apr 17 16:38:06 2025	34

Notes about parallel processing

Much like the philosophy of constructing simulation code that can be executed using multiple cores in the package, it is generally better to let SimDesign perform any necessary parallel processing distribution of the workload. Hence, while Stan supports parallel computations and the execution of multiple chains in parallel, it is generally advisable to execute each instance of Stan code on a single core so that within a given simulation replication there will not be any competition for available processor resources.

Notes on convergence of MCMC

The above example uses Stan for Markov Chain Monte Carlo (MCMC) sampling via the Hamiltonian method. While convergence is likely not an issue for this particular example, it is possible that convergence issues will appear in more complex models. One way to ensure the resulting replications have converged is to give an error when some convergence diagnostic statistic, such as \(\hat R\), is below a recommended threshold. In the stan_fit() function above, an error is thrown whenever \(\hat R\) > 1.01 (see this paper). Another possibility is to call Stan again with more iterations if convergence failed.