Simulation: Comparing two estimators and how effectively they recover population parameters

Item factor analysis (IFA) is a common method for determine the effectiveness and structure of items in psychological tests. Two approaches currently exist: full-information IFA via method from the item response theory framework, and limited-information IFA from the structural equation modeling framework. The question is, under which circumstances should either method be used? Here we explore a simple one factor test structure to determine which approach can recover the item parameters when varying test length and sample size.

Unfortunately, the mirt package uses a slightly different parameterization of the IFA problem, implementing a logistic model rather than a normal ogive model. Historically, the response curves have been equated by applying a scaling correction of \(D = 1.702\) to help fix this issue, which as can be seen below makes the response curves very similar (as it turns out, logistic models have slightly thicker tails than ogive models).

To give the following simulation the benefit of the doubt, the data will be generated from the normal ogive model (implying that lavaan is fitting the correct model) while mirt is only fitting an approximation to this model by translating the parameters. From this we will see whether FIML with a logistic model will better approximate the normal ogive parameters compared to the limited information DWLS approach available in lavaan.

P_logit <- function(a, d, Theta) exp(a * Theta + d) / (1 + exp(a * Theta + d))
P_ogive <- function(a, d, Theta) pnorm(a * Theta + d)

Theta <- seq(-5,5,length.out=200)
a <- 0.5
d <- -.5
D <- 1.702

example <- data.frame(Theta, logit=P_logit(a*D, d*D, Theta), ogive=P_ogive(a, d, Theta))
plot(Theta, example$logit, type = 'l', ylab = 'P', las=1)
lines(Theta, example$ogive, col = 'red')

Define the conditions

Start by defining the conditions to be studied. Because we are interested in a number of slopes and intercepts it is often convenient to save long strings of numbers to a single object to be passed to fixed_objects. Naturally, these could be included in the simulation source code directly, but often will take up a large amount of space (could be hundreds of lines), therefore passing a list object to runSimulation() may be more convenient.

library(SimDesign)
# SimFunctions()

sample_sizes <- c(250, 500, 1000)
nitems <- c(10, 20)
Design <- createDesign(sample_size = sample_sizes, 
                       nitems = nitems)

# create list of additional parameters which are fixed across conditions
set.seed(1)
pars_10 <- rbind(a = round(rlnorm(10, .3, .5)/1.702, 2),
                 d = round(rnorm(10, 0, .5)/1.702, 2))
pars_20 <- rbind(a = round(rlnorm(20, .3, .5)/1.702, 2),
                 d = round(rnorm(20, 0, .5)/1.702, 2))
(pars <- list(ten=pars_10, twenty=pars_20))
## $ten
##   [,1] [,2]  [,3]  [,4] [,5]  [,6] [,7] [,8] [,9] [,10]
## a 0.58 0.87  0.52  1.76 0.94  0.53 1.01 1.15 1.06  0.68
## d 0.44 0.11 -0.18 -0.65 0.33 -0.01 0.00 0.28 0.24  0.17
## 
## $twenty
##    [,1]  [,2] [,3] [,4]  [,5]  [,6] [,7] [,8]  [,9] [,10] [,11] [,12] [,13]
## a  1.26  1.17 0.82 0.29  1.08  0.77 0.73 0.38  0.62  0.98  1.56  0.75  0.96
## d -0.05 -0.07 0.20 0.16 -0.20 -0.21 0.11 0.23 -0.03  0.26  0.12 -0.18  0.10
##   [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## a  0.77  0.40  0.64  0.65  0.77  1.37  1.16
## d -0.33  0.42  0.58 -0.11 -0.31  0.17 -0.04
# The above slopes, when standardized, give the following factor loadings and commonalities (not run): 
a10 <- pars_10['a', ]; a20 <- pars_20['a', ]
load10 <- a10 / sqrt(1 + a10^2)
load20 <- a20 / sqrt(1 + a20^2)
rbind(F=load10, h2=load10^2)
rbind(F=load20, h2=load20^2)

Define the functions

As usual, define the functions of interest. Here we make sure that lavaan and mirt are loaded by passing them to the packages argument (required when running simulations in parallel). Here we only collect information on the slope parameters, mainly because they are the most interesting to study anyway (intercepts become more important in IRT methods).

Generate <- function(condition, fixed_objects = NULL) {

    N <- condition$sample_size
    nitems <- condition$nitems
    nitems_name <- ifelse(nitems == 10, 'ten', 'twenty')

    #extract objects from fixed_objects
    a <- fixed_objects[[nitems_name]]['a', ]
    d <- fixed_objects[[nitems_name]]['d', ]

    dat <- matrix(NA, N, nitems)
    colnames(dat) <- paste0('item_', 1:nitems)
    Theta <- rnorm(N)
    for(j in 1:nitems){
        p <- P_ogive(a[j], d[j], Theta)
        for(i in 1:N)
            dat[i,j] <- sample(c(1,0), 1, prob = c(p[i], 1 - p[i]))
    }
    as.data.frame(dat) #data.frame works nicer with lavaan
}

Analyse <- function(condition, dat, fixed_objects = NULL) {
    nitems <- condition$nitems

    # (optional) could use better starting values from fixed_objects here too
    mod <- mirt(dat, 1L, verbose=FALSE)
    if(!extract.mirt(mod, 'converged')) stop('mirt did not converge')
    cfs <- coef(mod, simplify = TRUE, digits = Inf)
    FIML_as <- cfs$items[,1L] / 1.702

    lavmod <- paste0('F =~ ', paste0('NA*', colnames(dat)[1L], ' + '),
                     paste0(colnames(dat)[-1L], collapse = ' + '),
                     '\nF ~~ 1*F')
    lmod <- sem(lavmod, dat, ordered = colnames(dat))
    if(!lavInspect(lmod, 'converged')) stop('lavaan did not converge')
    cfs2 <- coef(lmod)
    DWLS_alpha <- cfs2[1L:nitems]
    const <- sqrt(1 - DWLS_alpha^2)
    DWLS_as <- DWLS_alpha / const

    ret <- c(FIML_as=unname(FIML_as), DWLS_as=unname(DWLS_as))
    ret
}

Summarise <- function(condition, results, fixed_objects = NULL) {
    nitems <- condition$nitems
    nitems_name <- ifelse(nitems == 10, 'ten', 'twenty')

    #extract objects from fixed_objects
    a <- fixed_objects[[nitems_name]]['a', ]
    pop <- c(a, a)

    obt_bias <- bias(results, pop)
    obt_RMSE <- RMSE(results, pop)
    ret <- c(bias=obt_bias, RMSE=obt_RMSE)
    ret
}

Notice that a manual error is throw whenever lavaan or mirt objects reach their maximum number of iterations. This is required because the objects themselves do not throw error messages, but the data should be redrawn anyway to ensure that the simulation only contains models which have successfully converged.

Run the simulation

Because this simulation takes considerably longer it is recommended to pass the save = TRUE to temporarily save results in case of power outages. Results can be continued by running the identical simulation code as the initial run, and the function will automatically detect whether any temp files are available and resume the simulation at the previously saved location.

res <- runSimulation(Design, replications=100, verbose=FALSE, save=TRUE, parallel=TRUE,
                     generate=Generate, analyse=Analyse, summarise=Summarise, 
                     packages=c('mirt', 'lavaan'), fixed_objects=pars)
## This is lavaan 0.6-9
## lavaan is FREE software! Please report any bugs.
res
## # A tibble: 6 × 86
##   sample_size nitems bias.FIML_as1 bias.FIML_as2 bias.FIML_as3 bias.FIML_as4
##         <dbl>  <dbl>         <dbl>         <dbl>         <dbl>         <dbl>
## 1         250     10      -0.00804       0.0263       -0.0139        0.161  
## 2         500     10      -0.0183        0.00165      -0.0168        0.0521 
## 3        1000     10      -0.0109       -0.0151       -0.0169        0.0856 
## 4         250     20       0.0479        0.0387        0.0112       -0.0121 
## 5         500     20       0.0413        0.0540        0.00919      -0.00589
## 6        1000     20       0.0275        0.0360       -0.00946      -0.00443
## # … with 80 more variables: bias.FIML_as5 <dbl>, bias.FIML_as6 <dbl>,
## #   bias.FIML_as7 <dbl>, bias.FIML_as8 <dbl>, bias.FIML_as9 <dbl>,
## #   bias.FIML_as10 <dbl>, bias.DWLS_as1 <dbl>, bias.DWLS_as2 <dbl>,
## #   bias.DWLS_as3 <dbl>, bias.DWLS_as4 <dbl>, bias.DWLS_as5 <dbl>,
## #   bias.DWLS_as6 <dbl>, bias.DWLS_as7 <dbl>, bias.DWLS_as8 <dbl>,
## #   bias.DWLS_as9 <dbl>, bias.DWLS_as10 <dbl>, RMSE.FIML_as1 <dbl>,
## #   RMSE.FIML_as2 <dbl>, RMSE.FIML_as3 <dbl>, RMSE.FIML_as4 <dbl>, …

Analyze the results

Sometimes, reshaping and indexing your output can be very helpful. Here we break the analysis into two parts, though other strategies are certainly possible. Because analyzing simulations is a lot like analyzing empirical data no one strategy may be the best; you have to use judgment.

For this particular analysis, we can see that the res object contains NA values for slope parameters that were not applicable. For ease of manipulating the results it often will be convenient to subset the results so that these NA’s are no longer required.

res10 <- subset(res, nitems == 10)
res10 <- res10[,!as.vector(is.na(res10[1L, ]))]
res20 <- subset(res, nitems == 20)

Ten items

# bias in slopes
names10 <- colnames(res10)
bias_as_fiml <- t(res10[,grepl('bias\\.', names10) & grepl('\\_as', names10) & 
                       grepl('FIML', names10)])
colnames(bias_as_fiml) <- sample_sizes
rownames(bias_as_fiml) <- pars_10['a', ]

bias_as_dwls <- t(res10[,grepl('bias\\.', names10) & grepl('\\_as', names10) & 
                       grepl('DWLS', names10)])
colnames(bias_as_dwls) <- sample_sizes
rownames(bias_as_dwls) <- pars_10['a', ]

(out <- list(FIML=bias_as_fiml, DWLS=bias_as_dwls))
## $FIML
##           250      500     1000
## 0.58 -0.00804 -0.01833 -0.01092
## 0.87  0.02630  0.00165 -0.01512
## 0.52 -0.01388 -0.01679 -0.01691
## 1.76  0.16054  0.05211  0.08563
## 0.94  0.02608 -0.00237  0.00710
## 0.53 -0.01736 -0.00213 -0.01969
## 1.01  0.02802 -0.01111 -0.00583
## 1.15  0.05029  0.01756  0.01935
## 1.06  0.02388 -0.00195  0.01391
## 0.68 -0.00233 -0.00263 -0.02656
## 
## $DWLS
##          250       500      1000
## 0.58 0.00480 -0.005374  6.09e-04
## 0.87 0.04113  0.014565 -1.39e-03
## 0.52 0.00814  0.003822  1.64e-03
## 1.76 0.08295 -0.000971  1.79e-02
## 0.94 0.03609  0.003868  1.19e-02
## 0.53 0.00523  0.020524  1.27e-03
## 1.01 0.03786 -0.001602 -6.03e-05
## 1.15 0.04606  0.013637  1.44e-02
## 1.06 0.02711  0.000582  1.45e-02
## 0.68 0.01862  0.015010 -9.37e-03
sapply(out, colMeans)
##        FIML    DWLS
## 250  0.0274 0.03080
## 500  0.0016 0.00641
## 1000 0.0031 0.00514
# RMSE in slopes
RMSE_as_fiml <- t(res10[,grepl('RMSE\\.', names10) & grepl('\\_as', names10) & 
                       grepl('FIML', names10)])
colnames(RMSE_as_fiml) <- sample_sizes
rownames(RMSE_as_fiml) <- pars_10['a', ]

RMSE_as_dwls <- t(res10[,grepl('RMSE\\.', names10) & grepl('\\_as', names10) & 
                       grepl('DWLS', names10)])
colnames(RMSE_as_dwls) <- sample_sizes
rownames(RMSE_as_dwls) <- pars_10['a', ]

(out <- list(FIML=RMSE_as_fiml, DWLS=RMSE_as_dwls))
## $FIML
##        250    500   1000
## 0.58 0.126 0.0794 0.0605
## 0.87 0.138 0.1071 0.0810
## 0.52 0.134 0.0817 0.0567
## 1.76 0.480 0.2795 0.2271
## 0.94 0.175 0.1052 0.0868
## 0.53 0.109 0.0810 0.0560
## 1.01 0.194 0.1237 0.0871
## 1.15 0.237 0.1411 0.1049
## 1.06 0.185 0.1364 0.0962
## 0.68 0.138 0.1012 0.0615
## 
## $DWLS
##        250    500   1000
## 0.58 0.125 0.0764 0.0589
## 0.87 0.138 0.1046 0.0769
## 0.52 0.136 0.0820 0.0546
## 1.76 0.413 0.2766 0.1955
## 0.94 0.174 0.1035 0.0855
## 0.53 0.108 0.0855 0.0523
## 1.01 0.188 0.1183 0.0834
## 1.15 0.223 0.1389 0.0983
## 1.06 0.176 0.1295 0.0902
## 0.68 0.139 0.1022 0.0553
sapply(out, colMeans)
##        FIML   DWLS
## 250  0.1914 0.1820
## 500  0.1236 0.1218
## 1000 0.0918 0.0851

The methods appeared to recover the slope parameters fairly well, became progressively less biased as the sample size increased, and parameters were generally recovered with greater efficiency as \(N\) increased as well. Additionally, there appeared to be an effect relating to the size of the parameters. For IRT parameters, very large slopes appeared to be recovered with greater bias compared to the DWLS, though both estimators had more difficulty recovering the more extreme slopes (see the RMSE plot below).

library(ggplot2)
plt <- data.frame(pars = c(pars_10['a', ], pars_10['a', ]), 
                  RMSE = c(RMSE_as_fiml[,'1000'], RMSE_as_dwls[,'1000']),
                  bias = c(bias_as_fiml[,'1000'], bias_as_dwls[,'1000']),
                  estimator = rep(c('FIML', 'DWLS'), each = 10))
ggplot(plt, aes(pars, bias, colour=estimator)) + geom_point(size=2) + facet_wrap(~estimator) + 
    ggtitle('slope sizes by bias for FIML and DWLS estimators')

ggplot(plt, aes(pars, RMSE, colour=estimator)) + geom_point(size=2) + facet_wrap(~estimator) + 
    ggtitle('slope sizes by RMSE for FIML and DWLS estimators')

Twenty items

# bias in slopes
names20 <- colnames(res20)
bias_as_fiml <- t(res20[,grepl('bias\\.', names20) & grepl('\\_as', names20) & 
                       grepl('FIML', names20)])
colnames(bias_as_fiml) <- sample_sizes
rownames(bias_as_fiml) <- pars_20['a', ]

bias_as_dwls <- t(res20[,grepl('bias\\.', names20) & grepl('\\_as', names20) & 
                       grepl('DWLS', names20)])
colnames(bias_as_dwls) <- sample_sizes
rownames(bias_as_dwls) <- pars_20['a', ]

(out <- list(FIML=bias_as_fiml, DWLS=bias_as_dwls))
## $FIML
##           250       500     1000
## 1.26  0.04787  0.041337  0.02749
## 1.17  0.03874  0.053996  0.03596
## 0.82  0.01122  0.009190 -0.00946
## 0.29 -0.01205 -0.005892 -0.00443
## 1.08  0.01842  0.000435  0.01120
## 0.77  0.00835  0.005267 -0.01502
## 0.73 -0.01164 -0.016506  0.00194
## 0.38 -0.00914 -0.010466 -0.00913
## 0.62  0.00203 -0.012789 -0.00421
## 0.98  0.01990  0.012321  0.00926
## 1.56  0.11297  0.095413  0.04898
## 0.75  0.03109 -0.015747 -0.00892
## 0.96  0.00186 -0.007785  0.00871
## 0.77  0.00563  0.003424 -0.00323
## 0.4   0.00759  0.003174 -0.00175
## 0.64  0.02455 -0.005537  0.00708
## 0.65 -0.02924 -0.022061 -0.01312
## 0.77 -0.01248 -0.005322 -0.00813
## 1.37  0.06518  0.043794  0.04483
## 1.16  0.06326  0.022888  0.02453
## 
## $DWLS
##           250       500      1000
## 1.26  0.03294  0.025216  0.006480
## 1.17  0.03372  0.043421  0.021145
## 0.82  0.02711  0.021421 -0.001537
## 0.29  0.00505  0.009825  0.009373
## 1.08  0.01865 -0.000621  0.002970
## 0.77  0.02478  0.017953 -0.004609
## 0.73  0.00709 -0.000806  0.014526
## 0.38  0.01090  0.005563  0.007049
## 0.62  0.01947  0.005962  0.011918
## 0.98  0.02791  0.013298  0.006162
## 1.56  0.06900  0.050940 -0.000441
## 0.75  0.04705 -0.001845  0.002524
## 0.96  0.01012 -0.002236  0.010214
## 0.77  0.01393  0.011911  0.003857
## 0.4   0.02116  0.013804  0.009199
## 0.64  0.03353  0.001536  0.009720
## 0.65 -0.00905 -0.005399  0.002592
## 0.77  0.00361  0.005634  0.000998
## 1.37  0.04653  0.019473  0.010139
## 1.16  0.05524  0.015791  0.011361
sapply(out, colMeans)
##         FIML    DWLS
## 250  0.01920 0.02494
## 500  0.00946 0.01254
## 1000 0.00713 0.00668
# RMSE in slopes
RMSE_as_fiml <- t(res20[,grepl('RMSE\\.', names20) & grepl('\\_as', names20) & 
                       grepl('FIML', names20)])
colnames(RMSE_as_fiml) <- sample_sizes
rownames(RMSE_as_fiml) <- pars_20['a', ]

RMSE_as_dwls <- t(res20[,grepl('RMSE\\.', names20) & grepl('\\_as', names20) & 
                       grepl('DWLS', names20)])
colnames(RMSE_as_dwls) <- sample_sizes
rownames(RMSE_as_dwls) <- pars_20['a', ]

(out <- list(FIML=RMSE_as_fiml, DWLS=RMSE_as_dwls))
## $FIML
##         250    500   1000
## 1.26 0.2168 0.1551 0.0982
## 1.17 0.1734 0.1471 0.1051
## 0.82 0.1301 0.0952 0.0598
## 0.29 0.0838 0.0643 0.0443
## 1.08 0.1635 0.1182 0.0867
## 0.77 0.1344 0.0891 0.0570
## 0.73 0.1237 0.0889 0.0663
## 0.38 0.1069 0.0756 0.0470
## 0.62 0.1173 0.0772 0.0561
## 0.98 0.1661 0.1097 0.0761
## 1.56 0.2559 0.2244 0.1346
## 0.75 0.1440 0.0967 0.0665
## 0.96 0.1480 0.1089 0.0860
## 0.77 0.1411 0.0854 0.0649
## 0.4  0.1245 0.0864 0.0461
## 0.64 0.1278 0.0907 0.0702
## 0.65 0.1155 0.0830 0.0646
## 0.77 0.1426 0.0864 0.0630
## 1.37 0.2315 0.1713 0.1308
## 1.16 0.2109 0.1285 0.0853
## 
## $DWLS
##         250    500   1000
## 1.26 0.1970 0.1423 0.0880
## 1.17 0.1642 0.1369 0.0955
## 0.82 0.1316 0.0928 0.0558
## 0.29 0.0869 0.0672 0.0470
## 1.08 0.1515 0.1121 0.0821
## 0.77 0.1343 0.0906 0.0547
## 0.73 0.1248 0.0857 0.0667
## 0.38 0.1110 0.0761 0.0478
## 0.62 0.1175 0.0765 0.0560
## 0.98 0.1607 0.1048 0.0720
## 1.56 0.2206 0.1897 0.1190
## 0.75 0.1452 0.0956 0.0641
## 0.96 0.1412 0.1053 0.0832
## 0.77 0.1371 0.0832 0.0634
## 0.4  0.1282 0.0887 0.0474
## 0.64 0.1316 0.0891 0.0698
## 0.65 0.1119 0.0778 0.0622
## 0.77 0.1408 0.0835 0.0620
## 1.37 0.2159 0.1581 0.1139
## 1.16 0.1980 0.1202 0.0770
sapply(out, colMeans)
##        FIML   DWLS
## 250  0.1529 0.1475
## 500  0.1091 0.1038
## 1000 0.0754 0.0714

Again, the estimators appeared to recover the parameters with similar precision and bias. However, the effect of parameter size now is more evident in the FIML estimator. Larger slopes indeed cause more progressively bias and larger RMSE values compared to the DWLS estimator. In general, larger slopes when using FIML estimation will be under-estimated, and therefore have larger RMSEs than the DWLS approach.

library(ggplot2)
plt <- data.frame(pars = c(pars_20['a', ], pars_20['a', ]), 
                  RMSE = c(RMSE_as_fiml[,'1000'], RMSE_as_dwls[,'1000']),
                  bias = c(bias_as_fiml[,'1000'], bias_as_dwls[,'1000']),
                  estimator = rep(c('FIML', 'DWLS'), each = 20))
ggplot(plt, aes(pars, bias, colour=estimator)) + geom_point(size=2) + facet_wrap(~estimator) + 
    ggtitle('slope sizes by bias for FIML and DWLS estimators')

ggplot(plt, aes(pars, RMSE, colour=estimator)) + geom_point(size=2) + facet_wrap(~estimator) + 
    ggtitle('slope sizes by RMSE for FIML and DWLS estimators')