Raju, van der Linden, and Fleer (1995) simulation

Author

Phil Chalmers

Simulation from:

Raju, N. S.; van der Linden, W. J. & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353-368.

However, this is not a perfect replication of the results presented in this study. Specifically, the authors never actually performed a replication experiment to evaluate their presented statistics, and instead analyzed the results based on single data-set generations only. The following performs a more realistic presentation of the behaviour of the DFIT statistics under the conditions studied. Additionally, estimation is performed using marginal ML estimation via mirt rather than marginal MAP via BILOG-MG with the defaults at the time of publication. This was selected since the authors were interested in the frequentest properties of their detection statistics, and therefore to remove the potential influence of prior parameter distributions it makes more sense to start with non-informative priors.

Finally, a linking method was not adopted since the parameters were already on the same metric at the time of data generation, and therefore is not required (had linking been required, suitable anchor items could have been selected so that the scale of the focal group could be freely estimated).

Simulation code

#-------------------------------------------------------------------

library(SimDesign)

Design <- createDesign(N = c(500, 1000),
                       focal = c('reference', 'focal1', 'focal2', 'focal3'),
                       direction = c('uni', 'bi'))

source('Raju1995_extras.R')  ## read-in large tables of parameter sets
source('DFIT.R')             ## read-in DFIT function for multi-group mirt object

#-------------------------------------------------------------------

Generate <- function(condition, fixed_objects = NULL) {
    Attach(condition)
    refpars <- fixed_objects[[direction]]$reference
    if(direction == 'bi' && focal == 'focal3') # for modified reference group combination
        refpars <- fixed_objects[['bi']]$reference_focal3
    focalpars <- fixed_objects[[direction]][[focal]]
    datref <- simdata(a=refpars[,"a1"], d=refpars[,"d"], N=N/2, itemtype='2PL')
    datfocal <- simdata(a=focalpars[,"a1"], d=focalpars[,"d"], N=N/2, itemtype='2PL')
    dat <- data.frame(group = rep(c('ref', 'focal'), each=N/2),
                      rbind(datref, datfocal))
    dat
}

Analyse <- function(condition, dat, fixed_objects = NULL) {
    group <- dat$group
    items <- subset(dat, select = -group)

    ##  MML estimation instead of MMAP
    ##    Following is equivalent to fitting independent single-group models
    ##    (linking not required since both groups generated from same scale)
    mod <- multipleGroup(items, 1, group=group, verbose=FALSE)
    stopifnot(extract.mirt(mod, 'converged'))

    DTF <- DFIT(mod)
    DIF <- DFIT(mod, DIF = TRUE)

    ret <- c(DTF=DTF$DTF, DTF.p=DTF$p.X2,
             CDIF=DIF$CDIF, NCDIF=DIF$NCDIF, NCDIF.p=DIF$p.X2)
    ret
}

Summarise <- function(condition, results, fixed_objects = NULL) {
    pick <- subset(results, select = c(DTF, CDIF1:CDIF40, NCDIF1:NCDIF40))
    means <- colMeans(pick)
    SDs <- apply(pick, 2, sd)
    pick.p <- subset(results, select = c(DTF.p, NCDIF.p1:NCDIF.p40))
    ps <- EDR(pick.p, .01)
    pick.NCDIF <- subset(results, select = NCDIF1:NCDIF40)
    NCDIF.006 <- colMeans(pick.NCDIF > .006)
    ret <- c(mean=means, SD=SDs, ps, NCDIF.006=unname(NCDIF.006))
    ret
}

#-------------------------------------------------------------------

res <- runSimulation(design=Design, replications=5000, generate=Generate,
                     analyse=Analyse, summarise=Summarise,
                     fixed_objects=fo, packages = 'mirt', parallel=TRUE,
                     save_results = FALSE, filename = 'Raju1995')
res

# A tibble: 16 × 250
       N focal    direction mean.DTF mean.CDIF1 mean.CDIF2 mean.CDIF3 mean.CDIF4
   <dbl> <chr>    <chr>        <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
 1   500 referen… uni        1.6496  -4.4881e-4 -9.4495e-4  6.8298e-4  8.7543e-4
 2  1000 referen… uni        0.79966  3.2853e-4 -4.0062e-4 -3.2408e-4  3.4686e-4
 3   500 focal1   uni        1.8263   3.6271e-3  2.6617e-3  2.6948e-3  2.7975e-3
 4  1000 focal1   uni        1.0031   2.7431e-3  2.7282e-3  2.9469e-3  3.5933e-3
 5   500 focal2   uni        2.4525   5.6284e-3  6.9448e-3  7.7482e-3  7.2218e-3
 6  1000 focal2   uni        1.6140   6.0639e-3  5.8814e-3  7.7320e-3  7.5962e-3
 7   500 focal3   uni        2.8337   8.5511e-3  7.7950e-3  7.5639e-3  7.6831e-3
 8  1000 focal3   uni        2.0253   8.3191e-3  6.9485e-3  7.4085e-3  8.8175e-3
 9   500 referen… bi         1.5796  -1.7019e-3  1.3896e-3  8.3165e-4 -1.1015e-3
10  1000 referen… bi         0.81751 -4.5671e-4  5.5246e-4  1.9432e-4  1.4209e-4
11   500 focal1   bi         1.6012  -8.4113e-4  6.7791e-4  2.1897e-4 -9.7950e-4
12  1000 focal1   bi         0.79338 -4.8063e-4 -1.2972e-4 -4.0402e-4 -3.3937e-4
13   500 focal2   bi         1.6176   4.8432e-5 -7.0522e-4  5.3535e-4  2.0785e-4
14  1000 focal2   bi         0.78514  7.3518e-4 -4.0095e-4 -1.1517e-3  1.5601e-4
15   500 focal3   bi         1.5887  -1.8218e-3 -1.8665e-3 -1.4468e-3 -1.3513e-3
16  1000 focal3   bi         0.82812 -1.5600e-3 -1.7591e-3 -1.5820e-3 -1.1778e-3
# ℹ 242 more variables: mean.CDIF5 <dbl>, mean.CDIF6 <dbl>, mean.CDIF7 <dbl>,
#   mean.CDIF8 <dbl>, mean.CDIF9 <dbl>, mean.CDIF10 <dbl>, mean.CDIF11 <dbl>,
#   mean.CDIF12 <dbl>, mean.CDIF13 <dbl>, mean.CDIF14 <dbl>, mean.CDIF15 <dbl>,
#   mean.CDIF16 <dbl>, mean.CDIF17 <dbl>, mean.CDIF18 <dbl>, mean.CDIF19 <dbl>,
#   mean.CDIF20 <dbl>, mean.CDIF21 <dbl>, mean.CDIF22 <dbl>, mean.CDIF23 <dbl>,
#   mean.CDIF24 <dbl>, mean.CDIF25 <dbl>, mean.CDIF26 <dbl>, mean.CDIF27 <dbl>,
#   mean.CDIF28 <dbl>, mean.CDIF29 <dbl>, mean.CDIF30 <dbl>, …

Extras

Explore the results.

library(SimDesign)

res <- readRDS('Raju1995.rds')

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

# average close to 1 for DTF, and too powerful (should be at alpha = .01)
res %>%
    group_by(direction, focal) %>%
    summarise(mean(mean.DTF), mean(DTF.p))

`summarise()` has grouped output by 'direction'. You can override using the
`.groups` argument.

# A tibble: 8 × 4
# Groups:   direction [2]
  direction focal     `mean(mean.DTF)` `mean(DTF.p)`
  <chr>     <chr>                <dbl>         <dbl>
1 bi        focal1                1.20         0.749
2 bi        focal2                1.20         0.756
3 bi        focal3                1.21         0.720
4 bi        reference             1.20         0.751
5 uni       focal1                1.41         0.773
6 uni       focal2                2.03         0.825
7 uni       focal3                2.43         0.816
8 uni       reference             1.22         0.753

# Some Type I error examples
res %>%
    group_by(direction, focal) %>%
    summarise(CDF1=mean(mean.CDIF1), sd_CDF1=sd(mean.CDIF1),
              CDF2=mean(mean.CDIF2), sd_CDF2=sd(mean.CDIF2))

`summarise()` has grouped output by 'direction'. You can override using the
`.groups` argument.

# A tibble: 8 × 6
# Groups:   direction [2]
  direction focal           CDF1  sd_CDF1      CDF2   sd_CDF2
  <chr>     <chr>          <dbl>    <dbl>     <dbl>     <dbl>
1 bi        focal1    -0.000661  0.000255  0.000274 0.000571 
2 bi        focal2     0.000392  0.000486 -0.000553 0.000215 
3 bi        focal3    -0.00169   0.000185 -0.00181  0.0000759
4 bi        reference -0.00108   0.000881  0.000971 0.000592 
5 uni       focal1     0.00319   0.000625  0.00269  0.0000470
6 uni       focal2     0.00585   0.000308  0.00641  0.000752 
7 uni       focal3     0.00844   0.000164  0.00737  0.000599 
8 uni       reference -0.0000601 0.000550 -0.000673 0.000385

res %>%
    group_by(direction, focal) %>%
    summarise(NCDF1=mean(mean.NCDIF1), sd_CDF1=sd(mean.NCDIF1),
              NCDF2=mean(mean.NCDIF2), sd_CDF2=sd(mean.NCDIF2))

`summarise()` has grouped output by 'direction'. You can override using the
`.groups` argument.

# A tibble: 8 × 6
# Groups:   direction [2]
  direction focal       NCDF1 sd_CDF1   NCDF2 sd_CDF2
  <chr>     <chr>       <dbl>   <dbl>   <dbl>   <dbl>
1 bi        focal1    0.00314 0.00150 0.00306 0.00143
2 bi        focal2    0.00317 0.00149 0.00316 0.00152
3 bi        focal3    0.00316 0.00149 0.00315 0.00144
4 bi        reference 0.00311 0.00144 0.00308 0.00143
5 uni       focal1    0.00312 0.00147 0.00316 0.00151
6 uni       focal2    0.00316 0.00147 0.00313 0.00146
7 uni       focal3    0.00321 0.00141 0.00329 0.00145
8 uni       reference 0.00310 0.00141 0.00313 0.00148

# These should be at alpha = .01.....wayyyyy too powerful
res %>%
    group_by(direction, focal) %>%
    summarise(mean(NCDIF.p1), mean(NCDIF.p2))

`summarise()` has grouped output by 'direction'. You can override using the
`.groups` argument.

# A tibble: 8 × 4
# Groups:   direction [2]
  direction focal     `mean(NCDIF.p1)` `mean(NCDIF.p2)`
  <chr>     <chr>                <dbl>            <dbl>
1 bi        focal1               0.724            0.717
2 bi        focal2               0.72             0.730
3 bi        focal3               0.709            0.714
4 bi        reference            0.722            0.715
5 uni       focal1               0.724            0.719
6 uni       focal2               0.727            0.725
7 uni       focal3               0.705            0.712
8 uni       reference            0.718            0.726

# Ad-hoc cut-offs not much better, and don't reach alpha = .01
res %>%
    group_by(direction, focal) %>%
    summarise(mean(NCDIF.0061), mean(NCDIF.0062))

`summarise()` has grouped output by 'direction'. You can override using the
`.groups` argument.

# A tibble: 8 × 4
# Groups:   direction [2]
  direction focal     `mean(NCDIF.0061)` `mean(NCDIF.0062)`
  <chr>     <chr>                  <dbl>              <dbl>
1 bi        focal1                 0.144              0.140
2 bi        focal2                 0.149              0.144
3 bi        focal3                 0.144              0.144
4 bi        reference              0.141              0.143
5 uni       focal1                 0.145              0.151
6 uni       focal2                 0.145              0.148
7 uni       focal3                 0.149              0.153
8 uni       reference              0.143              0.142

#############################################

# Power ones
res %>%
    group_by(direction, focal) %>%
    summarise(CDF5=mean(mean.CDIF5), sd_CDF5=sd(mean.CDIF5),
              CDF10=mean(mean.CDIF10), sd_CDF10=sd(mean.CDIF10))

`summarise()` has grouped output by 'direction'. You can override using the
`.groups` argument.

# A tibble: 8 × 6
# Groups:   direction [2]
  direction focal          CDF5  sd_CDF5     CDF10  sd_CDF10
  <chr>     <chr>         <dbl>    <dbl>     <dbl>     <dbl>
1 bi        focal1     0.158    0.000100 -0.000389 0.0000203
2 bi        focal2     0.160    0.000192 -0.000432 0.000391 
3 bi        focal3    -0.00245  0.00117  -0.00189  0.0000657
4 bi        reference -0.000682 0.000844  0.000197 0.000465 
5 uni       focal1     0.163    0.000537  0.134    0.000173 
6 uni       focal2     0.167    0.000434  0.0771   0.000846 
7 uni       focal3     0.169    0.000357  0.0793   0.000476 
8 uni       reference  0.000156 0.000370 -0.00111  0.000148

res %>%
    group_by(direction, focal) %>%
    summarise(NCDF5=mean(mean.NCDIF5), sd_CDF5=sd(mean.NCDIF5),
              NCDF10=mean(mean.NCDIF10), sd_CDF10=sd(mean.NCDIF10))

`summarise()` has grouped output by 'direction'. You can override using the
`.groups` argument.

# A tibble: 8 × 6
# Groups:   direction [2]
  direction focal       NCDF5 sd_CDF5  NCDF10 sd_CDF10
  <chr>     <chr>       <dbl>   <dbl>   <dbl>    <dbl>
1 bi        focal1    0.0290  0.00149 0.00319  0.00149
2 bi        focal2    0.0296  0.00155 0.00327  0.00151
3 bi        focal3    0.0103  0.00195 0.00328  0.00156
4 bi        reference 0.00356 0.00171 0.00326  0.00155
5 uni       focal1    0.0306  0.00186 0.0222   0.00141
6 uni       focal2    0.0320  0.00181 0.00946  0.00170
7 uni       focal3    0.0330  0.00178 0.0101   0.00160
8 uni       reference 0.00360 0.00168 0.00333  0.00159

# Power still not even at 1.00 despite massive inflation
res %>%
    group_by(direction, focal) %>%
    summarise(mean(NCDIF.p5), mean(NCDIF.p10))

`summarise()` has grouped output by 'direction'. You can override using the
`.groups` argument.

# A tibble: 8 × 4
# Groups:   direction [2]
  direction focal     `mean(NCDIF.p5)` `mean(NCDIF.p10)`
  <chr>     <chr>                <dbl>             <dbl>
1 bi        focal1               0.998             0.738
2 bi        focal2               0.998             0.737
3 bi        focal3               0.422             0.730
4 bi        reference            0.740             0.736
5 uni       focal1               0.998             0.998
6 uni       focal2               1.00              0.945
7 uni       focal3               0.999             0.951
8 uni       reference            0.743             0.739

res %>%
    group_by(direction, focal) %>%
    summarise(mean(NCDIF.0065), mean(NCDIF.00610))

`summarise()` has grouped output by 'direction'. You can override using the
`.groups` argument.

# A tibble: 8 × 4
# Groups:   direction [2]
  direction focal     `mean(NCDIF.0065)` `mean(NCDIF.00610)`
  <chr>     <chr>                  <dbl>               <dbl>
1 bi        focal1                 0.980               0.146
2 bi        focal2                 0.983               0.156
3 bi        focal3                 0.691               0.155
4 bi        reference              0.172               0.150
5 uni       focal1                 0.984               0.958
6 uni       focal2                 0.988               0.602
7 uni       focal3                 0.990               0.629
8 uni       reference              0.182               0.159

Summary

The above simulation reflects my initial impression of the DFIT statistics in that they are a bit too wild to be of much practical use. This is largely due to the two-step nature of the statistics, where the model-implied components in the IRT models are treated independently for each response pattern observation. Moreover, with the MML estimation instead of the MMAP estimation from BILOG-MG, it appears even the .006 cutoff for NCDIF is not a great choice as it does not result in Type I error control close to the nominal rate of \(\alpha = .05\) or .01.

Finally, while it is also possible to estimate the bias and RMSD of these estimates this information was not presented in the original article, and therefore does not appear above either. However, I suspect that, in light of what was observed, the bias and RMSD likely will not provide convincing results at recovering the population parameters either….