Compute factor score estimates (a.k.a, ability estimates, latent trait estimates, etc)
Source:R/fscores.R
fscores.Rd
Computes MAP, EAP, ML (Embretson & Reise, 2000), EAP for sum-scores (Thissen et al., 1995),
or WLE (Warm, 1989) factor scores with a multivariate normal
prior distribution using equally spaced quadrature. EAP scores for models with more than
three factors are generally not recommended since the integration grid becomes very large,
resulting in slower estimation and less precision if the quadpts
are too low.
Therefore, MAP scores should be used instead of EAP scores for higher dimensional models.
Multiple imputation variants are possible for each estimator if a parameter
information matrix was computed, which are useful if the sample size/number of items were small.
As well, if the model contained latent regression predictors this information will
be used in computing MAP and EAP estimates (for these models, full.scores=TRUE
will always be used). Finally, plausible value imputation is also available, and will also account
for latent regression predictor effects.
Usage
fscores(
object,
method = "EAP",
full.scores = TRUE,
rotate = "oblimin",
Target = NULL,
response.pattern = NULL,
append_response.pattern = FALSE,
na.rm = FALSE,
plausible.draws = 0,
plausible.type = "normal",
quadpts = NULL,
item_weights = rep(1, extract.mirt(object, "nitems")),
returnER = FALSE,
T_as_X = FALSE,
EAPsum.scores = FALSE,
return.acov = FALSE,
mean = NULL,
cov = NULL,
covdata = NULL,
verbose = TRUE,
full.scores.SE = FALSE,
theta_lim = c(-6, 6),
MI = 0,
use_dentype_estimate = FALSE,
QMC = FALSE,
custom_den = NULL,
custom_theta = NULL,
min_expected = 1,
max_theta = 20,
start = NULL,
...
)
Arguments
- object
a computed model object of class
SingleGroupClass
,MultipleGroupClass
, orDiscreteClass
- method
type of factor score estimation method. Can be:
"EAP"
for the expected a-posteriori (default). For models fit usingmdirt
this will return the posterior classification probabilities"MAP"
for the maximum a-posteriori (i.e, Bayes modal)"ML"
for maximum likelihood"WLE"
for weighted likelihood estimation"EAPsum"
for the expected a-posteriori for each sum score"plausible"
for a single plausible value imputation for each case. This is equivalent to settingplausible.draws = 1
"classify"
for the posteriori classification probabilities (only applicable when the input model was of classMixtureClass
)
- full.scores
if
FALSE
then a summary table with factor scores for each unique pattern is displayed as a formattedmatrix
object. Otherwise, a matrix of factor scores for each response pattern in the data is returned (default)- rotate
prior rotation to be used when estimating the factor scores. See
summary-method
for details. If the object is not an exploratory model then this argument is ignored- Target
target rotation; see
summary-method
for details- response.pattern
an optional argument used to calculate the factor scores and standard errors for a given response vector or matrix/data.frame
- append_response.pattern
logical; should the inputs from
response.pattern
also be appended to the factor score output?- na.rm
logical; remove rows with any missing values? This is generally not required due to the nature of computing factors scores, however for the "EAPsum" method this may be necessary to ensure that the sum-scores correspond to the same composite score
- plausible.draws
number of plausible values to draw for future researchers to perform secondary analyses of the latent trait scores. Typically used in conjunction with latent regression predictors (see
mirt
for details), but can also be generated when no predictor variables were modelled. Ifplausible.draws
is greater than 0 a list of plausible values will be returned- plausible.type
type of plausible values to obtain. Can be either
'normal'
(default) to use a normal approximation based on the ACOV matrix, or'MH'
to obtain Metropolis-Hastings samples from the posterior (silently passes object tomirt
, therefore arguments liketechnical
can be supplied to increase the number of burn-in draws and discarded samples)- quadpts
number of quadrature to use per dimension. If not specified, a suitable one will be created which decreases as the number of dimensions increases (and therefore for estimates such as EAP, will be less accurate). This is determined from the switch statement
quadpts <- switch(as.character(nfact), '1'=121, '2'=61, '3'=31, '4'=19, '5'=11, '6'=7, 5)
- item_weights
a user-defined weight vector used in the likelihood expressions to add more/less weight for a given observed response. Default is a vector of 1's, indicating that all the items receive the same weight
- returnER
logical; return empirical reliability (also known as marginal reliability) estimates as a numeric values?
- T_as_X
logical; should the observed variance be equal to
var(X) = var(T) + E(E^2)
orvar(X) = var(T)
when computing empirical reliability estimates? Default (FALSE
) uses the former- EAPsum.scores
logical; include the model-implied expected values and variance for the item and total scores when using
method = 'EAPsum'
withfull.scores=FALSE
? This information is included in the hidden'fit'
attribute which can be extracted viaattr(., 'fit')
for later use- return.acov
logical; return a list containing covariance matrices instead of factors scores?
impute = TRUE
not supported with this option- mean
a vector for custom latent variable means. If NULL, the default for 'group' values from the computed mirt object will be used
- cov
a custom matrix of the latent variable covariance matrix. If NULL, the default for 'group' values from the computed mirt object will be used
- covdata
when latent regression model has been fitted, and the
response.pattern
input is used to score individuals, then this argument is used to include the latent regression covariate terms for each row vector supplied toresponse.pattern
- verbose
logical; print verbose output messages?
- full.scores.SE
logical; when
full.scores == TRUE
, also return the standard errors associated with each respondent? Default isFALSE
- theta_lim
lower and upper range to evaluate latent trait integral for each dimension. If omitted, a range will be generated automatically based on the number of dimensions
- MI
a number indicating how many multiple imputation draws to perform. Default is 0, indicating that no MI draws will be performed
- use_dentype_estimate
logical; if the density of the latent trait was estimated in the model (e.g., via Davidian curves or empirical histograms), should this information be used to compute the latent trait estimates? Only applicable for EAP-based estimates (EAP, EAPsum, and plausible)
- QMC
logical; use quasi-Monte Carlo integration? If
quadpts
is omitted the default number of nodes is 5000- custom_den
a function used to define the integration density (if required). The NULL default assumes that the multivariate normal distribution with the 'GroupPars' hyper-parameters are used. At the minimum must be of the form:
function(Theta, ...)
where Theta is a matrix of latent trait values (will be a grid of values if
method == 'EAPsum'
ormethod == 'EAP'
, otherwise Theta will have only 1 row). Additional arguments may included and are caught through thefscores(...)
input. The function must return a numeric vector of density weights (one for each row in Theta)- custom_theta
a matrix of custom integration nodes to use instead of the default, where each column corresponds to the respective dimension in the model
- min_expected
when computing goodness of fit tests when
method = 'EAPsum'
, this value is used to collapse across the conditioned total scores until the expected values are greater than this value. Note that this only affect the goodness of fit tests and not the returned EAP for sum scores table- max_theta
the maximum/minimum value any given factor score estimate will achieve using any modal estimator method (e.g., MAP, WLE, ML)
- start
a matrix of starting values to use for iterative estimation methods. Default will start at a vector of 0's for each response pattern, or will start at the EAP estimates (unidimensional models only). Must be in the form that matches
full.scores = FALSE
(mostly used in themirtCAT
package)- ...
additional arguments to be passed to
nlm
Details
The function will return either a table with the computed scores and standard errors,
the original data matrix with scores appended to the rightmost column, or the scores only. By
default the latent means and covariances are determined from the estimated object,
though these can be overwritten. Iterative estimation methods can be estimated
in parallel to decrease estimation times if a mirtCluster
object is available.
If the input object is a discrete latent class object estimated from mdirt
then the returned results will be with respect to the posterior classification for each
individual. The method inputs for 'DiscreteClass'
objects may only be 'EAP'
,
for posterior classification of each response pattern, or 'EAPsum'
for posterior
classification based on the raw sum-score. For more information on these algorithms refer to
the mirtCAT
package and the associated JSS paper (Chalmers, 2016).
References
Chalmers, R., P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1-29. doi:10.18637/jss.v048.i06
Chalmers, R. P. (2016). Generating Adaptive and Non-Adaptive Test Interfaces for Multidimensional Item Response Theory Applications. Journal of Statistical Software, 71(5), 1-39. doi:10.18637/jss.v071.i05
Embretson, S. E. & Reise, S. P. (2000). Item Response Theory for Psychologists. Erlbaum.
Thissen, D., Pommerich, M., Billeaud, K., & Williams, V. S. L. (1995). Item Response Theory for Scores on Tests Including Polytomous Items with Ordered Responses. Applied Psychological Measurement, 19, 39-49.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54, 427-450.
Author
Phil Chalmers rphilip.chalmers@gmail.com
Examples
mod <- mirt(Science)
#>
Iteration: 1, Log-Lik: -1629.361, Max-Change: 0.50660
Iteration: 2, Log-Lik: -1617.374, Max-Change: 0.25442
Iteration: 3, Log-Lik: -1612.894, Max-Change: 0.16991
Iteration: 4, Log-Lik: -1610.306, Max-Change: 0.10461
Iteration: 5, Log-Lik: -1609.814, Max-Change: 0.09162
Iteration: 6, Log-Lik: -1609.534, Max-Change: 0.07363
Iteration: 7, Log-Lik: -1609.030, Max-Change: 0.03677
Iteration: 8, Log-Lik: -1608.988, Max-Change: 0.03200
Iteration: 9, Log-Lik: -1608.958, Max-Change: 0.02754
Iteration: 10, Log-Lik: -1608.878, Max-Change: 0.01443
Iteration: 11, Log-Lik: -1608.875, Max-Change: 0.00847
Iteration: 12, Log-Lik: -1608.873, Max-Change: 0.00515
Iteration: 13, Log-Lik: -1608.872, Max-Change: 0.00550
Iteration: 14, Log-Lik: -1608.872, Max-Change: 0.00318
Iteration: 15, Log-Lik: -1608.871, Max-Change: 0.00462
Iteration: 16, Log-Lik: -1608.871, Max-Change: 0.00277
Iteration: 17, Log-Lik: -1608.870, Max-Change: 0.00145
Iteration: 18, Log-Lik: -1608.870, Max-Change: 0.00175
Iteration: 19, Log-Lik: -1608.870, Max-Change: 0.00126
Iteration: 20, Log-Lik: -1608.870, Max-Change: 0.00025
Iteration: 21, Log-Lik: -1608.870, Max-Change: 0.00285
Iteration: 22, Log-Lik: -1608.870, Max-Change: 0.00108
Iteration: 23, Log-Lik: -1608.870, Max-Change: 0.00022
Iteration: 24, Log-Lik: -1608.870, Max-Change: 0.00059
Iteration: 25, Log-Lik: -1608.870, Max-Change: 0.00014
Iteration: 26, Log-Lik: -1608.870, Max-Change: 0.00068
Iteration: 27, Log-Lik: -1608.870, Max-Change: 0.00065
Iteration: 28, Log-Lik: -1608.870, Max-Change: 0.00019
Iteration: 29, Log-Lik: -1608.870, Max-Change: 0.00061
Iteration: 30, Log-Lik: -1608.870, Max-Change: 0.00012
Iteration: 31, Log-Lik: -1608.870, Max-Change: 0.00012
Iteration: 32, Log-Lik: -1608.870, Max-Change: 0.00058
Iteration: 33, Log-Lik: -1608.870, Max-Change: 0.00055
Iteration: 34, Log-Lik: -1608.870, Max-Change: 0.00015
Iteration: 35, Log-Lik: -1608.870, Max-Change: 0.00052
Iteration: 36, Log-Lik: -1608.870, Max-Change: 0.00010
tabscores <- fscores(mod, full.scores = FALSE)
#>
#> Method: EAP
#>
#> Empirical Reliability:
#>
#> F1
#> 0.6666
head(tabscores)
#> Comfort Work Future Benefit F1 SE_F1
#> [1,] 1 1 1 1 -2.7492669 0.6293525
#> [2,] 1 3 2 1 -1.4198318 0.5772364
#> [3,] 1 4 2 3 -0.7141976 0.6200139
#> [4,] 1 4 3 1 -0.4469265 0.6509531
#> [5,] 2 1 1 1 -2.5437807 0.5909114
#> [6,] 2 1 2 4 -1.2478570 0.5840105
# convert scores into expected total score information with 95% CIs
E.total <- expected.test(mod, Theta=tabscores[,'F1'])
E.total_2.5 <- expected.test(mod, Theta=tabscores[,'F1'] +
tabscores[,'SE_F1'] * qnorm(.05/2))
E.total_97.5 <- expected.test(mod, Theta=tabscores[,'F1'] +
tabscores[,'SE_F1'] * qnorm(1-.05/2))
data.frame(Total_score=rowSums(tabscores[,1:4]),
E.total, E.total_2.5, E.total_97.5) |> head()
#> Total_score E.total E.total_2.5 E.total_97.5
#> 1 4 6.791606 5.321810 9.084082
#> 2 7 9.266018 7.128071 11.296189
#> 3 10 10.584682 8.296461 12.504975
#> 4 9 11.041648 8.691107 13.034195
#> 5 5 7.141179 5.576233 9.330947
#> 6 9 9.592533 7.415339 11.582060
if (FALSE) { # \dontrun{
fullscores <- fscores(mod)
fullscores_with_SE <- fscores(mod, full.scores.SE=TRUE)
head(fullscores)
head(fullscores_with_SE)
# convert scores into expected total score information with 95% CIs
E.total <- expected.test(mod, Theta=fullscores[,'F1'])
E.total_2.5 <- expected.test(mod, Theta=fullscores_with_SE[,'F1'] +
fullscores_with_SE[,'SE_F1'] * qnorm(.05/2))
E.total_97.5 <- expected.test(mod, Theta=fullscores_with_SE[,'F1'] +
fullscores_with_SE[,'SE_F1'] * qnorm(1-.05/2))
data.frame(Total_score=rowSums(Science),
E.total, E.total_2.5, E.total_97.5) |> head()
# change method argument to use MAP estimates
fullscores <- fscores(mod, method='MAP')
head(fullscores)
# calculate MAP for a given response vector
fscores(mod, method='MAP', response.pattern = c(1,2,3,4))
# or matrix
fscores(mod, method='MAP', response.pattern = rbind(c(1,2,3,4), c(2,2,1,3)))
# return only the scores and their SEs
fscores(mod, method='MAP', response.pattern = c(1,2,3,4))
# use custom latent variable properties (diffuse prior for MAP is very close to ML)
fscores(mod, method='MAP', cov = matrix(1000), full.scores = FALSE)
fscores(mod, method='ML', full.scores = FALSE)
# EAPsum table of values based on total scores
(fs <- fscores(mod, method = 'EAPsum', full.scores = FALSE))
# convert expected counts back into marginal probability distribution
within(fs,
`P(y)` <- expected / sum(observed))
# list of error VCOV matrices for EAPsum (works for other estimators as well)
acovs <- fscores(mod, method = 'EAPsum', full.scores = FALSE, return.acov = TRUE)
acovs
# WLE estimation, run in parallel using available cores
if(interactive()) mirtCluster()
head(fscores(mod, method='WLE', full.scores = FALSE))
# multiple imputation using 30 draws for EAP scores. Requires information matrix
mod <- mirt(Science, 1, SE=TRUE)
fs <- fscores(mod, MI = 30)
head(fs)
# plausible values for future work
pv <- fscores(mod, plausible.draws = 5)
lapply(pv, function(x) c(mean=mean(x), var=var(x), min=min(x), max=max(x)))
## define a custom_den function (*must* return a numeric vector).
# EAP with a uniform prior between -3 and 3
fun <- function(Theta, ...) as.numeric(dunif(Theta, min = -3, max = 3))
head(fscores(mod, custom_den = fun))
# compare EAP estimators with same modified prior
fun <- function(Theta, ...) as.numeric(dnorm(Theta, mean=.5))
head(fscores(mod, custom_den = fun))
head(fscores(mod, method = 'EAP', mean=.5))
# custom MAP prior: standard truncated normal between 5 and -2
library(msm)
# need the :: scope for parallel to see the function (not require if no mirtCluster() defined)
fun <- function(Theta, ...) msm::dtnorm(Theta, mean = 0, sd = 1, lower = -2, upper = 5)
head(fscores(mod, custom_den = fun, method = 'MAP', full.scores = FALSE))
####################
# scoring via response.pattern input (with latent regression structure)
# simulate data
set.seed(1234)
N <- 1000
# covariates
X1 <- rnorm(N); X2 <- rnorm(N)
covdata <- data.frame(X1, X2)
Theta <- matrix(0.5 * X1 + -1 * X2 + rnorm(N, sd = 0.5))
# items and response data
a <- matrix(1, 20); d <- matrix(rnorm(20))
dat <- simdata(a, d, 1000, itemtype = '2PL', Theta=Theta)
# conditional model using X1 and X2 as predictors of Theta
mod <- mirt(dat, 1, 'Rasch', covdata=covdata, formula = ~ X1 + X2)
coef(mod, simplify=TRUE)
# all EAP estimates that include latent regression information
fs <- fscores(mod, full.scores.SE=TRUE)
head(fs)
# score only two response patterns
rp <- dat[1:2, ]
cd <- covdata[1:2, ]
fscores(mod, response.pattern=rp, covdata=cd)
fscores(mod, response.pattern=rp[2,], covdata=cd[2,]) # just one pattern
} # }