The utility of the MIMIC model and MCFA method when detecting DIF using Monte Carlo simulation
Although the MIMIC modeling approach and the MCFA method have been greatly used during the past decade, applied measurement invariance studies using these two methods are still sometimes poorly understood and incorrectly conducted. The objective of the current study was to fully understand how these two methods perform in applied testing situations, expand the knowledge of measurement invariance, and contribute to the quality of research when using these two methods under different scenarios. The main work of this study included comparing the MIMIC modeling approach and the intercept/threshold invariance tests in MCFA when detecting the uniform DIF in both the polytomous and dichotomous data. Importantly, for researchers and practitioners, the present study revealed several critical findings: a) Congruent with previous studies, the MIMIC modeling approach can do a good job detecting uniform DIF in both the polytomous and dichotomous data. However, practitioners should not expect to successfully detect nonuniform DIF using either the MIMIC model or the intercept invariance test, as they are not designed to do so. b) The MIMIC model performed better than the intercept invariance test in almost all the conditions studied, indicating it may be the first choice when dealing with uniform DIF, especially when the sample size is small or the reference group is significantly larger than the focal group. The MCFA intercept/threshold invariance test was also recommended, however, particularly to those familiar with using the MCFA technique. c) Poor item discrimination not only is known to lead to poor score reliability, but also seems to lead to lower uniform DIF detection power for both the MIMIC model and the intercept/threshold invariance test. d) The low average Type I error rate for both methods was particularly impressive, even when the total sample size was large. e) It is not surprising to see the sensitivity of power at larger sample sizes for both methods when detecting the uniform DIF. The unequal sample size ratio, however, did not appear to impact the power significantly. f) As recommended by previous research, scale-level methods such as MCFA are becoming popular in educational and psychological measurement and can do a decent job regarding DIF detection, at least in simple factor structure with one studied item used in this study.