Chapter 9: Assessing Multiple Imputation Quality

Learning Objectives

By the end of this chapter, you will be able to explain why multiple-imputation diagnostics matter, inspect the main convergence and plausibility checks produced by mice, evaluate whether pooled estimates are stable across different values of m, and report imputation diagnostics in a way that makes downstream analyses defensible.

Why Imputation Diagnostics Matter

Multiple imputation (MI) requires deliberate specification at each stage. The quality of the imputed values depends on whether the imputation models are specified sensibly, whether the chained equations have converged, whether the imputed values remain plausible relative to the observed data, and whether the pooled estimates are stable across different choices of m. If those checks are ignored, the result can be biased parameter estimates, incorrect standard errors, implausible imputations, and misplaced confidence in apparently polished output.

Diagnostic 1: Convergence Checks

The mice algorithm uses iterative chained equations: it cycles through variables, updating imputations based on the current values of other variables. Convergence occurs when these iterations stabilise (no systematic trends).

Trace Plots

Trace plots show the mean and SD of imputed values across iterations for each variable. Good convergence looks like a fuzzy caterpillar: trace lines fluctuate randomly around a stable mean, show no systematic upward or downward trend across iterations, and chains from different imputations intermingle rather than remaining separated. Figure 9.1 shows the full set of trace plots for the three simulated variables.

Figure 9.1: Trace plots for age, satisfaction, and income across imputations.

Interpretation: Figure 9.1 should resemble a fuzzy caterpillar rather than a drifting set of lines. If the traces still drift after roughly 20 iterations, increase maxit. If chains remain separated, inspect the imputation model specification rather than assuming convergence has occurred.

Checking Specific Variables

If you have many variables, focus on those with the most missingness. Figure 9.2 shows how that targeted check looks when attention is restricted to age.

Figure 9.2: Trace plot for age only.

If you still see clear trends after the first 10 to 20 iterations, increase maxit to 50 or even 100. In many routine MCAR or MAR settings, maxit = 20–50 is usually enough, but the diagnostics should drive that decision rather than a hard default.

Diagnostic 2: Imputed vs. Observed Distributions

Imputed values should resemble the observed data distribution (but not be identical). Large discrepancies suggest model misspecification. With n < 30, imputed distributions may appear jagged or narrower because predictive mean matching has few donor values. Focus on whether imputed values fall within a plausible observed range rather than demanding smooth density curves. Figure 9.3 shows the density comparison for the simulated example.

Density Plots

Figure 9.3: Observed and imputed density plots for the simulated example.

Interpretation: Figure 9.3 should show substantial overlap between observed and imputed distributions without making them identical. In very small examples the red curve can look more jagged or narrower because it is based on few imputed values. The red flags are collapse toward a single implausible value or a clear shift outside the observed range.

Strip Plots (Univariate)

Strip plots show individual imputed values (red) alongside observed values (blue). Figure 9.4 uses age as the example variable.

Figure 9.4: Strip plot for age across imputations.

Interpretation: Figure 9.4 should show the imputed values filling gaps in the observed data without behaving like a separate distribution. A useful question here is whether any imputed points fall implausibly far outside the observed range, because that usually signals a poor imputation model rather than legitimate uncertainty.

Diagnostic 3: Sensitivity to m (Number of Imputations)

The number of imputations (m) affects the precision of pooled estimates. With more imputations, pooled estimates become more stable and standard errors more accurate. Table 9.1 compares the coefficient and standard-error estimates across three values of m.

Rule of Thumb for m

The number of imputations should increase as the fraction of missing information (FMI) increases. When FMI is below about 10%, m = 5 to 10 is often adequate. When FMI is around 10% to 30%, m = 20 to 50 is more defensible. When FMI exceeds 30%, values such as m = 50 to 100 may be needed. A useful heuristic from White, Royston, and Wood (2011) (White, Royston, and Wood 2011) is \(m \geq 100 \times \text{FMI}\), where FMI is averaged across the parameters that matter for the analysis. For example, average FMI = 0.15 suggests at least 15 imputations, and average FMI = 0.30 suggests at least 30. Round up to convenient reporting values such as m = 20 or m = 50.

Testing Sensitivity

Table 9.1

Sensitivity of the age coefficient to the number of imputations

m Age coefficient Age SE
5 -0.022 0.013
20 -0.021 0.015
50 -0.019 0.014

Note. Only modest changes across m values are expected once Monte Carlo error is under control.

Interpretation: Table 9.1 should show broadly similar coefficient estimates across different values of m, with only modest differences due to Monte Carlo error, and the standard errors should stabilise as m increases. If the coefficients move substantially, for example by more than about 10%, that is a sign to increase m rather than treating the smaller-imputation result as settled.

When to Use Larger m

Larger values of m are especially sensible when missingness is high, when the sample itself is small and Monte Carlo error is therefore more noticeable, or when the analysis is sensitive enough that conservative inference matters. In practice, that means using at least m = 20 once missingness moves beyond about 20%, and considering m = 50 or more for particularly consequential analyses.

Diagnostic 4: Checking Imputation Model Assumptions

Inspect Imputation Methods

Table 9.2 records which imputation method is being used for each variable.

Table 9.2

Imputation methods used for each variable

Variable Method
age pmm
satisfaction pmm
income pmm

Common methods: pmm uses predictive mean matching and is usually the safest default for continuous variables because it preserves plausible observed values. norm uses Bayesian linear regression and therefore leans harder on normality assumptions. logreg is intended for binary variables, while polyreg is used for unordered categorical variables. In most small-sample settings, pmm is the best starting point for continuous variables unless you have a strong reason to prefer a parametric normal model.

Check Predictor Matrix

Table 9.3 shows the predictor matrix that tells mice which variables are used to impute each target variable.

Table 9.3

Predictor matrix for the illustrative imputation model

Imputed variable age satisfaction income
age 0 1 1
satisfaction 1 0 1
income 1 1 0

Note. A value of 1 means the column variable is used to predict the row variable.

Interpretation: In the predictor matrix, rows identify the variables being imputed and columns identify the variables used to predict them. A 1 means the column variable is included as a predictor for the row variable, while a 0 means it is excluded.

Modify if needed:

# Example: Exclude a variable from predicting another
pred <- imp$predictorMatrix
pred["age", "satisfaction"] <- 0  # Don't use satisfaction to predict age

# Re-run imputation with modified predictor matrix
imp_modified <- mice(mi_data, m = 5, predictorMatrix = pred, print = FALSE)

Diagnostic 5: Fraction of Missing Information (FMI)

The FMI quantifies how much uncertainty is introduced by imputation. It is automatically reported by pool(), and Table 9.4 shows the relevant columns for the pooled regression model.

Table 9.4

Pooled regression estimates with lambda and FMI

Term Estimate Std. error Lambda FMI
(Intercept) 0.101 0.643 0.296 0.342
age -0.021 0.015 0.214 0.258
income 0.075 0.015 0.232 0.276

Columns to examine: The fmi column reports the fraction of missing information for each coefficient, while lambda shows the proportion of total variance attributable to missingness.

Interpretation: Values of fmi below about 0.10 indicate low missing-information burden and are often compatible with m = 5 to 10. Values around 0.10 to 0.30 suggest a moderate burden and support using m = 20 to 50. Once fmi exceeds about 0.30, missingness is making a large contribution to uncertainty. Consider larger m, add defensible auxiliary variables, or state that MI may not fully recover information in a very small sample.

Example: Full Diagnostic Workflow

The earlier sections show the individual diagnostics. This closing example condenses them into a reporting workflow so the end result is not a second copy of the same plots, but a compact summary of what should be written up after those checks have been completed.

Table 9.5

Workflow summary for assessing multiple-imputation quality

Step Action
Describe missingness satisfaction 20.0 % missing; age 10.0 % missing; income 6.0 % missing
Choose imputation settings Use predictive mean matching with m = 20, maxit = 30, and random seed = 2025.
Check convergence Inspect trace plots for drift or separated chains.
Check plausibility Compare observed and imputed distributions with density and strip plots.
Pool model estimates Pool the regression of satisfaction on age and income.
Summarise FMI FMI ranges from 0.208 to 0.302.
Write the report Example write-up: We used predictive mean matching with m = 20 imputations, maxit = 30, and random seed = 2025. Diagnostic plots indicated adequate convergence and plausible imputations, and FMI values ranged from 0.208 to 0.302.

Red Flags and Troubleshooting

Problem Symptom Solution
Non-convergence Trace plots show trends Increase maxit (try 50–100)
Imputed values at one value Density plot shows spike Use method = "pmm" instead of norm
Imputed values out of range Strip plot shows outliers Check variable type (e.g., use logreg for binary)
Unstable estimates across m Coefficients vary > 10% Increase m (try 50–100)
High FMI (> 0.50) Large uncertainty Consider whether MI is appropriate; may need auxiliary variables or accept wider CIs
Separation warnings (logistic regression) Model fails to converge Use penalized imputation methods or increase sample size

Reporting MI Diagnostics

When reporting MI results, include:

  1. Missingness pattern: “Three variables had missing data (age: 20%, income: 18%, satisfaction: 10%)”
  2. Imputation model: “We used predictive mean matching with m = 20 imputations, maxit = 30, and random seed = 2025”
  3. Convergence: “Trace plots showed convergence after 20 iterations (see Supplementary Figure S1)”
  4. Plausibility: “Imputed values were visually consistent with observed distributions (density plots in Supplementary Figure S2)”
  5. Sensitivity: “Results were stable across m = 5, 20, and 50 imputations (coefficient differences < 5%)”
  6. FMI: “Fraction of missing information ranged from 0.12 to 0.25, indicating moderate impact of missingness”

Key Takeaways

Multiple-imputation results are only as defensible as the diagnostics behind them. Convergence checks, distributional comparisons, stability across different values of m, and inspection of FMI all help determine whether the imputation model is behaving plausibly rather than merely producing polished output. In small samples especially, report the seed, m, maxit, method, package versions, and remaining uncertainty clearly.


Self-Assessment Quiz

Question 1. What is the main purpose of a trace plot in multiple-imputation diagnostics?

Explanation.

Trace plots are a convergence diagnostic. They show whether the imputed values are fluctuating around a stable level or still drifting across iterations.

Question 2. What pattern in a trace plot would most strongly suggest that maxit should be increased?

Explanation.

If the trace continues to trend upward or downward, the chained-equations algorithm has not yet stabilised. That is the clearest signal to increase maxit.

Question 3. What is the main diagnostic question answered by a density plot of observed and imputed values?

Explanation.

Density plots compare the observed and imputed distributions. The goal is to check plausibility, not to force the two distributions to be identical.

Question 4. In a strip plot, what would count as a red flag?

Explanation.

Strip plots should show imputed values filling plausible gaps in the observed data. Values far outside the observed range usually indicate model misspecification.

Question 5. Why does the chapter compare pooled results across m = 5, 20, and 50?

Explanation.

Changing m is a sensitivity check for Monte Carlo error. If the pooled estimates move substantially, then the smaller value of m was not yet stable enough.

Question 6. What is the rule of thumb from White, Royston, and Wood (2011) for choosing m?

Explanation.

The chapter recommends the shorthand m >= 100 x FMI, which scales the number of imputations to the amount of missing information in the model. For example, an average FMI of 0.30 suggests at least 30 imputations.

Question 7. What does the predictor matrix tell you in a mice analysis?

Explanation.

Rows identify the variable being imputed and columns identify the available predictors. A 1 means the predictor is used for that row variable, and a 0 means it is excluded.

Question 8. Why is predictive mean matching (pmm) often preferred to norm for continuous variables in small samples?

Explanation.

The chapter treats pmm as the safer default because it draws on observed donor values. That makes it more robust when normal-theory assumptions are not especially convincing.

Question 9. What does a high FMI value mean for a pooled coefficient?

Explanation.

FMI quantifies how much uncertainty is coming from the missing data rather than only from observed-data sampling variation. High FMI values justify larger m and more cautious interpretation.

Question 10. What should you do if the MI results differ materially from the complete-case results?

Explanation.

Material disagreement between MI and complete-case results is itself important information. The chapter recommends reporting that sensitivity rather than hiding it.

Question 11. What information should appear in a transparent write-up of an MI analysis?

Explanation.

The final reporting section stresses that readers need to know the method, m, the main diagnostics, and the FMI burden. Reporting only the pooled coefficients would hide the quality checks that make the analysis credible.