Chapter 8: Handling Missing Data in Small Samples

Learning Objectives

By the end of this chapter, you will be able to distinguish MCAR, MAR, and MNAR mechanisms, describe missingness patterns clearly, judge when complete-case analysis or multiple imputation is defensible in a small sample, and report missing-data decisions with the transparency needed for readers to assess their consequences.

The Challenge of Missing Data in Small Samples

Missing data are common in applied research. Participants skip survey questions, drop out of longitudinal studies, or provide incomplete records. With large samples, modern methods (multiple imputation, full information maximum likelihood) can handle substantial missingness without excessive bias. With small samples, however, missing data pose severe problems. Even a few missing observations can substantially reduce effective sample size and statistical power.

Missing data methods rely on large-sample asymptotics and may be unstable or inappropriate when samples are very small (n < 30) or missingness is extensive (> 20%). In such cases, prevention (minimise missingness through careful design) and transparency (report missingness patterns and sensitivity analyses) are more important than sophisticated imputation.

Types of Missingness

The standard missingness framework distinguishes between three mechanisms (Rubin 1987; Buuren 2018). MCAR (Missing Completely At Random) means missingness is unrelated to any observed or unobserved variable, as when a survey page disappears because of a random software glitch. This condition is conceptually simple but rare in practice. MAR (Missing At Random) means missingness depends on observed variables but not on the missing values themselves once those observed variables are taken into account. For example, older participants may be more likely to skip a technology question, but conditional on age the missingness is otherwise random. MNAR (Missing Not At Random) is the hardest case because missingness depends on the unobserved values themselves, as when participants with severe depression are especially likely to drop out. That possibility cannot usually be resolved from the observed data alone and therefore requires sensitivity analysis or explicit modelling assumptions.

Describing Missingness Patterns

Before choosing a handling strategy, describe the pattern of missingness in plain terms. Report how many observations are missing on each variable, whether the missing values are concentrated in particular participants or particular variables, and whether incomplete cases differ from complete cases on observed characteristics that might make MAR more plausible than MCAR.

Example Dataset for Diagnostics

To demonstrate the handling strategies in this chapter, we simulate a small dataset with missing values on satisfaction and performance. Table 8.1 shows the first ten rows so the missing-data pattern is visible before the formal diagnostics begin.

Table 8.1

Simulated study dataset with missing values (rows 1 to 10)

participant age satisfaction performance
1 45 3 54
2 38 7 54
3 47 7 NA
4 53 5 68
5 42 3 72
6 36 5 51
7 43 3 84
8 37 5 49
9 34 4 66
10 46 4 61

Note. The full simulated dataset contains 25 participants; the excerpt is shown to illustrate the missing values before diagnosis.

Testing the MCAR Assumption

Little’s MCAR test evaluates whether missingness is consistent with the MCAR mechanism. The test compares observed means across missing-data patterns. A large p-value suggests MCAR is plausible, whereas a small p-value indicates that missingness likely depends on observed data (i.e., not MCAR). With small samples (n < 50), Little’s MCAR test has low power to detect meaningful departures from MCAR. A non-significant result does not confirm MCAR. Supplement it with visual inspection of missingness patterns, complete versus incomplete case comparisons, and substantive reasoning about why data might be missing (Graham 2009).

Table 8.2

Little's MCAR test for the simulated study dataset

Statistic df p value Missing patterns
13.309 6 0.038 3

Note. A larger p-value is more consistent with MCAR, but with small samples this test is only one piece of evidence and does not confirm MCAR. If the naniar package is unavailable, the table records the number of observed missingness patterns and the formal test should be run before final reporting.

Figure 8.1: Percentage of missing values per variable.

Little’s MCAR test assesses whether missing data patterns are completely random. Table 8.2 reports the test statistics, while Figure 8.1 shows the percentage of missing values for each variable. However, with small samples (n < 50), the test has low power and should be treated as one clue rather than a verdict. In practice, it should be supplemented with visual inspection of missingness patterns, comparisons between complete and incomplete cases, and domain knowledge about why participants might have skipped particular items or visits.

Observation-Level Missingness Pattern

Figure 8.2 complements Figure 8.1 by showing whether missing values cluster within particular cases rather than only within particular variables.

Figure 8.2: Observation-level missingness pattern.

Example: Summarising Missing Data

We continue working with the simulated dataset (study_data) created above. Tables 8.3 to 8.5 summarise how much data are missing and whether the incomplete cases differ from the complete cases on age.

Table 8.3

Number of missing values by variable

Variable Missing
participant 0
age 0
satisfaction 2
performance 4

Table 8.4

Proportion of missing values by variable

Variable Proportion missing
participant 0.00
age 0.00
satisfaction 0.08
performance 0.16

Table 8.5

Age comparison for complete and incomplete cases

Group Mean age n
Incomplete 39.8 6
Complete 41.9 19

Interpretation: Tables 8.3 and 8.4 show that performance has the heaviest missingness burden, at about 20% of the sample. Table 8.5 then shows that the incomplete cases are younger on average than the complete cases, which makes MCAR less automatic and suggests that MAR or MNAR should remain plausible possibilities. With 20% missingness and n = 25, only 20 cases remain in a complete-case analysis, so power is reduced sharply.

Complete-Case (Listwise Deletion) Analysis

The simplest approach is to analyse only cases with complete data on all variables of interest. This is valid if missingness is MCAR and the reduction in sample size is tolerable. However, it can introduce bias if missingness is MAR or MNAR, and it wastes information. Complete-case analysis is most defensible when missingness is minimal, MCAR is at least plausible, or the sample is so small that a more elaborate imputation model would be less credible than a transparent analysis of the observed cases.

WarningCommon Misconception: β€œListwise Deletion Is Always Safe if Missingness Is Random”

Myth: β€œIf I check for MCAR and the test is non-significant, listwise deletion is unbiased.”

Reality: Even when missingness is truly MCAR, listwise deletion loses power and can introduce bias if you have multiple variables with independent missing patterns.

Demonstration:

set.seed(2025)

# Generate complete data: n=50, correlation between x and y = 0.6
n <- 50
x <- rnorm(n, 50, 10)
y <- 0.6 * x + rnorm(n, 0, 8)

# True correlation (no missing data)
true_cor <- cor(x, y)

# Introduce MCAR missingness (20% on x, 20% on y, independently)
x_missing <- x
y_missing <- y
x_missing[sample(1:n, 10)] <- NA  # 20% missing
y_missing[sample(1:n, 10)] <- NA  # 20% missing

# Listwise deletion: only cases with both x and y
complete_cases <- complete.cases(x_missing, y_missing)

# Correlation with listwise deletion
listwise_cor <- cor(x_missing[complete_cases], y_missing[complete_cases])

listwise_demo <- tibble(
  Metric = c(
    "True correlation (complete data)",
    "Complete cases retained",
    "Correlation after listwise deletion",
    "Cases lost",
    "Standard error inflation"
  ),
  Value = c(
    formatC(true_cor, format = "f", digits = 3),
    sprintf("%d / %d (%.1f%%)", sum(complete_cases), n, 100 * mean(complete_cases)),
    formatC(listwise_cor, format = "f", digits = 3),
    sprintf("%d (%.1f%%)", n - sum(complete_cases), 100 * (1 - mean(complete_cases))),
    sprintf("%.2fΓ—", sqrt(1 / sum(complete_cases)) / sqrt(1 / n))
  )
)

smallsamplelab_apa_table(
  "8.6",
  "Consequences of listwise deletion under independent MCAR missingness",
  listwise_demo,
  note = "With 20% missing on x and 20% missing on y, independent MCAR retention is (1 - p1) x (1 - p2) = 0.8 x 0.8 = 0.64, so only about 64% of cases remain once both variables are required.",
  align = c("l", "l")
)

Table 8.6

Consequences of listwise deletion under independent MCAR missingness

Metric Value
True correlation (complete data) 0.549
Complete cases retained 33 / 50 (66.0%)
Correlation after listwise deletion 0.548
Cases lost 17 (34.0%)
Standard error inflation 1.23Γ—

Note. With 20% missing on x and 20% missing on y, independent MCAR retention is (1 - p1) x (1 - p2) = 0.8 x 0.8 = 0.64, so only about 64% of cases remain once both variables are required.

Why this matters:

  1. Power loss: With 20% missing on x and 20% on y independently, the retention rate is (1 - p_x) x (1 - p_y) = 0.8 x 0.8 = 0.64, so about 36% of cases are lost.
  2. Multiple variables compound: With five variables each 15% missing, the retention rate is 0.85^5 = 0.44, so less than half the sample remains.
  3. Bias can still occur: If missingness is MAR rather than MCAR, listwise deletion can bias estimates as well as reduce precision.

Lesson:

  • MCAR does NOT mean listwise deletion is optimalβ€”it can still waste information.
  • Consider multiple imputation when missingness exceeds about 10%, the sample is large enough for the imputation model, and MAR is plausible.
  • With small samples (n < 50), losing even 20% of cases can sharply reduce power.

When listwise deletion is actually safe:

  • Missingness < 5% on any variable
  • n is large enough that losing cases doesn’t hurt power
  • You’ve verified MCAR (not just MAR) AND documented the power loss

Last Observation Carried Forward (LOCF)

In longitudinal studies, LOCF replaces missing follow-up values with the last observed value for that individual. This assumes no change after the last observation, which is often unrealistic. LOCF can therefore bias estimates and is not generally recommended. It is only marginally defensible when the assumption of no meaningful change is substantively plausible and no better alternative is available.

Multiple Imputation (Caution with Small Samples)

Multiple imputation (MI) generates several plausible imputed datasets, analyses each separately, and pools results to account for imputation uncertainty. MI is a strong default for handling missing data in adequately sized samples when MAR is plausible (Rubin 1987; White, Royston, and Wood 2011; Buuren 2018). If MNAR is suspected, MI alone does not solve the problem. The practical response is sensitivity analysis that varies the assumed departure from MAR (Sterne et al. 2009). MI requires sufficient data to estimate imputation models reliably. With very small samples (n < 30) or extensive missingness (> 20%), imputation models may be under-identified and can yield unstable or implausible imputations. If MI is attempted in that setting, use predictive mean matching (method = "pmm"), limit the number of predictors in the imputation model, check convergence diagnostics carefully, and report complete-case results as a sensitivity check.

Example: Multiple Imputation with mice (Caution)

We apply MI to the dataset with missing satisfaction and performance values. Given the small sample (n = 25) and 20% missingness, interpret results cautiously. Table 8.7 reports the pooled regression results rather than printing the raw imputation object.

Table 8.7

Pooled regression results from the illustrative multiple-imputation analysis

Term Estimate Std. error Statistic p value
(Intercept) 55.443 11.716 4.73 0.000
age 0.566 0.186 3.05 0.007
satisfaction -3.379 1.608 -2.10 0.054

Note. Five predictive-mean-matching imputations were pooled using Rubin's rules. With only 25 cases and 20% missingness, these estimates should be treated as provisional.

Interpretation: MI generates plausible values for missing data based on observed relationships. The pooled results combine estimates across imputations, with standard errors adjusted for imputation uncertainty. However, with n = 25 and 20% missingness, the imputation model is estimated from limited data, and results may be unstable. Before reporting the pooled estimates, inspect the standard mice trace plots with plot(imp) and increase maxit or simplify the imputation model if the chains drift rather than forming a stable fuzzy pattern. Compare MI results to complete-case analysis. If they differ substantially, report both and acknowledge uncertainty. Record the random seed, imputation method, number of imputations, and package versions whenever stochastic imputation code is adapted.

Checking Convergence of Multiple Imputation

When using mice, always check whether the imputation algorithm has converged. Poor convergence means the imputed values may not be stable, especially with small samples or complex missing data patterns. This chapter keeps the emphasis on handling decisions rather than diagnostic graphics, so the detailed trace-plot and strip-plot workflow is taken up in Chapter 9: For the present chapter, the practical takeaway is that any MI analysis should be accompanied by those diagnostics before it is reported as credible.

Sensitivity Analyses

When missingness is substantial or MNAR is suspected, conduct sensitivity analyses rather than presenting a single imputed answer as definitive. Compare complete-case results with imputed results, vary the assumptions about the missing-data mechanism where possible, and report how much the substantive conclusions change across those scenarios.

Preventing Missing Data

The best approach to missing data is still prevention. Clear instruments, low respondent burden, follow-up for missed appointments or skipped questions, pilot testing of confusing procedures, and good rapport with participants all reduce the need for heroic statistical repair later.

Key Takeaways

Missing-data work in small samples begins with description, not imputation. Researchers need to know how much is missing, where it is missing, and which missingness mechanisms are plausible before choosing a handling strategy. Complete-case analysis can be acceptable in narrowly defined situations but often wastes too much information, while multiple imputation is only as credible as the sample size, missingness level, and modelling assumptions allow. That is why prevention, diagnostics, and sensitivity analysis matter as much as the final pooled estimate.

Self-Assessment Quiz

Test your understanding of missing-data decisions in Chapter 8.

Question 1. What is the key distinction between MCAR and MAR?

Explanation.

The chapter defines MCAR as missingness unrelated to any variables and MAR as missingness related to observed variables but not the missing values themselves. This distinction matters because most modern missing-data methods assume MAR, not MCAR.

Question 2. Why can listwise deletion still be a poor choice even when MCAR is plausible?

Explanation.

The chapter’s misconception box shows that independent missingness across variables compounds quickly. Even under MCAR, listwise deletion can waste a large fraction of a small dataset and make estimates much less precise.

Question 3. Why is mean imputation generally not recommended?

Explanation.

Mean imputation replaces uncertainty with a constant value. The chapter warns that this shrinks variability and biases correlations, which is especially damaging when each observation matters.

Question 4. When is multiple imputation most defensible in this chapter’s guidance?

Explanation.

The chapter describes multiple imputation as most appropriate when the sample is not extremely small, missingness is moderate, and the MAR assumption is plausible. It explicitly cautions that MI can be unstable with n < 30 or heavy missingness.

Question 5. Why is last observation carried forward (LOCF) usually a weak solution to missing follow-up data?

Explanation.

The chapter warns that LOCF assumes the participant would have stayed unchanged after the last observed value. That assumption is often unrealistic, so LOCF can bias treatment effects or longitudinal trends.

Question 6. What does Little’s MCAR test evaluate?

Explanation.

Little’s MCAR test compares observed means across missing-data patterns to assess whether the data are consistent with MCAR. The chapter also stresses that, with small samples, this test is only one clue rather than a final verdict.

Question 7. Why are sensitivity analyses important for missing-data work?

Explanation.

Because the missingness mechanism is often uncertain, the chapter recommends comparing results under different reasonable assumptions. If results change materially, that uncertainty should be reported rather than hidden.

Question 8. What is the best overall strategy for dealing with missing data in small-sample studies?

Explanation.

The chapter ends by stressing prevention: clear instruments, reduced burden, follow-up procedures, and pilot testing. No statistical fix can fully recover information that was never observed.