```{r}
#| include: false
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(knitr))
suppressPackageStartupMessages(library(htmltools))
source("R/chapter_table_helpers.R")
```
# Chapter 17: Transparent Reporting of Methods and Limitations
### Learning Objectives
By the end of this chapter, you will be able to explain why transparent reporting is central to small-sample credibility, document analytic choices in reproducible scripts, distinguish planned from exploratory analyses, report samples, exclusions and missing data clearly, evaluate whether studies disclose enough information to support their claims, and use reporting guidelines such as CONSORT, STROBE and PRISMA as practical checklists rather than as afterthoughts.
### The Importance of Transparency
Transparent reporting allows readers to evaluate the quality of evidence, assess the risk of bias, and replicate or build upon findings. With small samples, transparency is particularly important because results are more sensitive to analytic choices, outliers, and missing data. Readers need full information to judge whether conclusions are warranted.
Transparent reporting includes a clear description of sampling and recruitment, a summary of participant characteristics, complete reporting of variables and measures, a record of data cleaning and exclusions, and a justified statement of the statistical methods used. It also requires reporting all planned analyses and relevant sensitivity checks rather than only statistically significant findings. Limitations and plausible alternative explanations should be stated directly so readers can judge how far the evidence supports the conclusion.
### Putting the Transparency Pieces Together
Transparency is a workflow, not a paragraph added at the end of a report. The same decisions should appear in four places: the preregistration or planning document, the analysis script, the results section and the limitations section. If those four records disagree, the report should explain why.
| Stage | What the reader should be able to verify |
|---|---|
| Planning | What was primary, what was exploratory, and what decision rules were set before analysis |
| Data preparation | How exclusions, missing data, recoding and outliers were handled |
| Analysis script | Which tests or models were run, with seeds, packages and sensitivity analyses visible |
| Results report | Estimates, intervals, p-values where relevant, adjusted p-values where needed and clear effect-size language |
| Limitations | How sample size, precision, design, assumptions and generalisability constrain the conclusion |
This structure is especially important when the analysis changes after inspection. A change can be defensible, but it must be visible. A small-sample report should never leave readers guessing whether a method was planned, chosen because assumptions failed, or selected because it produced the strongest result.
### Documenting Analytic Choices
Modern quantitative research involves many decisions: how to handle outliers, which variables to include, whether to transform variables, which test to use, how to handle missing data. These decisions, if made after seeing the data, can inflate Type I error and bias estimates (researcher degrees of freedom, p-hacking).
When possible, preregister hypotheses, methods and decision rules before data collection or before the dataset is inspected. All analysis decisions should then be documented in a reproducible script, with exploratory analyses and sensitivity checks clearly labelled. Exploratory work is entirely legitimate, but it should be labelled as such and kept separate from confirmatory analyses rather than presented as if planned from the outset.
### Example: Documenting Analysis Decisions in Code Comments
A well-documented analysis script includes comments explaining each decision.
:::: {.content-visible when-format="html"}
::::: {.panel-tabset group="part-d-reporting-chapter-14-transparent-reporting-of-methods-and-limitations-cell-1"}
#### Rendered Output
```{r}
#| label: part-d-chunk-06-html
#| echo: false
library(tidyverse)
# Load cleaned data (see data_cleaning.R for details)
study_data <- read_csv("data/mini_marketing.csv", show_col_types = FALSE)
# Descriptive statistics
summary(study_data)
# Decision 1: Treat satisfaction as ordinal (1–5 scale)
# Justification: Only 5 levels; cannot assume equal intervals
# Method: Mann–Whitney U test (nonparametric)
# Decision 2: Two-sided test (no directional hypothesis preregistered)
wilcox.test(satisfaction ~ campaign, data = study_data, exact = FALSE)
# Sensitivity analysis: Also run t-test assuming equal intervals
t.test(satisfaction ~ campaign, data = study_data, var.equal = TRUE)
# Result: Both tests yield similar p-values; conclusions robust to choice of test
```
#### R Code
```{webr-r}
#| context: interactive
# Re-create the cleaned mini dataset used in this example
study_data <- data.frame(
campaign = rep(c("Control", "New campaign"), each = 8),
satisfaction = c(3, 3, 4, 2, 4, 3, 3, 4, 4, 5, 4, 5, 3, 4, 5, 4)
)
# Descriptive statistics
summary(study_data)
# Decision 1: Treat satisfaction as ordinal (1–5 scale)
# Justification: Only 5 levels; cannot assume equal intervals
# Method: Mann–Whitney U test (nonparametric)
# Decision 2: Two-sided test (no directional hypothesis preregistered)
wilcox.test(satisfaction ~ campaign, data = study_data, exact = FALSE)
# Sensitivity analysis: Also run t-test assuming equal intervals
t.test(satisfaction ~ campaign, data = study_data, var.equal = TRUE)
# Result: Both tests yield similar p-values; conclusions robust to choice of test
```
:::::
::::
:::: {.content-visible unless-format="html"}
```{r}
#| label: part-d-chunk-06
library(tidyverse)
# Load cleaned data (see data_cleaning.R for details)
study_data <- read_csv("data/mini_marketing.csv", show_col_types = FALSE)
# Descriptive statistics
summary(study_data)
# Decision 1: Treat satisfaction as ordinal (1–5 scale)
# Justification: Only 5 levels; cannot assume equal intervals
# Method: Mann–Whitney U test (nonparametric)
# Decision 2: Two-sided test (no directional hypothesis preregistered)
wilcox.test(satisfaction ~ campaign, data = study_data, exact = FALSE)
# Sensitivity analysis: Also run t-test assuming equal intervals
t.test(satisfaction ~ campaign, data = study_data, var.equal = TRUE)
# Result: Both tests yield similar p-values; conclusions robust to choice of test
```
::::
Interpretation: The script documents that satisfaction is treated as ordinal and that a nonparametric test is chosen accordingly. A sensitivity analysis using a t-test (assuming equal intervals) is also reported to show robustness. This transparency helps readers understand and trust the analysis.
### Describing the Sample
The sample description should state the target population, the accessible population, the sampling method, inclusion and exclusion criteria, recruitment procedures, response rate, final sample size after exclusions, and relevant participant characteristics such as demographics or baseline measures.
Use a table to summarise sample characteristics. For RCTs, report characteristics separately by group to verify balance.
### Example: Sample Characteristics Table
We create a descriptive table for the `mini_marketing` dataset. The table reports group size, satisfaction scores and prior purchase rates so that readers can assess baseline comparability.
:::: {.content-visible when-format="html"}
::::: {.panel-tabset group="part-d-reporting-chapter-14-transparent-reporting-of-methods-and-limitations-cell-2"}
#### Rendered Output
```{r}
#| label: part-d-chunk-07-html
#| echo: false
library(tidyverse)
# Load data
study_data <- read_csv("data/mini_marketing.csv", show_col_types = FALSE)
# Summary statistics by campaign group
summary_table <- study_data %>%
group_by(campaign) %>%
summarise(
N = n(),
`Mean Satisfaction` = round(mean(satisfaction, na.rm = TRUE), 2),
`SD Satisfaction` = round(sd(satisfaction, na.rm = TRUE), 2),
`Prior Purchase (%)` = round(100 * mean(prior_purchase == "Yes", na.rm = TRUE), 1),
.groups = "drop"
)
smallsamplelab_apa_table(
"17.1",
"Sample characteristics by campaign type",
summary_table,
note = "The table summarises the mini marketing study by campaign group. Prior purchase is reported as the percentage of participants with a previous purchase.",
align = c("l", "r", "r", "r", "r"),
col.names = c("Campaign", "n", "Satisfaction M", "Satisfaction SD", "Prior purchase (%)")
)
```
#### R Code
```{webr-r}
#| context: interactive
study_data <- data.frame(
campaign = rep(c("Control", "New campaign"), each = 8),
satisfaction = c(3, 3, 4, 2, 4, 3, 3, 4, 4, 5, 4, 5, 3, 4, 5, 4),
prior_purchase = c("No", "Yes", "No", "No", "Yes", "No", "Yes", "No",
"Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", "No")
)
summary_table <- do.call(
rbind,
lapply(split(study_data, study_data$campaign), function(x) {
data.frame(
Campaign = x$campaign[1],
n = nrow(x),
Satisfaction_M = round(mean(x$satisfaction), 2),
Satisfaction_SD = round(sd(x$satisfaction), 2),
Prior_purchase_percent = round(100 * mean(x$prior_purchase == "Yes"), 1)
)
})
)
summary_table
```
:::::
::::
:::: {.content-visible unless-format="html"}
```{r}
#| label: part-d-chunk-07
library(tidyverse)
# Load data
study_data <- read_csv("data/mini_marketing.csv", show_col_types = FALSE)
# Summary statistics by campaign group
summary_table <- study_data %>%
group_by(campaign) %>%
summarise(
N = n(),
`Mean Satisfaction` = round(mean(satisfaction, na.rm = TRUE), 2),
`SD Satisfaction` = round(sd(satisfaction, na.rm = TRUE), 2),
`Prior Purchase (%)` = round(100 * mean(prior_purchase == "Yes", na.rm = TRUE), 1),
.groups = "drop"
)
smallsamplelab_apa_table(
"17.1",
"Sample characteristics by campaign type",
summary_table,
note = "The table summarises the mini marketing study by campaign group. Prior purchase is reported as the percentage of participants with a previous purchase.",
align = c("l", "r", "r", "r", "r"),
col.names = c("Campaign", "n", "Satisfaction M", "Satisfaction SD", "Prior purchase (%)")
)
```
::::
Interpretation: The table shows sample size, satisfaction scores, and prior purchase rates for each campaign group. Readers can assess whether groups are comparable at baseline. If the study were an RCT, imbalances might suggest randomisation problems or chance variation. In observational studies, imbalances indicate potential confounding.
### Reporting Missing Data
Missing-data reporting should state the number of complete observations, the number and proportion missing for each variable, visible patterns of missingness and the method used to handle missing values. If missingness clusters in certain subgroups, that pattern should be described because it may affect interpretation.
If multiple imputation was used, state the number of imputations and the imputation method.
### Reporting Deviations from Planned Analyses
If the analysis plan changes after seeing the data (e.g., adding a covariate, using a different test, excluding outliers), report the deviation explicitly.
Example: "We initially planned to use a t-test but observed severe skewness in the outcome. We therefore used a Mann–Whitney U test instead. Results from both tests are reported in the supplementary materials."
### Acknowledging Limitations
Every study has limitations. In small-sample studies, the common ones are limited power, wide confidence intervals, sensitivity to outliers or assumption violations, limited generalisability from narrow or non-probability samples, and inflated false-positive risk when many tests are conducted. These limitations should be connected to the interpretation: explain how they might affect the conclusion and what a future study would need to resolve.
### Handling Multiple Comparisons in Small Samples
When conducting multiple statistical tests, the probability of Type I error increases. With $k$ independent tests at $\alpha = 0.05$:
- Family-wise error rate (FWER) $\approx 1 - (1 - \alpha)^k$
- For 5 tests: roughly 23% chance of at least one false positive
- For 10 tests: roughly 40% chance
#### When to Correct
- Multiple outcomes or subgroups
- Post-hoc pairwise comparisons
- Exploratory analyses with many variables
#### Common Methods
1. **Bonferroni**: $\alpha_\text{adjusted} = \alpha / k$ (most conservative)
2. **Holm–Bonferroni**: Sequential step-down procedure
3. **Benjamini–Hochberg (FDR)**: Controls the false discovery rate
:::: {.content-visible when-format="html"}
::::: {.panel-tabset group="part-d-reporting-chapter-14-transparent-reporting-of-methods-and-limitations-cell-3"}
#### Rendered Output
```{r}
#| label: part-d-mc-adjustments-html
#| echo: false
# Example with multiple p-values
p_values <- c(0.01, 0.03, 0.08, 0.15, 0.25)
adjustment_table <- tibble(
Test = paste("Test", seq_along(p_values)),
`Raw p` = p_values,
Bonferroni = p.adjust(p_values, method = "bonferroni"),
Holm = p.adjust(p_values, method = "holm"),
`Benjamini–Hochberg FDR` = p.adjust(p_values, method = "fdr")
) %>%
mutate(across(where(is.numeric), ~ sprintf("%.3f", .x)))
smallsamplelab_apa_table(
"17.2",
"Adjusted p-values for five exploratory tests",
adjustment_table,
note = "Bonferroni controls the family-wise error rate most conservatively; Holm is a step-down family-wise method; Benjamini–Hochberg controls the false discovery rate.",
align = c("l", "r", "r", "r", "r")
)
```
#### R Code
```{webr-r}
#| context: interactive
# Example with multiple p-values
p_values <- c(0.01, 0.03, 0.08, 0.15, 0.25)
adjustment_table <- data.frame(
Test = paste("Test", seq_along(p_values)),
Raw_p = p_values,
Bonferroni = p.adjust(p_values, method = "bonferroni"),
Holm = p.adjust(p_values, method = "holm"),
Benjamini_Hochberg_FDR = p.adjust(p_values, method = "fdr")
)
adjustment_table[, -1] <- lapply(adjustment_table[, -1], function(x) sprintf("%.3f", x))
adjustment_table
```
:::::
::::
:::: {.content-visible unless-format="html"}
```{r}
#| label: part-d-mc-adjustments
# Example with multiple p-values
p_values <- c(0.01, 0.03, 0.08, 0.15, 0.25)
adjustment_table <- tibble(
Test = paste("Test", seq_along(p_values)),
`Raw p` = p_values,
Bonferroni = p.adjust(p_values, method = "bonferroni"),
Holm = p.adjust(p_values, method = "holm"),
`Benjamini–Hochberg FDR` = p.adjust(p_values, method = "fdr")
) %>%
mutate(across(where(is.numeric), ~ sprintf("%.3f", .x)))
smallsamplelab_apa_table(
"17.2",
"Adjusted p-values for five exploratory tests",
adjustment_table,
note = "Bonferroni controls the family-wise error rate most conservatively; Holm is a step-down family-wise method; Benjamini–Hochberg controls the false discovery rate.",
align = c("l", "r", "r", "r", "r")
)
```
::::
#### Reporting Template
"We tested effects in three subgroups. After Holm–Bonferroni correction, only Group A showed a significant difference (adjusted p = 0.03)."
#### Small Sample Considerations
With limited power, strict corrections can remove all nominally significant findings. The practical response is to pre-specify primary outcomes, label exploratory outcomes clearly, report both corrected and uncorrected p-values where informative, and place greater weight on effect sizes and confidence intervals.
#### Key Takeaways
For multiple-comparison reporting, state how many tests were conducted, describe the correction method, and distinguish confirmatory from exploratory analyses. In small-sample work, adjusted p-values should usually be interpreted alongside confidence intervals because the interval shows the direction and precision of the estimate.
### Pre-Registration for Small-Sample Studies
Pre-registration involves documenting your hypotheses, methods, and analysis plan before data collection begins or, at the latest, before the dataset is inspected. This is especially important for small samples because:
- Limited power increases temptation for p-hacking
- Results are more sensitive to analytic choices
- Multiple testing is common (searching for effects)
- Post-hoc storytelling is easier with small samples
#### What to Pre-Register
**Minimum requirements:**
1. Research questions and hypotheses (primary versus secondary)
2. Sample size with justification
3. Statistical tests planned for each hypothesis
4. Handling of outliers, missing data, and covariates
5. Multiple comparison corrections (if applicable)
#### Pre-Registration Template
Use the template below as a planning table rather than as code to run. A preregistration should be specific enough that another analyst could reproduce the planned analysis without asking what you meant.
| Section | What to write before analysis |
|---|---|
| Study title | Short descriptive title and date of preregistration |
| Primary question | One confirmatory question stated in testable terms |
| Secondary questions | Exploratory or supportive questions labelled as secondary |
| Hypotheses | Directional or non-directional predictions, including the expected outcome metric |
| Design and sample | Target sample size, stopping rule, recruitment source, inclusion and exclusion criteria |
| Variables | Primary outcome, predictors, covariates and scoring rules |
| Primary analysis | Statistical test or model, alpha level, effect size, confidence interval and software |
| Assumption checks | Planned diagnostics and what will be done if assumptions are not met |
| Outliers and missing data | Definitions, handling rules and sensitivity analyses |
| Multiplicity | Which outcomes are primary, which are exploratory and how p-values will be adjusted |
| Decision rule | What pattern of estimate, interval and p-value will be interpreted as support for the primary hypothesis |
#### Where to Pre-Register
The **Open Science Framework** (osf.io) provides free, time-stamped registration for study protocols, analysis plans and materials. **AsPredicted** (aspredicted.org) provides a short nine-question template that is widely used for behavioural, psychology and management studies. **Registered Reports** are a journal submission format in which the research question and methods are reviewed before results are known, with in-principle acceptance if the protocol is judged sound.
#### Handling Deviations
Deviations are acceptable if reported transparently:
```markdown
**Deviations from Pre-Registration:**
1. Sample size: Planned n = 40, achieved n = 36 due to [reason]
2. Primary test: Switched from t-test to Mann–Whitney due to severe skewness (skew = 2.4)
3. Additional analysis: Added baseline covariate per reviewer request (post-hoc, clearly labelled)
```
#### Benefits for Small Samples
- Protects against p-hacking accusations
- Separates confirmatory from exploratory analyses
- Improves study design through upfront planning
- Facilitates transparent reporting
::: {.callout-tip}
## Pre-Registration Checklist
- [ ] Hypotheses specific and testable
- [ ] Sample size justified
- [ ] All variables operationally defined
- [ ] Statistical tests specified
- [ ] Outlier/missing data plans stated
- [ ] Multiple comparison approach stated
- [ ] Time-stamped before analysis
:::
### Following Reporting Guidelines
Numerous reporting guidelines exist for different study designs:
- **CONSORT** [@schulz2010]: Randomised controlled trials.
- **STROBE** [@vonelm2007]: Observational studies (cohort, case-control, cross-sectional).
- **PRISMA** [@page2021]: Systematic reviews and meta-analyses.
- **COREQ**: Qualitative research.
These guidelines provide checklists of items to report. Following them improves transparency and comparability across studies. Even if formal adherence is not required, consult the relevant guideline as a checklist.
For example, CONSORT asks randomised trials to report participant flow. In a small pilot RCT, this can be as simple as a table that states how many participants were assessed, randomised, analysed and excluded at each stage. If 40 people were screened, 30 were enrolled, and 28 were analysed, the report should make clear where the two losses occurred and whether they were related to group assignment or outcome.
```{r}
#| label: ch17-consort-flow-table
#| echo: false
#| results: asis
consort_flow <- tibble(
Stage = c("Assessed for eligibility", "Randomised", "Allocated to intervention", "Allocated to control", "Included in analysis", "Excluded after randomisation"),
n = c(40, 30, 15, 15, 28, 2),
Note = c(
"10 did not meet inclusion criteria or declined",
"1:1 allocation",
"14 analysed; 1 withdrew before post-test",
"14 analysed; 1 missing post-test",
"Primary analysis used available paired outcomes",
"Reasons reported by group"
)
)
smallsamplelab_apa_table(
"17.3",
"Example participant-flow summary for a small pilot RCT",
consort_flow,
note = "This table illustrates the reporting logic of CONSORT Item 13a. A full trial report would usually include a flow diagram as well.",
align = c("l", "r", "l")
)
```
---
### Key Takeaways
Transparent reporting allows readers to evaluate the quality and limits of small-sample evidence. The essential tasks are to document analytic choices in reproducible scripts, report sample characteristics, missing data and exclusions explicitly, disclose deviations from planned analyses, and present sensitivity analyses where decisions could affect the result. Relevant reporting guidelines such as CONSORT and STROBE should be used as checklists, while the limitations section should state clearly how sample size, precision, assumptions and generalisability affect the conclusion.
---
### Self-Assessment Quiz
Test your understanding of transparent reporting from Chapter 17.
```{r}
#| echo: false
#| results: asis
source(normalizePath(file.path(dirname(knitr::current_input(dir = TRUE)), "..", "R", "quiz_helpers.R"), mustWork = TRUE))
smallsamplelab_render_quiz(list(
list(
prompt = "Which should be reported when documenting a small-sample study?",
options = c("Only significant results", "All analyses conducted, including non-significant findings", "Only the primary analysis", "Results can be selectively reported"),
answer = 2L,
explanation = "Transparent reporting requires documenting planned analyses, exploratory analyses and sensitivity checks, not just significant findings. Selective reporting inflates Type I error across the literature and prevents readers from evaluating the quality of the evidence."
),
list(
prompt = "A study planned to use a t-test but switched to Mann–Whitney after seeing skewed data. How should this be reported?",
options = c("Do not mention the change", "Report only the Mann–Whitney result", "State the planned test, explain the skewness, report the Mann–Whitney result and include a sensitivity analysis", "Pretend Mann–Whitney was always planned"),
answer = 3L,
explanation = "Deviations from plans should be documented with justification. Reporting both the planned and adapted analyses shows how much the conclusion depends on the analytic choice."
),
list(
prompt = "What is \"p-hacking\"?",
options = c("Illegally accessing data", "Trying multiple analyses/subgroups until finding p<0.05, then reporting only that result", "Using permutation tests", "Adjusting for multiple comparisons"),
answer = 2L,
explanation = "P-hacking involves exploring many analyses, such as different covariates, subgroups or outlier rules, until statistical significance appears and then selectively reporting that analysis. This inflates the false-positive rate."
),
list(
prompt = "Pre-registration helps prevent:",
options = c("Sample size limitations", "Researcher degrees of freedom (flexibility in analysis choices) leading to false positives", "Missing data", "Measurement error"),
answer = 2L,
explanation = "Pre-registration documents hypotheses and analysis plans before the data are inspected. This reduces post-hoc decisions that capitalise on chance and inflate Type I error."
),
list(
prompt = "A study with n=15 per group finds p=0.12. The limitation section should state:",
options = c("\"The result is not significant, proving no effect exists\"", "\"The study was underpowered to detect small-to-medium effects; findings are inconclusive\"", "\"The sample size was adequate\"", "Nothing; non-significant results need no discussion"),
answer = 2L,
explanation = "Small samples have limited power. Non-significance may reflect insufficient power rather than absence of effect, so the limitation section should discuss precision, minimum detectable effects and the uncertainty around the estimate."
)
))
```
---