AP Statistics Inference Review: Confidence Intervals vs Hypothesis Tests

Statistical inference is the backbone of the AP Statistics exam. Units 6 through 9 in the official AP Statistics curriculum all revolve around one central idea: using data from a sample to draw conclusions about a population. Whether you are constructing a confidence interval or conducting a significance test, the process demands careful attention to conditions, correct formulas, and precise language. This guide breaks down what each inference procedure tests, how confidence intervals and hypothesis tests differ conceptually, the most common mistakes students make, and a set of worked practice problems to sharpen your skills before exam day.

ViewMath is an independent publisher and is not affiliated with or endorsed by the College Board, Advanced Placement, or any AP exam program. Always verify current course and exam information with College Board directly.

Official source checked: College Board’s AP Statistics course page lists inference across Units 6-9, including proportions, means, chi-square procedures, and slopes, with current exam-unit weight ranges.

The Four Inference Units at a Glance

The AP Statistics curriculum organizes inference across four units:

Unit 6 — Inference for Categorical Data: Proportions. One-sample and two-sample z-procedures for proportions. You estimate p or compare p₁ and p₂.
Unit 7 — Inference for Quantitative Data: Means. One-sample and two-sample t-procedures for means. You use the t-distribution because σ (population standard deviation) is almost never known.
Unit 8 — Inference for Categorical Data: Chi-Square. Goodness-of-fit, homogeneity, and independence tests. You compare observed counts to expected counts across multiple categories.
Unit 9 — Inference for Quantitative Data: Slopes. Significance tests and confidence intervals for the slope of a least-squares regression line. You test whether the true slope β is zero (i.e., no linear relationship).

Confidence Intervals vs. Hypothesis Tests: The Core Conceptual Difference

Students often treat confidence intervals (CIs) and hypothesis tests (HTs) as separate procedures with separate purposes, but they answer related questions from opposite directions.

What a Confidence Interval Does

A confidence interval estimates the value of an unknown population parameter. A 95% CI for a proportion means: if we repeated the sampling process many times and built an interval each time, approximately 95% of those intervals would capture the true parameter. The interval gives you a range of plausible values.

Form: statistic ± (critical value) × (standard error)
Example: p̂ ± z* × √(p̂(1 − p̂) / n)

What a Hypothesis Test Does

A hypothesis test makes a decision about a specific claimed value of the parameter. You assume the null hypothesis (H₀) is true, calculate how surprising your sample result would be under that assumption (the p-value), and decide whether to reject or fail to reject H₀.

Form: test statistic = (statistic − null value) / standard error
A small p-value (typically below α = 0.05) is evidence against H₀.

The Connection Between the Two

A two-sided hypothesis test at significance level α and a confidence interval at confidence level (1 − α) always agree: if the null value falls outside the CI, the test would reject H₀, and vice versa. This connection can save time on free-response questions that ask for both.

Conditions, Conditions, Conditions

A strong AP Statistics free-response inference solution should verify conditions before calculating. Every procedure has three standard condition checks:

Random: The data must come from a random sample or a randomized experiment. Name it: “Data were collected using an SRS” or “Subjects were randomly assigned to treatments.”
Independence (10% rule for sampling without replacement): The sample must be less than 10% of the population when sampling without replacement. For two-sample procedures, both samples must satisfy this condition independently.
Normal / Large Counts:
- For proportions: np̂ ≥ 10 and n(1 − p̂) ≥ 10 (use the null value for HT).
- For means: the population is Normal, OR the sample size is large (n ≥ 30 invokes the Central Limit Theorem), OR a graph of the data shows no strong skew or outliers for smaller samples.
- For chi-square: all expected counts ≥ 5.

Failing to state conditions, even partially, costs points even when the calculation is correct.

The Four-Step Inference Framework

A clean inference response should make four steps explicit:

State: Define the parameter in context (e.g., “Let p = the true proportion of adults in the city who…”). State H₀ and Hₐ in symbols and words for HTs; state the confidence level for CIs.
Plan: Name the procedure (e.g., “one-sample z-interval for a proportion”) and verify all conditions.
Do: Calculate the interval or test statistic. Show the formula with numbers substituted.
Conclude: Interpret the result in context. For CIs: “We are 95% confident that the true proportion … is between … and ….” For HTs: “Because the p-value (0.03) is less than α = 0.05, we reject H₀. There is convincing evidence that…”

The Most Common AP Statistics Inference Errors

1. Misinterpreting Confidence Level

Wrong: “There is a 95% chance the true proportion falls in this interval.” (The parameter is fixed; it either is or is not in the interval.)
Correct: “We used a method that, in the long run, captures the true proportion 95% of the time.”

2. Wrong Standard Error for Hypothesis Tests

For a one-proportion z-test, use p₀ (the null value) in the standard error formula, not p̂. Many students mistakenly use the sample proportion, which gives the wrong test statistic.

3. Confusing Statistical Significance with Practical Importance

A p-value near zero tells you the effect is real — not that it is large. A sample size of 10,000 can produce a statistically significant result for a trivially small difference. Always interpret the size of the effect (e.g., the confidence interval) alongside the p-value.

4. Wrong Degrees of Freedom for t-Procedures

One-sample t: df = n − 1. Two-sample t (without pooling): use technology or the conservative df = min(n₁ − 1, n₂ − 1). Never pool unless told the population variances are equal, which you almost never are.

5. Paired vs. Two-Sample Confusion

If each observation in one group is naturally matched to an observation in the other (e.g., before-and-after on the same subject), use a paired t-test on the differences. Treating paired data as two independent samples inflates variability and weakens the test.

Worked Example: One-Proportion Z-Test

A random sample of 200 registered voters found that 94 plan to vote for a ballot measure. A researcher claims more than 40% of all registered voters support the measure. Test this claim at α = 0.05.

State: Let p = true proportion of all registered voters who support the measure. H₀: p = 0.40; Hₐ: p > 0.40.

Plan: One-proportion z-test. Random ✓ (random sample). Independence ✓ (200 is less than 10% of all registered voters). Normal ✓ (np₀ = 200(0.40) = 80 ≥ 10; n(1 − p₀) = 120 ≥ 10).

Do: p̂ = 94/200 = 0.47. z = (0.47 − 0.40) / √(0.40 × 0.60 / 200) = 0.07 / 0.0346 ≈ 2.02. p-value = P(Z > 2.02) ≈ 0.022.

Conclude: Because p-value (0.022) < α (0.05), we reject H₀. There is convincing evidence that more than 40% of all registered voters support the ballot measure.

Worked Example: Two-Sample T-Interval for Means

Two independent random samples of student test scores yielded: Group A: n = 25, x̄ = 82.4, s = 9.2; Group B: n = 30, x̄ = 77.1, s = 11.5. Construct a 95% confidence interval for the true difference in mean scores (μ_A − μ_B).

Plan: Two-sample t-interval. Both groups randomly sampled, both ≤ 10% of population, both samples have n ≥ 25 with no extreme skew assumed.

Do: Point estimate = 82.4 − 77.1 = 5.3. SE = √(9.2²/25 + 11.5²/30) = √(3.386 + 4.408) = √7.794 ≈ 2.79. Conservative df = min(24, 29) = 24; t* ≈ 2.064 for 95%. Interval: 5.3 ± 2.064(2.79) = 5.3 ± 5.76 = (−0.46, 11.06).

Conclude: We are 95% confident that the true difference in mean scores between Group A and Group B is between −0.46 and 11.06 points. Because 0 is in the interval, there is not convincing evidence of a difference in population means.

Quick-Reference: Choosing the Right Procedure

Data Type	One Group	Two Groups / Paired	Three+ Categories
Proportions (categorical)	1-prop z	2-prop z (independent) or 1-prop z on differences (paired)	Chi-square GOF or homogeneity/independence
Means (quantitative)	1-sample t	2-sample t (independent) or paired t	ANOVA (not on AP Stats exam)
Regression slope	t-test for slope β	—	—

ViewMath Resources for AP Statistics

Consistent practice with full-length problems is the most effective way to build inference fluency. ViewMath AP Statistics resources cover all nine units, with worked solutions and step-by-step guidance. Use AP Statistics Made Easy for guided review, AP Statistics Step by Step for worked examples, and the AP Statistics Formula Sheet and Key Points for final-week recall.

Build inference fluency one procedure at a time, verify conditions every time, and always write conclusions in context. Those habits protect points on problems where the calculation is only one part of the score.