Non Iterative Confidence Interval Methods for the Difference between Two Proportions
J Reed III
Citation
J Reed III. Non Iterative Confidence Interval Methods for the Difference between Two Proportions. The Internet Journal of Epidemiology. 2007 Volume 5 Number 2.
Abstract
The construction of a confidence interval for the difference between two independent proportions is one of the most basic analyses in statistical inference. The Wald asymptotic methods, with and without a continuity correction have less than nominal coverage probability characteristics but continue to be used in spite of this known poor behavior. For the equal sample size case, the Wald-z coverage probability is subnominal overall. The Wald-c coverage probability always exceeds the nominal level and has interval width larger than Wald-z. For the unequal sample size case, Wald-z is always subnominal while Wald-c exceeds the nominal level. Newcombe's hybrid method and the Agresti-Caffo methods have coverage probabilities that are near nominal for either the equal or unequal sample sizes. Considering the coverage probability criterion, Newcombe's hybrid method and Agresti-Caffo method demonstrate superior coverage properties.
Introduction
In analyzing the results of studies we often encounter the problem of comparing two binomial success probabilities p1 and p2, p1 > 0 and p2 > 0. Implicit in this comparison is the independent observations X1 ~ B (n1, p1) and X2 ~ B (n2, p2). The most common comparison is the hypothesis Ho: p1 = p2 versus Ha: p1 ≠ p2. Accompanying this test is the construction of a confidence interval for the difference between p1 and p2. The most commonly described textbook method is the Wald-z. Occasionally, a continuity corrected version is given (Wald-c). Nearly all introductory statistics textbooks include the Wald-z confidence interval and issue a warning - usually in a footnote - when not to use it.
The general problems associated with constructing a confidence interval for the difference between two independent proportions are similar to those associated with constructing a confidence interval for a single proportion - overshoot and poor coverage probability. Despite these shortcomings, the Wald-z and Wald-c methods continue to dominate in textbooks. The purpose of this paper is to review the coverage probability functions of the Wald methods and a set of alternative non-iterative methods for computing a confidence interval for the difference between two independent proportions.
Methods
The primary criteria for evaluating a confidence interval method is the coverage probability function. This coverage probability for the difference between two independent proportions, C(π1,π2|n1,n2,α), is found by fixing n1, n2, π1, and π2, then computing the confidence interval for each xi= 0, . . ., ni for i= 1, 2. The coverage probability is then defined by:
C(π1,π2|n1,n2,α) = Σ Pr(X1=x1|n1,π1)Pr(X2=x2|n2,π2)δ(π1,π2|x1,x2,n1,n2,α).
If (π1-π2) ∈ [LB(x1,x2,n1,n2,α), UB(x1,x2,n1,n2,α)], δ(π1,π2|x1,x2,n1,n2,α)=1 and 0 otherwise.
The Wald-z and Wald-c confidence interval lower upper bounds for the difference between two independent proportions are defined as (See Table 1 for a typical data structure):
Figure 2
Figure 1 shows the 95% confidence interval coverage probability function for the Wald-z and Wald-c methods as a function of π1, π1 ∈ [0,1] for n1 = n2 = 20 and p2 = 0.3. The sawtooth appearance of the coverage functions is due to the discontinuities for values of p1 – p2 corresponding to any lower or upper limits in the set of confidence intervals. Like the one sample cousin, the Wald-z coverage probability curve is subnominal and less than 0.95 overall. The Wald-c coverage probability always exceeds 0.95 overall with interval widths larger than Wald-z.
Figure 3
Figure 2 shows the 95% confidence interval coverage probability function for the Wald-z and Wald-c methods as a function of π1, π1 ∈ [0,1] for n1 = 20, n2 = 10 and p2 = 0.2. The Wald-z coverage probability curve is subnominal for differences in proportions near 0 and 1 and less than 0.95 overall.
Figure 4
Beal, (Beal, 1987) evaluated asymptotic methods for computing confidence interval for the differences of two independent proportions. All involved identifying the interval within which (θ - θ') 2 ≤ z 2 V(ψ, θ'), where θ'= p1 — p2, and V(ψ, θ')=u{4ψ(1 —ψ)θ = π1(1 —π1)/m + π2(1 — π2)/n . The Haldane (H) and Jeffreys-Perks (JP) method provides non-degenerative confidence intervals for all values of p1 and p2 unlike Wald-z or Wald-c. H and JP generally performed better than the Wald-z and Wald-c and of the two, JP was preferred (Beal, 1987, Radhakrishna, et al, 1992). The Haldane and Jeffreys-Perks lower and upper limits are defined by:
Newcombe compared eleven methods for estimating the difference between independent proportions (Newcombe, 1998). Similar to the single proportion, the virtues of Wald-z and Wald-c methods are in their simplicity but overshoot and inappropriate intervals are still common. The Haldane and Jeffreys-Perks methods attempt to overcome the overshoot and inappropriate intervals while maintaining closed-form tractability. Newcombe concluded that both H and JP were improvements over the Wald-z and Wald-c methods, but both were still inadequate. Newcombe recommended a hybrid method based on Wilson's score method for a single proportion without continuity correction (NS). The LB and UB for the NS method are:
l1, l2, u1, u2 are the lower and upper bounds for the two proportions p1 and p2 using Wilson's score method.
Agresti and Coull's adjustment to the Wald method for a single proportion adds t/2 successes and t/2 failures (Agresti and Coull, 1998). Agresti and Caffo later suggest that by adding two successes and two failures (total) to the two-sample method would improve the simple Wald method (Agresti and Caffo, 2000). This is an adjustment that adds one pseudo observation of each type to each sample. For instance, for sample
For p1 - p2, Carlin and Lewis (Carlin and Lewis, 1996) considered independent uniform prior distributions for p1 and p2. The posterior distribution of pi is β with mean of (xi + 1)/(ni + 2) and variance pi (1 - pi)/(ni + 3). Then, a crude normal approximation for the distribution of the difference of the posterior β variates leads to a closed form confidence interval.
Zou and Donner (Zou and Donner, 2004) proposed using Fisher's z transformation to the estimator Δ', where Δ' = p1 - p2. By letting σ' = √ (ac/m 3 + bd/n 3 ), the resulting interval for Δ' is given by:
where l and u are given by: ½ log[(1 + Δ')/(1- Δ')] ± zα/2 σ'/(1 - Δ' 2 )
Results
Figure 3 shows the 95% confidence interval coverage probability function for the Newcombe NS, Haldane, Jeffreys-Perks, and Agresti-Caffo, Carlin-Lewis and Fisher's-Z methods as a function of π1, π1 ∈ [0,1] for n1 = n2 = 20 and p2 = 0.3. The NS, Agresti-Caffo and Fisher's-Z methods demonstrate coverage probabilities that are near nominal over π1 ∈ [0, 1]. Table 2 details the corresponding minimum, maximum, mean, and standard deviation for the coverage probability of these confidence interval methods. The Newcombe NS, Fisher's-Z, Agresti-Caffo and Jeffreys-Perks methods are near nominal while the Wald-z is subnominal.
Figure 9
Figure 12
Figure 4 shows the 95% confidence interval coverage probability function for the Newcombe NS, Haldane, Jeffreys-Perks, Agresti-Caffo, Carlin-Lewis and Fisher's Z methods as a function of π1, π1 ∈ [0,1] for n1 = 20, n2 = 10 and p2 = 0.2. In the unequal sample size situation, Newcombe NS and Agresti-Caffo coverage probability functions are near nominal over π1 ∈ [0, 1]. The other methods are subnominal. Table 3 details the corresponding minimum, maximum, mean, and standard deviation for the coverage probability of these confidence interval methods. The Newcombe NS and Agresti-Caffo are nominal. Fisher's-z and the Jeffreys-Perks are equivalent while the Wald-z and Carlin-Lewis methods are subnominal.
Figure 13
Discussion
The popularity of the Wald-z closed form confidence interval is based in part because of its overwhelming use in elementary statistics textbooks. The similarity of the computational formula to the confidence interval for the difference between two sample means is simplicity in action. Rarely do these elementary textbooks warn of the inadequacy of the Wald-z, Wald-c or even of a Wald-t. In the case of differences between two independent proportions the Wald-z confidence interval behaves so poorly with coverage probabilities below nominal values that it would seem that authors of these texts would at least consider other closed form methods which are better. Considering the coverage probability criterion, three alternative methods demonstrate superior coverage properties and both are easily programmable. Our recommendation is in the equal n or nearly equal n case, use the Newcombe NS, Agresti-Caffo or Fisher's-Z methods. In the unequal n situation, use either the Newcombe NS or Agresti-Caffo methods.