![]()
Patuxent
Wildlife Research Center
![]()
NAAMP III Archive
- statistical issues
Home | Archive
by Alphabetical Order | Archive by Category
Philip M. Dixon and Joseph H.K. Pechmann
Savannah River Ecology
Lab.
University of Georgia
Drawer E, Aiken SC 29802
For more information, contact:
dixon@srel.edu (statistics), pechmann@srel.edu (amphibians)
Non-significant results from a traditional test may result from:
Previous approaches have focused on power analysis and survey design. A typical calculation is to find a sample size (number of years) for which a given trend is likely to produce a statistically significant result. Such calculations require a priori specification of the true trend and the magnitude of random variation.
If observed estimates of trend and variation are used in power calculations, the estimated power is simply a function of the p-value (Mead 1988, Dixon and Pechmann 1996). Such calculations provide no additional insight into the nature of non-significant results.
Limitations of power analysis - 2
A second problem with traditional tests is that a test may be too powerful. If the sample size is large or the residual variation small, then a biologically insignificant trend (e.g., numerically close to zero) can be statistically significant. The test of no trend is not a test of whether the trend is biologically important.
Finally, it is extremely unlikely that a trend will truly be exactly zero.
Hence, the null hypothesis of zero trend is probably false and the p-value
is as much a measure of sample size as of trend.
Example: amphibian populations
Trends in abundance of four amphibian populations (Ambystoma talpoideum and A. tigrinum at Rainbow Bay, SC and Desmognathus monticola and D. ochrophaeus at Coweeta Hydrological Lab, NC) are shown in Figure 1.
These four populations include examples of populations with no apparent / small trend (A. talpoideum and D. monticola), strong declines (A. tigrinum) and strong increases (D. ochrophaeus), and populations with large (Ambystoma spp) and small (Desmognathus spp) annual variation.
Further information on these populations and the data collection is given in
Semlitsch et al. (1996) and Hairston (1996).
Model for trend
We will use a simple log-linear model for trend in population size:
log (Nt + 1) = A + B t + et
The slope, B, describes the linear trend; it is estimated by b, the least-squares regression slope.
Estimated trends are superimposed on the data in Figure 1. Slopes for A. tigrinum and D. ochrophaeus are significantly different from zero, but those for A. talpoideum and D. monticola are not (Table 1).
Are the non-significant results from A. talpoideum and D. monticola
proof that the populations are stationary? NO, for the reasons listed above.
What we need are different statistical methods.
Equivalence tests
We illustrate statistical methods to test whether a population is stationary. Stationary here means that the trend is ecologically close to zero, i.e., within some region that is considered equivalent to no trend. Although the discussion and example will focus on trends, the method of equivalence testing can be applied to other statistical problems, such as comparing two means.
The key to an equivalence test is to define an ecologically relevant equivalence region. Values of the trend inside this region are considered equivalent to zero.
Equivalence hypotheses for a trend:
Null hypothesis is that the trend is outside the equivalence region
(Bl, Bu), i.e. H0: B <= B1 or B >= Bu.
The alternative is that the trend is inside the region
Ha: Bl < B < Bu.
Many statistical tests have been proposed, but there is no optimal one. Instead there is a trade-off between test size, power, and shape of the rejection region. The currently favored approach to test the equivalence null hypothesis is the two 1-sided tests method (Schuirmann 1987).
Two 1-sided tests method:
Reject H0: B <= B1 or B >= Bu if reject both B <= B1 and B >= Bu.
That is to say, conclude that the trend is equivalent to zero only if it is significantly more than the lower bound and significantly less than the upper bound.
Advantages:
But other tests may be more powerful. More information and examples of equivalence tests for differences are given in Dixon (1996).
Equivalence regions for trends:
What are reasonable upper and lower bounds for the equivalence region? These must be specified before an equivalence test can be used.
Ideally, these represent biological/ecological knowledge and judgment: what sorts of short-term trends are small for a particular population.
Suggested rule-of-thumb
A trend is small if the half-life/doubling time of a population
is 10 years or longer when that population declines/increases
by exactly the specified trend every year. This translates into log-linear slopes
of -0.0693 and 0.0693. If population size declined with a slope of -0.0693,
it would reach 1% of the starting size (pseudo-extinction) in 66 years.
Equivalence tests - 2:
If variation around the trend line is assumed to be independent, normally-distributed, and equi-variant, then each 1-sided test is a 1-sided t-test.
The subhypothesis H0a: B <= B1 is rejected if the t-statistic T1 = (b - B1)/sb is larger than the 1-sided critical value for a t-distribution with the appropriate degrees of freedom. The other subhypothesis is H0b: B >= Bu is rejected if the t-statistic Tu = (Bu - b)/sb is larger than the same t critical value. Or, one can compare the one-sided p-values to 0.05.
Alternatives to the t-test include bootstrapping, randomization tests, and
nonparametric tests for trend. The appropriate statistical theory for most of
these has been developed (Dixon 1996, Dixon 1997), but the tests may not be
easy to implement.
Example - equivalence tests:
We will use 1-sided t-tests for linear trends in log-transformed abundance. Diagnostic tests and plots suggest that the test's assumptions are reasonable (Dixon and Pechmann 1996).
For D. monticola, the t-values are:
Tl = (0.0077 - -0.0693)/0.0130 = 5.92, p < 0.0001
Tu = (0.0693 - 0.0077)/0.0130 = 4.74, p < 0.0001
Both subhypotheses are rejected, so we reject the null hypothesis of "non-equivalence." There is strong evidence that the trend in D. monticola is significantly inside the equivalence region.
Example - equivalence tests - 2:
For A. tigrinum, one of the two subhypotheses is rejected, but not the other, so one cannot conclude that the trend is within the equivalence region (Table 2).
For A. talpoideum, neither subhypothesis is rejected.
For D. ochrophaeus, both subhypotheses are rejected, so the trend
is significantly inside the equivalence region.
Relationship with the "usual'' test:
Results from the equivalence test are not necessarily opposite those of the "usual'' test of no difference because the rejection regions for the two tests are quite different.
The rejection region of a statistical test is the set of sample statistics
which lead to rejecting the null hypothesis. For t-tests of trend, the relevant
sample statistics are the estimated slope and its estimated standard error.
The rejection regions for the usual test (H0: B = 0) are to the right and left of the dotted line in Figure 2. The rejection region for the equivalence test is inside the dashed triangle in Figure 2.
Relationship between tests - 2:
If the two tests are considered together, there are four possible outcomes (Figure 2). Two are consistent:
Relationships between tests - 3:
Two outcomes seem to be inconsistent:
Other points:
The two 1-sided tests approach has an associated confidence interval. The null hypothesis of non-equivalence will be rejected at alpha = 5% if and only if a 90% confidence interval for B is entirely within the equivalence region. Note that the size of the interval is 100%-2alpha, not the usual 100%-alpha, because there are two tests.
Results from equivalence tests depend critically on the equivalence region. If the region is narrowed to (-0.0346, 0.0346), i.e, a halving/doubling time of 20 years, D. monticola is still significantly inside the equivalence region, but D. ochrophaeus is not.
Power for an equivalence test can be calculated. It depends on the size of the equivalence region, the true trend, sample size, error variance, and type of statistical test.
Conclusions:
Literature Cited
Dixon, P.M. 1996. Assessing effect and no effect with equivalence tests. To
appear in Newman, M. and Strojan, C. (eds.) Quantitative Risk Assessment: Concepts
and Methodologies. Lewis Publ.
Dixon, P.M. 1997. Nonparametric tests of no trend in population size. In preparation.
Dixon, P.M. and Pechmann, J.H. 1996. Pitfalls of power analysis: the case of declining amphibian populations. In review.
Hairston, N.G. Sr. 1996. Predation and competition in salamander communities. In Cody, M.L. and Smallwood, J. (eds) Long-term Studies of Vertebrate Communities. Academic Press, New York NY.
Mead, R. 1988. The Design of Experiments. Cambridge Univ. Press, Cambridge, UK.
Schuirmann, D.J. 1987. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmokinetics and Biopharmaceutics 15:657-680.
Semlitsch, R.D., Scott, D.E., Pechmann, J.H.K., and Gibbons, J.W. 1996. Structure and dynamics of an amphibian community: evidence from a 16-year study of a natural pond. In: Cody, M.L. and Smallwood, J. (eds) Long-term Studies of Vertebrate Communities. Academic Press, New York NY.
Acknowledgments
This research was supported by Financial Assistance Award Number DE-FC09-96SR18546
from the U.S. Department of Energy to the University of Georgia Research Foundation.
U.S. Department of the Interior
U.S. Geological Survey
Patuxent Wildlife Research Center
Laurel, MD, USA 20708-4038
http://www.pwrc.usgs.gov/naamp3/naamp3.html
Contact: Sam Droege, email: Sam_Droege@usgs.gov
Last Modified: June 2002