USGS Home
Patuxent Wildlife Research Center

NAAMP III Archive - statistical issues
Home | Archive by Alphabetical Order | Archive by Category

Testing For No Trend

Philip M. Dixon and Joseph H.K. Pechmann
Savannah River Ecology Lab.
University of Georgia
Drawer E, Aiken SC 29802

For more information, contact:
dixon@srel.edu (statistics), pechmann@srel.edu (amphibians)

[ Abstract ]
Introduction
Traditional statistical tests of no trend are useful to evaluate whether population size is declining or increasing. They are less useful when the intent is to evaluate whether or not population size is stationary. A non-significant result is not evidence that the population is stationary.

Non-significant results from a traditional test may result from:

  1. small sample sizes,
  2. large random fluctuations in abundance,
  3. poor choice of test, or
  4. a stationary population.



Limitations of power analysis - 1

Previous approaches have focused on power analysis and survey design. A typical calculation is to find a sample size (number of years) for which a given trend is likely to produce a statistically significant result. Such calculations require a priori specification of the true trend and the magnitude of random variation.

If observed estimates of trend and variation are used in power calculations, the estimated power is simply a function of the p-value (Mead 1988, Dixon and Pechmann 1996). Such calculations provide no additional insight into the nature of non-significant results.


Limitations of power analysis - 2

A second problem with traditional tests is that a test may be too powerful. If the sample size is large or the residual variation small, then a biologically insignificant trend (e.g., numerically close to zero) can be statistically significant. The test of no trend is not a test of whether the trend is biologically important.

Finally, it is extremely unlikely that a trend will truly be exactly zero. Hence, the null hypothesis of zero trend is probably false and the p-value is as much a measure of sample size as of trend.


Example: amphibian populations

Trends in abundance of four amphibian populations (Ambystoma talpoideum and A. tigrinum at Rainbow Bay, SC and Desmognathus monticola and D. ochrophaeus at Coweeta Hydrological Lab, NC) are shown in Figure 1.

These four populations include examples of populations with no apparent / small trend (A. talpoideum and D. monticola), strong declines (A. tigrinum) and strong increases (D. ochrophaeus), and populations with large (Ambystoma spp) and small (Desmognathus spp) annual variation.

Further information on these populations and the data collection is given in Semlitsch et al. (1996) and Hairston (1996).


Model for trend

We will use a simple log-linear model for trend in population size:

log (Nt + 1) = A + B t + et

The slope, B, describes the linear trend; it is estimated by b, the least-squares regression slope.

Estimated trends are superimposed on the data in Figure 1. Slopes for A. tigrinum and D. ochrophaeus are significantly different from zero, but those for A. talpoideum and D. monticola are not (Table 1).

Are the non-significant results from A. talpoideum and D. monticola proof that the populations are stationary? NO, for the reasons listed above.

What we need are different statistical methods.


Equivalence tests

We illustrate statistical methods to test whether a population is stationary. Stationary here means that the trend is ecologically close to zero, i.e., within some region that is considered equivalent to no trend. Although the discussion and example will focus on trends, the method of equivalence testing can be applied to other statistical problems, such as comparing two means.

The key to an equivalence test is to define an ecologically relevant equivalence region. Values of the trend inside this region are considered equivalent to zero.


Equivalence hypotheses for a trend:

Null hypothesis is that the trend is outside the equivalence region

(Bl, Bu), i.e. H0: B <= B1 or B >= Bu.

The alternative is that the trend is inside the region

Ha: Bl < B < Bu.

Many statistical tests have been proposed, but there is no optimal one. Instead there is a trade-off between test size, power, and shape of the rejection region. The currently favored approach to test the equivalence null hypothesis is the two 1-sided tests method (Schuirmann 1987).




Two 1-sided tests method:

Reject H0: B <= B1 or B >= Bu if reject both B <= B1 and B >= Bu.

That is to say, conclude that the trend is equivalent to zero only if it is significantly more than the lower bound and significantly less than the upper bound.

Advantages:

But other tests may be more powerful. More information and examples of equivalence tests for differences are given in Dixon (1996).




Equivalence regions for trends:

What are reasonable upper and lower bounds for the equivalence region? These must be specified before an equivalence test can be used.

Ideally, these represent biological/ecological knowledge and judgment: what sorts of short-term trends are small for a particular population.

Suggested rule-of-thumb
A trend is small if the half-life/doubling time of a population is 10 years or longer when that population declines/increases by exactly the specified trend every year. This translates into log-linear slopes of -0.0693 and 0.0693. If population size declined with a slope of -0.0693, it would reach 1% of the starting size (pseudo-extinction) in 66 years.


Equivalence tests - 2:

If variation around the trend line is assumed to be independent, normally-distributed, and equi-variant, then each 1-sided test is a 1-sided t-test.

The subhypothesis H0a: B <= B1 is rejected if the t-statistic T1 = (b - B1)/sb is larger than the 1-sided critical value for a t-distribution with the appropriate degrees of freedom. The other subhypothesis is H0b: B >= Bu is rejected if the t-statistic Tu = (Bu - b)/sb is larger than the same t critical value. Or, one can compare the one-sided p-values to 0.05.

Alternatives to the t-test include bootstrapping, randomization tests, and nonparametric tests for trend. The appropriate statistical theory for most of these has been developed (Dixon 1996, Dixon 1997), but the tests may not be easy to implement.




Example - equivalence tests:

We will use 1-sided t-tests for linear trends in log-transformed abundance. Diagnostic tests and plots suggest that the test's assumptions are reasonable (Dixon and Pechmann 1996).

For D. monticola, the t-values are:

Tl = (0.0077 - -0.0693)/0.0130 = 5.92, p < 0.0001

Tu = (0.0693 - 0.0077)/0.0130 = 4.74, p < 0.0001

Both subhypotheses are rejected, so we reject the null hypothesis of "non-equivalence." There is strong evidence that the trend in D. monticola is significantly inside the equivalence region.




Example - equivalence tests - 2:

For A. tigrinum, one of the two subhypotheses is rejected, but not the other, so one cannot conclude that the trend is within the equivalence region (Table 2).

For A. talpoideum, neither subhypothesis is rejected.

For D. ochrophaeus, both subhypotheses are rejected, so the trend is significantly inside the equivalence region.




Relationship with the "usual'' test:

Results from the equivalence test are not necessarily opposite those of the "usual'' test of no difference because the rejection regions for the two tests are quite different.

The rejection region of a statistical test is the set of sample statistics which lead to rejecting the null hypothesis. For t-tests of trend, the relevant sample statistics are the estimated slope and its estimated standard error.

The rejection regions for the usual test (H0: B = 0) are to the right and left of the dotted line in Figure 2. The rejection region for the equivalence test is inside the dashed triangle in Figure 2.


Relationship between tests - 2:

If the two tests are considered together, there are four possible outcomes (Figure 2). Two are consistent:

  1. The trend is significantly different from zero and not significantly inside the equivalence region. This is strong evidence for an ecologically significant trend. As always, interpretation of that trend and its cause is a separate issue. An example is A. tigrinum.
  2. The trend is not significantly different from zero and significantly inside the equivalence region. This is strong evidence of no ecologically significant trend. An example is D. monticola.




Relationships between tests - 3:

Two outcomes seem to be inconsistent:

  1. The trend is significantly different from zero and also significantly inside the equivalence region. An example is D. ochrophaeus. This indicates that the trend is small and precisely known (due to small residual variation or large sample size). The trend is not zero, but it is considered to be ecologically equivalent to zero, based on the a priori specification of the equivalence region.
  2. The trend is not significantly different from zero and also not significantly inside the equivalence region. This region is the large region in the upper center of Figure 2. An example is A. talpoideum. This indicates that the trend is not estimated well enough to make strong conclusions. The sample size is insufficient relative to the residual variation.



Other points:

The two 1-sided tests approach has an associated confidence interval. The null hypothesis of non-equivalence will be rejected at alpha = 5% if and only if a 90% confidence interval for B is entirely within the equivalence region. Note that the size of the interval is 100%-2alpha, not the usual 100%-alpha, because there are two tests.

Results from equivalence tests depend critically on the equivalence region. If the region is narrowed to (-0.0346, 0.0346), i.e, a halving/doubling time of 20 years, D. monticola is still significantly inside the equivalence region, but D. ochrophaeus is not.

Power for an equivalence test can be calculated. It depends on the size of the equivalence region, the true trend, sample size, error variance, and type of statistical test.


Conclusions:


Literature Cited
Dixon, P.M. 1996. Assessing effect and no effect with equivalence tests. To appear in Newman, M. and Strojan, C. (eds.) Quantitative Risk Assessment: Concepts and Methodologies. Lewis Publ.

Dixon, P.M. 1997. Nonparametric tests of no trend in population size. In preparation.

Dixon, P.M. and Pechmann, J.H. 1996. Pitfalls of power analysis: the case of declining amphibian populations. In review.

Hairston, N.G. Sr. 1996. Predation and competition in salamander communities. In Cody, M.L. and Smallwood, J. (eds) Long-term Studies of Vertebrate Communities. Academic Press, New York NY.

Mead, R. 1988. The Design of Experiments. Cambridge Univ. Press, Cambridge, UK.

Schuirmann, D.J. 1987. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. Journal of Pharmokinetics and Biopharmaceutics 15:657-680.

Semlitsch, R.D., Scott, D.E., Pechmann, J.H.K., and Gibbons, J.W. 1996. Structure and dynamics of an amphibian community: evidence from a 16-year study of a natural pond. In: Cody, M.L. and Smallwood, J. (eds) Long-term Studies of Vertebrate Communities. Academic Press, New York NY.

Acknowledgments
This research was supported by Financial Assistance Award Number DE-FC09-96SR18546 from the U.S. Department of Energy to the University of Georgia Research Foundation.


U.S. Department of the Interior
U.S. Geological Survey
Patuxent Wildlife Research Center
Laurel, MD, USA 20708-4038
http://www.pwrc.usgs.gov/naamp3/naamp3.html
Contact: Sam Droege, email: Sam_Droege@usgs.gov
Last Modified: June 2002