Patuxent Wildlife Research Center

NAAMP III Archive - calling surveys
Home | Archive by Alphabetical Order | Archive by Category

A Heuristic Approach to Validating Monitoring Programs Based on Count Indices (Damn the Statisticians, Full Speed Ahead)

Homines id quod volunt credunt.

Sam Droege

US Geological Service/Biological Resources Division
Patuxent Wildlife Research Center
Laurel, MD 20708-4038 USA

Validation is a process in which you, the person who wishes to start a new monitoring program, investigates the relationship between your proposed count index (e.g. number of salamanders hiding under boards, counts of calling frogs in wetlands, number of tadpoles caught in a sweep of a net) and the real number of critters in the area. If your index (or count) behaves properly (a graph of count vs. true population = a straight line. Figure 1) then your count would appear to be an unbiased index to population size. Congratulations. If your index behaves improperly (graph does not yield a straight line. Figure 2) then you need to make some corrections to your index.

Sounds simple, really, but the devil is in the details. How, for example, would you validate an index that functions at the level of a state or province, the very level that AMP groups primarily target? Well, it can't be done. But you could conceivably validate your index using local populations and assume that the relationship holds for larger units of land (i.e., your sampling scheme isn't biased and the situation is stable over large geographic areas). For each species (or a representative collection of species) one would have to determine the true populations size (through a total count or an estimate based on a mark-recapture study) and then compare that to your proposed index, measured at the same site. In the literature, there are only a handful of examples where researchers have attempted to validate animal indices. (Aside: I am collecting examples of validation surveys, if you know of any could you email me at Thanks.).

A number of factors can prevent or make these studies prohibitively expensive:

A daunting task.

Over the past few years of working with the North American Amphibian Monitoring Program (AMP) group, I have wrestled many times with the divine notion of validation...and lost, or at least had my hip put out of joint. In this presentation, I would like to explore a heuristic alternative to the experimental validation process.

If you query statisticians about how one might go about interpreting the data generated by a monitoring program whose index has not been validated, that is, one where you don't explicitly know the relationship between your count and the total population, be sure to turn on your force field, because their phasers are on full.

Witness Burnham's (1981) summarizing remarks for a session at a monitoring symposium.

"Some general points concerning the use of bird counts to estimate bird abundance that I want to emphasize are:"

"(1) Using just the count of birds detected (per unit effort) as an index of abundance is neither scientifically sound nor reliable. Many papers in this symposium illustrate this fact, in effect, whether the authors so intended or not."

Nichols and Conroy (/statistics/statist.html), in their well thought out dichotomous key to choosing a monitoring technique, write the following in the one puny branch of their key that has to do with indices (the remaining branches use some form of density estimator, not indices).

"Situation (common): Experience suggests that the index and population density are related, perhaps monotonically. Examples: track and pellet surveys for deer; coyote vocal responses to distress calls, sirens; scent station surveys of foxes, bobcats."

"Is it possible to census or estimate abundance for use in index calibration? "


"Method to Use: Calibrate index with censuses of population or estimation methods having limited or known bias (e.g. mark-recapture), then use a double-sampling design to estimate population density from the index. "


"Method: Use the index (with proper replication, stratification, and recording of nuisance variables as in A) recognizing that it may or may not be correlated with actual abundance. (OPTION OF LAST RESORT - RELIES ON UNTESTED ASSUMPTION!)"

So, the take-home message here is to never bring up religion, politics, sex, or unvalidated indices at the next cocktail party you attend unless you are very, very sure no statisticians are present. Biologist's though are hardly exemplary when it comes to confronting the rogue index. If they admit at all (unlikely) that their program is biased (which it most certainly is) they tend to glibly acknowledge the existence of biases, but then proceed to analyze and interpret the data as if those biases had been statistically expunged through public confession. Witness what Droege (1990) wrote summarizing the impact of these unknown biases on Breeding Bird Survey data (an index based on the number of birds detected from a single point).

"Statistical analyses of these data and their subsequent interpretations should dwell on the patterns of population change rather than on the magnitudes of calculated trend and variances."

This author then proceeds to spend the next few year publishing estimates of population trend using this data set sans further caveats (statistically, this situation is known as caveat lector, but I digress).

Here then lies the problem, the traditional statistician argue for statistically validating indices before developing a new monitoring program or interpreting the data of existing ones. But, biologists fear statisticians, and their numanistic black magic, and seeing no reasonable way of validating these large-scale surveys, pretend that their data mirrors reality, wave-off the statisticians, and play phrenologist with the data - blindly interpreting every bump and wiggle in their trend lines.

I believe that there is a middle ground. It should be possible to use known relationships between a species (or group of species), population size, and changes in their delectability to establish a general notion of whether an index will, or is, yielding interpretable information and what types of conclusions may be drawn.

So, here goes: Figure 1 shows the relationship that one would like to have exist between a count and the true population size.

However, in Figure 2 you can see that all sorts of interesting relationships could exist between population size and count . For most surveys this is most likely the situation.

In Figure 2, line type 1 there is an decrease in relative detectability (per bird) as population increases. Under this situations the slope of a trend line based on data with this form of bias would be underestimated (both for negative and positive trends) and the shape deformed, but the sign of the trend would remain the same as that of the true population change.

In Figure 2, line type 2 there is a increase in the relative detectability (per bird) as population increases. Under this situation the slope of the trend would be overestimated (both for negative and positive trends) and the shape deformed, but the sign of the trend would remain the same as that of the true population.

In Figure 2, line type 3, depending on the range that the true population changes over, the shape of a trend line based on data if this type would not only be deformed, but could also change sign; a disastrous circumstance.

In Figure 2, line type 4, everything is cool and there is no bias in shape or slope of a trend line based on this type of data, even though the relationship between population size and count is not 1 to 1.

The Case For Calling Anuran Surveys

So, let's explore the case of calling frog surveys. I would like to use the calling frog system partially because calling frogs monitoring programs are being implemented left and right and also to illustrate a format that could be used by others to think about the effects of bias on their survey programs.

Describe the Index: Here the index is the number of calling frogs and toads counted at wetland stops along roadsides at night.

Describe the Biases that are Density and Time (over several years) Independent:

These are biases whose impacts are acutely felt over the short run, but wash out over extended periods of time (10+ years). Some of these biases are relatively easy to correct for. For example, weather effects can be minimized by standardizing when and under what conditions amphibian surveys are run, as has been done by the group that developed the calling survey protocols for NAAMP. But at some level shifting seasons, droughts, storms, and the vagaries of weather cannot all be accounted for and the result will be that a given year's data may be biased high or low. The consequence of those biases makes short term comparisons (2-5 years) of population difficult to interpret, but over longer runs of years, these effects (because they are random) become extra noise in the system, not a bias.

It should be noted that the general variability of amphibian calling surveys is so great that it is difficult to determine trends over periods of time shorter than 5-10 years, under any circumstance.

Describe the Biases that are Density Dependent

Calling saturation is the only bias, that I can think of, that is likely linked to changes with density of calling amphibians. The general situation is that as the number (and clumping) of calling frogs and toads increases, the ability of the human observer to differentiate or count them rapidly declines. The Wisconsin calling amphibian scale uses only 3 levels (also adopted within the NAAMP protocols) of calls.

The bias in this situation is severe, even though I know of no validation studies that have directly explore the relationship between count and true population. As the population increases the index quickly saturates and nearly flattens out (line type 1 in Figure 2). Note though that in no case does the index act like Line 3 in Figure 2, where an increase in population size result in a decrease in the index.

The result of bias in this situation means that increases and decreases are disguised (appearing to be nearly zero) when amphibian populations are high and index is saturated. At lower population levels, the index (which includes more sites with calling index values of 1 and 2) tracks the population change in a more realistic manner.

Despite the severity of this bias (creating a great disjunct between the true population size and the index), it presents a conservative estimate of frog population trend, that is it causes trends to be underestimated rather than overestimated.

Describe the Biases that are Time (over several years) Dependent:

Throughout North America the numbers, coverage, and average age of trees is increasing. As tree coverage changes so does the mix of amphibian populations. Some species are favored and others not. These are the types of changes we would like a monitoring program to track. However, one consequence of having more trees around is that sound doesn't carry as far. Frogs in a pond blocked by a forest are more difficult to hear that the pond surrounded by corn fields.

Because we are in an increasing forest situation (in most parts of the continent) this decreases our ability to detect population changes (a negative bias). However, it works a bit differently than a bias that changes with the density of animals (as illustrated by calling saturation). With calling saturation, the bias tends to adjust the trend line toward zero, or no population change (but does not ever reverse cause the trend to go from negative to positive or positive to negative). With an increase in tree cover, the bias causes both positive and negative trends to appear more negative. In this situation you can have a stable population (no trend) that registers a decline in your monitoring program solely because the delectability of the count has declined, making declining populations appear to be more negative than they really are, and inducing you to cry wolf.

Observers with deteriorating hearing, also create a negative bias. Now, it could be argued that if you have many observers participating that relative hearing level would be randomly distributed throughout that population yielding extra noise in the system but not bias. However, many researchers now statistically factor out the differences among observers in their analyses of trend (e.g. they control for the folks who are over-estimators and those who are underestimators) by making observers co-variables. This reduces the noise in the data caused by differences in how people count, but assumes that observer's hearing doesn't change over time (which it can and ultimately does) and thus builds a negative bias into the system (because observers hearing deteriorates as they age, resulting in them recording relatively fewer frogs and toads).

Dealing with hearing loss in observers can be dealt with in several ways. The best way would be to have strict guidelines for participation based on the ability to hear at the frequencies of the highest pitched amphibian species. It may be possible to develop a tape on an answering machine with a set of test tones that an observer would have to call each year to make sure their hearing is still good. People who cannot pass the test would have their results rejected or dropped for some species.


Calling frog surveys, as a technique, appear rife with bias, fortunately most sources of bias can be partially controlled for or disappear with time into general system variability. Unfortunately, calling saturation prevents us from interpreting the calculated slope values. For example, if we calculate that Spring Peepers (Pseudacris crucifer) are declining by 2 per cent per year in a region, what we actually know is that the rate of decline for Peepers is very likely greater than 2 per cent per year (estimates of trend lines have variances too, consequently, we can't say that it is certainly greater than 2 per cent, only that it probably is). So, calling surveys permit us to talk about whether a population is increasing or decreasing, gives us a minimum estimate as to how fast that population is changing, and if the calculated trend is significant, then there is strong cause to believe that the population is changing quickly.

Because changes in hearing can potentially confound our ability to look at trends, it is important to develop an even-handed test for hearing loss among observers, test them each year, and eliminate data that do not conform. Changes in vegetation are something that will affect calling frog and toad data over spans of decades. Because there are good ways to use satellite and aerial maps to look at changes, it is something that can be developed and corrected for as programs proceed.

By stepping through a process similar to the one outlined here (a multi-step process, not without parallels in the substance abuse world) , it should be possible to explore the consequences and effects of bias on any proposed new survey or count. The end result should be a clearer understanding of what factors need further testing, the type of information such a survey will produce (could be quite different from what was expected), and an idea whether such a survey will yield misleading results and should not be attempted (or at minimum modified).

True statistical validation is, in many ways, unnecessary if it appears that the general behavior of biasing factors can be outlined and the consequences are acceptable. If the ecological and monitoring system is a simple one (few biases are present) then it could be possible to validate a survey and develop a set of density correcting factors for the index. However, I believe that in most cases, a survey's constraints are so numerous as to defy modeling and that indices that are developed should be ones that talk about trends in relative or qualified terms.

Vaguely scratching
Numbers wild
Never matching
Counting crows
Or eels electric
Badly biased
Or vaguely valid
Cucumber Salad

Acknowledgments: Thanks go to all my statistical heros at Patuxent for giving me feedback on these ideas.


Burnham, Kenneth P., 1981, Summarizing Remarks: Environmenntal Influences, In C. John Ralph, and J. Michael Scott, Editors, Estimating Numbers of Terrestrial Birds: Lawrence, KS, Allen Press, Inc, p. 324-325.

Droege, Sam. 1990, The North American Breeding Bird Survey, In Sauer, J. R., and Droege S., Editors, Survey Designs and Statistical Methods for The Estimation of Avian Population Trends, USFWS, p. 1-4.

U.S. Department of the Interior
U.S. Geological Survey
Patuxent Wildlife Research Center
Laurel, MD, USA 20708-4038
Contact: Sam Droege, email:
Last Modified: June 2002