Survey Design for Calling Amphibians
by Paul Geissler, October 13, 1995
Importance of a Random Sample
Unemployment Survey Example
Literary Digest Polls Example
Mourning Dove Survey Example
Implications for Amphibian Monitoring
The selection of sites for surveying amphibians is very important. For example, Mossman et al. (1995) reported that 10 of 11 Wisconsin amphibian species had negative trends (3 "significant") on the surveyed sites. However, that result does *NOT* imply a decline in the Wisconsin populations unless it can be demonstrated that the sites are representative of amphibian habitat in Wisconsin. Observers subjectively selected the wetlands to be sampled, and they are likely to select good habitat where amphibians are present. With normal variation, good sites are expected to decrease and fair sites are expected to increase. Thus the Wisconsin results are expected because of the method of site selection and should not be used as evidence for declines in amphibian populations.
Importance of a Random Sample
The American Heritage Dictionary (Soukhanov 1995) defines "random" as "1. Having no specific
pattern, purpose, or objective 2. Statistics. Of or relating to the same or equal chances or
probability of occurrence for each member of a group. - idiom. at random: Without a governing
design, method, or purpose; unsystematically." "Synonyms: chance, random, casual, haphazard
desultory. These adjectives apply to what is determined not by deliberation or method but by
accident. Random implies the absence of a specific pattern or objective and suggests a lack of
direction that might or could profitably be imposed: struck by a random shot; took a random
guess." In the context of surveys, we mean the statistical definition, not the common conception
of an unplanned event. The selection of a sample must be determined by careful deliberation and
planning, using a table of random numbers to select specific sites. Sites must not be substituted
without a substantial reason.
Cochran and Cox (1957:6-7) explained that "as would be expected, the type of statistical
inference that can be made from a body of data depends on the nature of the data. It is easy to
conduct an experiment in such a way that no useful inferences can be made. . . . In order to avoid
. . . biases we need some means of insuring that a treatment will not be continually favored or
handicapped in successive replications by some extraneous source of variation, known or
unknown. This is done by the device known as randomization, due to Fisher . . . . Tests of
significance and confidence limits can be constructed, using only the fact that randomization has
been properly applied in the experiment."
Deming (1950:9-14) provides an excellent discussion of this issue as it relates to surveys. Amphibian surveys in which the sites are selected by the observers are referred to as "judgement samples."
In his daily practice the statistician must constantly be aware of two different types of
samples, probability-samples and judgment-samples.
Probability-samples, for which the sampling errors can be
calculated, and for which the biases of selection, nonresponse, and estimation
are virtually eliminated or contained within known limits.
Judgement samples, for which the biases and sampling errors
can not be calculated from the sample but instead must be settled by judgment.
The two types of surreys are not distinguished by the questionnaire and instructions,
but by the procedures for selecting the sample, for calculating the estimates,
and for appraising the precisions of these estimates. A probability-survey is
carried out according to a statistical plan embodying automatic selection of
the elements (people, farms, manufactured material) concerning which information
is to be obtained. In a probability-sample neither the interviewer nor the elements
of the sample have any choice about who is in the sample. If a sample of individuals
is desired, the design of a probability-sample must give rules for finding these
individuals; it is not sufficient that it give rules that lead to a random selection
of households, leaving the selection of the individuals in these households
to the judgment of the interviewer. A probability-sample demands a competent
field-force and careful execution of the instructions at all stages of the work.
It is also to be noted that in a probability-sample the procedure for forming
the estimates is automatic, being laid down beforehand as part of the sampling
design. Unless these conditions are met, probability theory can not be used
to appraise the precision of the results, and a survey can not be characterized
as a probability-sample.
A probability-sample will send the interviewer through mud and cold, over long
distances, up decrepit stairs, to people who do not welcome an interviewer; but such
cases occur only in their correct proportions. Substitutions are not permitted: the rules
Actually, a pure probability-sample with complete response is a rarity. In practice
there will usually be some nonresponse and some departure from instructions. An upper
limit to the biases so created may often be assigned, nevertheless, through knowledge of
the subject matter, in which case the survey will still satisfy the definition of a
probability-sample, viz., a calculable error. Thus, suppose that in a survey of 1000
households, 500 are found to be users of a certain product, 450 are found to be nonusers,
and 50 were never found at home. By assigning the 50 nonresponses first to the users
and then to the nonusers, upper and lower limits to the mean square error of the results
may be calculated. . . . In contrast, the results from a judgment-sample are obtained by
procedures which depend to some appreciable part on I. a judgment selection of
"typical" or "representative" counties, cities, road-segments, blocks, individual people,
households, firms, farms, articles, or packages concerning which information is to be
obtained; or on ii. weighting factors that are prescribed arbitrarily or by expert judgment
to make allowances for certain sizable segments of the population whose magnitudes and
characteristics are unknown and not determined by the sample. The following examples
may be noted in this respect: the assumption that nonresponding groups are similar to
responding groups; that homes without telephones are similar to homes with telephones;
that packages that are difficult to get at are similar to packages on the out side of a pile.
There are many problems in which the survey itself, through (e.g.) failure of proper
design, failure of the questionnaire, or for lack of sufficient response, fails to elicit
certain information that is needed in calculating the final estimates: in such cases the
survey is of the judgment type, whether originally intended thus or not. The "quota"
method is one type of judgment-sample. In this method an interviewer is assigned to
procure (e.g.) 10 interviews with people conforming to certain sociological and economic
characteristics within a prescribed area, such as housewives who do not work full time
for pay, who own their homes, who belong in a certain economic level, a particular age-class, and live in a particular block, tract, or precinct. The quota method is subject to
the biases of selectivity and availability, besides the errors of incorrect assignment of
weights to the various classes of the population. This assertion, however, is not intended
to cast doubts on the quota method, but to acquaint the reader with some of the
This book will deal entirely with probability-samples in other words, this is a book on
statistical theory, not subject-matter or manipulation of data. Judgment-samples, so far
as I know, are not amenable to statistical analysis. I know of no way to remove the
biases of selectivity, availability, nonresponse, and incorrect assignment of weights.
Moreover, I know of no way in which to calculate the standard errors of data from a
quota sample, the reason being that a particular man or house has no assignable
probability of coming into the sample; hence probability does not apply. It is more
important to learn something about the Noses of a judgment sample than about its
sampling errors. The usefulness of data from judgment-samples is judged by expert
knowledge of the subject-matter and comparisons with the results of previous surveys, not
from knowledge of probability. A skilled statistical theorist would be helpless in the
analysis of a judgement-sample if he were to depend on his knowledge of theory. It is a
fact, though, that some of the lessons regarding economy in the design (not analysis) of
probability-samples are equally applicable to judgment-samples. For example, theory
can assist judgment-samples in the choice of sampling unit, allocation of the sample to
economic levels and to urban and rural areas, and in the number of survey points.
Such remarks are not meant to imply that judgement-samples can not and do not
deliver useful results, but rather that the reasons why they do when they do are not well
understood. Indeed, quota and other types of judgment-samples will undoubtedly
continue to play an important role in research, and they will become more and more
useful as their strong points and weak points are more generally understood. Pilot
surveys are usually judgment-samples. In trying out a questionnaire or set of
instructions, or for getting a rough idea of how much a certain operation is going to cost,
or what the refusal rate is likely to be, it may not be necessary or desirable to carry out a
probability-survey; it will often be sufficient to conduct a trial in a particular county or
city or even in a few blocks, chosen by judgment. Examples abound. The proposed
instructions and questionnaire for the decennial census of population in 1940 were put to
a test in St. Joseph and Marshall Counties in Indiana in August 1939. These counties
were not selected as a probability-sample, but because they contained an abundance of
"typical" situations. They served the purpose well, as they focused attention on weak
points of the instructions and the questionnaire. Moreover, a large operation in two
adjoining counties provided a dress-rehearsal for the big census eight months later, as a
widely dispersed probability-sample would not have done. Much of the experimental
work in the planning of the 1960 censuses of population and agriculture is being
conducted in areas chosen by judgment.
As for comparisons of costs between probability- and judgment-samples, no
satisfactory basis for comparison is possible because the two types of survey are
different commodities and are not interchangeable. Price without knowledge of quality is
meaningless, and it is impossible to compare the costs of two proposed methods of
conducting study unless the precision and biases of the results of both methods are known
and controllable. In many of the surveys on characteristics of the population, of farms,
of agricultural production that are carried out by the government, a controllable and
measurable error of sampling and freedom from the biases of selection and nonresponse
are considered indispensable and cheaper than a wrong decision based on biased results.
Moreover, business, industry, and private research demand quality in government
statistics. For similar reasons there is a decided trend in private research in marketing
toward the use of probability-samples.
A relatively inefficient but unbiased design for a single (nonrecurring) probability-sample need not be costly to lay out. An inexpensive map and a visit to the library to
look at Census figures will often provide sufficient information for the delineation of
large roughly equal sampling units for single- or double-stage sampling. The
inefficiency of the design is then to be counterbalanced by taking a sufficiently large
sample. On the other hand, for a recurring survey, it usually pays to make more
elaborate preparations by providing several years' supply of small efficient sampling
units and listings so that smaller samples may be used month efter month.
Either way, a probability-sample demands careful field-work, constantly reviewed by
a competent statistician, with records and call- backs, proper training and supervision.
These safegnards cost money, but there is no alternative if demonstrable precision is
required. To say that the job can he done cheaper without tham is to confuse the issue,
as there can be no talk of price without a simultaneous measure of quality.
A judgment-sample can often be devised quickly without benefit of skilled statistical
assistance, which is sometimes very hard to find.
Remark 1. As already stated, strictly, there is hardly ever a pure probability-sample. The purest examples are the simple ones in which the universe to he sampled is by definition a file of cards: there are then no refusals or nonresponses unless some entries are illegible. However, as there were refusals, nonresponses, and inevitable errors of response in the original collection of the information on the cards, these imperfections will he carried over into any sample, even 100 percent, that is drawn from the cards.
. . . .
Remark 2. Statistical research has disclosed and explained several amazing facts
about sampling. It is entirely possible to build up a "sample" of people by adding a few
names here and subtracting a few there, so that the list finally agrees almost perfectly
with the last census and any additional information in regard to the proper proportions
by area, age-groups, sex, color, education, economic level, ownership of home,
telephone, and in fact with respect to almost any conceivably complex pattern. This is
what in lay language is sometimes described as "a perfect cross-section." In fact,
however, this kind of "sample" is extremely dangerous, as it may fail miserably to
correspond with the population of the country, city, or county that it was intended to
represent in regard is the characteristics that the survey is expected is measure (e.g., the
number of people intending to buy certain books or holding certain political opinions).
Such hazards are avoided in probability-samples.
Remark 3. Judgment is indispensable in any survey. It would be decidedly incorrect
to say that knowledge of the universe is not utilized in a probability-sample, and blind
chance substituted. In modern sampling, judgment and all possible knowledge of the
subject-matter under study are put to the best possible use. Knowledge and judgment
come into play in many ways in the design of probability-samples; for instance, in
defining the kind and size of sampling units, in delineating homogeneous or
heterogeneous areas, and in classifying the households into strata in ways that will be
contributory toward reduction of sampling error. There is no limitation to the amount of
judgment or knowledge of the subject that can be used, but this kind of knowledge is not
allowed to influence the final selection of the particular cities, counties, blocks, roads,
households, or business establishments that are to be in the sample; this final selection
must be automatic, for it is only then that the bias of selection is eliminated, and the
sampling tolerance is measurable and controllable.
Snedecor and Cochran (1980:438) express their opinion as follows:
Probability sampling has some important advantages. By probability theory it is
possible to study the biases and the standard errors of the estimates from different
sampling plans. In this way much has been learned about the scope, advantages, and
limitations of each plan. This information helps greatly in selecting a suitable plan for a
particular sampling job. As will be seen later, most probability sampling plans also
enable the standard error of the estimate and confidence limits for the true population
value to be computed from the results of the sample. Thus, when a probability sample
has been taken, we have some idea as to how accurate the estimates are.
Probability sampling is by no means the only way of selecting a sample. One alternative
method is to ask someone who has studied the population to point out average or typical
members and then confine the sample to these members. When the population is highly
variable and the sample is small, this method often gives more accurate estimates than
probability sampling. Another method is to restrict the sampling to those members that
are conveniently accessible. If bales of goods are stacked tightly in a warehouse, it is
difficult to get at the inside bales of the pile and one is tempted to confine attention to the
outside bales. In many biological problems it is hard to see how a workable probability
sample can be devised, for instance, as in estimating the number of houseflies in a town,
field mice in a wood, or plankton in the ocean. [I strongly disagree - Paul]
One drawback of these alternative methods is that when the sample has been obtained,
there is no way to determine how accurate the estimate. Members of the population
picked as typical by an expert may be more or less atypical. Outside bales may or may
not be similar to interior bales. Probability sampling formulas for the standard error of
the estimate or for confidence limits do not apply to these methods. Consequently, it is
wise to use probability sampling unless it is clearly not feasible or prohibitively
Jongman et al. (1995:24) suggest that "by not using random sampling, one can obtain a 'distorted
picture' (Snedecor & Cochran 1980) of a population. Biased sampling, such as choosing sites
from a vegetation tat are considered to be typical (key sites) or that have as many species as
possible, introduces an observer-tied bias. Other types of bias may be caused because certain
measurements have been performed with a frequency that depends on the season or the location
(e.g. near the laboratory vs. Far from the laboratory). Then the results of any statistical analysis
may be different from the results that would have been obtained if random sampling had been
applied. It is worthwhile to go to quite some effort to avoid this kind of bias"
Manly (1992: 4-5) points out that "truly random sampling is often difficult to carry out, and there
is a temptation to assume that a sample that is obtained in some convenient way is equivalent to a
random sample. Unfortunately, however, it is then very easy for a systematic bias in the sampling
procedure to distort estimates of key parameters to such an extent that a study becomes quite
The discussion of probability and judgement sampling in the statistical literature occurred during the 1920s and 1930s (Stephan 1948, Rossi 1983). Some statisticians supported judgement or purposive sampling instead of random sampling (Jensen, 1928), but a consensus developed in favor of random sampling. I will discuss two papers that demonstrate the problems resulting from judgment sampling issues, in a social rather than a biological context. However, the statistical principles are the same, and the results apply to amphibian surveys..
Unemployment Survey Example
Hogg (1930) discussed many of the same sampling problems we currently face in amphibian monitoring. She was considering unemployment surveys, but the sampling issues are the same. She noted that the problem of selecting persons or households to be studied is of paramount importance, because the "resulting aggregate of persons or households must be demonstrably representative of the population for which an estimate is desired, otherwise the survey will have little or no value." In an earlier survey in Buffalo, the principles of random sampling were not followed. Instead, nine considerable areas, varied in character and scattered in position were selected, and the report was based on their aggregate. There was no reason to suppose that the aggregate of the nine corresponds at all with the whole city's occupational and racial composition, and therefore the unemployment estimates would not accurately estimate unemployment in Buffalo. The authors of the Buffalo study stated that the results do "not purpose to show employment and unemployment for the whole city of Buffalo but merely for the persons enumerated." One must wonder why they did the survey, if the results could not be applied to the city. It is likely that any public report would be erroneously extrapolated to the whole city, regardless of the disclaimer. This early account of the problems resulting from non-random sampling has clear implications for amphibian surveys that also use the observer's judgement to select sample sites.
Literary Digest Polls Example
Willcox (1931) examined in detail two national surveys conducted by the Literary Digest on the
public's opinion about repealing prohibition (the eighteenth amendment and associated state
laws). In 1922, 10,108,437 ballots were mailed and 922,383 were returned. In 1930, 20,227,370
ballots were mailed and 4,806,464 were returned. The Literary Digest's mailing list was built-up
for advertising purposes based on users of telephones and automobiles. In spite of the enormous
sample, there were concerns about whether or not it well represented American public opinion.
1. The sample was 95% male, compare to 60% of the voters. It was believed that a larger
proportion of women favored retaining prohibition. In an effort to answer this objection, the
Literary Digest in 1922 also sent ballots to 2,268,101 women and received replies from 120,050.
Nineteen percent of the women and 21% of the men favored repealing prohibition. The Literary
Digest did not repeat this survey in 1930, but Scripps-Howard newspapers found that 82% of the
men and 70% of the women in their sample favored repealing prohibition.
2. It was claimed that country folk are more generally dry than city folk. In 1930, the Literary
Digest sample included 4.8% of the city folk, compared to 3.1 percent of the country folk.
Forty-five percent of the city folk and 33% of the country folk favored repealing prohibition.
3. The proportion of wets among wage-earners is thought to be greater than the proportion
among those classes with larger incomes from which the Literary Digest has drawn most of the
names. However, it was difficult to address this bias.
Willcox concluded that "In my opinion this [wage-earner] bias of the Literary Digest samples in
favor of the drys outweighs the other biases in favor of the wets. . . . But if others think
differently I can see no way to prove that my opinion is better than theirs." The take-home
message is that even with enormous samples, valid conclusions cannot be made unless probability
sampling is used. For example, it could be argued that it is not important to consider women's
opinions because their opinions are probably similar to men's opinions. But if women are not
asked, one can not be sure of the magnitude of the bias. Trying to correct the bias with another,
even more biased sample, does not help. Similar arguments could be made about the areas that
were not surveyed for amphibians, but if they are not surveyed, one can not be sure of the
magnitude of the bias.
There was a very public test of the alternate survey methods in predicting the 1936 Roosevelt-Landon presidential election (Rossi, et al. 1983:5). Literary Digest mail straw ballots with millions of respondents were pitted against small scientific surveys that were conducted by Gallup and Crossley, with about 1500 interviews each. The scientific surveys clearly won, demonstrating that "small but carefully drawn samples could do better than huge numbers picked from a partial sample frame with little or no effort to achieve reasonable response rates." Manly (1991:10-11) described that survey:
The Literary Digest poll of 1936 was carried out in the United States to determine in
advance what was to be the outcome of the presidential election to choose between the
Republican Landon and the Democrat Roosevelt. It is a classic example of a sample
survey that went wrong.
A total of 10 million survey forms were sent out to people on lists of telephone
subscribers, car owners, etc. About two and a third million responded and they were
strongly in favor of Landon for President rather than Roosevelt, with a ratio of three
supporting Landon for every two supporting Roosevelt. The election result was quite the
reverse, with Roosevelt winning 62% of the popular vote and carrying 46 out of 48 states.
There have been various explanations of why this survey gave a result so far from the
truth. Two obvious possibilities are:
(a) That economic status was strongly associated with voting preferences, and also with
being on the lists that were used for mailing
(b) That the voting preferences were different for respondents and non-respondents
One thing to note from this example is that an unrepresentative sample cannot necessarily be improved by making it bigger. The Literary Digest sample of over two million got a very precise estimate of the percentage support for Landon, but this was a percentage relating only to a small and apparently unrepresentative part of the entire body of voters.
Mourning Dove Survey Example
A national Mourning Dove Call-Count Survey was initiated in 1950 using routes along rural roads
selected by the observers (Nelson et al. 1951). Several abundance indices were evaluated
including 1) call-count routes, 2) records of doves seen by biologists of their normal travels, 3)
one week's records of doves seen by mail carriers while delivering mail, and 4) area counts by
game biologists, Four-H Club members, etc. The call-count route was selected as the most
Foote et al. (1958) reported on a test of the efficiency of mourning dove call counts in seven southeastern states, with particular reference to tests of a sampling design that will yield more reliable data.
The call count has been used as an index to the population of calling mourning doves
(Zenaidura macroura) since 1951. Fundamental research on this technique was
conducted from 1950 to 1956 as a part of the Cooperative Dove Study. . . . Analysis of
call-count data from the original management routes by life zones, biotic provinces,
soils, and political boundaries has suggested some fundamental relations between
populations and ecology. More conclusive evidence of these relations, however, has
awaited collection of call-count population data from a statistically appropriate
Although some of the original call-count routes were selected on an ecological basis, at
least in the Southeast, many of the routes may have been chosen because they were
known to sustain moderate to good dove populations or because they were convenient to
an observer. The degree of randomness in selection and of representativeness of area
and population of the original management routes was unknown. Data from these routes
have been used chiefly to denote changes in the population of calling doves on the same
routes from year to year. Attempts have been made to weight calling populations by
land areas for hunting-regulation information and for design of a nation-wide
nestling-banding program, using data from original management routes, because these
were the best data available on dove populations.
To permit area-to-area comparisons and proper weighting of call-count data from
geographic areas, the population data so used either should be obtained under principles
of randomization, or the sampling biases related to nonrandomness should be estimated
and used as correction factors. A test to compare data resulting from random sampling
with that obtained from the present system of sampling has been of high priority in the
over-all dove-management program. . . .
The call-count technique is limited in application to sections of the dove range traversed
by roads because it consists of 20 systematic auditory plots with about 3/8-mile radii, at
1-mile intervals, on which all doves heard are counted. It is limited to nonurban areas
where noise does not interfere with one's hearing calling birds, and is best adapted to
lightly traveled roads. Its adaptability in areas having high dove-calling populations is
unknown, but it is very efficient in areas having low calling populations. Weather, time
of the morning, season, and other factors that influence the counts are generally
understood, and the nation-wide counts have been rigidly standardized to eliminate much
of this innate variation. . . .
Throughout this paper, "original management" routes are regularly censured routes [selected by the observers] on which mourning doves have been counted for several years, while "random" routes are those especially selected for this study [using a stratified probability sample.]
. . . .
As compared to randomly selected routes, the presently employed dove call-count routes
in seven of the Southeastern States are positively biased and, therefore,
higher-than-average dove population areas are being sampled. The original
management routes in these seven states may have been selected purposefully and are not
representative of the dove population in this area. The differences between the random
sample and the original management sample were not great, which should be reassuring
to the administrator considering these data for hunting regulations.
Unless it can be shown that the bias in the mean is constant from year to year and from
area to area, detection of population change from original management-sample data has
an error potential. Because the bias in original management-route data tends to be
positive over the different population densities in the state strata, it might be inferred that
the original route bias would also be positive over year-to-year changes in density. This
could be tested only by continuing to sample both random and original management
routes for several years to determine if both sets of population data fluctuate
While the management-route data reflect calling-population changes on those routes
from year to year, these data cannot be shown to reflect actual population change from
year to year within the 7-state sampling area because of the relatively large bias and
because the year-to-year harmony between original management-route data and those
from a random sampling is unknown. In the state strata, the biases, while generally
positive, differ considerably in magnitude, so the original management data cannot be
used for area-to-area comparison with the accuracy necessary for management.
Comparisons of variance of sampling by state and by ecological-zone stratification indicate that
the latter is approximately 17 per cent more efficient than the former. This confirms information
from analyses of 1953 and 1954 call-count data. . . . Use of the data for population-weight
factors for banding analyses will be more precise if the data are stratified first by ecological
zones, then by states or areas of interest. This also applies to the original management data.
1) The presently employed dove call-count routes in seven of the Southeastern States are
positively biased, and higher than average dove population areas are being sampled by
the original management routes.
2) The difference between mean numbers of doves heard calling on original
management routes and on randomly selected routes was small, but approached
significance at the 95 per cent probability level. Data from the original management
routes have been used since about 1951 in planning hunting regulations. It should be
reassuring to the wildlife administrator that differences between the means of the two sets
of routes were not of greater magnitude than they were.
3) The nation-wide call-count sampling should be revised so that censuring is conducted
on routes selected by a statistically appropriate system.
4) Selection of routes should be on the basis of stratified random sampling by ecological
zones. For administrative purposes, data from ecological-zone strata can be combined
by states or by other political groupings to relate areas of harvest and production for
hunting-regulation purposes. To establish banding quotas or for other management or
research programs, weighting by ecological zones is more appropriate than weighting by
any other presently known system.
The number of routes necessary for sampling by ecological zones can be approximated
from the call-count route data gathered from 1951 to 1957 and from the random data
gathered in this study. The sample size required will depend upon the expected variance
in each ecological zone, and on the precision required from the estimates within each
area of interest. Use of the finite population-correction factors is recommended if the
sampling density exceeds five per cent of the cells.
5) Revision of the nation-wide sampling should be undertaken as soon as possible. To
permit use of the call-count data for management purposes, especially for hunting
regulations, in conjunction with randomization of the sampling, there are two
alternatives. (a) A completely new randomly selected series of routes can be designed
and both random and original management sets of routes censured the first year. The
original routes would be dropped thereafter. (b) A second alternative would be to select
20 per cent of the new routes at random during each of the next five years, after which
the entire nation-wide sampling would be on a statistically acceptable basis of
6) After the nation-wide sampling has been randomized, it may be necessary to select a
few new routes each year to replace those routes on which the census technique cannot
be applied. Suburban development, new main highways, airfields, and growth of
population centers will make annual replacement of a few routes desirable.
The dove "management" routes were replaced with stratified random routes, providing an overlap so that trend information could be maintained. The transition to random routes was completed in 1966, and this is the first year reported in current mourning dove status reports (Dolton 1995).
Implications for Amphibian Monitoring
I have reviewed some of the literature on survey design and on the problems associated with judgement samples. It is clear that the results of any survey in which the observers select the routes or any survey where the routes are selected by any judgement method will be questioned. It is difficult and expensive, if not impossible, to demonstrate that any judgement sample is representative of the population. Consequently, there will always be enough unanswered questions about the data that I think that agencies would be unable to make difficult or costly decisions to protect amphibians, because it could not be conclusively demonstrated that there is a real problem.
To avoid the problems discussed above and to achieve statistically valid estimates, one must use a
formally designed sample survey. Levy and Lemeshow (1991) define a "sample survey" as a
study involving a subset (or sample) of individuals selected from a larger population. Variables or
characteristics of interest are observed or measured on each of the sampled individuals. These
measurements are then aggregated over all individuals in the sample to obtain summary statistics
(e.g. means, proportions, totals) for the sample. It is from these summary statistics that
extrapolations can be made concerning the entire population. The validity and reliability of these
extrapolations depend on how well the sample was chosen and how well the measurements were
The first step is to define the population of interest to which we will extrapolate the results of a
survey. At first, one thinks of the statistical population as the biological population of all calling
amphibians in a state or province. It is not practical to assign serial numbers to all the frogs and
then to select them at random, using a table of random numbers. It is more feasible to divide a
state or province into a number of sampling units (blocks of land e.g. degree blocks,
topographical maps) and to select a sample of units at random. Then the relative abundance of an
amphibian species can be measured on the selected sampling units using a calling amphibian route
or other method. In practice, we cannot sample all land areas. If we restrict measurements to
stops along rural roads, the statistical population becomes frogs breeding within hearing of a rural
roads. Our estimates apply only to the frogs breeding within hearing of a rural roads. However,
we can argue that this segment of the anuran population is most susceptible to human disturbance
and, therefore, is the most at risk. Consequently, it seems reasonable to use the relative
abundance indices for frogs breeding within hearing of a rural roads as an early warning system
for all amphibians in a state or province.
I will present two methods for selecting call-count routes. The first is used for the Mourning Dove Call-Count Survey (MDCCS) (Foote et al. 1958:403-404):
A map of the 7-state area was gridded into squares (cells) 20 miles on a side, each
enclosing 400 square miles. A total of 785 sampling cells thus constituted the frame,
enclosing 314,000 square miles, which approximates the estimated 313,800 square miles
of dove habitat in these states. Major metropolitan areas were omitted from the frame;
also omitted were areas containing major bodies of water. The periphery of the 7-state
area was corrected by omitting approximately one-half the incomplete grids on the
A one-in-five sampling yielded 157 cells and provided approximately the same number
of random routes as there were regular annually censused call-count routes within the
test area. County road maps enclosing the area in each randomly selected cell were
obtained. The periphery of each cell was divided into 80 1-mile intervals, starting at the
northeastern corner and proceeding clockwise. Twenty lineal miles then were mapped
from that point which had an appropriate road closest to the 1-mile interval selected at
random, proceeding generally toward the center of the cell and staying within the cell
boundaries. Selected roads were those that appeared (by the map) to be traversable.
Intermittent roads, ferries, or major unbridged streams were avoided. Main highways
were not included, except for short distances when it was not possible to select a more
appropriate route. Urban areas, railroads, airfields, and roads adjacent to major
highways were avoided, and the route did not "double back" upon itself on roads closer
than 1 mile. Acute-angle turns were avoided to prevent duplication of auditory plots at
Slight adjustments of less than 2 miles occasionally were made to provide a readily recognizable starting point for the route. The starting point was determined to be that end of the route most easily reached from a main highway and population center. All routes were selected by a mapper unfamiliar with the area. When confronted with alternatives within the preceding limitations, roads were selected at random.
Another procedure is used with the Breeding Bird Survey (BBS) (Bruce Peterjohn, personal communication):
ESTABLISHING NEW BREEDING BIRD SURVEY ROUTES
Random Routes - New random routes are established only if most of the routes within a
state/province are currently being run. The local coordinator must also feel that there
are enough available people willing to run new routes and also feels capable of thing on
the burden of assigning new routes.
If new routes can be established, then one route is added to every degree block of the
state or province. For each block a random set of latitude and longitude minutes is
chosen as are one of the cardinal directions. The random starting point will not likely
land on an appropriate road for a Breeding Bird Survey. Therefore the true starting
point of the new route will be the nearest appropriate road to the random starting point.
Appropriate roads are secondary roads that have relatively light traffic (fewer than 5
cars per stop on a weekday) and are state or county maintained. If possible the starting
point should also include some easily identifiable feature such as a road intersection,
bridge, or curve so that the observer can find it early in the morning.
Once an appropriate starting point is found, the actual route must move toward the
random direction. However, with special exceptions, the route cannot cross degree block,
state/province, or stratum lines and they cannot follow another route's path. When
laying out the route the path taken should follow the chosen direction as much as
possible, while staying on maintained secondary roads. If a route must traverse small
sections of heavily traveled roads, stops along these sections can be skipped. Connecting
roads between large towns should be avoided as should roads that are poorly
maintained. In all cases the observer should be asked to scout the newly laid out route for
problems. They can either be given explicit instructions for changing the route or be
asked to inform this office. The numbers given new routes should follow a regular
pattern, the best usually being some multiple of 100 of the routes already existing within
the degree block. The name of a new route should be the largest town along the route. If
no town exists then some dominant landscape feature should be used. Rivers and creeks
should be avoided as names, for obvious reasons.
Non-random Routes - These are only established at the request of an observer or a
researcher doing a special project. They are often areas of special ecological or
ornithological interest. (A section on the National Parks special project will be inserted
here once all the details have been settled). Non-random routes are not used in most
analyses of trend and are given 900 series numbers so that they can easily be identified.
Alaska and the Far Northern Provinces - These areas present special problems due to
scarcity of roads. These routes cannot be considered random in most senses. Some of
these routes are run by boat or even off-road vehicles. Because of road limitations some
are shorter than 50 complete stops. These are given 800 series numbers. These BBS's
are not used in most calculations of trend and a system of regular analysis is not yet in
use due to the relative recency of their creation.
I prefer the method used with the dove survey (MDCCS) because all rural roads have about the
same probability of selection. With the Breeding Bird Survey (BBS), rural roads on the periphery
of roadless areas have a higher probability of being selected than roads that are near other roads.
To see this effect, take a map and draw a line around the area that is closest to each road. The
probability that a road is selected for the BBS is proportional to the area that is closest to it. On
the other hand, the BBS approach tends to spread out the routes so that the sample is not
concentrated as much were there is a higher density of roads. However, with each method, a
route with stops is used to sample the amphibians calling within an area sampling unit (20 mile
square for MDCCS and a degree block for BBS). A random selection using a table of random
numbers is used to avoid biases in the selection of the route. However, some bias associated with
restricting stops to rural roads is unavoidable. I argued (above) that this bias would favor the
early detection of problems, because areas close to roads would tend to have more human
The stops on a route within an area sampling unit are a cluster sample, because they were selected together and are contained within the unit (Levy and Lemeshow 1991:175-211). Stops on a route must be analyzed on as a unit, because they were selected together and are not independent. Statistical tests and confidence intervals require independent observations and stops do not qualify. Two events (A and B) are independent if the probability of A given that B has occurred is equal to the unconditional probability of A [i.e. P(A|B)=P(A) and P(B|A)=P(B)] (Mendenhall et al. 1981:47). This technical definition is in agreement with the common defintion of "not determined or influenced by someone or something else; not contingent: a decision independent of the outcome of the study" (Soukhanov 1995). Here, the probability of A is the same whether or not B occurs. If A and B are counts of amphibians at two stops on the same route, A and B are not independent because knowing B will provide one with information about A. This is because points that are close together tend to be similar because:
* The mean temperature tends to decrease as one moves away from the equator and temperature affects amphibian calling.
* The number and type of ponds and other water bodies tend to be similar within a region within a state or province.
* Amphibian species occur only within their geographic ranges and may be abundant in parts of their ranges and rare in other parts.
* The counts at stops on the same route are all made on the same night and the temperature and moisture conditions on that night affect the calling.
There may be substantial differences in the habitat or amphibian populations among the stops along a single route, but they are still not independent because they were selected and observed together as a cluster. To demonstrate independence, one would have to show that amphibians are equally abundant in all areas of a state or province and that they call equally on all nights regardless of temperature and moisture conditions within the range accepted for the survey.
Cochran, William G. and Gertrude M. Cox. 1957. Experimental design. Wiley, New York.
Deming, William E. 1950. Some theory of sampling. Cover Publications, New York.
Dolton, David D. 1995. Mourning dove breeding population status, 1995. U.S. Fish and Wildlife Service, Office of Migratory Bird Management, Laurel, MD.
Foote, Leonard E., Harold S. Peters, and Alva L. Finkner. 1958. Design test for mourning dove call-count sampling in seven southeastern states. J. Wildlife Management 22:402-408.
Hogg, Margaret H. 1930. Sources of incomparability and error in employment-unemployment surveys. J. Amer. Statistical Assoc. 25:284-294.
Jensen, Adolf. 1928. Purposive selection. J. Royal Statistical Assoc. 91:541-547.
Jongman, R.H.G., C.J.F. ter Braak, and O.F.R. van Tongeren. 1995. Data analysis in community and landscape ecology. Cambridge Univ. Press, Cambridge, UK.
Levy, Paul S. and Stanley Lemeshow. 1991. Sampling of populations: methods and applications. Wiley, New York.
Manly, Bryan F.J. 1992. The design and analysis of research studies. Cambridge Univ. Press, Cambridge, UK
Mendenhall, William, Richard L. Scheaffer, and Dennis D. Wackerly. 1981. Duxbury Press, Boston.
Mossman, Mike, Paul Rasmussen, John Sauer, Sam Droege, and Lisa Hartmant. 1995. Sample size estimation for amphibian calling surveys and some surprising trends from an 11-year analysis of Wisconsin Frog and Toad Survey data. Second annual meeting of the North American Amphibian Monitoring Program in Burlington, Ontario, September 27-29, 1995.
Nelson, Dan, Leonard fotte, James E. Keeler, Harold Alexander, Frank A. Winston, Dan M. Russell, John Newsom, Henry Bobbs, Jr., Donald G. Allison, Harold B. Poole, and Hjames W. Hammond. 1951. Statistics as a tool in measuring dove inventories. in Tim Fendley (ed) 1985 Proceedings of the first - sixth annual conferences, Southeastern Association of Game and Fish Commissioners, 1947 - 1952. Southeastern Association of Game and Fish Agencies.
Rossi, Peter H., James D. Wright, and Andy B. Anderson. 1983. Sample surveys: history, current practice, and future prospects. in Peter H. Rossi, James D. Wright, and Andy B. Anderson (eds). Handbook of survey research, Academic Press, San Diego.
Soukhanov, Ann H. (Ed.). 1995. The American Heritage Dictionary, electronic edition. Microsoft
Snedecor, George W. And William G. Cochran. 1980. Statistical methods. Iowa State Univ. Press, Ames.
Stephan, Frederick F. 1948. History of the uses of modern sampling procedures. J. Amer. Statistical Assoc. 43:12-39
Willcox, Walter F. 1931. An attempt to measure public opinon abut repealing the eighteenth
amendment. J. Amer. Statistical Assoc. 26:243-261.
Paul H. Geissler, National Ecological Surveys Team, Office of Inventory and Monitoring
National Biological Service, 12100 Beech Forest Road, Laurel, MD, USA 20708-4038
Tel. 301-497-5780, FAX 301-497-5784, Paul_Geissler@usgs.gov, HTTP://www.im.nbs.gov