Survey Design for Calling Amphibians

by Paul Geissler, October 13, 1995

Importance of a Random Sample

Unemployment Survey Example

Literary Digest Polls Example

Mourning Dove Survey Example

Implications for Amphibian Monitoring

Survey Sampling


The selection of sites for surveying amphibians is very important. For example, Mossman et al. (1995) reported that 10 of 11 Wisconsin amphibian species had negative trends (3 "significant") on the surveyed sites. However, that result does *NOT* imply a decline in the Wisconsin populations unless it can be demonstrated that the sites are representative of amphibian habitat in Wisconsin. Observers subjectively selected the wetlands to be sampled, and they are likely to select good habitat where amphibians are present. With normal variation, good sites are expected to decrease and fair sites are expected to increase. Thus the Wisconsin results are expected because of the method of site selection and should not be used as evidence for declines in amphibian populations.

Importance of a Random Sample

The American Heritage Dictionary (Soukhanov 1995) defines "random" as "1. Having no specific pattern, purpose, or objective 2. Statistics. Of or relating to the same or equal chances or probability of occurrence for each member of a group. - idiom. at random: Without a governing design, method, or purpose; unsystematically." "Synonyms: chance, random, casual, haphazard desultory. These adjectives apply to what is determined not by deliberation or method but by accident. Random implies the absence of a specific pattern or objective and suggests a lack of direction that might or could profitably be imposed: struck by a random shot; took a random guess." In the context of surveys, we mean the statistical definition, not the common conception of an unplanned event. The selection of a sample must be determined by careful deliberation and planning, using a table of random numbers to select specific sites. Sites must not be substituted without a substantial reason.

Cochran and Cox (1957:6-7) explained that "as would be expected, the type of statistical inference that can be made from a body of data depends on the nature of the data. It is easy to conduct an experiment in such a way that no useful inferences can be made. . . . In order to avoid . . . biases we need some means of insuring that a treatment will not be continually favored or handicapped in successive replications by some extraneous source of variation, known or unknown. This is done by the device known as randomization, due to Fisher . . . . Tests of significance and confidence limits can be constructed, using only the fact that randomization has been properly applied in the experiment."

Deming (1950:9-14) provides an excellent discussion of this issue as it relates to surveys. Amphibian surveys in which the sites are selected by the observers are referred to as "judgement samples."

In his daily practice the statistician must constantly be aware of two different types of samples, probability-samples and judgment-samples.

Probability-samples, for which the sampling errors can be calculated, and for which the biases of selection, nonresponse, and estimation are virtually eliminated or contained within known limits.

Judgement samples, for which the biases and sampling errors can not be calculated from the sample but instead must be settled by judgment.

The two types of surreys are not distinguished by the questionnaire and instructions, but by the procedures for selecting the sample, for calculating the estimates, and for appraising the precisions of these estimates. A probability-survey is carried out according to a statistical plan embodying automatic selection of the elements (people, farms, manufactured material) concerning which information is to be obtained. In a probability-sample neither the interviewer nor the elements of the sample have any choice about who is in the sample. If a sample of individuals is desired, the design of a probability-sample must give rules for finding these individuals; it is not sufficient that it give rules that lead to a random selection of households, leaving the selection of the individuals in these households to the judgment of the interviewer. A probability-sample demands a competent field-force and careful execution of the instructions at all stages of the work. It is also to be noted that in a probability-sample the procedure for forming the estimates is automatic, being laid down beforehand as part of the sampling design. Unless these conditions are met, probability theory can not be used to appraise the precision of the results, and a survey can not be characterized as a probability-sample.

A probability-sample will send the interviewer through mud and cold, over long distances, up decrepit stairs, to people who do not welcome an interviewer; but such cases occur only in their correct proportions. Substitutions are not permitted: the rules are ruthless.

Actually, a pure probability-sample with complete response is a rarity. In practice there will usually be some nonresponse and some departure from instructions. An upper limit to the biases so created may often be assigned, nevertheless, through knowledge of the subject matter, in which case the survey will still satisfy the definition of a probability-sample, viz., a calculable error. Thus, suppose that in a survey of 1000 households, 500 are found to be users of a certain product, 450 are found to be nonusers, and 50 were never found at home. By assigning the 50 nonresponses first to the users and then to the nonusers, upper and lower limits to the mean square error of the results may be calculated. . . . In contrast, the results from a judgment-sample are obtained by procedures which depend to some appreciable part on I. a judgment selection of "typical" or "representative" counties, cities, road-segments, blocks, individual people, households, firms, farms, articles, or packages concerning which information is to be obtained; or on ii. weighting factors that are prescribed arbitrarily or by expert judgment to make allowances for certain sizable segments of the population whose magnitudes and characteristics are unknown and not determined by the sample. The following examples may be noted in this respect: the assumption that nonresponding groups are similar to responding groups; that homes without telephones are similar to homes with telephones; that packages that are difficult to get at are similar to packages on the out side of a pile. There are many problems in which the survey itself, through (e.g.) failure of proper design, failure of the questionnaire, or for lack of sufficient response, fails to elicit certain information that is needed in calculating the final estimates: in such cases the survey is of the judgment type, whether originally intended thus or not. The "quota" method is one type of judgment-sample. In this method an interviewer is assigned to procure (e.g.) 10 interviews with people conforming to certain sociological and economic characteristics within a prescribed area, such as housewives who do not work full time for pay, who own their homes, who belong in a certain economic level, a particular age-class, and live in a particular block, tract, or precinct. The quota method is subject to the biases of selectivity and availability, besides the errors of incorrect assignment of weights to the various classes of the population. This assertion, however, is not intended to cast doubts on the quota method, but to acquaint the reader with some of the problems.

This book will deal entirely with probability-samples in other words, this is a book on statistical theory, not subject-matter or manipulation of data. Judgment-samples, so far as I know, are not amenable to statistical analysis. I know of no way to remove the biases of selectivity, availability, nonresponse, and incorrect assignment of weights. Moreover, I know of no way in which to calculate the standard errors of data from a quota sample, the reason being that a particular man or house has no assignable probability of coming into the sample; hence probability does not apply. It is more important to learn something about the Noses of a judgment sample than about its sampling errors. The usefulness of data from judgment-samples is judged by expert knowledge of the subject-matter and comparisons with the results of previous surveys, not from knowledge of probability. A skilled statistical theorist would be helpless in the analysis of a judgement-sample if he were to depend on his knowledge of theory. It is a fact, though, that some of the lessons regarding economy in the design (not analysis) of probability-samples are equally applicable to judgment-samples. For example, theory can assist judgment-samples in the choice of sampling unit, allocation of the sample to economic levels and to urban and rural areas, and in the number of survey points.

Such remarks are not meant to imply that judgement-samples can not and do not deliver useful results, but rather that the reasons why they do when they do are not well understood. Indeed, quota and other types of judgment-samples will undoubtedly continue to play an important role in research, and they will become more and more useful as their strong points and weak points are more generally understood. Pilot surveys are usually judgment-samples. In trying out a questionnaire or set of instructions, or for getting a rough idea of how much a certain operation is going to cost, or what the refusal rate is likely to be, it may not be necessary or desirable to carry out a probability-survey; it will often be sufficient to conduct a trial in a particular county or city or even in a few blocks, chosen by judgment. Examples abound. The proposed instructions and questionnaire for the decennial census of population in 1940 were put to a test in St. Joseph and Marshall Counties in Indiana in August 1939. These counties were not selected as a probability-sample, but because they contained an abundance of "typical" situations. They served the purpose well, as they focused attention on weak points of the instructions and the questionnaire. Moreover, a large operation in two adjoining counties provided a dress-rehearsal for the big census eight months later, as a widely dispersed probability-sample would not have done. Much of the experimental work in the planning of the 1960 censuses of population and agriculture is being conducted in areas chosen by judgment.

As for comparisons of costs between probability- and judgment-samples, no satisfactory basis for comparison is possible because the two types of survey are different commodities and are not interchangeable. Price without knowledge of quality is meaningless, and it is impossible to compare the costs of two proposed methods of conducting study unless the precision and biases of the results of both methods are known and controllable. In many of the surveys on characteristics of the population, of farms, of agricultural production that are carried out by the government, a controllable and measurable error of sampling and freedom from the biases of selection and nonresponse are considered indispensable and cheaper than a wrong decision based on biased results. Moreover, business, industry, and private research demand quality in government statistics. For similar reasons there is a decided trend in private research in marketing toward the use of probability-samples.

A relatively inefficient but unbiased design for a single (nonrecurring) probability-sample need not be costly to lay out. An inexpensive map and a visit to the library to look at Census figures will often provide sufficient information for the delineation of large roughly equal sampling units for single- or double-stage sampling. The inefficiency of the design is then to be counterbalanced by taking a sufficiently large sample. On the other hand, for a recurring survey, it usually pays to make more elaborate preparations by providing several years' supply of small efficient sampling units and listings so that smaller samples may be used month efter month.

Either way, a probability-sample demands careful field-work, constantly reviewed by a competent statistician, with records and call- backs, proper training and supervision. These safegnards cost money, but there is no alternative if demonstrable precision is required. To say that the job can he done cheaper without tham is to confuse the issue, as there can be no talk of price without a simultaneous measure of quality.

A judgment-sample can often be devised quickly without benefit of skilled statistical assistance, which is sometimes very hard to find.

Remark 1. As already stated, strictly, there is hardly ever a pure probability-sample. The purest examples are the simple ones in which the universe to he sampled is by definition a file of cards: there are then no refusals or nonresponses unless some entries are illegible. However, as there were refusals, nonresponses, and inevitable errors of response in the original collection of the information on the cards, these imperfections will he carried over into any sample, even 100 percent, that is drawn from the cards.

. . . .

Remark 2. Statistical research has disclosed and explained several amazing facts about sampling. It is entirely possible to build up a "sample" of people by adding a few names here and subtracting a few there, so that the list finally agrees almost perfectly with the last census and any additional information in regard to the proper proportions by area, age-groups, sex, color, education, economic level, ownership of home, telephone, and in fact with respect to almost any conceivably complex pattern. This is what in lay language is sometimes described as "a perfect cross-section." In fact, however, this kind of "sample" is extremely dangerous, as it may fail miserably to correspond with the population of the country, city, or county that it was intended to represent in regard is the characteristics that the survey is expected is measure (e.g., the number of people intending to buy certain books or holding certain political opinions). Such hazards are avoided in probability-samples.

Remark 3. Judgment is indispensable in any survey. It would be decidedly incorrect to say that knowledge of the universe is not utilized in a probability-sample, and blind chance substituted. In modern sampling, judgment and all possible knowledge of the subject-matter under study are put to the best possible use. Knowledge and judgment come into play in many ways in the design of probability-samples; for instance, in defining the kind and size of sampling units, in delineating homogeneous or heterogeneous areas, and in classifying the households into strata in ways that will be contributory toward reduction of sampling error. There is no limitation to the amount of judgment or knowledge of the subject that can be used, but this kind of knowledge is not allowed to influence the final selection of the particular cities, counties, blocks, roads, households, or business establishments that are to be in the sample; this final selection must be automatic, for it is only then that the bias of selection is eliminated, and the sampling tolerance is measurable and controllable.

Snedecor and Cochran (1980:438) express their opinion as follows:

Probability sampling has some important advantages. By probability theory it is possible to study the biases and the standard errors of the estimates from different sampling plans. In this way much has been learned about the scope, advantages, and limitations of each plan. This information helps greatly in selecting a suitable plan for a particular sampling job. As will be seen later, most probability sampling plans also enable the standard error of the estimate and confidence limits for the true population value to be computed from the results of the sample. Thus, when a probability sample has been taken, we have some idea as to how accurate the estimates are.

Probability sampling is by no means the only way of selecting a sample. One alternative method is to ask someone who has studied the population to point out average or typical members and then confine the sample to these members. When the population is highly variable and the sample is small, this method often gives more accurate estimates than probability sampling. Another method is to restrict the sampling to those members that are conveniently accessible. If bales of goods are stacked tightly in a warehouse, it is difficult to get at the inside bales of the pile and one is tempted to confine attention to the outside bales. In many biological problems it is hard to see how a workable probability sample can be devised, for instance, as in estimating the number of houseflies in a town, field mice in a wood, or plankton in the ocean. [I strongly disagree - Paul]

One drawback of these alternative methods is that when the sample has been obtained, there is no way to determine how accurate the estimate. Members of the population picked as typical by an expert may be more or less atypical. Outside bales may or may not be similar to interior bales. Probability sampling formulas for the standard error of the estimate or for confidence limits do not apply to these methods. Consequently, it is wise to use probability sampling unless it is clearly not feasible or prohibitively expensive.

Jongman et al. (1995:24) suggest that "by not using random sampling, one can obtain a 'distorted picture' (Snedecor & Cochran 1980) of a population. Biased sampling, such as choosing sites from a vegetation tat are considered to be typical (key sites) or that have as many species as possible, introduces an observer-tied bias. Other types of bias may be caused because certain measurements have been performed with a frequency that depends on the season or the location (e.g. near the laboratory vs. Far from the laboratory). Then the results of any statistical analysis may be different from the results that would have been obtained if random sampling had been applied. It is worthwhile to go to quite some effort to avoid this kind of bias"

Manly (1992: 4-5) points out that "truly random sampling is often difficult to carry out, and there is a temptation to assume that a sample that is obtained in some convenient way is equivalent to a random sample. Unfortunately, however, it is then very easy for a systematic bias in the sampling procedure to distort estimates of key parameters to such an extent that a study becomes quite worthless."

The discussion of probability and judgement sampling in the statistical literature occurred during the 1920s and 1930s (Stephan 1948, Rossi 1983). Some statisticians supported judgement or purposive sampling instead of random sampling (Jensen, 1928), but a consensus developed in favor of random sampling. I will discuss two papers that demonstrate the problems resulting from judgment sampling issues, in a social rather than a biological context. However, the statistical principles are the same, and the results apply to amphibian surveys..


Unemployment Survey Example

Hogg (1930) discussed many of the same sampling problems we currently face in amphibian monitoring. She was considering unemployment surveys, but the sampling issues are the same. She noted that the problem of selecting persons or households to be studied is of paramount importance, because the "resulting aggregate of persons or households must be demonstrably representative of the population for which an estimate is desired, otherwise the survey will have little or no value." In an earlier survey in Buffalo, the principles of random sampling were not followed. Instead, nine considerable areas, varied in character and scattered in position were selected, and the report was based on their aggregate. There was no reason to suppose that the aggregate of the nine corresponds at all with the whole city's occupational and racial composition, and therefore the unemployment estimates would not accurately estimate unemployment in Buffalo. The authors of the Buffalo study stated that the results do "not purpose to show employment and unemployment for the whole city of Buffalo but merely for the persons enumerated." One must wonder why they did the survey, if the results could not be applied to the city. It is likely that any public report would be erroneously extrapolated to the whole city, regardless of the disclaimer. This early account of the problems resulting from non-random sampling has clear implications for amphibian surveys that also use the observer's judgement to select sample sites.


Literary Digest Polls Example

Willcox (1931) examined in detail two national surveys conducted by the Literary Digest on the public's opinion about repealing prohibition (the eighteenth amendment and associated state laws). In 1922, 10,108,437 ballots were mailed and 922,383 were returned. In 1930, 20,227,370 ballots were mailed and 4,806,464 were returned. The Literary Digest's mailing list was built-up for advertising purposes based on users of telephones and automobiles. In spite of the enormous sample, there were concerns about whether or not it well represented American public opinion.

1. The sample was 95% male, compare to 60% of the voters. It was believed that a larger proportion of women favored retaining prohibition. In an effort to answer this objection, the Literary Digest in 1922 also sent ballots to 2,268,101 women and received replies from 120,050. Nineteen percent of the women and 21% of the men favored repealing prohibition. The Literary Digest did not repeat this survey in 1930, but Scripps-Howard newspapers found that 82% of the men and 70% of the women in their sample favored repealing prohibition.

2. It was claimed that country folk are more generally dry than city folk. In 1930, the Literary Digest sample included 4.8% of the city folk, compared to 3.1 percent of the country folk. Forty-five percent of the city folk and 33% of the country folk favored repealing prohibition.

3. The proportion of wets among wage-earners is thought to be greater than the proportion among those classes with larger incomes from which the Literary Digest has drawn most of the names. However, it was difficult to address this bias.

Willcox concluded that "In my opinion this [wage-earner] bias of the Literary Digest samples in favor of the drys outweighs the other biases in favor of the wets. . . . But if others think differently I can see no way to prove that my opinion is better than theirs." The take-home message is that even with enormous samples, valid conclusions cannot be made unless probability sampling is used. For example, it could be argued that it is not important to consider women's opinions because their opinions are probably similar to men's opinions. But if women are not asked, one can not be sure of the magnitude of the bias. Trying to correct the bias with another, even more biased sample, does not help. Similar arguments could be made about the areas that were not surveyed for amphibians, but if they are not surveyed, one can not be sure of the magnitude of the bias.

There was a very public test of the alternate survey methods in predicting the 1936 Roosevelt-Landon presidential election (Rossi, et al. 1983:5). Literary Digest mail straw ballots with millions of respondents were pitted against small scientific surveys that were conducted by Gallup and Crossley, with about 1500 interviews each. The scientific surveys clearly won, demonstrating that "small but carefully drawn samples could do better than huge numbers picked from a partial sample frame with little or no effort to achieve reasonable response rates." Manly (1991:10-11) described that survey:

The Literary Digest poll of 1936 was carried out in the United States to determine in advance what was to be the outcome of the presidential election to choose between the Republican Landon and the Democrat Roosevelt. It is a classic example of a sample survey that went wrong.

A total of 10 million survey forms were sent out to people on lists of telephone subscribers, car owners, etc. About two and a third million responded and they were strongly in favor of Landon for President rather than Roosevelt, with a ratio of three supporting Landon for every two supporting Roosevelt. The election result was quite the reverse, with Roosevelt winning 62% of the popular vote and carrying 46 out of 48 states.

There have been various explanations of why this survey gave a result so far from the truth. Two obvious possibilities are:

(a) That economic status was strongly associated with voting preferences, and also with being on the lists that were used for mailing

(b) That the voting preferences were different for respondents and non-respondents

One thing to note from this example is that an unrepresentative sample cannot necessarily be improved by making it bigger. The Literary Digest sample of over two million got a very precise estimate of the percentage support for Landon, but this was a percentage relating only to a small and apparently unrepresentative part of the entire body of voters.


Mourning Dove Survey Example

A national Mourning Dove Call-Count Survey was initiated in 1950 using routes along rural roads selected by the observers (Nelson et al. 1951). Several abundance indices were evaluated including 1) call-count routes, 2) records of doves seen by biologists of their normal travels, 3) one week's records of doves seen by mail carriers while delivering mail, and 4) area counts by game biologists, Four-H Club members, etc. The call-count route was selected as the most reliable.

Foote et al. (1958) reported on a test of the efficiency of mourning dove call counts in seven southeastern states, with particular reference to tests of a sampling design that will yield more reliable data.

The call count has been used as an index to the population of calling mourning doves (Zenaidura macroura) since 1951. Fundamental research on this technique was conducted from 1950 to 1956 as a part of the Cooperative Dove Study. . . . Analysis of call-count data from the original management routes by life zones, biotic provinces, soils, and political boundaries has suggested some fundamental relations between populations and ecology. More conclusive evidence of these relations, however, has awaited collection of call-count population data from a statistically appropriate sampling.

Although some of the original call-count routes were selected on an ecological basis, at least in the Southeast, many of the routes may have been chosen because they were known to sustain moderate to good dove populations or because they were convenient to an observer. The degree of randomness in selection and of representativeness of area and population of the original management routes was unknown. Data from these routes have been used chiefly to denote changes in the population of calling doves on the same routes from year to year. Attempts have been made to weight calling populations by land areas for hunting-regulation information and for design of a nation-wide nestling-banding program, using data from original management routes, because these were the best data available on dove populations.

To permit area-to-area comparisons and proper weighting of call-count data from geographic areas, the population data so used either should be obtained under principles of randomization, or the sampling biases related to nonrandomness should be estimated and used as correction factors. A test to compare data resulting from random sampling with that obtained from the present system of sampling has been of high priority in the over-all dove-management program. . . .

The call-count technique is limited in application to sections of the dove range traversed by roads because it consists of 20 systematic auditory plots with about 3/8-mile radii, at 1-mile intervals, on which all doves heard are counted. It is limited to nonurban areas where noise does not interfere with one's hearing calling birds, and is best adapted to lightly traveled roads. Its adaptability in areas having high dove-calling populations is unknown, but it is very efficient in areas having low calling populations. Weather, time of the morning, season, and other factors that influence the counts are generally understood, and the nation-wide counts have been rigidly standardized to eliminate much of this innate variation. . . .

Throughout this paper, "original management" routes are regularly censured routes [selected by the observers] on which mourning doves have been counted for several years, while "random" routes are those especially selected for this study [using a stratified probability sample.]

. . . .


As compared to randomly selected routes, the presently employed dove call-count routes in seven of the Southeastern States are positively biased and, therefore, higher-than-average dove population areas are being sampled. The original management routes in these seven states may have been selected purposefully and are not representative of the dove population in this area. The differences between the random sample and the original management sample were not great, which should be reassuring to the administrator considering these data for hunting regulations.

Unless it can be shown that the bias in the mean is constant from year to year and from area to area, detection of population change from original management-sample data has an error potential. Because the bias in original management-route data tends to be positive over the different population densities in the state strata, it might be inferred that the original route bias would also be positive over year-to-year changes in density. This could be tested only by continuing to sample both random and original management routes for several years to determine if both sets of population data fluctuate synchronously.

While the management-route data reflect calling-population changes on those routes from year to year, these data cannot be shown to reflect actual population change from year to year within the 7-state sampling area because of the relatively large bias and because the year-to-year harmony between original management-route data and those from a random sampling is unknown. In the state strata, the biases, while generally positive, differ considerably in magnitude, so the original management data cannot be used for area-to-area comparison with the accuracy necessary for management.

Comparisons of variance of sampling by state and by ecological-zone stratification indicate that the latter is approximately 17 per cent more efficient than the former. This confirms information from analyses of 1953 and 1954 call-count data. . . . Use of the data for population-weight factors for banding analyses will be more precise if the data are stratified first by ecological zones, then by states or areas of interest. This also applies to the original management data.


1) The presently employed dove call-count routes in seven of the Southeastern States are positively biased, and higher than average dove population areas are being sampled by the original management routes.

2) The difference between mean numbers of doves heard calling on original management routes and on randomly selected routes was small, but approached significance at the 95 per cent probability level. Data from the original management routes have been used since about 1951 in planning hunting regulations. It should be reassuring to the wildlife administrator that differences between the means of the two sets of routes were not of greater magnitude than they were.

3) The nation-wide call-count sampling should be revised so that censuring is conducted on routes selected by a statistically appropriate system.

4) Selection of routes should be on the basis of stratified random sampling by ecological zones. For administrative purposes, data from ecological-zone strata can be combined by states or by other political groupings to relate areas of harvest and production for hunting-regulation purposes. To establish banding quotas or for other management or research programs, weighting by ecological zones is more appropriate than weighting by any other presently known system.

The number of routes necessary for sampling by ecological zones can be approximated from the call-count route data gathered from 1951 to 1957 and from the random data gathered in this study. The sample size required will depend upon the expected variance in each ecological zone, and on the precision required from the estimates within each area of interest. Use of the finite population-correction factors is recommended if the sampling density exceeds five per cent of the cells.

5) Revision of the nation-wide sampling should be undertaken as soon as possible. To permit use of the call-count data for management purposes, especially for hunting regulations, in conjunction with randomization of the sampling, there are two alternatives. (a) A completely new randomly selected series of routes can be designed and both random and original management sets of routes censured the first year. The original routes would be dropped thereafter. (b) A second alternative would be to select 20 per cent of the new routes at random during each of the next five years, after which the entire nation-wide sampling would be on a statistically acceptable basis of randomization.

6) After the nation-wide sampling has been randomized, it may be necessary to select a few new routes each year to replace those routes on which the census technique cannot be applied. Suburban development, new main highways, airfields, and growth of population centers will make annual replacement of a few routes desirable.

The dove "management" routes were replaced with stratified random routes, providing an overlap so that trend information could be maintained. The transition to random routes was completed in 1966, and this is the first year reported in current mourning dove status reports (Dolton 1995).


Implications for Amphibian Monitoring

I have reviewed some of the literature on survey design and on the problems associated with judgement samples. It is clear that the results of any survey in which the observers select the routes or any survey where the routes are selected by any judgement method will be questioned. It is difficult and expensive, if not impossible, to demonstrate that any judgement sample is representative of the population. Consequently, there will always be enough unanswered questions about the data that I think that agencies would be unable to make difficult or costly decisions to protect amphibians, because it could not be conclusively demonstrated that there is a real problem.


Survey Sampling

To avoid the problems discussed above and to achieve statistically valid estimates, one must use a formally designed sample survey. Levy and Lemeshow (1991) define a "sample survey" as a study involving a subset (or sample) of individuals selected from a larger population. Variables or characteristics of interest are observed or measured on each of the sampled individuals. These measurements are then aggregated over all individuals in the sample to obtain summary statistics (e.g. means, proportions, totals) for the sample. It is from these summary statistics that extrapolations can be made concerning the entire population. The validity and reliability of these extrapolations depend on how well the sample was chosen and how well the measurements were made."

The first step is to define the population of interest to which we will extrapolate the results of a survey. At first, one thinks of the statistical population as the biological population of all calling amphibians in a state or province. It is not practical to assign serial numbers to all the frogs and then to select them at random, using a table of random numbers. It is more feasible to divide a state or province into a number of sampling units (blocks of land e.g. degree blocks, topographical maps) and to select a sample of units at random. Then the relative abundance of an amphibian species can be measured on the selected sampling units using a calling amphibian route or other method. In practice, we cannot sample all land areas. If we restrict measurements to stops along rural roads, the statistical population becomes frogs breeding within hearing of a rural roads. Our estimates apply only to the frogs breeding within hearing of a rural roads. However, we can argue that this segment of the anuran population is most susceptible to human disturbance and, therefore, is the most at risk. Consequently, it seems reasonable to use the relative abundance indices for frogs breeding within hearing of a rural roads as an early warning system for all amphibians in a state or province.

I will present two methods for selecting call-count routes. The first is used for the Mourning Dove Call-Count Survey (MDCCS) (Foote et al. 1958:403-404):

A map of the 7-state area was gridded into squares (cells) 20 miles on a side, each enclosing 400 square miles. A total of 785 sampling cells thus constituted the frame, enclosing 314,000 square miles, which approximates the estimated 313,800 square miles of dove habitat in these states. Major metropolitan areas were omitted from the frame; also omitted were areas containing major bodies of water. The periphery of the 7-state area was corrected by omitting approximately one-half the incomplete grids on the boundary.

A one-in-five sampling yielded 157 cells and provided approximately the same number of random routes as there were regular annually censused call-count routes within the test area. County road maps enclosing the area in each randomly selected cell were obtained. The periphery of each cell was divided into 80 1-mile intervals, starting at the northeastern corner and proceeding clockwise. Twenty lineal miles then were mapped from that point which had an appropriate road closest to the 1-mile interval selected at random, proceeding generally toward the center of the cell and staying within the cell boundaries. Selected roads were those that appeared (by the map) to be traversable. Intermittent roads, ferries, or major unbridged streams were avoided. Main highways were not included, except for short distances when it was not possible to select a more appropriate route. Urban areas, railroads, airfields, and roads adjacent to major highways were avoided, and the route did not "double back" upon itself on roads closer than 1 mile. Acute-angle turns were avoided to prevent duplication of auditory plots at successive stations.

Slight adjustments of less than 2 miles occasionally were made to provide a readily recognizable starting point for the route. The starting point was determined to be that end of the route most easily reached from a main highway and population center. All routes were selected by a mapper unfamiliar with the area. When confronted with alternatives within the preceding limitations, roads were selected at random.

Another procedure is used with the Breeding Bird Survey (BBS) (Bruce Peterjohn, personal communication):


Random Routes - New random routes are established only if most of the routes within a state/province are currently being run. The local coordinator must also feel that there are enough available people willing to run new routes and also feels capable of thing on the burden of assigning new routes.

If new routes can be established, then one route is added to every degree block of the state or province. For each block a random set of latitude and longitude minutes is chosen as are one of the cardinal directions. The random starting point will not likely land on an appropriate road for a Breeding Bird Survey. Therefore the true starting point of the new route will be the nearest appropriate road to the random starting point. Appropriate roads are secondary roads that have relatively light traffic (fewer than 5 cars per stop on a weekday) and are state or county maintained. If possible the starting point should also include some easily identifiable feature such as a road intersection, bridge, or curve so that the observer can find it early in the morning.

Once an appropriate starting point is found, the actual route must move toward the random direction. However, with special exceptions, the route cannot cross degree block, state/province, or stratum lines and they cannot follow another route's path. When laying out the route the path taken should follow the chosen direction as much as possible, while staying on maintained secondary roads. If a route must traverse small sections of heavily traveled roads, stops along these sections can be skipped. Connecting roads between large towns should be avoided as should roads that are poorly maintained. In all cases the observer should be asked to scout the newly laid out route for problems. They can either be given explicit instructions for changing the route or be asked to inform this office. The numbers given new routes should follow a regular pattern, the best usually being some multiple of 100 of the routes already existing within the degree block. The name of a new route should be the largest town along the route. If no town exists then some dominant landscape feature should be used. Rivers and creeks should be avoided as names, for obvious reasons.

Non-random Routes - These are only established at the request of an observer or a researcher doing a special project. They are often areas of special ecological or ornithological interest. (A section on the National Parks special project will be inserted here once all the details have been settled). Non-random routes are not used in most analyses of trend and are given 900 series numbers so that they can easily be identified.

Alaska and the Far Northern Provinces - These areas present special problems due to scarcity of roads. These routes cannot be considered random in most senses. Some of these routes are run by boat or even off-road vehicles. Because of road limitations some are shorter than 50 complete stops. These are given 800 series numbers. These BBS's are not used in most calculations of trend and a system of regular analysis is not yet in use due to the relative recency of their creation.

I prefer the method used with the dove survey (MDCCS) because all rural roads have about the same probability of selection. With the Breeding Bird Survey (BBS), rural roads on the periphery of roadless areas have a higher probability of being selected than roads that are near other roads. To see this effect, take a map and draw a line around the area that is closest to each road. The probability that a road is selected for the BBS is proportional to the area that is closest to it. On the other hand, the BBS approach tends to spread out the routes so that the sample is not concentrated as much were there is a higher density of roads. However, with each method, a route with stops is used to sample the amphibians calling within an area sampling unit (20 mile square for MDCCS and a degree block for BBS). A random selection using a table of random numbers is used to avoid biases in the selection of the route. However, some bias associated with restricting stops to rural roads is unavoidable. I argued (above) that this bias would favor the early detection of problems, because areas close to roads would tend to have more human impacts.

The stops on a route within an area sampling unit are a cluster sample, because they were selected together and are contained within the unit (Levy and Lemeshow 1991:175-211). Stops on a route must be analyzed on as a unit, because they were selected together and are not independent. Statistical tests and confidence intervals require independent observations and stops do not qualify. Two events (A and B) are independent if the probability of A given that B has occurred is equal to the unconditional probability of A [i.e. P(A|B)=P(A) and P(B|A)=P(B)] (Mendenhall et al. 1981:47). This technical definition is in agreement with the common defintion of "not determined or influenced by someone or something else; not contingent: a decision independent of the outcome of the study" (Soukhanov 1995). Here, the probability of A is the same whether or not B occurs. If A and B are counts of amphibians at two stops on the same route, A and B are not independent because knowing B will provide one with information about A. This is because points that are close together tend to be similar because:

* The mean temperature tends to decrease as one moves away from the equator and temperature affects amphibian calling.

* The number and type of ponds and other water bodies tend to be similar within a region within a state or province.

* Amphibian species occur only within their geographic ranges and may be abundant in parts of their ranges and rare in other parts.

* The counts at stops on the same route are all made on the same night and the temperature and moisture conditions on that night affect the calling.

There may be substantial differences in the habitat or amphibian populations among the stops along a single route, but they are still not independent because they were selected and observed together as a cluster. To demonstrate independence, one would have to show that amphibians are equally abundant in all areas of a state or province and that they call equally on all nights regardless of temperature and moisture conditions within the range accepted for the survey.



Cochran, William G. and Gertrude M. Cox. 1957. Experimental design. Wiley, New York.

Deming, William E. 1950. Some theory of sampling. Cover Publications, New York.

Dolton, David D. 1995. Mourning dove breeding population status, 1995. U.S. Fish and Wildlife Service, Office of Migratory Bird Management, Laurel, MD.

Foote, Leonard E., Harold S. Peters, and Alva L. Finkner. 1958. Design test for mourning dove call-count sampling in seven southeastern states. J. Wildlife Management 22:402-408.

Hogg, Margaret H. 1930. Sources of incomparability and error in employment-unemployment surveys. J. Amer. Statistical Assoc. 25:284-294.

Jensen, Adolf. 1928. Purposive selection. J. Royal Statistical Assoc. 91:541-547.

Jongman, R.H.G., C.J.F. ter Braak, and O.F.R. van Tongeren. 1995. Data analysis in community and landscape ecology. Cambridge Univ. Press, Cambridge, UK.

Levy, Paul S. and Stanley Lemeshow. 1991. Sampling of populations: methods and applications. Wiley, New York.

Manly, Bryan F.J. 1992. The design and analysis of research studies. Cambridge Univ. Press, Cambridge, UK

Mendenhall, William, Richard L. Scheaffer, and Dennis D. Wackerly. 1981. Duxbury Press, Boston.

Mossman, Mike, Paul Rasmussen, John Sauer, Sam Droege, and Lisa Hartmant. 1995. Sample size estimation for amphibian calling surveys and some surprising trends from an 11-year analysis of Wisconsin Frog and Toad Survey data. Second annual meeting of the North American Amphibian Monitoring Program in Burlington, Ontario, September 27-29, 1995.

Nelson, Dan, Leonard fotte, James E. Keeler, Harold Alexander, Frank A. Winston, Dan M. Russell, John Newsom, Henry Bobbs, Jr., Donald G. Allison, Harold B. Poole, and Hjames W. Hammond. 1951. Statistics as a tool in measuring dove inventories. in Tim Fendley (ed) 1985 Proceedings of the first - sixth annual conferences, Southeastern Association of Game and Fish Commissioners, 1947 - 1952. Southeastern Association of Game and Fish Agencies.

Rossi, Peter H., James D. Wright, and Andy B. Anderson. 1983. Sample surveys: history, current practice, and future prospects. in Peter H. Rossi, James D. Wright, and Andy B. Anderson (eds). Handbook of survey research, Academic Press, San Diego.

Soukhanov, Ann H. (Ed.). 1995. The American Heritage Dictionary, electronic edition. Microsoft

Snedecor, George W. And William G. Cochran. 1980. Statistical methods. Iowa State Univ. Press, Ames.

Stephan, Frederick F. 1948. History of the uses of modern sampling procedures. J. Amer. Statistical Assoc. 43:12-39

Willcox, Walter F. 1931. An attempt to measure public opinon abut repealing the eighteenth amendment. J. Amer. Statistical Assoc. 26:243-261.

Paul H. Geissler, National Ecological Surveys Team, Office of Inventory and Monitoring

National Biological Service, 12100 Beech Forest Road, Laurel, MD, USA 20708-4038

Tel. 301-497-5780, FAX 301-497-5784,, HTTP://