From: cmduran@whale.st.usm.edu (mike duran)
Date: Sat, 16 Mar 1996 10:39:59 -0600
Subject: Random designs and volunteers
Folks,
As a biologist and avid amateur statistician, I have lurked with great interest in the AMP-L
discussion on the maintenance of sound statistical data and the use of volunteers (volunteer
selected routes). In order to better get my mind wrapped around the subject, I posed a question
to some of the sci.stat.* newsgroups. I hope I did a decent job of summarizing the problem.
Maybe this will interject some new life into the discussion.
Mike Duran; MS Natural Heritage Program
CSTS-DPW-E; Bldg. 6678;
Camp Shelby, MS 39407-5500
(601) 558-2894; Fax: (601) 558-2636
cmduran@whale.st.usm.edu
The Question:
===========================================================
> On the Amphibian Monitoring Protocol mailing-list we are currently having
> a very interesting discussion and debate about the best way to achieve the
> statistical randomization of calling-frog surveys. The general idea is
> that a numbered map grid would be laid over the area that is to be
> surveyed, random numbers would be chosen, by computer or random number
> table, and the wetlands inside of the grid squares chosen would be
> surveyed. (You can download the protocol from if
> you're that interested.) The surveyor drives the randomly selected route
> and listens for a predetermined amount of time.
>
> Ideally you hire qualified people to drive along these routes and record
> the data, but in the real world volunteers are used because no one has the
> money to hire all those people. It is difficult, however, to assign
> volunteers to the rigidly selected routes--you get many more volunteer
> hours by allowing them to select their own routes which means going to areas
> near their homes. This leads to oversampling near urban areas and other
> obvious biases. The question is this: Is all of the data from volunteers
> who are selecting their own routes statistically useless? We have heard
> several good suggestion on ways to use volunteers and still have
> statistically sound results. Does anyone have suggestions on how these
> non-randomized results might be randomized? It occurs to me that we might
> be able to compare subsets of non-random volunteer data with randomized
> subsets and select those that fit. (???) I thought some of you folks
> might be able to see this from a different angle.
>
> All comments welcome.
The Answers:
===================================================================
From: aacbrown@aol.com
Date: Sun, 10 Mar 1996 22:06:07 -0500
If I understand correctly you have data on a grid where some grid points are measured many
times and others not at all. For example, suppose I wanted to know the average global
temperature at some specific time and called 1,000 people at random asking them their latitude,
longitude, and the current outside temperature.
Obviously I couldn't simply average the results since people are not randomly distributed around
the globe. Most likely the hottest and coldest places are the least densely populated.
What you can do is divide the earth into (say) 144 regions (squares 30 degrees of latitude and
longitude on a side). For some squares I might get 20 or 30 responses, for others only 1 or 2.
However I could compute an average for each square and then average the averages.
There are more efficient procedures, especially if you have some knowledge of the process.
However this is simple and defensible.
Aaron C. Brown
New York, NY
==================================================
From: Magnus.Pettersson@merry.stat.gu.se (Magnus Pettersson)
Subject: Re: Randomization of calling-frog data
Date: Tue, 12 Mar 1996 11:14:01
Hello Mike!
A similar problem was studied by Harri Hogmander at the University of Jyvoskyla, Finland. He
has used Bayesian methods and spatial statistics to estimate the number of toads in Finland.
He divided Finland into a grid and each pixel represented (0= No toad sighted, 1=At least one
toad sighted). The toad surveying was performed on volunteer basis. When you look onto the
map you see good map over the population density of people and not of toads.
Hogmander however, takes one specie (I don't remember which) which is known to exist
throughout Finland at an approximately uniform distribution. Now, the map of that specie can be
used to show some kind of "surveying activity". He then corrects the information from the first
map given the information from the second.
The research report [1] describes the method and is available from the University of Jyvoskyla.
References [2] -[4] I haven't read but might contain parts of the results or developments.
[1] Hogmander, H (1995). Methods of Spatial Statistics in Monitoring of Wildlife Populations.
Research report 1995:25, University of Jyvoskyla.
[2] Hogmander, Harri and Moller, Jesper (1995). Estimating Distribution Maps >From Atlas
Data Using Methods of Statistical Image Analysis. Biometrics, v 51 Page: 393-
[3] Hogmander, H. (1991). A Random Field Approach to Transect Counts of Wildlife
Populations. Biometrical journal. v 33 n 8 Page: 1013-
[4] Heikkinen, J. and Hogmander H. (1994). Fully Bayesian approach to image restoration with
an application in biogeography. Applied statistics. v 43 n 4 Page: 569-
Good luck
===================================================================
Date: Tue, 12 Mar 1996 18:30:31 +0200 (EET)
From: Harri I Hogmander <hogmande@stat.jyu.fi>
Hi Mike,
I heard about your request from a Swedish friend. Some years ago I and my colleague Juha
Heikkinen estimated the geographic range of Common Toad (Bufo bufo) in Finland. The data
were from a volunteer based atlas survey of amphibian and reptile species using a national 10 x 10
kilometer grid over Finland (about 3800 squares in total). The data (binary, species observed in a
square or not) were obviously very sparse and heterogeneous, so we developed a statistical model
to estimate the probabilities for occurrence of toads in a "white" square. The model resembles
ones applied in statistical restoration of pixel images (exploiting neighbor effects, i.e. it is assumed
that adjacent pixels tend to be similar), but it also takes into account the (estimated)
square-specific coverage of field-work. The result of the estimation is a grey-scale map where the
shade of a square corresponds to the estimated probability of occurrence.
If you think that our method could be helpful, here is the reference: J. Heikkinen and H.
Hogmander 1994: Fully Bayesian approach to image restoration with an application in
biogeography. Applied Statistics 43(4): 569-582. If it is hard to find, reply me and I'll send you a
copy.
Could you post the summary of answers also to me, please?
Good frogging,
Harri Hogmander
University of Jyvaskyla, Finland
============================================================
Date: Tue, 12 Mar 1996 07:53:33 -0500
From: "Alan M. Zaslavsky" <zaslavsk@hustat.harvard.edu>
I don't think that there is any approach to using data such as you describe that will be free of
controversy, but I will suggest one fairly general approach that at least allows you to do inference
as if you had a real survey and then carry on the argument on the side about whether your
assumptions were reasonable in this situation. If you had a random sample selected according to
some (possibly complex) plan, and you knew the probability of selection of each unit (in this case,
each area) in the sample, you could apply the Horvitz-Thompson estimator to estimate a mean,
which is just the sum of each observed value divided by probability of selection of the unit it
corresponds to, all divided by the total sample size. (In this case, where a unit could be selected
more than once, the divisor is the expected number of times it is selected.)
In your real problem, there is no explicit random sampling plan, but you can proceed with this
type of analysis if you can describe what you think were the appropriate probabilities that you
volunteers would show up in a particular area. I see no alternative to estimating these
probabilities by using a model. (A suitable model would be a Poisson regression for number of
visits per grid square, although strictly speaking the Poisson model is not precisely applicable if
different visits are dependent, e.g. because one volunteer goes several times to areas near her/his
house.)
In order to be believable, this model has to be fairly rich, i.e. it should include "all" variables that
are likely to affect the behavior of the volunteers (in selection of areas where they will collect
data) and particularly those that are likely to also be related to the variable you are measuring.
Hence it is sure to be somewhat controversial, but at least the controversy is out front so you can
work to some consensus. Be aware, however, that you may find that some areas are very unlikely
to be selected (or equivalently, they end up having very high weights), and this will dramatically
inflate the contribution to variance from this type of area. E.g. if 200 volunteers walk behind their
houses and 1 goes into the depths of the swamp, then it is possible that the 1 will end up with
more weight than the 200 put together. On the other hand, you can do something about this by
doing some directed random sampling (paid or dedicated volunteers) in the most under
represented areas (as shown by your probability model).
===============================================================
Date: Wed, 13 Mar 1996 22:14:04 --100
From: das@si.hhs.nl
Hello Mike,
I was not aware of an amphibian monitoring protocol mailing list. You would do me a favor by
telling me how to subscribe.
Your problem.
[I plan to send this answer to the list, too, to promote discussion.]
You could start solving it in a simple way. Say you have some attribute expected to be important
to calling-frogs in selecting their mating places. You could , say, rate all km squares in
attractiveness: shallow pools, foliage nearby, not isolated by roads etc. Or perhaps you could only
rate a random sample, but you could extrapolate that. Then you also rate the places visited by
volunteer observers. That gives you the expected probability of hitting on a mating pool, or
perhaps even an estimate of number of males, as a function of attractiveness. What is to stop you
from extrapolating that to all km squares?
I see only one valid argument and that is the nearness of habitation itself. But part of the
volunteers, at least, will be in rural surroundings. So why not enter that variable (urbanity-rurality)
as an independent in a regression?
Those are my first thoughts.
Peter Das das@si.hhs.nl
Silkeborg 43
2905 AT Capelle aan den IJssel
Netherlands
tel 31 10 4510079
=======================================================
Newsgroups: sci.stat.consult
From: ewalters@idirect.com (Eric L. Walters)
Subject: Re: Calling-frog sampling design
Why re-invent the wheel? Ornithologists have been doing exactly what you propose for frogs for
at least 50 years! Consult some of the Breeding Bird Survey (BBS) literature and you will find
your answers.
Eric Walters
Dept. of Biology
University of Victoria
ewalters@idirect.com
==========================================================
file: 15s7.wpd