gridEZ: A Robust Algorithm for Generating Gridded Sampling Frames in Data-Scarce Settings
Claire Dooley (University College London) - United Kingdom
Caitlin Clary (Biostat Global) - United States
Dale Rhoda (Biostat Global) - United States
Andy Tatem (University of Southampton) - United Kingdom
Dana Thomson (Columbia University) - United States
Keywords: sampling frames, population, survey design
Abstract
Conducting household surveys often requires a reliable sampling frame, which can be challenging to establish in regions lacking detailed geographic information, such as roads and other characteristics of the built environment. To address this, we have designed the “gridEZ” algorithm that utilizes gridded population data to generate gridded enumeration zones (EZs), which can serve as a robust sampling frame for selecting Primary Sampling Units (PSUs) in household surveys.
The gridEZ algorithm partitions space into contiguous EZs that adhere to specific population and geographic criteria. These EZs are constructed entirely of full grid squares (e.g. 100m x 100m), ensuring that they: (1) meet a target population range; (2) do not exceed a maximum geographic size; and (3) remain compact and operationally feasible for fieldwork. The algorithm relies on gridded population data, administrative unit boundaries and (optionally) settlement classifications. These input datasets can come from any source and the sampling frame will be generated for the geographic area that is fully covered by all inputted datasets. As with common survey sampling frames, EZs produced using the gridEZ algorithm are restricted to not cross administrative or settlement boundaries.
The gridEZ algorithm is versatile, allowing survey practitioners to generate EZs tailored to their bespoke study needs or to select from pre-made sampling frames consisting of small, medium, or large enumeration zones. Pre-made sampling frames exist for 89 low- and middle-income countries. The algorithm is publicly available at: https://github.com/cadooley/gridEZ. Execution of the algorithm requires basic knowledge of R software and handling raster datasets.
In addition to the gridEZ algorithm, we have recently developed an algorithm for refining existing gridded sampling frames to meet new criteria, e.g. reduced maximum population and/or geographic size of EZs. This new development allows new frames to be derived from old frames very quickly and with low computational needs.
The gridEZ framework offers a systematic and replicable approach for generating gridded enumeration zones in the absence of detailed geographic information. Its ability to produce compact, contiguous, and administratively consistent zones makes it a valuable tool for survey design, particularly in data-scarce settings. This innovative approach ensures that practitioners can design and implement household surveys efficiently, even in challenging contexts.
As part of the panel on “survey frames – advances and experiences”, we will discuss the pros and cons of the gridEZ approach in different contexts and provide insights on recent and upcoming developments.