Earth Observation Magazine

Collecting Control Data for Remote Sensing Applications in the Frontier Environment of the Ecuadorian Amazon

Brian G. Frizzelle,
Stephen J. Walsh,
Christine M. Erlien, and
Carlos F. Mena

The fusion of GPS technology, remote sensing methods, and social survey practices
combined to generate sufficient control data for image processing and analysis in a remote and inhospitable environment.

There are many places in the world where the environment offers significant challenges to the successful application of remote sensing technology and value-added image processing for landscape characterization. Frontier environments, such as the Northern Oriente region of Ecuador, pose considerable problems associated with their limited access into and throughout the region, and the complex and constantly changing nature of their biophysical landscapes. These limitations manifest themselves as constraints to the traditional practice of collecting numerous widely distributed training samples per mapping class for land use and land cover (LULC) classifications and for their corresponding accuracy assessments. Scientists working in frontier environments must overcome the in situ difficulties associated with such environments before addressing the challenges inherent in the data processing and analysis schemes.
Our study area lies in the Amazonian lowlands of northeastern Ecuador (the Northern Oriente), east of the Andean mountains, in the provinces of Sucumbios and Orellana (Figure 1). Natural vegetation consists of vast expanses of tropical rainforest, containing a complex structure and the highest biodiversity in the entire Amazon Basin. The region also exhibits substantial zones of secondary forest succession particularly along rivers altered by natural forces, land conversion by localized indigenous communities, as well as broad areas of LULC change by recently arriving colonists. Within this forested landscape is a large contiguous agricultural zone comprised of farms containing both tree and non-tree crops, as well as areas of secondary succession at a variety of stages associated with deforestation, agricultural extensification, and community settlement patterns. The regional infrastructure consists of a growing network of mostly unpaved roads, dozens of small towns providing local services to nearby farmers, and a handful of larger towns that serve as the centers of commerce and trade within and outside the region. Small indigenous villages lie within and adjacent to national parks and reserves, primarily along rivers, but with some small agricultural plots scattered in the forest.
As part of our ongoing NASA-funded research within the region, we have assembled a deep time-series of satellite imagery ranging from 1973 through 2002. Our earliest images (1973-1986) are from the Landsat 1 and Landsat 4 Multispectral Scanner (MSS) sensors. The later images (1986-2002) are from the Landsat 5 Thematic Mapper (TM) and the Landsat 7 Enhanced Thematic Mapper (ETM) sensors. The images have been collected to identify the regional LULC at each image date and patterns of LULC change throughout the 30-year period. To accomplish these tasks, two steps must be taken: register the images to a real world coordinate system, in this case the Universal Transverse Mercator (UTM); and classify the images into discrete LULC classes for change-detections and models of change. Both steps require “control” data collected both remotely and in situ. In this study, geodetic control was collected to aid in image registration, and compositional control was collected as reference data for LULC classifications.

Image Registration
The horizontal accuracy requirements for the image rectification were set at 15 meters, or half the spatial resolution of a Landsat TM pixel. Rectification, a form of georeferencing, requires the use of control reference data. Pre-existing control data, such as topographic maps, surveyed locations, and geodetic horizontal/vertical control markers, are commonly used for rectification, but our study areas lack them in sufficient quantity or quality. Topographic maps exist at a 1:50,000 scale, but the other types are not present. However, the maps’ minimum horizontal error of 42.3 meters, based on the USGS National Map Accuracy Standards that were used to create them, is too large to satisfy our accuracy requirements. Therefore, we created our own network of geodetic control points (GCPs) using Global Positioning System (GPS) receivers. The data were post-processed using differential correction for the highest possible accuracy.
For the best possible rectification, the GCPs should be evenly distributed over as much of the image as possible and collected at large, static locations that are visible on the imagery (e.g., 30x30 meters in size for Landsat TM). In a frontier region such as the Ecuadorian Amazon, road intersections and bridges are the main targets for GCPs. The GCP distribution was therefore constrained by the road network, which is dense in the center of the image and much sparser toward the edges (Figure 2). To add to the difficulty, many of the roads are of poor quality and not easily traversed, making accessibility to the more remote regions of the image difficult if not impossible.

Land Cover Assessment
Most methods for converting a multispectral satellite image into discrete LULC classes require some knowledge of the location and composition of each land cover type of interest so that classes can be properly labeled, training areas defined, and the classification assessed for accuracy. LULC compositional control data can be used to attribute statistical output clusters from an unsupervised classification, or can be used as training data sets to generate spectral signatures for a supervised classifier. LULC samples are often collected in the field by finding locations that are representative of the classes of interest and collecting a GPS point at that site. The GPS points are then used to identify one or more pixels on the image for attribution/training purposes. As with geodetic control, it is a good practice to collect compositional control in a widely distributed manner to minimize small-area effects such as atmospheric haze and other local anomalous conditions linked to environment site and situation factors.
In our study, the sparse road network and poor quality roads affected LULC data collection just as they impacted GCP collection. The classification scheme contains 18 classes, with a mixture of natural vegetation, agriculture, water, and urban areas. All classes, other than those easily identifiable in the imagery (e.g., primary forest, urban, and water), were located in the field, with the constraint that the field site be at least 3600m2, or four TM pixels, in size. The four-pixel size requirement was set to control for possible errors in the horizontal accuracy of the imagery and the GPS data. The sites also needed to contain a “pure” class to minimize the within-class spectral variance and to maximize the between-class spectral variance.
Given the constraints imposed on data collection, multiple field trips failed to produce a sufficient quantity and distribution of training data for all classes, leading to the collection of additional sources of compositional control. The final set of compositional control only spanned the period of 1999 to 2002, requiring the creation of a method for generating a detailed classification for image dates prior to 1999. This method is briefly discussed below.

Collecting GCPs
When collecting GCPs for image rectification, a set of nine for a rectangular image would be ideal if one point were located in the center, four points at the corners, and four points on the edges midway between the corners. If necessary, four more points could be added at the center of each quadrant, with additional points added evenly throughout the image until the rectification requirements are satisfied. Figure 3 shows the nine ideal points as blue diamonds, with the four additional points displayed as red triangles. These are ideal placements, but rarely realistic in a frontier environment.
The GCPs in the study area were collected using mapping grade Trimble GeoExplorer II and GeoExplorer 3 GPS receivers. All points were differentially corrected through post-processing, using base station files from Quito. The resultant accuracy of each corrected/averaged point was better than 15 meters, within the accuracy requirements for the project.
GCP collection proceeded at road intersections or primary roads and at bridges along the more heavily traveled roads, bypassing the poorer quality roads that service the periphery of the region. This resulted in a high density of GCPs in the center of the image, necessitating further excursions to the peripheral regions. As there are fewer and more degraded roads along the periphery, more points were added away from the center, but the cluster in the center remained.

GCPs and Image Rectification
The central GCP was well placed. However, only one of the eight boundary and edge “ideal” GCPs were within close proximity to any collected GCPs (Figure 3). Therefore, those GCPs that were located nearest to the edges and boundaries were selected for the rectification along with a sample of GCPs that (1) were evenly distributed throughout the image, and (2) resulted in the lowest possible root mean squared error (RMSE). A total of 15 GCPs were used to apply a second-order polynomial absolute rectification to the November 1999 image. This resulted in an RMSE of 0.3263 pixels, or 9.8 meters, which was well within the horizontal accuracy requirement of 1/2 pixel, or 15 meters.
The other time-series images were rectified to the November 1999 image using the relative rectification method. This approach was used to obtain the highest level of co-registration throughout the time-series. Accurate co-registration was necessary for successful implementation of our change-detection methodologies.

Multiple Data Sources
Compositional control was collected to build a training data set for classifying the Landsat time-series. Five types of compositional control were collected: (1) land use information from a regional household survey questionnaire, (2) sketch maps created during the household survey, (3) convenience sample of GPS points of selected LULC sites throughout the region, (4) IKONOS satellite imagery, and (5) a detailed GPS-based LULC survey of a sample of farms. Compositional control data were used with the Landsat image dates that most closely corresponded with the dates of data collection, and a database of LULC points and polygons was created.
The land use survey questions were pulled from a 1999 household socio-economic/demographic survey, in which the head of household was asked questions regarding current and past land use on the farm. Responses were used to inform analysts as to the type of LULC on particular farms on a November 1999 Landsat TM image. During the interviews, the head of household helped create a sketch map of LULC for the farm. The sketch maps were not surveyed or drawn to scale, but do contain the relative spatial composition of LULC parcels (Figure 4). The general shape of each parcel was hand-drawn, and information on the parcel’s LULC and size in hectares was recorded. Analysts used the sketch maps to identify large plots of LULC on the farms and delineate the plots on the November 1999 TM image, using farm boundary polygons as reference locations.
Selected LULC sites were collected with Trimble GeoExplorer II and GeoExplorer 3 GPS receivers during various field trips from 1999 to 2002. In most cases, the sites consist of one point representing a patch of land cover, although some sites were collected as polygons by walking the perimeter of the plot. All point and polygon data were differentially corrected and grouped into GIS data sets with information such as LULC class and size of plot.
IKONOS satellite imagery was acquired for selected sites from 2000 to 2002. IKONOS’s 1-meter panchromatic and 4-meter multispectral imagery allows for easy recognition of some LULC types, based on expert knowledge and spatial pattern recognition (Figure 5). Georeferenced images were overlaid on the corresponding Landsat image dates, allowing the analyst to visually interpret the land cover on the IKONOS image and delineate polygons on the co-registered Landsat image.
In November 2001, a GPS-based LULC survey was conducted on a sample of farms from the 1999 survey. On each farm, the researcher and farmer walked the 50 hectares of land with Trimble GPS receivers and collected points to delineate the shape of each individual parcel. The points were later connected to form polygons representing specific LULC parcels, and were attributed with the current LULC, the LULC 2-years prior, and the expected LULC 5-years in the future (Figure 6). These farm-wide detailed LULC surveys provided additional temporal LULC information for the classification of Landsat data.

Land Cover Classification
The main concern for accurately classifying the Landsat time-series was the lack of compositional control prior to 1999. A methodology to resolve this issue was developed that included the normalization of the time-series and the generation of a multi-temporal spectral signature data set.
The image normalization was performed with a 5S Top of Atmosphere (ToA) reflectance and atmospheric correction model (Teillet et al. 1997), which was applied to each image in the time-series. An ARTMAP neural network model classified all clouds and cloud shadows, which were subsequently masked from the images. Clouds are a particularly severe problem in the Ecuadorian Amazon. Of the 407 Landsat 5 overpasses of the area between March 1, 1983 and December 31, 2000, only 7 (1.72%) scenes were considered sufficiently cloud-free to suit our purposes, and those scenes still contain clouds to various extents.
The compositional control was used in conjunction with the appropriate corresponding image dates to create a multi-temporal spectral signature data set. Signatures for each class were extracted using each of the five compositional control types. The signatures were then merged into one multi-temporal data set, which was then applied to each image in the time-series using a standard maximum likelihood classification algorithm. The classifications were further processed using several different change-detection methods to better grasp the dynamics of LULC change over time throughout the region. The classifications and change-detections required geodetic and compositional control to succeed.

Summary
In the frontier of the Ecuadorian Amazon, isolation, inaccessibility, and a general lack of discernible landscape features make the task of image rectification and classification difficult. Geodetic and spectral control is fundamental for LULC characterization. In this setting, obtaining such data was a considerable challenge. Developing site-specific geodetic control was critical to the success of the project. Using GPS technology was basic, but still the areal distribution and the number of control points were subject to road access. To augment the GPS work, alternative compositional control was acquired through field sketch maps, a longitudinal social survey, and high spatial resolution IKONOS satellite data that were acquired over targeted features. As a result, we were able to classify images from dates earlier than any of our compositional control data at a level of LULC detail otherwise unattainable. The fusion of GPS technology, remote sensing methods, and social survey practices combined to generate sufficient control data for image processing and analysis in a remote and inhospitable environment.

Acknowledgements
This work is based on a project (R.E. Bilsborrow and S.J. Walsh, Co-PIs) of the Carolina Population Center and the Departments of Geography and Biostatistics at the University of North Carolina, Chapel Hill. Ecuadorian collaborators include individuals from EcoCiencia, a leading non-profit ecological research organization in Quito, Ecuador, and Cepar, a leading survey processing center in Quito. Funding for this research was provided by NASA (grant NCC5-295) and the Mellon Foundation.

About the Authors
Brian G. Frizzelle is a Senior Spatial Analyst, Stephen J. Walsh is a Professor in the Department of Geography and Research Fellow, Christine M. Erlien is a PhD candidate in the Department of Geography and a pre-doctoral trainee, and Carlos F. Mena is a PhD candidate in the Department of Geography and a pre-doctoral trainee at the Carolina Population Center, University of North Carolina at Chapel Hill.

References
Teillet, P.M., Staenz, K., and Williams, D.J. 1997. Effects of spectral, spatial, and radiometric characteristics on remote sensing vegetation indices of forested regions. Remote Sensing of Environment, 61(1): 139-149.

Back