Raster-Vector Integration: A Cost Effective Approach to Map Production?
By Janet Lefebvre

Introduction
The Geography Division of Statistics Canada produces a number of map products in support of the Census of Population and the Census of Agriculture. For urban areas, a Computer Assisted Mapping (CAM) system is used to generate maps for a variety of data collection and dissemination activities. For the vast rural areas of Canada, however, manual mapping techniques are used because digital reference or basemap data are not readily available at affordable prices. Maps of high cartographic and aesthetic quality are produced manually, but the production process is slow. Although digital technologies make possible the integration of raster and vector data to produce maps, there are still obstacles that must be overcome before a production mapping system can become operational. This article describes the research and development, undertaken between August 1994 and January 1995, which established a digital methodology for rural field map production. It reports on the technical challenges that were faced, and on the solutions that were found through strategic partnerships with both private and public sector agencies.

Background
The Federal Electoral District/Census Commissioner District/Enumeration Area1 (FED/CCD/EA) map series is one of a number of map series produced by the Geography Division in support of the Census. The maps in this series are used as planning, management and reference tools in the data collection activity. There are three sub-series in the FED/CCD/EA map series: the overview FED/CCD maps, the FED/CCD/EA maps, and the larger scale CCD/EA maps. After the Census, a FED/EA reference map series is also produced and disseminated. Essentially, all of the sub-series are the same, though boundaries of different vintages and of varying aggregations are shown. Several copies of each map are required, but the maps are not produced in large quantities.
      In the past, this series was produced manually. Mylar base maps, derived from the Federal Electoral District maps produced by Energy Mines and Resources for the Office of the Chief Electoral Officer (CEO), at scales ranging from 1:50,000 to 1:4,000,000, were used as the base. The FED, CCD and EA boundaries, maintained by the Geography Division on an analog base comprised of National Topographic System (NTS) maps, at scales of 1:50,000 and 1:250,000, and manually drafted place maps of small urban centers, at scales ranging from 1:2,000 to 1:20,000, were then transcribed manually onto the base. Transcribing these boundaries to a different base, at a different scale, was slow and costly. The product, however, was well liked by clients.
      In addition to maintaining the rural boundaries on an analog base, the Geography Division also maintains all geostatistical boundaries in digital form. In the rural areas, these boundaries are digitized from the Mylar base maps described above, while in the urban areas they are derived from the Street Network Files (SNFs), digital data files which define the street network for large urban centers and include such features as rivers, railways and municipal boundaries.

The Challenge
Although vector boundary data are available for the FED/CCD/EA maps, there was no digital reference data for the rural areas and, despite the numerous sources of digital data on the market, finding appropriate reference data presents a significant challenge. Issues to be considered include vintage (datedness), resolution (scale at which data was compiled or captured; i.e., precision), content (what features are present, level of attribution, etc.), coverage (aerial extent of data), consistency among data sets, and availability. Time and cost constraints can further limit the options.
      The fact that the maps are field tools presents an additional challenge: how to produce, within a tight budget, only a few copies of a large number of unique maps (producing relatively few copies of a relatively large number of maps is much more costly than producing many copies of only a few maps). For the three sub-series, over 1,000 unique maps will be produced and about five copies of each map are required.
      Faced with these challenges and constraints, the Geography Division nevertheless undertook research to develop a digital methodology to produce the FED/CCD/EA map series. The goal was to produce a digital map series to support planning and supervision of field operations for the Census, and subsequently to serve as a spatial reference for clients conducting data analysis. For these purposes, it is important to provide sufficient background detail to enable accurate identification of the enumeration area boundaries, but not so dense a background that boundary interpretation becomes onerous. In most cases, the desired background is similar to the 1:250,000 NTS maps.
      Research for this project began in August 1994. This provided a five month window in which to research, develop and test a new mapping methodology. To succeed in this tight time frame, it was recognized that partnerships with industry and with other government departments would be critical.

Vector Data Search
The first challenge was to assess the suitability of available data sources for the rural base. Although the National Topographic Data Base (NTDB), at 1:250,000, would have provided an appropriate level of detail, the cost to purchase the necessary files would have exceeded $200,000 - far too expensive for the project.
      The Digital Chart of the World (DCW) data were also examined as a possible data source. However, prototyping revealed that the feature density of the data, having been compiled at 1:1,000,000, was inadequate and that the data were too generalized to meet our requirements. Furthermore, alignment between our boundaries and the corresponding features from the DCW was unsatisfactory.
      As well, both of the above vector data sets lacked sufficient toponymic information for our purposes. Although toponymic files are available separately, the cost of interactive text placement made such an approach too costly.
      We searched for other data sets, but soon realized that appropriate, off-the-shelf vector data were either not available or not affordable. At this point we began to look at the possibility of integrating raster and vector technologies to produce maps.

Raster Data Sources
Our first attempts to integrate raster and vector data began in September 1994. A 1:50,000 coloured NTS paper mapsheet was scanned at 300 d.p.i greyscale in uncompressed Tag Image File Format (TIFF) on a Canon BubbleJet copier/scanner. Using the Image Integration functions of ARC/INFO Version 6, the image was rectified to the FED boundaries. The file was converted to postscript format and plotted on a Synergy electrostatic plotter at the Geological Survey of Canada (GSC). The TIFF file was 200 megabytes and the process of image integration took approximately six hours. Although the results were encouraging, a number of issues were identified; namely, the resolution did not appear to be adequate, in that the legibility of some of the finer text was poor; the raster-vector integration process was slow; and, since the FED boundaries do not respect NTS mapsheet limits, the NTS bases would require mapjoining. However, the alignment of our vector boundaries to the underlying raster features was very good. The idea of raster-vector integration showed promise.
      One of the challenges in producing this map series is the tremendous range of sizes and shapes of the 295 FEDs. As previously mentioned, if NTS map sheets are used as the FED/CCD base, several mapsheets may have to be joined together to cover one FED. A program was developed, using the GRID module of ARC/INFO Version 6, to handle this and other image processing related tasks. Functionality of the programme includes eliminating the "shadow" that forms around the edges of a geo-rectified image after rotation, clipping the image in preparation for appending (edgematching) to adjoining mapsheets, and appending previously clipped or partially overlapping images. Prior to appending images, each image must be georeferenced (registered in ARC); this creates a file containing the parameters that will be used to scale and rotate the image to the map coordinate system. Sections of NTS sheets were successfully appended with this program, though the processing time for two map sheets was about 12 hours. It was decided that mapjoining should be avoided if possible, not only because of the long processing time, but also because of the time required to prepare the inputs, such as selection of map sheets, determination of map extents and scanning of multiple map sheets for each FED. There was a need to reduce data preparation time to a minimum, due to the short time frame available for production.
      The Federal Electoral District (CEO) maps were the next possible input source that we examined. These maps, produced for each FED, range in scale from 1:50,000 to 1:4,000,000, though most are based on the NTS 1:250,000 map series. The sheets range in size from about 60 by 50 cm (24 by 20 inches) to a maximum of 162 by 112 cm (65 by 45 inches). The greatest advantage to using these maps is that there is one published map for each FED, so there is no need for mapjoining. Only four maps are too large to be scanned in one pass, based on the size of the largest format scanner commonly available. Furthermore, the maps are attractive and well liked by the client. These factors made them an appropriate base for the series.

Scanning Resolution, Data Integration, Data Output
Having selected the base, the next steps were to: 1) find a suitable scanning resolution; 2) test the integration of the raster base to our vector boundaries; and 3) locate an output device. Our search for the optimal balance between scanning resolution and file size included evaluation of several different scanners at varying resolutions.
      The first test was on a black and white Skantek SK1000 scanner. A Federal Electoral District (CEO) map was scanned at 300, 400 and 500 dpi using a Run-Length Compressed (RLC) format. In every case, the text and black line work were clear. However, the shaded areas on the original map (10 percent - 20 percent dot screens for rivers, lakes and municipal boundaries) appeared black or blotchy on the scanned image. The scanner was adjusted to a lighter setting in an effort to compensate for this; although the overall effect was lighter, it was still not possible to identify text labels on the shaded areas. The file sizes for a black and white scan ranged from 1 megabyte to 4 megabytes depending on the size of the original map and the scanning resolution. The cost was approximately $5 per running foot (at 300 dpi), for an average of about $15 per mapsheet. Because 1-bit scanning technology is designed for linework, good results on screened (shaded) areas are difficult to obtain; however, we did achieve entirely acceptable results with a 1-bit scanner later in our research.
      In our second test, an 8-bit greyscale Contex MultiScan 5000 scanner was tested at 500 dpi. A large format map measuring 138 by 70 cm (55 by 28 inches) was chosen in order to assess image resolution, potential file size and scanning time. The map took over six hours to scan and the resulting file, in TIFF uncompressed format, was approximately 350 megabytes. The cost of scanning one mapsheet would be approximately $200. The high cost, slow scanning time and large file size made this option unattractive. Rectifying this raster data to our vector boundaries took 27 hours (including 24 hours CPU usage) and required over a gigabyte of disk space. The output postscript file, at 400 megabytes, was plotted on the Synergy electrostatic plotter at Geological Survey of Canada, where it took five hours to process and about half an hour to plot. The boundary rectification worked well, but the image was very dark, resulting in an unsatisfactory product (although the contrast could likely have been corrected initially when scanned). The cost to scan 143 rural FED maps could have exceeded $20,000, and it would have taken up to five months to rectify these images to our boundaries using our existing methods and technology. Given a five month production schedule for the first series, it was clear that this approach was not feasible.
      Discouraged by the results of the scanning tests, but still confident that success was possible, we scanned a smaller format map, measuring 57 by 62 cm (23 by 25 inches approx.), on a Canon BubbleJet. Using a resolution of 300 dpi, we produced an 8-bit greyscale image. The resulting uncompressed TIFF file was 41 megabytes. Rectification of the image to our boundaries took about five hours. We also scanned a portion of a Mylar NTS base to produce an inset. This inset, measuring 12.5 by 12.5 cm (5 by 5 inches), resulted in a file size of 30 megabytes. For this exercise, basic cartographic elements such as title, scale bar and legend were added using ARCPLOT. The final postscript file, also 30 megabytes, was plotted on the Canon BubbleJet, with highly satisfactory results.

Strategic Partnerships
Having made a satisfactory prototype map, the next challenge was to establish an efficient production process. To meet this challenge, a strategic partnership was initiated among Les Services CartoGraphiques 2 + 1 Inc., a local company which has developed a sophisticated cartographic editing system called ACE (Automated Cartographic EditingTM), the Geological Survey of Canada, and the Geography Division of Statistics Canada.

Scanning Issues Resolved
The first task undertaken by Les Services CartoGraphiques 2 + 1 Inc. was to refine the scanning process in order to optimize the balance between image clarity and file size and to find a scanning device to accommodate large documents. Despite the good results with 8-bit greyscale scanning on the BubbleJet, it was only an option for the smaller format maps. Satisfactory results were finally achieved using a 1-bit scan, at 400 dpi, on a SG906 drum scanner. Using this scanner, the screened (shaded) areas of the original map appeared as a gray pattern in the image file so that the rivers, lakes and boundaries, with labels, were easy to discern. The uncompressed TIFF file, overlaid with FED boundaries, was plotted at NRCan on the Iris plotter. The result was a visually pleasing map. At $200 per hour to process an Iris proof, however, this would not be an option for production. Les Services CartoGraphiques eventually opted for a resolution of 800 dpi to improve ease of interpretation during onscreen cartographic editing. The average file size for this type of scan was 2.9 megabytes per map, compressed. Uncompressed, the average file size was 30 megabytes. The average time to scan one map was 25 minutes.

Plotting Issues Resolved
Both the Products and Services Division of Geomatics Canada and the Geoscience Information Division of the Geological Survey of Canada provided valuable assistance during our research into plotting technology. In addition to producing the Iris proof from the postscript file provided by Les Services CartoGraphiques, Geomatics Canada also used the Scitex system to process a number of the files we had created using the ARC/INFO image integration functions described earlier. The large size of these files and the length of time required for processing, however, made using the Scitex system untenable for low volume plotting. At GSC, extensive testing was carried out on both the Calcomp and Synergy plotters. The Synergy plotter, a high-end one-pass colour electrostatic plotter, provided quality plots at 400 dpi; the Calcomp plotter was also tested in case a contingency plotter would be required to handle our needs (an anticipated 600 plots in the first three-month period). The BubbleJet could potentially be used for plots not exceeding 57 by 62 cm (22 by 33 inches).

Conclusion
By working in partnership with other government agencies and private industry, an effective production process was developed in a very short period. Geography Division provides the vector boundaries and image files for the insets to Les Services CartoGraphiques. There, a customized cartographic mapping system built upon the ACE software is used to graphically overlay our boundaries onto the raster base and to create the cartographic output. The resulting postscript files are then sent to the Geological Survey of Canada for plotting on the Synergy electrostatic plotter.
      It should be remembered, however, that our solution is an overlay approach rather than true integration, the main limitation of which is that cartographic interpretation and manipulation - aligning the vectors to the raster background - is required for each map series. If the raster data had been georeferenced then the subsequent series could have been produced by, essentially, running the mapping system with a new set of boundaries. Although during preliminary research we sought to integrate the raster and vector data structures in a geographic information system (GIS), this did not turn out to be a viable option due to large disk space requirements, lengthy processing times and limited production time.
      A practical solution was found, however, by capitalizing on the expertise of private sector companies and other government departments. The success of this project attests to the state of the technology today and to the power of interagency cooperation. Finally, we are confident that the highly effective maps produced for this series will contribute appreciably to the 1996 Census.

Acknowledgements
The following people contributed invaluable technical assistance during this research and development: Peter Rushforth of Geography Division, Statistics Canada; Robert Burns of Geological Survey of Canada, NRCan; Assefa Yewondwossen and Normand Savard of Les Services CartoGraphiques 2 + 1 Inc.

1 A Federal Electoral District (FED) is any place or territory entitled to return a member to serve in the House of Commons, commonly referred to as a federal riding. A Census Commissioner District (CCD) is an administrative grouping of enumeration areas within a FED; generally, there are between five and eight CCDs in a FED. An enumeration area (EA) is a geographic area canvassed by one census representative and is the smallest geographic area for which census data are available.

About the Author:
Janet Lefebvre is a cartographer and product manager with the Geography Division at Statistics Canada, Ottawa, Ontario, Canada. She may be reached by telephone: 613-951-4987; fax: 613-951-0569; or via e-mail: [email protected]

Back