Earth Observation Magazine

Current Issues
Archives
Media Kit
Editorial Guidelines
About Us
Contact Us
Subscribe

HOME > ARCHIVES > 1995 > JUNE

Database Management and GIS: What's the Connection?
By Damon Judd

A GIS is a DBMS
As many of you frequent readers may know, a geographic information system (GIS) is much more than just a software package for producing pretty maps. A GIS can be described as a computerized system for the collection, storage, manipulation, and output of spatially-referenced information.
     So what do Database Management Systems (DBMSs) have to do with the application of GIS? A database is defined to be "a collection of data logically organized and centrally controlled to satisfy the information and time requirements of a universe of users." A DBMS then, is a computerized system (i.e. hardware, software, and some sort of procedural language) that provides the tools to organize and control a collection of data.
      The key concept here is that a GIS is actually an extended form of a DBMS based on spatial or locational descriptors. A GIS represents data graphically and provides additional tools for analyzing spatially-referenced data. A GIS is sometimes called a spatial database management system.
      The data that go into a GIS define its utility for a particular application. For example, a GIS intended for maintaining a timber inventory for the U.S. Forest Service is of little use if it can only create colored graphics of forested areas. If the colors do not have a direct correlation to species, tree density, crown cover, DBH, or other descriptive attributes (data fields in a DBMS), they have no real meaning.
      A GIS or DBMS is used to translate data into usable information that can be processed through cognitive reasoning into decision criteria. Therefore, the decision is only as good as the raw data that was fed into the database to begin with.

Good Data Management
For this reason, a strong emphasis on good data management practices is critical to an effective GIS implementation, and a thorough understanding of database concepts is crucial to the decision process. To illustrate this point, assume for a moment that you are an environmental consultant who has been hired to identify and locate contamination from a gasoline tank farm that is adjacent to a school in your city. Typically, you would take samples from the soil and groundwater in the immediate vicinity of, and on the land surrounding the tank farm. You would then send the samples to a laboratory for analysis of hydrocarbon compounds. After the lab provides the analytical results from sampling, you would then synthesize, evaluate and present the results of your findings to your client, who happens to be an attorney working for the oil company who owns the property.
      For purposes of illustrating the importance of good data management, let's assume that you were trying to save your client money and you used a spreadsheet to record the analysis results that came back from the lab. You spent several days entering the data into your spreadsheet and checked the numbers three times to make sure they were all correct. You then decided to produce a map using your newest version of Excellent CAD for Windows which you purchased for the exceedingly low price of $49.95.
      But first, you decide to sort the spreadsheet results by contaminant concentration so that all the high values are at the top of the spreadsheet. Except you forget to include the columns containing X,Y coordinates for the sample locations - which were provided on a disk to you by Surreal Surveyors Inc., much to your satisfaction. You then use Excellent CAD's capability to produce a high-quality graphical illustration of your sampling results.
      You produce and bind 25 copies of your final report and proudly present it to your client two days before the deadline for submittal, and $500 under budget. He goes to court, presents your findings, and gets fined $10,000 for uncontrolled contamination because the map you produced shows high concentrations in samples that should have been clean dirt.
      You get a nasty phone call from the attorney demanding to know how you could have produced such erroneous results. You again study the numbers in your spreadsheet, check them twice more against the original reports that the lab delivered, and against the surveyor's spreadsheet. The numbers all match. What could have gone wrong?
      The answer, of course, is that a lot more could have gone wrong even in this simple example. As projects get larger and more data are collected and are required for ongoing analyses, and reporting and querying by multiple end users needs to be supported for a variety of purposes, the need for proper data management methods increases exponentially.
      What does this have to do with GIS? By using a GIS to store and manage data such as the type of data described in the above scenario, efficiencies in data management can be achieved. A GIS provides the graphical interface needed to view the data directly after it is entered into a database, as long as the key connection is made. That connection is typically based on a unique data column called a key field or index column. A key field is stored both in the GIS as an attribute for a spatial location (such as sample-id) and is linked to a similar data field contained in the data table that contains analysis results.

Relational Database Constructs
This concept of linking graphic objects with data descriptors is based on relational database theory. A relational DBMS is a database system that uses a table-oriented storage and retrieval structure with explicitly defined relationships between data tables.
      Relational operators support algebraic (join, divide) and boolean logic (union, intersection, difference) constructs to manipulate data in tables. Relational DBMSs are especially useful for conditional retrievals ("select all wells deeper than 100 feet with 4 inch diameters"), ad hoc queries ("give me all wells installed before Jan. 1 today, then tomorrow give me all wells installed after Jan. 1 that were not included in the previous query"), and changing information needs ("give me all wells installed after Jan. 1 with 4 inch diameters that are deeper than 150 feet"). Relational algebraic and boolean constructs provide the primary methods for creating the connection between a GIS and a DBMS that stores related, non-spatial data.
      For example, in the scenario described above assume that you decided to store all the sampling locations for the tank farm characterization effort in a GIS. The laboratory analytical results are then loaded into tables in a relational DBMS. The data field for sample-id is stored in the GIS as an attribute of the sample point locations. In one or more of the data tables stored in the DBMS, sample-id is a data field that contains the same string of characters as that stored in the GIS. In the same DBMS table are the contaminant concentrations derived from sample analyses.
      Using the relational "join" construct, a dynamic link between the spatial locations of sampling events which are stored in the GIS can be connected with the contaminant concentrations stored in the DBMS table. This has the effect of adding a new set of descriptive attributes for those sample locations in the GIS which can be displayed as symbols with colors or patterns that are driven by concentration values.
      This link between graphically-based locational data and the more descriptive, non-spatial data elements stored in tables is one of the most powerful concepts in applying the analytical capabilities of a GIS. Only after this concept is mastered can the true power of modern GIS capabilities be unleashed.

Back