Database
Management and GIS: What's the Connection?
By Damon Judd
A GIS is a DBMS
As many of you frequent readers
may know, a geographic information system (GIS) is much
more than just a software package for producing pretty
maps. A GIS can be described as a computerized system for
the collection, storage, manipulation, and output of
spatially-referenced information.
So what do Database Management
Systems (DBMSs) have to do with the application of GIS? A
database is defined to be "a collection of data
logically organized and centrally controlled to satisfy
the information and time requirements of a universe of
users." A DBMS then, is a computerized system (i.e.
hardware, software, and some sort of procedural language)
that provides the tools to organize and control a
collection of data.
The key concept here is
that a GIS is actually an extended form of a DBMS based on
spatial or locational descriptors. A GIS represents data
graphically and provides additional tools for analyzing
spatially-referenced data. A GIS is sometimes called a
spatial database management system.
The data that go into a GIS
define its utility for a particular application. For
example, a GIS intended for maintaining a timber inventory
for the U.S. Forest Service is of little use if it can
only create colored graphics of forested areas. If the
colors do not have a direct correlation to species, tree
density, crown cover, DBH, or other descriptive attributes
(data fields in a DBMS), they have no real meaning.
A GIS or DBMS is used to
translate data into usable information that can be
processed through cognitive reasoning into decision
criteria. Therefore, the decision is only as good as the
raw data that was fed into the database to begin with.
Good Data Management
For this reason, a strong emphasis on good data management
practices is critical to an effective GIS implementation,
and a thorough understanding of database concepts is
crucial to the decision process. To illustrate this point,
assume for a moment that you are an environmental
consultant who has been hired to identify and locate
contamination from a gasoline tank farm that is adjacent
to a school in your city. Typically, you would take
samples from the soil and groundwater in the immediate
vicinity of, and on the land surrounding the tank farm.
You would then send the samples to a laboratory for
analysis of hydrocarbon compounds. After the lab provides
the analytical results from sampling, you would then
synthesize, evaluate and present the results of your
findings to your client, who happens to be an attorney
working for the oil company who owns the property.
For purposes of
illustrating the importance of good data management, let's
assume that you were trying to save your client money and
you used a spreadsheet to record the analysis results that
came back from the lab. You spent several days entering
the data into your spreadsheet and checked the numbers
three times to make sure they were all correct. You then
decided to produce a map using your newest version of
Excellent CAD for Windows which you purchased for the
exceedingly low price of $49.95.
But first, you decide to
sort the spreadsheet results by contaminant concentration
so that all the high values are at the top of the
spreadsheet. Except you forget to include the columns
containing X,Y coordinates for the sample locations -
which were provided on a disk to you by Surreal Surveyors
Inc., much to your satisfaction. You then use Excellent
CAD's capability to produce a high-quality graphical
illustration of your sampling results.
You produce and bind 25
copies of your final report and proudly present it to your
client two days before the deadline for submittal, and
$500 under budget. He goes to court, presents your
findings, and gets fined $10,000 for uncontrolled
contamination because the map you produced shows high
concentrations in samples that should have been clean
dirt.
You get a nasty phone call
from the attorney demanding to know how you could have
produced such erroneous results. You again study the
numbers in your spreadsheet, check them twice more against
the original reports that the lab delivered, and against
the surveyor's spreadsheet. The numbers all match. What
could have gone wrong?
The answer, of course, is
that a lot more could have gone wrong even in this simple
example. As projects get larger and more data are
collected and are required for ongoing analyses, and
reporting and querying by multiple end users needs to be
supported for a variety of purposes, the need for proper
data management methods increases exponentially.
What does this have to do
with GIS? By using a GIS to store and manage data such as
the type of data described in the above scenario,
efficiencies in data management can be achieved. A GIS
provides the graphical interface needed to view the data
directly after it is entered into a database, as long as
the key connection is made. That connection is typically
based on a unique data column called a key field or index
column. A key field is stored both in the GIS as an
attribute for a spatial location (such as sample-id) and
is linked to a similar data field contained in the data
table that contains analysis results.
Relational Database Constructs
This concept of linking graphic objects with data
descriptors is based on relational database theory. A
relational DBMS is a database system that uses a
table-oriented storage and retrieval structure with
explicitly defined relationships between data tables.
Relational operators
support algebraic (join, divide) and boolean logic (union,
intersection, difference) constructs to manipulate data in
tables. Relational DBMSs are especially useful for
conditional retrievals ("select all wells deeper than
100 feet with 4 inch diameters"), ad hoc queries
("give me all wells installed before Jan. 1 today,
then tomorrow give me all wells installed after Jan. 1
that were not included in the previous query"), and
changing information needs ("give me all wells
installed after Jan. 1 with 4 inch diameters that are
deeper than 150 feet"). Relational algebraic and
boolean constructs provide the primary methods for
creating the connection between a GIS and a DBMS that
stores related, non-spatial data.
For example, in the
scenario described above assume that you decided to store
all the sampling locations for the tank farm
characterization effort in a GIS. The laboratory
analytical results are then loaded into tables in a
relational DBMS. The data field for sample-id is stored in
the GIS as an attribute of the sample point locations. In
one or more of the data tables stored in the DBMS,
sample-id is a data field that contains the same string of
characters as that stored in the GIS. In the same DBMS
table are the contaminant concentrations derived from
sample analyses.
Using the relational
"join" construct, a dynamic link between the
spatial locations of sampling events which are stored in
the GIS can be connected with the contaminant
concentrations stored in the DBMS table. This has the
effect of adding a new set of descriptive attributes for
those sample locations in the GIS which can be displayed
as symbols with colors or patterns that are driven by
concentration values.
This link between
graphically-based locational data and the more
descriptive, non-spatial data elements stored in tables is
one of the most powerful concepts in applying the
analytical capabilities of a GIS. Only after this concept
is mastered can the true power of modern GIS capabilities
be unleashed.
Back
|