There are five kinds of data to be represented in a GIS, see figure 1.

Point features, e.g. location of soil samples, boreholes,
manholes, rain gauges, burst water mains, pumping stations, trees,
buildings. The points consist of a number of nodes with no thickness
and is often referred to as zero dimensional. One method to store
a point feature in a GIS is as a table in the data base management
system
| Point ID | X coordinate | Y Coordinate | Pointer to attribute data |
| 1 | 30123.6 | 19782.4 | Point 1 |
| 2 | 30167.3 | 19745.7 | Point 2 |
| 3 | 34952.2 | 19648.1 | Point 3 |
Where the pointer to attribute data is a link into another data base table when other data about that point is kept, for example that it represents an access chamber to a sewer system and that it has properties such as date of construction, condition, size, material etc. This is a link into a full data base management system so further relations are permitted from this point
Linear features, e.g. roads (on small scale maps), rivers,
pipe lines, power lines, elevation contours. The nodes are linked
with arcs, each with a number of vertices (the simple arc is a
straight line). Between vertices the arc is usually considered
a straight line but curved links are possible. Line data can either
be non-branching lines, or tree or network structures. In a network
there are more than one routes between two nodes. This data has
one dimension, that is, it does not have thickness and care must
be taken in the definition of the system that a loop is not confused
with a polygon. A simple structure for a line feature or network
is:
| line reference |
| Attribute Pointer |
| arc 1 |
| arc 2 |
| --- |
| etc |
| Arc reference | X coordinate | Y Coordinate |
| Node 1 | 30123.6 | 19782.4 |
| Node 2 | 30167.3 | 19745.7 |
| Vertex 3 | 30952.2 | 19648.1 |
| etc |
Areas (polygons) with common properties, e.g. pressure
zones, catchments, contributing areas, soil association mapping
units, climate zones, administrative district areas, buildings
and other land cover. The polygon consists of a number of arcs
or linear features that form a closed loop without crossing over
one another. The arcs are usually straight between vertices but
may be curved.
| Polygon Reference | Attribute Pointer |
| 3 | Point 1 |
| X coordinate | Y Coordinate |
| 30123.6 | 19782.4 |
| 30167.3 | 19745.7 |
| 30952.2 | 19648.1 |
| 30123.6 | 19782.4 |
Simple polygon structure.
The simple polygon representation shown above where a quadrangle is represented, as used in CAD (or DXF format), is of little use in GIS. The 3 major problems with simple polygons are:
In GIS therefore area data is represented as topological structure
in one of a number of ways. The Arc/Info method of storing this
information is shown in figure 7. A separate list is used to hold
information about islands and disjointed structures. Different
themes can be represented on the same coverage and there is no
requirement that polygons do not overlap. For example a single
coverage may contain polygons representing landcover, whereas
another other polygons may contain the contributing areas to inlet
nodes of a storm water drainage system, see figure 6. The polygons
naturally overlap and the intersections of these polygons provides
one of the main uses of GIS and is known as overlay to reflect
the graphical process of overlaying one theme upon another.
Actual or potential surfaces, e.g. ground elevation, variation of mean annual temperature, spatial distributions of rainfall, population densities. These are discussed in detail in the section on the digital elevation model (DEM)
Temporal elements, e.g. changes in land use over time, changes to a pipe network, rainfall records or streamflow records. These are not well represented in current GIS technology, but newer object oriented GIS should make this more readily available
Raster Representation
Figure 6 shows two polygons intersecting. The numerical calculation required to calculate either the intersection or the join of the 2 polygons is quite intensive. The whole process is made much simpler if the polygons are all the same shape and size, preferably rectangular. This use of rectangular polygons is known as a cell, grid or raster representation and provides one of the simplest representations for GIS and spatial statistical modelling. Figure 7 shows the same polygon data represented as a vector and as a raster. Note that the individual cell values can be either numbers for computation, such as elevations or pointers to a database with further attributes.
The ease of programming raster GIS systems and low computational overheads makes them very suitable for natural or environmental modelling. The size of cells used in GIS modelling requires careful thought before data entry and modelling can begin. I have used cells of 1m square for urban drainage work where we were only interested in a small catchment and 250m square for land evaluation where we were studying the whole of Ghana.
There is always error in the representation of real world structures
as small cells and it is important to realise the trade off between
small cells that accurately represent the real world but carry
a lot of computational overhead and large cells that are much
more efficient but introduce large errors. Fortunately computers
are getting more powerful and disk drives much larger every year
so these problems become less important and we can select cell
sizes to represent the natural variation we observe. For example
a soil association boundary will never be known on the ground
to better than 50m accuracy, therefore using any cell size less
than 50m is pointless. My recommendations on cell size are as
follows:
| Data derived from 1:50 000 maps | 50m |
| Data derived from 1:10 000 maps | 10m |
| Data derived from 1:1250 maps | 1m |
| Any modelling with satellite remote sensing | resolution of the sensor (often 30m) |
| Nation wide land evaluation | 250m |
| Studies involving geodemographics | 200m |
| Physically based rainfall runoff modelling | 20-40m (it is debatable whether it is truly physically based at this resolution but this will allow realistic computation times) |
| Flood plane studies | 50m |
Most GIS that use raster data have some means of compressing the data using either run length encoding, quad trees or any of the loss less schemes for computer graphics. Unless you intent to write your own modules and one of the big attractions of raster GIS is that you can write your own modules then, then the compression technique is irrelevant to the user. However, it does mean that raster GIS data bases can be as small as their vector counterparts.
With some raster GIS all overlays must be carried out with identically sized cells and all resampling must be carried out manually before the overlay modelling begins. With other GIS the resampling is carried out dynamically to either the largest grid size of all the overlays in the model or some user specified grid size.
Raster and vector GIS are traditionally compared and the author states his preference for one or the other, but most modern GIS have vector and raster components which can often be inter linked seamlessly. Many tasks are easier to carry out in each form, for example cadasteral work requires the accuracy and precision of a vector GIS, whereas determining the water requirements of a region can be best done using a raster representation.
GIS software comes in a variety of packages. The two main types, as already described, are the vector based system and the raster based system. More modern systems permit the total integration of raster and vector data, allowing the advantages of both methods to be enjoyed, with few of the disadvantages.
Vector systems are often supported by traditional DataBase Management Systems (DBMS). The most common conform to the relational model, see Avison (1992). Arc-Info, the most widely used vector GIS package, follows this approach, Info being a relational DBMS in its own right The relational model is the basis of most DBMS used in organisations and businesses. This underlies the vector model's principle use as an asset or resource inventory system. A DBMS should allow access to appropriate parts of the database to different types of user, and prevent unauthorised viewing or changing. It should also maintain data concurrency, provide archive facilities and present a simple interface to the user for manipulating the data
Raster systems generally do not employ such strict data management. They have developed from image processing systems and are often used by a single user. Clearly these are generalisations, and many packages will embody aspects of both systems.
The most up-to-date systems are described as 'object oriented'. The distinction of object oriented systems is that all data items are described as being of one or more object type; e.g. a linear feature, a point, a vector polygon, a regular raster, a raster cell, a TIN, a DEM, etc. In addition to storing the description of the object, the methods of displaying, plotting and general manipulation are also carried with the object type, this is known as encapsulation.
Objects are hierarchical; rivers, roads and pipes will be objects that are descended from the linear object, each will, therefore, have the properties, behaviour and methods inherited from the linear feature, such as length. However they will each have behaviour and properties that are distinct; roads will have classes (i.e. 'A' roads and motorways); pipes and roads will not be able to connect to form a network.
The object oriented paradigm is currently of great interest to the computer science community. Object oriented programming languages, databases and, of course, GIS are under development, (see Worboys et al, 1990). There are several advantages that are stressed by advocates of the object oriented approach;
(i) it is intuitive as people naturally think in terms of objects;
(ii) by specifying behaviour, inconsistencies in the database can be reduced, for example sewers and water mains objects exhibit different behaviour and should not be part of the same network;
(iii) developing applications is easy; by having a hierarchical structure new objects are easily created.
There are a variety of ways of storing geographical data and different ways of processing the data. The choice of data structure is largely dictated by the use the data is to be put to, the capabilities of the GIS being used and, to a large extent by the existing data formats .

Figure 7(a) Simple vector representation, using
the topologic model presented by Dangermond (1982), more complex
structures are used to improve access times. (b) Raster representation,
a raster layer is required for each attribute to be represented.
Avison DE (1992), Information Systems Development A Database
Approach, 2nd Edition, Blackwell Scientific Publications.
Bradbury PA, Lea NJ and Bolton P (1993), Estimating Catchment
Yield: Development of the GIS-based Calsite Model, Report OD125,
April 1993, HR Wallingford.
Burrough PA (1986), Principle of Geographical Information Systems
for Land Resources Assessment, Clarendon Press Oxford.
Carter (1989), On Defining the Geographic Information Systems,
Fundamentals of Geographic Information Systems: A compendium,
edited by Ripple WJ, pp3-6.
Dangermond J (1982), A Classification of Software Components
Used in Geographic Information Systems, Proc. US - Australia Workshop
on the Design and Implementation of Computer Based Geographic
Information Systems, Honolulu Hawaii, pp70-91.
Elgy J, Maksimovic C and Prodanovic D (1993), Using Geographical
Information Systems for Urban Hydrology, International Conference
on Application of Geographical Information Systems in Hydrology
and Water Resources, Vienna, Austria.
Lillesand TM and Kiefer RW (1987), Remote Sensing and Image Interpretation,
2nd Edition, Wiley.
Sibson R (1978), Locally Equiangular Triangulation, The Computer
Journal, v21 n3, pp243-245.
Siyyid AN (1993), The use of METEOSAT data for rainfall/runoff
modelling, PhD. Thesis, Aston University, May, 1993.
Worboys MF, Hearnshaw HM and Maguire DJ (1990), Object-Oriented
Data Modelling for Spatial Databases, Intention Journal of Geographical
Information Systems, v4 n4, pp369-383.
References