Now we understand what a GIS is and what is can do, the next step is to understand how a GIS is made. But first you need to take a crash course in GIS data.

If you think the term GIS is vague, then you haven’t seen anything yet. There are a dizzying array of formats used for storing GIS data.

Before we delve into the various formats let’s take a look at some fundamentals. Primarily there are two main types of GIS data: vector and raster.

Vector Data

You can think of vector data as instructions for how to render data. The best way to visualize it is to think of it as a spreadsheet with columns that contain your regular data, but in addition it always has an extra column called “geometry”.

That column contains one or more coordinates that describe how to draw the point, line or polygon that represents that feature on the face of the earth.

Raster Data

If vector data is abstract, raster data is literal. Raster data is a bitmap image such as a TIFF or JPEG. This format is usually used for satellite imagery, aerial photography, elevation models and topographic maps.

Introducing the Shapefile

The Shapefile is the most common format in GIS. It’s a vector format that can be read by almost all GIS systems.

The name Shapefile is a little deceptive because the file is made up of at least four parts. The .SHP, the .DBF, the .PRJ and the .SHX.

It’s not important that you remember what’s in each part of of a Shapefile, but I think a brief explanation will help you better understand how GIS data is structured in general.

SHP

This file contains the geometry of each feature.

DBF

This is a dBase file which contains the attribute data for all of the features in the dataset. The dBase file is very similar to a sheet in a spreadsheet and can even be opened in Excel.

SHX

The .shx is the spatial index, it allows GIS systems to find features within the .SHP file more quickly.

PRJ

The .prj is the projection file. It contains information about the “projection” and “coordinate system” the data uses.

JARGON ALERT : Projections & Coordinate Systems

You could fill an entire book on this subject. But the short answer is that the earth is a three dimensional sphere and your screen is two dimensional and flat. So in order to display the earth on your screen it needs to be flattened.

Map projections distort the true size of countries

The flattening process creates distortions, and this is the reason that on some maps Greenland looks the same size as the whole of South America.

Greenland isn't as big as you think it is, thanks to projection distortion — The true size of...

There are many different formulas of flattening the earth, each designed to cause less distortion in specific places on earth.

You don’t need to understand how this process works as the data you use will already have the correct coordinate system. And if you are making a new dataset, the default coordinate system used in most GIS systems (WGS84) will be suitable 99% of the time.

Geometry Type

Every Shapefile can only contain one geometry type. This means that every feature in the dataset will be either a point, a line or a polygon. You can’t have a dataset that contains a mixture of geometry types.

Most beginner and intermediate level GIS users never need to look any further than the Shapefile for storing and sharing map data.

So that wraps up our introduction to the Shapefile.

Other Common GIS Data Formats

There a lots of other formats used in GIS. Each with their own distinctive benefits and drawbacks. Here’s a quick list of other common formats that you might come across:

CSV - Comma Separated Value File

Although the CSV isn’t exclusively a mapping format, it is often used in mapping. The beauty of the CSV is its simplicity. This simplicity means they can be read by almost any program including the Excel or Google Docs.

It’s literally a text file where columns are separated by commas and rows are separated by line breaks. When used in mapping, two extra columns are added to hold the x and y, or lat and lon.

For mapping purposes this format is only really used for sharing point layers. The downside of the CSV is that they are very easy to break. Just one comma in the wrong place and the file becomes unreadable.

File GeoDatabase

A file geodatabase is a collection of files in a folder on disk that can store, query, and manage both spatial and nonspatial data.

This is a popular format amongst advanced GIS users. But despite originally being touted as the favourite to replace the old but entrenched Shapefile as the defacto standard for sharing GIS data, the FileGDB never gained the popular support that many believed it would.

The main reason being its lack of support amongst open source GIS platforms.

Tab File

This format is very similar to the Shapefile and is the default format used by the MapInfo desktop GIS system.

KML

This is the format most likely to be known by non-GIS users, as it is the default file format of Google Earth.

Unlike the other datasets covered here, KML does more than just store geometry and attribute data, it also contains lots of configuration options for Google Earth maps.

This extra information however makes KML less portable, as the additional information is only relevant to Google Earth and isn’t of any value to other GIS systems.

GeoJSON

JSON or to give it its full name JavaScript Object Notation is a lightweight data interchange format.

It’s primarily used by software developers due to the ease with which it can processed by web applications.

GeoJSON is a form of JSON that also contains geometry data. It’s not often used as a format for sharing spatial data for human consumption but is very popular as an output for API’s (application programming interface).

GeoTIFF

The GeoTIFF is the most widely supported raster data format. TIFF is a bitmap image format similar to GIF, PNG or JPEG.

A GeoTIFF is just a regular TIFF that also contains special metadata that allows us to know where it should be placed on a map.

A GeoTIFF is an uncompressed format. There are many other raster formats that offer compression to reduce the filesize, but these tend to be proprietary formats that require additional paid software to use.

Sourcing GIS Data

Sometimes you will be lucky and have all of the data required for your maps up front. In other cases you will need to source and prepare data from a third party source.

A common task is taking data that contains location information such as an address or zip code and joining it with map ready data sourced from a third party.

For example you may have a spreadsheet that contains the number of sales made by county, in which case you will need to source the county map data and join it with your spreadsheet using a desktop GIS system, a process which we will look at in more detail later.

The good news is that most useful map data is freely available without restriction. The bad news is that there isn’t a single authoritative source of map data.

Usually locating data just requires a search engine and a little time and patience.

When looking for data it’s best to start with Google, the best approach is to search for the name of the dataset you want followed by the word Shapefile, e.g. “Texas Counties Shapefile”.

Search for texas counties shapefile — Search for “texas counties shapefile”

The map data will normally be provided as a zip containing the various files for the Shapefile.

Often the only way to really see what a dataset contains is to download it and open it with a desktop GIS. We will be looking at the most popular desktop GIS systems later in the book.

If you don’t have any luck with Google then the next step is to be more direct.

In North America and Europe most national and local government websites will have a GIS section that contains GIS data available for download. These sites are often not very user friendly so finding and downloading things can be a challenge.

Finally, if all else fails you can try contacting people who might have access to the data and asking if you can use it.

For example if you wanted to map the forests of California and couldn’t find the relevant GIS data, you could try emailing the forestry department to see if they can share the data.

Most open data available for download online is available for use without restriction, but it’s still polite to credit the source when using third party data or ask permission from the owner of the data first.

Preparing GIS Data

Later in the book I will be showing you the exact steps for preparing map data using a desktop GIS. But first I want to introduce to you the general concepts.

So let’s just jump straight in and take a look at the most common data preparation tasks.

Table Joins

Here’s the scenario. You have a spreadsheet that contains your data listed by zip code, and you would like to build a quantity map to visualize the distribution of the sales.

To do this you are going to need to find zip code map data and then join your spreadsheet to the map data so that it can be used in a GIS.

So first you go and find the data by Googling “Shapefile US zip codes”.

Once you have sourced the data you would open it in your desktop GIS and use the table join features to merge the datasets.

The table join works by choosing a column from the map data and the spreadsheet that contain shared unique values; e.g. the zip code.

The GIS system will then find all records that match and join them together by appending the columns from your spreadsheet to the end of the attributes for the map data.

Joining spreadsheet data with geospatial data

Geocoding Addresses

Another common scenario is the need to map addresses.

The process of transforming an address into a coordinate is called geocoding and is offered by most desktop GIS systems.

It works by passing the address to a geocoding service that stores data about the exact location of addresses for most developed nations globally.

The geocoder will then output a Shapefile that will contain your spreadsheet data in the form of attributes and a point on the map for each record in the new dataset.

Filtering a Dataset

Often map data sourced from third parties will contain features outside of your area of interest.

For example, you only want to map counties in Texas but the counties Shapefile provided by the Census Bureau contains every county in the county.

The task here is to remove all of the features from the dataset that you aren’t interested in.

In a GIS this is done by selecting features using an expression or query.

For example: “STATE” IS NOT ‘Texas’

In QGIS this expression would select all of the counties in the dataset which has a STATE attribute value other than Texas. Once selected, removing them is usually as simple as hitting the delete key.

QGIS expression — Using an expression to remove features in QGIS

The above examples are just an introduction to the concepts. Later in the book we will be showing exactly how to perform these actions using the a free desktop GIS system called QGIS.

Read on for the fundamental knowledge you’ll need to pick the right GIS solution for you.