Welcome to Part 5 of Mango's video tutorial series on sourcing and preparing map ready data.
The series will help you to source third party geospatial data for your mapping projects, clean and prepare data for upload to an online GIS mapping platform, and visualizing your data on interactive web maps that can facilitate decision making and uncover hidden trends that can benefit you and your organization.
This video will teach you how to take your spreadsheet data that contains regions such as state names, zip codes, or county names, and join that spreadsheet data with a spatial file containing the geographic shapes of those regions to create an intereactive web map.
If this is your first time viewing this series, we recommend you start with Video 1: Why Maps are Superior to Spreadsheets.
To complete these tutorials, you'll need to download QGIS, and sign up for a free 30-day trial of Mango.
Watch the next video:
How to Map Addresses From an Excel Spreadsheet
Watch other videos in this series:
- Video 1: Why Maps are Superior to Spreadsheets
- Video 2: How to Find Data for Your Maps
- Video 3: How to Clean Up Your Map Data
- Video 4: How to Stack Layers on Your Map
- Video 5: How to Map a Spreadsheet Containing Regions or Areas [WATCHING]
- Video 6: How to Map Addresses From an Excel Spreadsheet
Exclusive freebie! The Beginners Guide to GIS eBook. 108 pages, completely free.
Grab your copy today.
VIDEO TRANSCRIPT
Hello and welcome back!
In the previous video I showed you how to search for, source, and download existing map datasets.
So what we did is we went and looked for a data set that contains a map data for counties in the US, and we sourced that data because we wanted to join it with our sales spreadsheet that we see on screen at the moment that contains our sales data for Texas.
So what we're going to show you in this video is we're going to show you how to join those two data sets together using QGIS.
So before we begin, I'm going to help you to understand how a join works. Now a join works by taking two different data sets, and matching a record, an attribute for an individual record in both datasets.
For example, we're going to be taking a look here at Potter County.
So potter county has an ID and the ID is 48375. Now if we go to QGIS and we select Potter County and we look at the attribute data, we can say it has a name, “Potter”, and it has an ID 48375.
So this means that there are two columns that we can use to join this data together, because there are two columns in both datasets - the spreadsheet and the Shapefile that contain the same data. One is the name and one is the ID.
Now, whenever we're doing joins, we always prefer to use an ID, and the reason for this is an ID isn't ambiguous.
A name can often be misspelled or there can be various letters that come before a name or after a name, and occasionally we'll have a data set where there will be multiple records for the same name - for example counties in the US there are multiple Orange Counties, so the join won't be able to complete correctly because it won't know which Orange county to join to, so we always prefer to use an ID.
So that's how a join works: QGIS is going to look through the spreadsheet and it's going to find, it's going to go through each of the records in the spreadsheet it's going to look at the ID and then it's going to find a record in the map data so that has the same ID, and when it finds it it's going to take those columns from the spreadsheet and it's going to append them to the attribute data for that specific county, or feature as we like to say in GIS, in the map data.
Okay so let's jump in and show you exactly how that works on a practical level.
So we already have QGIS it's open; we already have our County shapefile data open, so the next thing we need to do is we need to open our spreadsheet in QGIS.
Now we do this by clicking on the icon in the left toolbar the “Add Delimited Text Layer” icon, and then we need to go to our spreadsheet. So our spreadsheet has to be uploaded to QGIS in a format called CSV which stands for comma separated values.
CSV is a format that you can export from all spreadsheet programs, so whether you're using Excel or OpenOffice, just go to your spreadsheet, right click and choose save as and then choose the format CSV.
So you can have to change your spreadsheet into a CSV before you can upload it to QGIS.
Now I've already done that, so I have the sales data here as a CSV. I'm going to press open, and then what I see is this dialog box it shows me a preview of the data, so we can see it right that's the correct data; but the OK button is currently grayed out - I can't move forward.
The reason for this is QGIS is a mapping program.
It’s looking in this data and it's trying to find some geometry so it knows where to place this data on the map, but our spreadsheet doesn't contain any geometry, so we need to choose the “No Geometry” option here.
Once we've done that we can press OK, and we can now see that we have the sales data and the county data.
Now that we have both datasets open and available in QGIS, let's take a look at how we do the actual join.
We do that by right-clicking on the county data, so we always right click on the map data - not the spreadsheet, and then we click on properties and then in the left hand so while we go to Joins.
We press the Add button at the bottom and now it will give us our options.
So we're going to Join later - the county layer, to the sales data and the fields that we're going to use for the join. Now if we remember in our spreadsheet - let's go back to the spreadsheet - our ID was contained in a column called FIPS, so in QGIS, we're going to do the join using the column FIPS, and the equivalent column in the map data as you remember was GEOID.
So those are the two columns we're going to use for the join, and we're going to choose which fields are joined. Wow both datasets have the name and have the ID so we don't need to append those.
What we're interested in appending to the map data is the sales, the population, and the income.
The final option is this Custom field name prefix.
We only need a prefix if we have column names in both data sets that have exactly the same name.
That's not the case for us, so we're going to delete the prefix, and we're going to press OK, and the little ticks as it's done and it's saved in memory and we can press OK.
So, we’ve pressed OK and nothing's happened. So how can we tell whether our join has actually worked?
What we can do is we can right click on our County's data, we can say open attribute table, and if we scroll across will see that we now have three new columns in our map data: sales, population, and income.
Now a lot of them have null values because remember our County data is for the whole of the US, but we only have sales data for the counties that are in Texas.
OK, so if we order them by the number of sales all of the counties in Texas will appear at the top here, so we can see that our join between these two data sets has been successful.
Now this data set is going to need some cleaning. It has a lot of columns that we're not actually interested in, and it also has features for state features and counties outside of Texas that we're not also interested in, but we're going to look at cleaning up the data in a later video.
For the moment, I’m going to show you that we now have a working dataset. We have a data set that we can upload some Mango and do something meaningful with.
Now before we go ahead and upload that to Mango, we actually need to save the join.
When you first perform a joint in QGIS, it's only saved to memory; it's not saved the physical file so what we're going to do, is we're going to right click on the map data we're going to press Save as…, and we're going to save this as a Shapefile called “Sales Data”, and we are going to save it in the same folder as the counties.
So I'll press save, and our map data is now saved and it's now been added to the map.
Now our old County data that has the joins in memory, we don't need that anymore, so we can just go ahead and close that down, and now we have access to the sales data which is displayed on screen.
So right, let's go ahead and let's upload that data now to Mango to show you that we have a working dataset. So here we have a new map in Mango.
I'm going to go to the layers, I'm going to click the Add Layer button, and now I’m going to go to my local file system, and I’m going to get the new sales data that we made.
Now in Mango we only need to upload for parts of the Shapefile we need the SHP, the SHX, the DBF and the PRJ. Once we have those four parts I’ll press open and Mango will begin the upload.
Once the upload is complete we just hit the Save key, and then Mango is going to ask us how we would like to style this data.
So before we go ahead and style it, let's just close down this dialogue, and let's take a look at what we've got. OK, so here we have a map of the whole of the USA, and it has all of the counties all colored in blue.
Now that's not really what we're looking for. Remember our original map had each of the counties colored based on the number of sales, population, and household income - only for Texas.
So to do that in Mango, let's go back into the layer and open up the settings again, and we're going to choose a quantity map because we want to show each of the counties based on the number of sales.
So we're going to click on quantity, and then we're going to decide which column we'd like to base that quantity legend on. We're interested in sales.
We're going to have seven class breaks in the legend, and we're going to use a yellow to red color ramp.
I'm just going to change the outline to a darker color that will work better with that red color ramp, and then we go ahead and press Save.
Now as you can see with color-coded only the counties in Texas.
The reason only the counties in Texas have been color-coded is because all of the other counties has null values for their sales data - it's only the counties in Texas that contain any sales data, so that have been colored in on this map.
So as you've seen the process of downloading some map data, joining with your spreadsheet and then opening up in Mango in order to make quantity based maps, is an extremely simple process.
I recommend you to go ahead and try this yourself using QGIS and your own data so you can become familiar and comfortable with the process.