So it’s Wednesday night and I’m feeling a little left out because I seem to be the only person in web mapping yet to map a map related to the Winter Olympics in Sochi.
So I bit the bullet and here’s what I made:
I thought I would put together a quick post about how I made this map as the data preparation part was pretty interesting.
I couldn’t find a Shapefile with the data already nicely prepared for me, so I knew I would have to roll my own. I had already found a table containing the data on Wikipedia, and located a Shapefile of country bounds. The Shapefile already contained the three letter country codes so to get the medal data into the Shapefile I needed to create a spreadsheet of the medal data along with the country code, then I could join the two datasets together using QGIS. I also wanted to show the flags for each country in the popup window on the web map so would also need to add a link to an image of the flag for each country.
Although preparing this data isn’t difficult it certainly is tedious and it’s also time consuming, so I decided to try letting someone else do it for me.
Enter Mechanical Turk
Mechanical Turk is a beta service from Amazon which is part of their Amazon Web Service suite of products which now powers half the internet. It allows you to create a web form with a series of questions and provide a spreadsheet (CSV) of items/features/places etc that you want the form completed for. Each form is know as a HIT (human intelligence task).
The form was pretty easy to make, it involved some very simple HTML and a sample template was provided to copy from.
Once you’ve made the form you start a new “batch” a batch is spreadsheet of the records you want processing. For me the batch was a CSV containing the names of the 38 countries that are still around and have ever won a medal at the Winter Olympics. As the “requester” you can say how much you are willing to pay the “workers” for each HIT in the batch and how long you will allow for it to be completed.
First Attempt: Fail
For my first attempt I made a bid of 20¢ per HIT, and gave a three hour time limit. Almost as soon as I’d submitted it I could see people going to work on the task and I could also see the results as they became available.
Straight away it became clear that some workers had misunderstood the task, they were putting in the medal tally for the combined total of the summer and winter Olympics. In the results I accepted the correct ones by pressing a button and rejected the incorrect ones, almost immediately I started getting messages from the Mechanical Turk system from workers who weren’t happy about the rejection as they said my HIT definition wasn’t clear enough.
It all started to look like a bit of a mess so I cancelled the batch and wrote off the $10 I’d already paid.
Second Attempt: Success
Now knowing that the form needs to be extremely clear in order to get the best results I went back added bold font and capitals to the important parts and started the batch again this time raising the bid to 30c.This time the result was amazing, within 20 minutes the batch was complete and there were only a couple of tiny errors that I could easily clean up myself.
It was very impressive, not only was this system much cheaper in real terms ($11.40) than me doing it myself it was also much faster.
- Make your instructions EXTREMELY clear
- Raise your bid to get faster results
- Check and recheck the form before you start the batch
This was really just a test to check out the capabilities of this service, next time I would like to try it with a much larger dataset to see whether these results scale.