Making Government Data More “Hack”able
At Google, we think it’s pretty awesome that the government is holding a contest to mash government data. As a company with a lot of APIs, we love when people use them to make mashups, and as a company with a mission of making data universally accessible and useful, we love to see governments opening up their data. So we’ve arranged a couple of events in support of the contest. We held a 3-hour “MashupAustralia HackNight” on October 14th, we’re holding another one tonight, and we’re hosting the OpenAustralia HackFest from Nov 7-8. At our first hack night, we started off with talks on the contest, mashups and APIs, and putting data on maps. Then, since we conveniently had a representative from data.australia.gov.au at the event, we took the opportunity to search through their database and find useful datasets. We found a couple really good ones — the NSW Crime set and the Victoria Internet locations set — but we also found a lot of really hard to use sets. Since part of the goal of this contest is to figure out what characters define a useful dataset, and to encourage governments to adopt those, I thought I’d take this opportunity to give a few basic tips:
- Format: Generally not a good idea to share data in a binary format. It is more compact, but it is less accessible to developers. The best format is an API (REST or XML-RPC) or more simply, an RSS feed with all the entries. The next-best format is a well-structured CSV or spreadsheet, as many database systems can easily input those. If you are going to use a more obscure format, provide tips on how to use it. (This is something that the data.australia.gov.au site could also provide).
- Size: Some data sources provided zip files that were around 300 megabytes. Most developers aren’t going to download 300 megabytes if they don’t know what the data looks like, and what makes up that size. If you are going to provide a large file, I suggest also providing a preview file.
- Geo data: The vast majority of the data sources are related to geographic regions or points, but the vast majority also didn’t provide enough geographic data. If possible, you should provide the address and the latitude/longitude coordinate. If the data describes a region, provide an array of coordinates. A great example of this is the NSW fire feed – it provides an address, a point, and a polygon.
These are simple suggestions, but they can make a world of difference in terms of making data useful. We hope to see more government agencies opening up their data for developers and evaluating how they’re doing so. But we also hope to see developers using the current data as much as possible, and coming up with more ideas. Please join us at one of our future events!