One of the ideas that we at Open Knowledge Melbourne had as an activity for our weekly meetups was to set aside some time for people to browse the various Australian open data portals and assess the quality of the data sets they hold. The purpose for this being to make the community aware of whether the data set they’re looking to download would be immediately machine readable, or whether it would require some cleaning up, as well as how easy it is to download the data.
This goal became a larger deal for me when I tried to download some geospatial data from DELWP on the GovHack weekend, only to find that I had to download it via the Spatial Datamart, which was sporadically inaccessible, and which had a delay on delivering the link to the downloadable data to my email inbox (Apparently the Data Vic custodians made this clear via Twitter several days in advance, but I guess I missed the memo). This would have been a fine platform when the data was being used almost exclusively by internal government departments who were happy to wait a couple of days for the data to appear, but for somebody starting a hackathon and needing access to the data right now to start his project, this is somewhat of a pain. To be clear, I don’t blame the data owners for not providing a better access to the data; they’ve got limited resources, and anything they can give us is better than nothing, but I think it’s important to make clear to the community that they’ll have to deal with these hurdles so they know ahead of time, and also highlight which data owners it’s most worthwhile offering help to in order to make their data more accessible.
Anyway, with all the amazing speakers we’ve had at our meetups, as well as the preparation for GovHack, this plan kind of fell by the way site. Now though, we’ve got some time, and I suggested that in future we could perhaps allocate just 10 or 15 minutes of most nights to each picking a data set or two, trying to download it and having a quick look to decide on its integrity. So in an attempt to just get the ball rolling, I’ve created a Google Docs Spreadsheet to start listing some of the slightly more tricky data sets. Ideally the findings from this spreadsheet can, and should, be added as comments to each data set, but I figured a central location for us to contribute to to begin with was a good start. Currently the document requires specific permission for users to edit it; I’m not sure whether it’s worth allowing the entire world to make edits to it. In any case, if you want access, you can click the Share button in the top-right corner and press the Request Access link in the bottom-right of the pop-up to request edit access from me.
This is a best-effort work in progress, and I’ve no idea how it’ll go, but let’s give it a shot!