This site was developed to support the Government 2.0 Taskforce, which operated from June to December 2009. The Government responded to the Government 2.0 Taskforce's report on 3 May 2010. As such, comments are now closed but you are encouraged to continue the conversation at

Data data everywhere but not a scrap of sense

2009 November 16
by Pip Marlow

It was exhilarating to see the enthusiasm around the GovHack event as hordes of developers enjoyed pulling together data sets in new and innovative ways. It is certain that it will provide enthralled users with not only access to, but also insight from, the resulting information combinations.
It was also heartening to see Pamela Fox provide some best proactive tips for developers and data owners in her post stressing the value of structure and standardisation where possible. But I was reminded yesterday in a discussion about social software how much of our total information is now in an unstructured format, where the value lies in the ability to understand the context and meaning of data and its relationship to other information which is not supported in a nice neat way.
This became apparent at the Public Sphere event that Pia Waugh championed earlier this year where everyone struggled to consolidate the extremely valuable – but vast and unmanageable – variety of input in all sorts of different forms. Oral, written, blog posts, tweets, videos,… and many more.
The team did a great job at pulling together a useful summary and set of recommendations but I was left thinking that the increasing torrent of data is leading to diminishing returns as individuals initially try to monitor the real-time fire hose of information and secondly, as they pause to reflect, analyse, and try to derive value from a range of inputs.
So, what am I saying here? Basically that the agenda of Gov 2.0, and of the whole project of providing transparency and openness in government data, cannot be met unless we deal with the challenge of finding the “jewels”, the “gems” in the unstructured data itself. Surely, given that we have a range of companies working with us on the Gov 2.0 project, and we have recognised that utilising the power of semantic technologies is going to play a big part in allowing us to address this issue, would it not be sensible and timely to integrate some of the processes that are already being developed into the way the Gov 2.0 Task Force itself operates – the whole mantra of “eating our own dog food”. A radical thought but perhaps with some merit.

5 Responses
  1. 2009 November 16
    Kerry Webb permalink

    Or perhaps if we just concentrate on giving the less-structured data just that bit of extra polish to make it a little more of a gem.

    The Taskforce did a good job putting together the data sets, but a little more quality control and the contribution of some information specialists would have improved the situation.

  2. 2009 November 16
    simonfj permalink

    Thanks Pip, Kerry,

    Can i get a bit philosophic, cause we need to be. You make the aim of what we are attempting here quite clearly.

    the agenda of Gov 2.0, and of the whole project of providing transparency and openness in government

    I’ve left off the last word data, as I think it throws everyone of the scent of a common sense. You might believe that “utilising the power of semantic technologies is going to play a big part in allowing us to address this issue”. I don’t. To me, what Nic and Pia have done is far more important. They’ve used various technologies to improve digital engagement. (and I’ve said all i have to say by doing some introductions.

    It’s hard, I admit, as we move from an age when we expected our .gov (and .edu) institutions to just ‘deliver a service’ to one where we can compile many of services ourselves; if only we had the access rights into a few publicly funded databases. As I read Kevin’s latest entry, I can see it’s not just me that simply wants access, after which government can just get out of the way. I’ve tried to raise a discussion about AGOSP being a way in as well as a way out (of .gov databases) but I’m afraid all our .gov friends can see here is demands for another service to “be delivered”.

    The one thing I’m hoping for, as Nic and Pia offer their recommendations to a Minister (later this year), is that we might see the tools, which Kevin and Co will get their hands on when they’ve got their AAF credentials, used to include a few more ‘gem polishers’ – some on the domain, some in, and some in edu’s and gov’s in foreign countries. Then let them have a play without supervision. (and hopefully they might protect us from another global crisis)

    The fun part, from what i can see, is going to happen when citizens realize that the databases are largely unnecssary, because most of the useful stuff has already been compiled by librarians in the edu space.

    BTW. You can’t

    integrate some of the processes that are already being developed into the way the Gov 2.0 Task Force itself operates

    Not until they are mature at least, and that’s a while off. In the meantime well just have to share the learning.

  3. 2009 November 16

    I’m sure there will be a wash-up (if not I suggest we have a retrospective) now that entries to the Mashup Australia competition have closed. At the last minute as small group of my colleagues entered the competition – we always planned to, but we did our project and client work first and left it till the last minute to get a competition entry in. My reflections on the exercise are that while we have great developers, prototypes and even mature applications available to enable mashups, in the end it’s all about the data.

    You need to have people involved who are thinking about the data and what meaning they can generate from the data
    I think the most powerful combination of people you can have when mashing data is a developer to make the tools, hook into the APIs etc, and present it powerfully, and an analyst or statistician who can help tell a story with the data, or work out how it usefully relates to other things. I observed that the focus of the mashup comp has been towards developers which I thought was interesting. This has maybe resulted from the “just make APIs, don’t make applications” opinion that is one opinion out there. If you have APIs, and then you have to make your own applications, what do you get? 50 NSW crime maps all using slightly different flavours of the same technology.

    Mashing up data requires something to join on
    The most common & interesting data types to join on are spatial or time based. I know there are others. I wanted to work with the DEEWR university stats data but didn’t find anything I could join it to.

    Mappable data is more fun to work with
    Much of the data didn’t have spatial characteristics which is a real shame because I bet it’s stored somewhere. Even postcode data can be useful (and I know the can of worms I just opened!).

    Most of the data needed a lot of work to get it into a mashable format
    I also started to work with the BITRE airline data. It required a lot of formatting to get it into a format acceptable by any program. I ended up abandoning it, again, because I couldn’t find anything interesting to join it with readily available. There was data I could have used from ABS but it would have required too much reformatting.
    My favourite format is what the ABS call ‘csv’ string – a CSV file with categories and then results. It can be easily aggregated and transformed to go into a variety of applications and fits the world I am used to which is relational. However, there are new data sharing formats that are also worthy of consideration such as MDX and SDMX.

    Metadata was missing
    The dataset I worked with, NSW crime data, had cells that were clearly annotated in the past, but I don’t know what the annotations were. I also found an unexplained peak in data that I don’t have an explanation for. This is where formats like SDMX could be interesting because they store metadata too.
    Even the LGAs- I didn’t know which year the data related to. I used the ABS 2006 boundaries and they matched (but the 2004 boundaries didn’t) so I assume it’s a valid match.

    All up, I really enjoyed participating in the mashup competition, and some of the entries are really really good. I think ours was solid, but with more work on the data, and more time, I would have had a much more compelling story to tell with the data. And I know that any analyst or statistician already knows this!



    • 2009 November 16
      Jimi Bostock permalink

      A great set of posts that is well punctuated by Kerry and Jo.

      They are extremely on the scent with the allusion to narrative and story. It is a thought path that will challenge many involved in the brave new world of government. It is going to be hard for people without narrative background and experience to start to think in the ways that Jo and Kerry are alluding to.

      I often like to think about the sorts of jobs that I can imagine springing up as we move forward in the digital revolution. I wonder about these ‘crafters of knowledge’ that I like to call them. People who’s job it is to apply meaning to digital objects. They are going to be strange creatures, ones that we have not yet met.

      In organisations, a question will start to come up on what ‘part’ of the org is going to be ‘in charge’. Is it the IT department, the library, communications, or is the answer all of the above. Anyone on this forum knows the answer, it’s all about convergence.

      Building on what Jo well argues, these new knowledge and meaning crafters will be entirely useless without the key ingrediant – the subject expert.

      I have been often accussed of being just a techie (despite my zero technical skills) but you can be assured I am well aware of the crucial and symbiotic links between the mashers and the subject experts.

      So, not having been at GovHack, I would still say that if the one thing that came from it was that we needed to get the people who can make a story from the ‘data’ in the same room as the wizards, then that is an outstanding, dare I say historical, result.

  4. 2009 November 18

    Yes, people have been talking about the semantic web for 10 years now, but it has not eventuated in force yet because it is elusively difficult. However the tools and processes are now becoming available to set up a community and standards driven domain.

    This has been spear headed by Australia’s CSIRO in the Geospatial domain, and our Australian company have also been involved. I’ve blogged about the process at:

    Cameron Shorter
    GeoSpatial Solutions Manager

Comments are closed.