This site was developed to support the Government 2.0 Taskforce, which operated from June to December 2009. The Government responded to the Government 2.0 Taskforce's report on 3 May 2010. As such, comments are now closed but you are encouraged to continue the conversation at agimo.govspace.gov.au.

Making more government data and information available

2009 August 21
by Ann Steward

How much support does Government need to provide when it releases government data?

This is one of the important areas for the Taskforce to consider and we would like to hear your views and ideas on this.

Metadata plays an important role in understanding the meaning of data, its use and management. But are there other expectations from those who would like to see more data made available, such as:

  • retention specifications that the agency will need to provide at the time of release of data, for example, formats;
  • details of where and when the data will be archived;
  • how long the data is likely to be captured;
  • how complete the datasets are;
  • would it be helpful to have a general policy, covering all government data releases that sets out what support would be provided – for example, contact points for clarification on the data and its sources;
  • role of disclaimers when releasing data and what should they cover;
  • and so on.

In looking beyond just text data, are support regimes considered to be pre requisites, for example, when images are released? And are they the same regimes or is something new needed?

Are there issues that you have encountered, either with data or images that the Taskforce should take into account as we form our recommendations to Government?

18 Responses
  1. 2009 August 21

    Anne, great to have you here.

    I am not sure what you are meaning by support regimes.

    Are you able to elaborate (or perhaps someone around you)

    • 2009 August 24
      Ann Steward permalink

      Just by way of example, would you expect to have a contact point – either online or phone point – to resolve any queries you might have with the data. Or – to see if there were other related data sets that might be released in the near future.

      • 2009 August 24

        Thanks for that Anne, I understand now.

        You point to a very important question in regard to resources.

        I am making a submission today to the TF that speaks to that.

        Hopefully we can now move forward around the submissions and AGIMO, etc responses

        Good to be interacting, it’s a long way from London :)

      • 2009 August 24
        xtfer permalink

        I’d rather have an unsupported data set than wait for a Government Department to “support” it somehow. Additionally, there’s a lot of data in agencies which the agency probably can’t “support” in this sense.

        We should also distinguish between “understanding the data” support, and “how to I access your XML-RPC” type support. They are very different cases.

  2. 2009 August 22
    Kevin Cox permalink

    I do not think it necessary for the task force to make specific technical details on what support or standards are to be used but should establish principles to guide. Examples should be used to illustrate.

    An example set of principles is:

    Metadata will be stored with all data
    Metadata will be available electronically to all who are permitted to access the data
    Metadata definitions can be added, changed and deleted and the history is always available

    What terms are used and what will be kept for different data sets will “evolve” through use. In other words people will tend to use common terms and where different terms are used then thesauri will evolve to cope with the problems. The critical thing is that it is a language that like natural languages is allowed to grow through use.

    What will happen will be that some early adopters will set up some metadata and then others will copy and modify to suit their own purposes. They will tend to evolve towards standards and where they do not agree on standards they will build bridges where necessary so that meaning can be transferred as well.

    • 2009 August 22

      Excellent post Kevin, can’t do a big reply, got to get on with my submission to the TF – how is yours going ?:)

  3. 2009 August 22

    Hi Ann,

    I agree with Kevin that it’s important not to be too rigid and analytical at the start of the process.

    The most successful complex systems evolve from simple systems following simple rules, rather than being devised originally as complex systems.

    Specifically addressing your points:

    # retention specifications that the agency will need to provide at the time of release of data, for example, formats;

    The specifications and structure should be built into the data. This is how machine-readable script languages like XML work.

    Over time if the data is extended or modified the new and changed structures can be added within the data in such a way that it can be disregarded by systems expecting the original data, or these systems can be similarly extended to support the changes.

    # details of where and when the data will be archived;

    Data storage is cheap. Why should data ever be ‘archived’ in the sense of being removed from easy online access? I see the Google philosophy being the appropriate approach strategy for data storage – implement many cheap and redundant systems rather than a few high cost (and expensive to maintain) solutions.

    If necessary (to save storage space on government-leased systems) use cloud-based technologies to host the data and keep a back-up in case of cloud failure.

    By all means archive as well – but keep the data available online. People are still meaningfully using data from the Domesday book and it’s not government’s place to second-guess how long data may be useful.

    # how long the data is likely to be captured;

    Speaking as someone who has reused government data for many years across a number of industries, the basic principle was always to use the most recent data available and extrapolate if necessary. Whether the data was continuing to be captured or not was only relevant for periodic updates or in live systems, which are generally designed with tolerances if the data stops being updated as regularly as expected.

    In any case, the ability to provide data on how long a data set is likely to be captured is only relevant when the capture period is defined. My experience of government has been that many datasets are of variable capture duration, and decisions on how long a data set will be maintained is subject to often quick short-timeframe decision-making. In other words, there’s little warning when many data sets stop being collected.

    # how complete the datasets are;

    Frankly this would not require any complexity to manage within the data itself. If there is missing data, leave the gaps marked with ‘Not Available’. When data is reused the gaps will be managed by the destination system or person.

    For appropriate data sets provide a mechanism for the community to complete them (and correct any incorrect data). The most complete and accurate data sets will always be those that have undergone the most scrutiny by external parties. For example Hansard flaws are often identified first by OpenAustralia and several state geospatial data sets have been enormously enriched by allowing the users to point out errors.

    Having these errors corrected improves its completeness, thereby benefiting ALL users – and costing the government nothing.

    # would it be helpful to have a general policy, covering all government data releases that sets out what support would be provided – for example, contact points for clarification on the data and its sources;

    Only if this doesn’t take years to develop.

    The need for data is almost always RIGHT NOW, therefore any policies that require the government to spend years agreeing on them (such as meta data standards) defeat the purpose of providing the data.

    Better to release early and release often, and correct in progress than to delay any release until an unachievable state of perfection is achieved.

    The only policy that would be useful in this space would be a whole-of-government policy that government departments are required to publish their public data online via their website and data.gov.au in machine readable formats (ie: XML, GEORSS, CSV or via an API) within X days of being collected.

    The users of the data can sort out any differences between data sets much quicker than it would potentially take government to agree on standards (which might be too rigid for future data sets anyway).

    # role of disclaimers when releasing data and what should they cover;

    The role of disclaimers is to remove responsibility from the organisation providing the data to the organisation/individuals using the data. This can be covered in standard terms in widespread use today.

    For example, simply rephrasing the ABS Disclaimer (replacing website with data set) pretty much covers what would be required

    There’s also very good terms at the data.gov site – yes they reflect US law, however let’s not invent any more of the wheel than we have to.

    And the data can be released under one of the standard Creative Commons licenses already legal in Australia. In most cases either the Attribution-Noncommercial 2.5 Australia or the Attribution 2.5 Australia licenses will provide government with the appropriate protection and control over reuse.

    Overall, my view is don’t over-complicate and iterate quickly.

    I do know government is very sensitive about having data taken out of context, misrepresented or misunderstood.

    However all the evidence suggests that this occurs regardless of how many safeguards are put in place – providing more words does not increase understanding (like shouting at someone who does not speak your language).

    Greater safeguards also do not deter those who do not trust the figures or wish to misuse them.

    The only way to prevent these types of misuses are to hide the data altogether – whereupon people will either trust the government less or make up their own figures – even worse situations.

    Cheers,

    Craig

    • 2009 September 2
      asa letourneau permalink

      Overall, my view is don’t over-complicate and iterate quickly.

      Great post Craig! Looking forward to data support provided by web3.0 communities which may be a lot more meaningful and realiseable than that provided by government.

  4. 2009 August 23

    Anne, good thinking here. I agree with Kevin that, particularly given your time lines, it would be best for the Taskforce to come up with principles for data release as opposed to a refined rule set. That might be a job for later on.

    All the usual objections are likely to be raised about data misuse. They are false objections – data is misused now. Better that the license and metadata present for any given data set points to a know, available and definitive set so that if someone chooses to use data in a way that misrepresents it, you always have the option to say, “look, here’s the definitive data set, here’s what the data says in its definitive form”.

    While rights and licensing legalities obviously differ between Australia and the US, they seem to be doing an adequate job with data.gov and the way they’re managing data sets there. It’s an exemplar worth using as context for any further thinking you’re doing.

  5. 2009 August 24

    I have to again agree with SC.

    It is very important to always reflect on the real world and confirm if any issues that are raised in a 2.0 context are also present in how things are now.

    Then that leaves the issue to be ’solved’ in a total context. You can be sure that any 2.0 style of data release, based on XML, will be able to adapt better to any changes in policy than physical methods.

    Anne and her colleagues across the APS are going to have to grapple with the central tenets of the new world, that is learning how to give up control and accept that honesty is actually the only policy that will survive the rogours of the new online world.

    People around Obama have been doing great work in this regard and have also documented well the challenges and how they have met them. As SC has suggested, the Obabama folks see the data.gov as a key part of the effort.

  6. 2009 August 24
    xtfer permalink

    I also do not think its necessary to set out too much at the beginning. That said, there are some crucial enablers that could be recommended, such as:

    1) A license which provides for reuse (such as Creative Commons)
    2) Recommendations for standards to use (such as Atom, RDF, etc)

    Other items such as retention, archiving, completeness, support and so on are going to vary greatly between data sets, or be irrelevant. For example, the concept of archiving data is spurious when that data is available through an API.

    Also, metadata is part of the dataset, if not the dataset itself.

    • 2009 August 24
      xtfer permalink

      While I was writing this, Craig Thomler covered most of my points in a far more concise manner. I agree with everything he said.

  7. 2009 August 24
    Mike Nelson permalink

    We don’t need to reinvent the wheel. The US and UK have already worked through these issues and both have workable (although different) solutions. The US has data.gov and the UK has the National Digital Archive of Datasets.

    I think we are also getting several issues confused. Web 2.0 is social networking. I cannot see the point of getting too hung up on transitory fads like Facebook and Twitter when research shows that most of the content is pointless babble. And, to be perfectly honest, I include some government Web 2.0 services in this.

    What we are talking about here is describing, publishing, using and linking data along with its meaning and context (rich metadata). This is what the Semantic Web (also called Web 3.0) is all about. The semantic layer is the next major step forward and this is where efforts should be focussed NOW, not 2-3 years after the horse has bolted and left us behind. data.gov is already doing just this.

    The tools are already there and have been since at least 1998. We just need to start using them.

  8. 2009 August 24
    Craig thomler permalink

    Hi Ann,

    Regarding your follow up question on support, the OpenAustralia experience is one to learn from. They have been attempting for months to clarify if they can reuse state Hansard data sets and the primary difficulties have been with clarifying the licensing arrangements.

    If licensing is clear upfront and the data is structured using descriptive tags many potential data queries would not need to occur.

    If other words, being clear means saving governments money in having public servants explain individually each time an explanation is requested.

    It could be seen as having the equivalent of a knowledgebase rather than constantly training contact staff.

  9. 2009 August 25

    Hi Ann,

    I agree with Craig on this the only real thing blocking publishing data at this time appears to the percieved licensing issues. There are many agencies who have overcome this issue and have been openly publishing data sets for years.

    Some examples of these are listed here:
    Current available spacial data sets
    ABS data sets
    Victorian Emergency Services data sets
    Climate data sets
    Environmental and heritage data sets
    Education data sets

    And that is just a few of them. I would suggest talking to the Agencies above as well as Open Australia and solving the licensing misconception is the first big step.

    The W3C publish and maintain the standards for the XML Schema format for data services here is a link to those standards. The XML schema language is well defined, clear and easy to use.

    Cheers,

    Rae

  10. 2009 August 25
    Ann Steward permalink

    Thank you all for your comments, you have again raised some very important points for the Taskforce to consider. It is particularly useful to have your ideas about what could and should happen sooner
    (for example Stephen’s suggestion of Taskforce developed principles being of more immediate benefit than a refined ruleset).

    The longer term considerations are no less critical, and these will also of course stay before the Taskforce as we review the submissions, commission some of the project ideas and work on the report to Government.

    • 2009 August 25
      Mike Nelson permalink

      One practical suggestion on something that could (and should) happen sooner rather than later is to trial a version of data.gov in Australia. There are already publicly available data sets which could be used in such a trial. Just put the data sets and metadata out there and see what happens.

  11. 2009 August 26

    Information about companies – particularly those in which Superannuation funds invest, needs to be made freely available, to the public, in a co-ordinated way across the regulators – ASIC, APRA, ASX etc.

    We need a continous disclosure framework for government funded programs and infrastructure projects.

Comments are closed.