This site was developed to support the Government 2.0 Taskforce, which operated from June to December 2009. The Government responded to the Government 2.0 Taskforce's report on 3 May 2010. As such, comments are now closed but you are encouraged to continue the conversation at agimo.govspace.gov.au.

Data.gov and lessons from the open-source world

2009 August 26
tags:
by Alan Noble

A previous blog bost talked about what data government departments should be releasing. In this post I like to talk about how to release it.

One approach is to centralise things. For example, the US Government has established the Data.gov web site with the stated purpose of “increasing public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government [of the US]”. The UK Government is currently considering a similar approach. The goals are commendable, but in a sense, Data.gov adopts a traditional “Web 1.0” approach to the challenge of increasing access to public sector information (PSI). To use an analogy drawn from Eric Raymond’s “The cathedral and the bazaar”, Data.gov can be thought of as a data “cathedral”, which is to say a huge, ambitious, centralised undertaking.

Another approach is decentralised, and would be modeled on a “bazaar”. In this approach, government web sites scattered around the Internet would utilise Web 2.0 technologies to provide data in both human and machine readable data and metadata formats.

In “The cathedral and the bazaar” Eric Raymond was of course describing software development. However if you replace “code” with “data”, and “developer” with “author” the same principles apply, namely:

  • Users should be treated as co-authors, because having more co-authors increases the rate at which the data evolves and improves, i.e., user-generated content (UGC) plays a key role.
  • Release as early as possible, because this increases one’s chances of finding co-authors early and stimulates innovative uses of the data.
  • New data should be incorporated frequently, because, as above, this maximises the rate of innovation and avoids the cost of “big bang” style integration.
  • There should be multiple versions of data sets, a newer version that is known to be of lower quality, and an older version that is stable and higher quality.
  • Data sets should be highly modular, because this allows for parallel and incremental development.

The bazaar approach is flexible and economical and supports evolutionary change. It enables different government agencies to move at different speeds to open up public sector information, one data set at a time.

What about data discoverability, you may say? Doesn’t a data portal make it easier? Well, users don’t expect to find all their news and entertainment at a single web site, so why would they expect to find all of their data at a single web site? The trick is to ensure that government web sites are discoverable and searchable, both technically (through open robots.txt, site maps, etc) and legally (through friendly copyright, provisions such as Creative Commons, etc).

Of course it’s not an either/or scenario. Data cathedrals can coexist with data bazaars, and perhaps different data sets are best served in different ways. A related question is whetherPSI platforms should be government operated at all though, or instead left to the private sector or non-profit organizations.

What do you think? Should government departments embrace some of the principles of the open-source world in order to liberate public sector information?

71 Responses
  1. 2009 August 26

    Great to read your post Alan. When I saw you were on the Gov 2.0 Taskforce I was wondering where on Earth would you get time to blog about this stuff!

    After reading about the “Bazaar” approach – it certainly echoes the evolution Internet. Decentralised, loosely coupled sites and services mashed up together. In contrast, you raise a good point about discoverability- and the Data.gov approach does a good job at that right now.

    As you suggest, its “not an either/or scenario” and in fact, I would say the best solution would take both approaches. If we go too far down the Data.gov approach, we rely solely in the hands of “official” government approaches, timings, costs and limitations. It’s not that they won’t do a good job, but rather could do a much better job by allowing the principles of the open web, and as you say “open source-world in order to liberate public sector information”.

    Take, for example, OpenAustralia.org. A simple service that re-publishes the Hansard in an enriched, searchable and interactive format. A classic example of the effect of “Data cathedrals can coexisting with data bazaars” as you say. OpenAustrlia.org now has an API, which in turn will allow even further open government approach.

    • 2009 August 26
      asa letourneau permalink

      Totaly agree that it has to be a bit of both worlds whereby non-gov communities generate data sharing/application opportunities in a way only they can do and if they see a real opportuntity for enagagement and leverage ‘re-release’ the data back to govt. Nurturing these informal sharing ‘loops’ alongside more traditional govt ‘initiatives’ will hopefully throw some new light on old landscapes and provide the impetus to explore new ones.

    • 2010 April 26
      Madeleine Kingston permalink

      Sherif

      I find myself on this page. Just as I was about to log out, I saw your comment to Alan’s article.

      People who are the busiest always find the time. The key seems to be motivation. Such a complex issue and so misunderstood and under-categorized.

      No wonder consumers and markets are mis-assessed. Most of us don’t quite fit into the boxes created for us.

      There is such a dearth of available economic data both in terms of behaviour on all sides of the fence and of market functioning.

      Those who do have data hide it under the proverbial bushel; or claim copyright and intellectual privileges, or don’t really want to share and collaborate. People speal of “information philanthropy.” That has a do-gooding flavour.

      I took Bunsen to task on Club Troppo for labelling people as “do-gooders”, though I must say I walked into the spider’s web without thinking, so deserved what I got, not without a small protest to clarify. Where are you Bunsen? I miss your teasing.

      The marketplace is changing daily. Consumers are becoming more discerning and sophisticated. The 21st century’s innovations and forward thinking pressures are upon us all.

      If we want to keep up we just have to find the time, opportunity and motivation, not necessarily in that order.

      Thank you Alan for speaking of how not what should be released.

      At a deeper level let us look at how collaborations should really work, not just how information exchange can occur.

      Gov2 implies embedding the deeper meaning – nay it promises to do so. Web2 is more about information availability. Though the two are closely linked they mean different things.

      Bring on the deeper meaning.

      Madeleine

      • 2010 April 28
        Madeleine Kingston permalink

        I felt so sure the blog above would be moderated, given my recent experiences of experimenting with openness, trust and moderation expectations in one blog forum or another. Don’t let me down. My standards are high. I walk away from negative relationships – always.

        As mentioned before, I have had some negative experiences lately of blogging. Does this phase me? No. What are relationships without trust and tolerance of alternative viewpoints? If it is not there, move on.

        Make word count and moderation policies clear from the outset. Don’t waste the time of committed stakeholders.

        Does it matter that I am forming relationships with “Faceless Bureaucrats.” No. (stand up designer of the graphic of that article – my feedback is really positive – just love it).

        Good to know that one can name pseudonyms, speak from the heart, make spelling mistakes, talk about the underlying meaning and purpose of Gov 2 (as opposed to Web2 – as in timely information provision in the moment).

        Is Gov2 up to a few mistakes? Is it up to being honest. Engaging with stakeholders in a meaningful way for the long haul? If so, I am will still be around. If not, I will take the view that it was fun whilst it lasted, and move on to greener pastures.

        Gov 2 Taskforce if you are there, and notwithstanding my half-teasing comments about your Faceless Persona (all bets off if you misinterpret this remark)

        Hear me say this – speak up, let us know where things are at and whether the identified distinction between Web2 and Gov2 will be addressed in the interests of the improved governance and policy decisions that impact on the fundamentals of sustainability in every possible sphere.

        Look for timely identification of problems in these arenas before they become entrenched. Seek application of layered participation. Identify your target market, but don’t neglect other more incidental and short-term engagement prospects.

        I may joke, play innocent’ use escapist jargon; make obsure literary references; flatter; cajole; gesticulate; seek guidance with technical gaps in both accessing and providing data.

        Bottom line:

        I am seeking a reciprocal and sustained dialogue – and adoption of improved national governance and policy. I am not concerned about how this achieved, or what the budgetary constraints are; or what the delays may be. I just want to see this goal achieved.

        Adam Smith – I bow to you as a legend. But times move on. I will never forget you.

        Steve McShane and Tony Travaglione – your Organizational Behaviour on the Pacfic Rim was and is an inspiration to me. Splendidly presented. But it is three years since it was published. We need an update.

        Already Maslow’s Hierarchy of Needs has been challenged – by McShane and Travaglione (2007)newcomers to the scene of Organizational Behaviour (on the Pacific Rim) 2007.

        What else should be questioned in terms of assessing human needs and expectations? It would be too smug to cite the favourite theorists of the past.

        Move with the times. Theorize in the here and now. Be prepared to reassess, re-evaluate, evaluate again – leave complacency behind. There is no room for it.

        Don’t be afraid to seek professional evaluative advice. This means before not after the event. Evaluation is a process not an assessment of what has already occurred

        I am looking for implementation of the deeper embedded message behind engagement and information exchange. I will be patient but also realistic – and I will be looking for deliverable and measurable outcomes that are timely and accountable.

        I am not in the right demographic category for your target market at all if you are looking for superficial “in-the-moment” information provision.

        I need and expect more – much more. Can you deliver it?

        Don’t box me or anyone else in your target audience. Each of us is unique. Acknowledge that, respect it and allow for it.

        Behavioural economics has for decades been under-evaluated and misunderstood, besides being an inexact science. That is why we cannot get assessments of consumer behaviour and market performance correctly assessed.

        Have a look, for instance as the history of poorly assessed competitive markets in the energy erena, as one example of low-involvement commodity markets for which “rapid churn” has been mistaken for competition in action. Go back four decades and more. Learn from the lesson of recurrent history.

        In 1979 Peter Applegarth, then Executive Member of the Queensland Council for Civil Liberties said:

        “The Government’s actions are motivated by fear.
        Fear that citizens will begin to tell the Government what the law should be, instead of the Government telling the citizens what the law.”

        “All power is a trust handed to Government by the people. Any other power is usurpation.”

        Now in the year 2008, Government initiatives are seeking to receive input from stakeholders adversely affected by regulations as evidenced by the philosophies embraced by the Productivity Commission’s Inquiry in Australia’s Consumer Policy Framework. There is a dearth of consumer input into enquiries such as this.

        There are cautions about the tactical shift by industry groups, home and abroad and pertinent questions as to whether such a shift is motivated by a confluence of self-interest. In the area of goods, it is easy to say that growing competition from inexpensive imports that do not meet voluntary standards and a desire to head off liability lawsuits and pre-empt tough state laws or legal actions that may have resulted from a laissez-faire response to policies in place.

        One interesting US example is the case of the Altria group, owner of the cigarette manufacturing firm Phillip Morris. The unexpected proposal was made by that group to allow the F.D.A. to regulate the manufacture and marketing of tobacco products.

        Such legislation is pending in the US. Critics are saying that this is a bid by Phillip Morris to weaken opposition to cigarettes by working with the government, and could help the company maintain its market share.

        Reducing regulatory burden is a long-time goal of the Productivity Commission in Australia as well as of other bodies. It is commendable if the outcomes for all concerned are equitable.

        The energy industry in Australia appears to be super-enthusiastic about the changes proposed putting forward well-structured and plausible arguments in the interest of least burdensome regulatory control. What will be the consequences for consumers?

        Rosario Palmieri, a regulatory lobbyist at the US National Association of Manufacturers, a body that has often opposed government regulations, is reported as observing the change with equanimity.

        The Director of Regulatory Policy OMB Watch (Office of Management and Budget) of the Washington group that tracks regulatory actions has never seen so many industries joining the push for regulation. He poses a pertinent question: will this achieve a real increase in standards and public protections or simply serve corporate interests?

        Of the US situation Sarah Klein, a lawyer at the Centre for Science in the Public Interest is seeking to examine the problems created by a failed voluntary system in the grocery store and produce grower segment.

        Ms Klein sees the situation as a strange bedfellow one where community organizations and watchdogs are putting into place national regulatory frameworks for quite different reasons to those of industry players. Says Klein:

        “……industry officials consumer groups and regulatory experts all agree there has been a recent surge of requests for new regulations and one reason they give is the Bush administration’s willingness to include provisions that would block consumer lawsuits in state and federal courts.”

        It is more than interesting that some of this thinking is reflected in the conceptual model proposed by Allens Arthur Robinson in the Composite Working Paper National Framework for Distribution and Retail Regulation recommendations (proposed national template Law, energy).

        Some are saying that it is like Christmas in particular industries.

        However, many clauses are being challenged in the US courts where they block the inherent right of individuals to seek seamless redress through the courts and are not theoretically expected to rely on advocacy and alternative dispute models alone.

        In the New York Times Opinion article dated 16 September 2007, still on the subject of uniform regulation and in the case of toys, for example, mandatory testing is believed to be a good idea in principle. However, it is observed that

        Each of us has needs and expectations that cannot just be boxed, categorized and subjected to the usual processes of statistical evaluation. Did Adam Smikth know every thing? How about Maslow?

        Read my tips on evaluation theory posted elsewhere on Gov2 and Club Troppo. Long-winded, I confess, but worthy of at least a second glance. Those were not my own ideas as freely confessed.

        They were the views of those whose expertise and writings have made me what I am by the mere existence of the availability of those views.

        I owe so much to so many who by there mere existence of what they have written and made accessible have influence the whole course of my life. They are responsbile for the passion that motivates me. There are too many to mention.

        Make no mistake. Some revered theorists are about to be challenged. How can anyone place people into categories simply to uphold outdated theories of consumer behaviour and market conduct.

        The 21st century has arrived.

        Consumers are just not what they used to be.

        My fly. Duty calls.

        Catch me if you can. I too am a ‘Faceless Stakeholder,: the counterpart of the “Faceless Bureaucrat.”

        I don’t fit in a box and nor do any of your stakeholders.

        Forget about traditional marketing concept theories. Treat each of your stakeholders as individuals. The era of labelling people is outdated. Target, target, target is as to marketing theory as location, location, location is to propety.

        How about it guys? Capture the moment, but think sustainability?

        Walk in the shoes of your target audience and you will achieve.

        Make your goals sustainable, so say nothing of specific, measurable, achievable, realistic and simly, the traditional SMART principles of good marketing theory. Some of those old acronyms do survive the test of time.

        Chose carefully swhat says and what goes. Not all theories ahd acronyms are sustainable. Toss out the textbooks that have not meaured up and may need to be revised.

        Oops, my coffee break is over. I am say behind.

        Regards

        Madeleine Kingston

        mkin2711@bigpond.net.au

  2. 2009 August 26
    Bec permalink

    agree, take both approaches…

    centralised would be useful for web teams to see examples of what’s available etc,
    decentralised may be better for viewers.

  3. 2009 August 26
    Alexander Sadleir permalink

    Of course as with open-source development, there’s the option that if you provide a cathedral model when the users want a bazaar, they’ll start their own. This is also a “risk” in the sense that if you don’t provide a way to involve users, they can’t share their refinements back so the original data holders are missing out on the added value

    For example, the community for the OpenStreetMap project in the UK is incorporating the National Public Transport Access Node (NaPTAN) database. As the OSM community starts incorporating and checking the data, they are also improving it by finding “ghost stops” – locations that were once in existance/service but no longer are. Luckily, there is planning underway to make sure these (and any future) improvements are easily accessable to NaPTAN maintainers. This will be a huge benefit to both parties!

  4. 2009 August 26

    Alan, I agree with everything you’re saying. It’s very encouraging to see this kind of position coming from the Taskforce.

    I’m personally a fan of cathedral+bazaar=better than any one single option.

    Here in Australia, we could take an approach of offering a cathedral where datasets are aggregated in collections based on some thematic metadata (for those that are cathedreal thinkers) as well as a bazaar where the latest and greatest, including added value from the co-creators was available (for the bazaar thinkers).

    Offering any one option imposes a challenge for those that think the other way around. It’s much the same approach as you’re seeing at a lot of conferences now – mixed events where speakers offer punditry from the stage (cathedral) as well as unconference elements (bazaar).

    The cathedral needn’t be a huge undertaking, nor necessarily an actual repository, but merely an aggregator of locations of datasets and a place to see groupings of subject matter matches. Think of it more like a Bittorrent tracker for PSI – a valid rather than illegal use of the technology.

    What do you think?

  5. 2009 August 26

    You need an index, like Data.gov, but you also need it to be trivial to update.

    I’d suggest:

    Day 1: A wiki page page, chock full of links, plus one or two interested folk to help garden it – ie: http://community.linkeddata.org/MediaWiki/index.php?ShoppingList

    * Low cost / setup effort
    * Small ongoing resource required – community participation possible

    Day 30:
    Encourage government data providers with blogs or twitter accounts to make use of a common tag: #gov20au or similar when announcing data sets.

    Day 45:
    Create a mailing list – datasets@australia.gov.au or a google group

    This works well on the public linked data mailing list (http://lists.w3.org/Archives/Public/public-lod/), as well as the sunlight labs mailing lists.

    Day 90: Encourage the use of Sitemaps (ala google’s or http://www.sitemaps.org/) ; plus where appropriate, Semantic Web descriptions – http://sw.deri.org/2007/07/sitemapextension/

    Day 365: Have a crawler which indexes all known government sites and produces the wiki link page, and diagrams, etc – ie http://linkeddata.org/ or http://pingthesemanticweb.com/

    Less tedious human work, more machines just solving problems for you.

    For datasets which are published as RDF, a lot of this already exists.

    • 2009 August 27
      Martin Stewart-Weeks permalink

      I can barely understand some of the technology in this, but it has a compelling architecture about it that seems to ‘go with the grain’. A judicious mix of clever machines and a huge, broad openness to the crowd…

  6. 2009 August 26

    @Alan – absolutely spot on!

  7. 2009 August 26

    My biggest worry is that the government’s response to this initiative will be the announcement of some $multi-million grand-gesture. Big press-conference; Minister announcing the ‘grand vision’; and the possible benefits we could see, lost in the maze that is large-scale government procurement.

    The key insight of CatB is the extent to which redundancy is a benefit in exploratory development.

    For the moment, we have no idea of the correct model for Gov2.0 – we have some understanding of what has worked outside of government, and a few promising avenues of approach, but no actual answers.

    So I think we want to recommend that different Agencies experiment with different approaches and that the OIC be tasked with:

    1. Examining the success/failure of the different attempts, and eventually start to help agencies improve the success rate.

    2. Ensuring that the legal and regulatory requirements for aggregation and interoperability of/between these different services is standardised, as these are the issues that will derail bazaar development.

    3. Acting as a central clearing house where agencies/organisations/individuals can choose to self-publish interface descriptions, custom schema, metadata element schemes, vocabularies etc.

    4. Providing a collaboration and mediation service to allow the reconciliation of conflicting interface/schema/scheme/vocab’s.

    The result would hopefully be a myriad of exploratory projects, some of which would fail, most of which would be ho-hum, but many of which would succeed.

    The OIC would act as an institutional memory, learning and recording the lessons learnt; an institutional coordinator, making sure that people wanting to aggregate/integrate the different data-sources aren’t forbidden from doing so; and an institutional mediator, assisting the different projects in finding and working together when they would like to.

    • 2009 August 29
      Matt permalink

      My biggest worry is that the government’s response to this initiative will be the announcement of some $multi-million grand-gesture. Big press-conference; Minister announcing the ‘grand vision’; and the possible benefits we could see, lost in the maze that is large-scale government procurement.

      Too right. Publish a few crap datasets and get the trumpets out. Never to be heard of again.

      I say, give money to an independant non-profit, and get the community involved. Opensource the design and mandate the data sources to publish.
      Use funding incentives if that helps, departments who embrace being open with there data, get preference for funding.
      All new budgets must include a Open Data component.
      All new projects must define what data will be published, before they get funding.
      etc

    • 2010 April 26
      Madeleine Kingston permalink

      Andrae

      Your concerns are valid and deserve exploration. The ‘i’s and the ‘t’s do need to be dotted. The legalities need to be nutted out.

      But the prospect of using data for the purposes of “lessons learnt” is compelling and needs consideration. To say nothing of transparency.

      Gov2 is in its infancy. No-one can predict the shortcomings of innovative ideas yet. It is OK to make some mistakes, admit to them, get back on the horse and hold one’s head high. (Voice of experience here).

      Let us take a few risks. People are saying Australia is 20 years behind and risk-averse. Could they be right?

      Find a way round technicalities, legalities and blockades.

      Brick walls never phase the courageous. But honesty is a must. People will see through hype and gauze. Grand gestures won’t work or fool the discerning public.

      Whose game amongst us? (notice the Royal or academic we, pardon the slip force of habit prevails).

      Let’s do it.

      Make this happen.

      Madeleine

  8. 2009 August 26

    A data.gov.au here could act as either a data “Market Place”, a la the Android Market or the Apple App Store, or otherwise as a “data forge” like Sourceforge or Google Code does for open source applications.

    This could then point to the canonical data source from data.gov.au and also foster community development of applications or use of the data. To use the OpenAustralia example Sherif raised before, data.gov.au would point to the source data at data.aph.gov.au (when it exists) but would also list OpenAustralia.org as a user of this data and provide details about the project.

    Henare

    • 2009 August 27
      Martin Stewart-Weeks permalink

      Canonical data source? Only from a cathedral, presumably? Is it the case that even in determining, monitoring and correcting the ‘canon’, the cathedral will, at least sometimes, need the bazaar? In other words, who keeps the canonical sources reliable?

  9. 2009 August 26

    Alan

    The bazaar approach sounds very appealing. I could imagine that the creation of a central portal from the outset would suffer from fatal compromises being forced out of a gargantuan committee (not because it’s government related necessarily, just because of the number of entities with a dog in the fight). From my experience, a centrally ordained design is likely also to foster resistance out of the “not invented here” syndrome (again, a common affliction in both government and private sector entities).

    The bazaar approach would have the capacity to minimise multilateral tensions and create an informal competition between the entities involved. That way, each group is likely to use different approaches, achieve different types of accomplishments, make different mistakes and establish different learnings and relationships. Widely and constructively shared experience could hasten development and nurture imaginations.

    It seems to me that the extent to which data is adopted and the social / economic / ecological value of those applications demonstrated, could be the basis of a Prime Minister’s Cup. Entrants would need to consist jointly of the provider and the user of the data. A sense of fun, reward, competition and high profile acknowledgement would, in my view, be a real and powerful motivator to gather the best ideas and establish a parity of worth among all of those with something to contribute.

    All the best
    Tony Cutcliffe

  10. 2009 August 26

    I agree with those suggesting both approaches.

    There needs to be a central way to find any government data (and a regular audit of what data government is providing).

    This central data site can be a portal to the feeds offered by various agencies – let’s call it data.gov.au…

    The actual datafeeds would be delivered from individual departmental sites – which is appropriate as the data collected is the responsibility of the agencies, not the body managing data.gov.au.

    It should be extremely easy for agencies to add and update feeds and APIs presented via data.gov.au.

    And data.gov.au should be developed to provide analysis tools that can be reused on any of the data feeds and APIs it has collated…then represented back on agency sites.

    This would minimising the need of agencies such as the ABS to develop their own online data display and mining tools (see http://betaworks.abs.gov.au/betaworks/betaworks.nsf/index.html).

    Which agency should manage data.gov.au?

    I’d suggest there is a strong case that the ABS should, as the main repository of government data and having the expertise in data management and mining.

  11. 2009 August 26
    Dan permalink

    I actually see data.gov.au as the cathedral, warehouseing the most recent “official” or “validated” datasets, and it then farming out links to the more up to date, less accurate, perhaps more influx version that is available at the primary bazaar source..

  12. 2009 August 26

    The cathedral approach works when people are eager to contribute and you’re trying to make it easy for them to do so: “Just put it anywhere and we’ll find it!” But the problem that Data.gov is trying to solve (as I understand it) is different. The contributors aren’t eager to contribute. If they weren’t required to put their data out into the public, they wouldn’t, or at least they would take forever to get around to it because sharing its data is not the primary goal of any government agency. So, data.gov makes it easy for agencies to dump their data off. The site also happens to make it easy for the public to find that data, and makes it easy for mashups to be written (including the gov’t’s own “Data Dashboard”), but as you point out, we could do that just about as easily with a bazaar.

    So, while it’s impossible to oppose a “cathedral AND bazaar” approach here — more is better than less — I think in this case there’s good reason to say that the gov’t ought to be building a cathedral, and let the bazaar take care of itself.

    I think.

    • 2009 August 26

      (Ack, I meant: “The BAZAAR approach” in the first sentence of my comment two above. Sigh.)

  13. 2009 August 26

    Go for RDF and Linked Data.
    It is possible to automatically map existing relational databases to RDF using tools such as D2RQ — you get a data API and a SPARQL query endpoint.

  14. 2009 August 26
    asa letourneau permalink

    okay…finally got a look and like what I see. Check out David Osimo’s presentation.

    • 2009 August 27
      Martin Stewart-Weeks permalink

      For those who don’t click on the link – and yes, very apt and timely advice:

      “From my side, I added something new in my presentation. I added some key recommendations for government:

      1: DO NO HARM

      o don’t hyper-protect public data from re-use
      o don’t launch large scale “facade” web2.0 project
      o don’t forbid web 2.0 in the workplace
      o let bottom-up initiatives flourish as barriers to entry are very low
      2: ENABLE OTHERS TO DO

      o publish reusable and machine readable data (XML, RSS, RDFa) > see W3C work
      o adopt web-oriented architecture
      o create a public data catalogue > see Washington DC
      3: ACTIVELY PROMOTE

      o ensure pervasive broadband
      o create e-skills in and outside government: digital literacy, media literacy, web2.0 literacy, programming skills
      o fund bottom-up initiatives through public procurement, awards
      o reach out trough key intermediaries trusted by the community
      o listen, experiment and learn-by-doing”

      Other parts of this blog have noted how emotionally, professionally, technically and cultural confronting phrases like “listen, experiment and learn-by-doing” and “don’t forbif web 2.0 in the workplace” can appear when they get translated into a real live government agency.

      It’s rapidly becoming clear that more lists of things to do is not the issue. The issue is how we cross the chasm, as they say, to a place where there is some chance to putting them into action.

  15. 2009 August 26

    The suggestion from @Martynas for using RDF and SPARQL is a good one, the point being that if ALL the datasources were SPARQLable, then the Cathedral would simply be a SPARQL query engine… a single API if you like. But thats possibly asking too much, especially to begin with.

    I agree with those advocating a Cathedral+Bazaar model. In fact, I would expect that the Cathedral model would spawn the Bazaar model anyway, but I’d be inclined to build the Cathedral just so the reticent community members came along (and I’m not mentioning any names but I already know of some departments who are having trouble with some of these ideas).

    Nobodies discussing the AGLS Metadata elephant in the corner though…

    • 2009 August 27
      Gordon Grace permalink

      Do you mean this ‘elephant’ from the National Archives?

      <meta name="DC.Type.documentType"scheme="agls-document" content="dataset">

      Structured information encoded in lists, tables, databases, etc, which will normally be in a format available for direct machine processing (eg spreadsheets, databases, GIS data, MIDI data), data may be numeric, spatial, spectral, statistical or structured text (including bibliographic data and database reports)

      • 2009 August 27

        Yep. Thats the elephant.

      • 2009 August 27

        As someone who has actually read both AGLS and AGRkMS cover-to-cover multiple times, I’m afraid I still don’t quite see the elephant… could you please elaborate?

      • 2009 August 27
        Gordon Grace permalink

        I think xtfer might be hinting that a discovery (and subsequent aggregation) method may have existed since at least 2002.

        1. Agencies publish datasets online
        2. Agencies use DC.Type.doctype to describe the online resources as datasets (via the agls-document schema)
        3. Someone (anyone?) harvests said metadata from *.gov.au domains to automagically produce “A List of Australian Government Datasets&153;” (either as a feed, a standalone website, or something else equally able to feed the ‘Raw data now’ beast).
        4. ???
        5. Profit!

      • 2009 August 28

        Spot on Gordon. AGLS has been kicking around for some time and seems to have been dropped from the conversation. The big problem with AGLS is that, as a way of describing data in a machine readable form that humans can interpret, its not bad, but as a way of describing data in a machine readable form thats consistent and useful for aggregating, its not so great (the 2008 version is a big improvement – but most agencies are still using the legacy version, if at all).

        So, as I see it, the problem arises at (3) currently. Its not worth anyones time harvesting AGLS currently, as where it is implemented its often useless anyway. Try browsing a few department websites and browsing the AGLS in view source… See if you can actually derive much useful information beyond some generic site wide concepts?

        One major issue with AGLS currently is that its mostly ignored when used as HTML metadata. RDF may be AGLS’ saviour in this regard.

        The other big problem is that its poorly documented. The current version of AGLS is from 2008, but all of the documentation (except for the element list at agls.gov.au) is from 2006 or earlier. The National Library websites resources are almost all out of date, and agls.gov.au is missing lots of documentation. Government agencies are currently required to jump through the AGLS hoop, for little real benefit, so why not leverage that.

    • 2009 August 27

      @xtfer I’d agree re SPARQL endpoints, but I think realistically, getting all government agencies producing RDF / hosting endpoints is a bit much to hope for.

      I think it’s a great idea for a end goal; but a more immediate target of “links to loads of CSV” that can be consumed and RDFized might be a tad more practical – much easier to demonstrate the payoff to your Average Developer ™

      • 2009 August 27

        You are, unfortunately, quite right.

        On the other hand, to enable querying of a database via an API of some sort is not particularly difficult, and if that database is updated regularly (weather information, for example), the initial investment in the API may outweigh the ongoing requirements of managing a mounting pile of CSV files.

        I can see some technical challenges here though:

        1) How do agencies control access to data sets?

        Should authentication against public APIs be completely open, or should individuals register to access them (in the same way you must when using the Twitter or Facebook APIs). Who controls the authentication? Is that a job for data.gov.au… distributed API authentication using OAuth, for example.

        2) Should agencies be required to standardise data formats?

        If I have to grab JSON from one location, YAML for a second, and CSV from a third, the technical bar to participation has been raised considerably. Should we require data in a certain format(s)? Even if we use RDF as that format, should it be in XML, N3, Turtle, JSON, N-Triples? AARGH!

        3) How do we help agencies who are required or desire to open their data sets, when that data is held is Excel spreadsheets, secure databases, or other largely inaccessible locations?

        Should data.gov.au provide guidelines for doing this or should agencies be left to their own devices?

        4) How do you reconcile the use of existing Schemas such as AGLS, AGIFT or similar?

        I dont have answers to these problems, Im just putting it out there…

      • 2009 August 27

        I’d vote to KISS:

        1) How do agencies control access to data sets?
        For round one, public, unfettered access to data, ‘controlled’ by the appropriate licences – http://www.gilf.gov.au/
        When the question of ‘control’ comes up, ignore it, and release as much data as you can that doesn’t need controlling.

        For round two, APIs and API keys – ala web 2.0 style – mainly for data which might be a touch less public.

        This already works reasonably well with assorted Title search services (usernames/passwords to get to data), the ABR register uses API keys; etc. It has a higher cost to implement (even if it’s just HTTP Basic authentication); so IMO, its the kind of thing you ignore when you can output a higher volume of valuable data that is unincumbered.

        If I were running it:
        Rule 1: Raw Data Now!
        Rule 2: Requires control? Skip it for the moment, refer to rule #1

        2) Should agencies be required to standardise data formats?
        I’d vote no, beyond a mandate of at least “CSV with Identifiers” (ie; can be rdfized easily) for bulk data.

        Also; documenting a preference for individual URIs that return a human readable description of a Thing.

        Add RDFa in XHTML or RDF/XML as appropriate ontop of that; with AGLS, FOAF, VCard and DublinCore worked in to spruce it up.

        Add a JSON representation if you like – ie, you want to see Mashups (as opposed to meshups).

        I would vote against YAML, and non RDF/XML serializations.

        If site X only provides format Y, then people will step up and produce a mapping as required.
        For example; http://triplr.org/ or http://dbtune.org/last-fm/

        3) How do we help agencies?
        Guidelines, documentation, howtos; absolute musts!
        If there are scripts, common code, etc; those should be published publically too.

        If agencies have bulk data and no capacity/knowledge to publish it robustly (ie, RDFize it); commerical entities and community involvement can do so.

        The first steps they need to get to: Use an FTP client to publish a CSV; and put an email address in the dataset / be part of a mailing list where the community can reach them.
        The web will help take care of the rest.

        OpenAustralia is a great example; and I know there’s a wealth of commerical entities around who exist primarly to transform data from A to B.

        Another example: You look at wikipedia / freebase – loads of community involvement which results in useful data.

      • 2009 August 27

        There’s some good suggestions in there!

        On the issue of (1), i wasn’t referring to “Control” as in licencing or availability – I agree with your points on that issue, but rather how to manage access from a technical perspective.

        For example, open slather on access vs. account-based API keys.

      • 2009 August 27

        As an Software Engineer who has spent the past 5 years specialising in RDF I think it might be worthwhile to clarify a few definitions.

        RDF is a semantic data-model – not a format. The key aspects are defining a stable one-to-one mapping between identifiers and URIs; the identification of attributes/properties by identifiers; and the mapping of data onto the XSchema-Datatypes data-type model (although note, this does include custom datatypes defined compliant with the model).

        AGLS, AGRkMS, DCMI, FOAF, vCard, iCal, SODI, DOPE, RDFS, SKOS, etc, etc, etc, are not data formats they are “Metadata Element Sets”, and all have mappings onto the RDF data-model; but equally can be mapped onto the Relational model, or pretty much any other data-model you care to mention.

        XML, Json, YAML, N3, Turtle, N-Triples, and CSV are all syntaxes, which can all be used to encode data in a (or oftentimes several) data-model(s) for transmission. However, it is important to understand that these standards only materialise meaning denoted within a semantic data-model (RDF, Relational, Topic-Map, FoL, etc).

        So to summarise my position on this:

        – RDF is an excellent data-model for data-interchange.

        – Ensuring reuse of existing Element Sets makes the data much more useful – of course AGLS, et al. tend to be mandated by government recordkeeping regulations already, so this shouldn’t be hard.

        – We gain far more value agreeing on a standard denotational semantic model than syntax.

        Note: This was also the confusion at the Brisbane Forum, where a question was asked regarding the need to address (connotational) semantics, and the answer was given with respect to Excel/PDF (syntax). Some of the comments I have been reading seem to indicate a similar level of confusion; although between denotation and syntax (for data-interchange, connotation is by and large out-of-scope).

  16. 2009 August 27
    Kevin Cox permalink

    Alan your approach is excellent.

    However, like Andrae Muys, I am worried about it being hijacked by data.gov.au . The best approach is to make data.gov.au compete with everyone else for eyeballs. If someone wants a cathedral then let them build it and compete with other places for their own congregations and tourists.

    So how is the best way to stop it being hijacked? The best way is not to put it up as an option. As soon as you suggest that there be one data.gov.au the natural tendency will be for everyone to wait for data.gov.au to do something and we will not even get alternatives arising because the controllers of data.gov.au will inevitably try to control all access under the excuse of cyber terrorism or some such scare tactic.

    It will be done for the most worthy of reasons but it will happen because that is the way organisations are structured. An organisation be it a government organisation or a private one or a not for profit has as its main objective its own survival and it will do things that will ensure that that happens and it will compromise other objectives to ensure survival. (Even the Task Force except it has a suicide pill it must take when it delivers it report)

    So the best way of avoiding data.gov.au controlling and directing is to have every organisation and person – including the government – on an equal footing and let the best survive.

    I would recommend strongly against planning for any cathedral to be included in the master plan but allow cathedrals to arise “spontaneously” from the rubble of Web 1.0

    • 2009 August 27

      @Kevin Cox: You are assuming that Government agencies will actually engage with the ideas of open data. Experience suggests that unless agencies are given a REQUIREMENT to do something (not just the permission to do it), they wont make any moves for fear of making a mistake, or simply because the internal bureaucracy prevents it. Under those circumstances, there wont be a cathedral “emerging” from any government source… and then you are really talking about the Bazaar model.

      I think there are also some assumptions being made about the form that the Cathedral might take, which reflect a rigid, cataloging approach not really relevant to open data projects. We don’t level the same accusations at Australia.gov.au’s Media RSS feeds, and quite rightly. They aggregate, just as a data.gov.au portal would aggregate data sources (though in a different way). The argument that you shouldn’t engage in an aggregating function because the risks of government stuffing it up are spurious ESPECIALLY if agencies are distributing the data themselves anyway.

      • 2009 August 27
        Kerry Webb permalink

        Experience suggests that unless agencies are given a REQUIREMENT to do something (not just the permission to do it), they wont make any moves for fear of making a mistake, or simply because the internal bureaucracy prevents it.

        Or perhaps because, in their opinion, they have more important things to do. Like getting the cheques produced, or implementing policy or other boring BAU tasks – all without enough resources.

      • 2009 August 27
        Martin Stewart-Weeks permalink

        Media release RSS feeds work because that is information governments WANTS to get out. Tends to reinforce the concern that, as David Weinberger said earlier in this stream, the cathedral is a convenient dumping spot for agencies who won’t otherwise play the game.

      • 2009 August 28

        @Kerry Webb I fail to see how thats relevant. This is a government initiative, and we are discussing ways the government can implement its policy. Agencies don’t have “better things to do” than that.

        @Martin Stewart-Weeks If data.gov.au is the only way to get those agencies to actually do anything, then surely its a good option?

      • 2009 August 28
        Kerry Webb permalink

        @xtfer I fail to see how thats relevant.

        The relevance is that you suggested that in the absence of a requirement, there were some reasons why agencies wouldn’t do the sort of things that are being proposed here. I was saying that there may be a few other reasons.

        I am a proponent of using Web 2.0 concepts and technologies to improve the operations of government (it’s part of my job), and when governments direct that we have to publish datasets for open public access I hope that they recognise the extra work involved and fund the initiatives accordingly.

      • 2009 August 28

        @Kerry Webb Then I misunderstood you – you are absolutely right, there are many reasons. I do think that “not having the resources” is sometimes an excuse for not doing anything, however, and it is often the process and bureaucracy which prevent it.

        Perhaps we also need to look at how to empower those parts of the public service which can lead these initiatives? How do we empower the APS6 or the Policy Officer or the IT support officer to champion that within their organisation?

  17. 2009 August 27
    Mia Garlick permalink

    Something that I think may support the general sentiment that seems to be expressed in these comments of a cathedral+bazaar approach is that the original data.gov. site is now facing “friendly competition” from a National Data Catalog to be established by the Sunlight Foundation because “there’s only so much the government is going to be able to do. There are legal hurdles and boundaries the government can’t cross that we can. For instance: there’s no legislative or judicial branch data inside Data.gov and while Data.gov links off to state data catalogs, entries aren’t in the same place or format as the rest of the catalog. Community documentation and collaboration are virtual impossibilities because of the regulations that impact the way Government interacts with people on the web.”

  18. 2009 August 27
    ben rogers permalink

    I have to agree with @XFTER – if there is no mandate/requirement things will move very slowly indeed – those willing on the inside quickly get silenced by the resources/control arguments

  19. 2009 August 28
    Kevin Cox permalink

    @xfter – you are right nothing will happen unless there is a requirement.

    The point about cathedrals is not whether the government itself should build one. Of course they should and it will be probably be very good. However, the fact that there is one, should not stop others building other cathedrals. We need competition and choice otherwise there is likely to be little progress.

    My worry is that if there is an “official” church other religions will find it hard to gain any converts.

  20. 2009 August 29
    Craig thomler permalink

    Never forget humans in this discussion. It will never be the lack or difficulty selecting formats that will hold up the availability of open government data.

    It will be the concern felt at senior levels if the community does a better job of analysing the data – what does this mean for officials who have made it their career to analyse and draw conclusions? And what might be uncovered in the data that could affect careers?

    Data access (and access to the knowledge derived from data) is a manifestation of power and control in all organisations, hence the elaborate systems developed to control which employees or outsiders get access to which data.

    The challenge with opening up data will be distinguishing why some openness is opposed – whether there are sound governance, privacy, national security or resourcing reasons or whether it is a manifestation of past culture, fear or control.

    • 2009 August 29
      Matt permalink

      It will be the concern felt at senior levels if the community does a better job of analysing the data – what does this mean for officials who have made it their career to analyse and draw conclusions? And what might be uncovered in the data that could affect careers?

      All the more reason for transparency.

      • 2009 August 29
        Nicholas Gruen permalink

        And all the greater need for the capacity for some independent scrutiny of decisions made as to what gets released and what doesn’t.

  21. 2009 August 29
    Matt permalink

    I think the core issue here is ‘discovery’, how do we make data discoverable.
    A wapping great catalogue isn’t the way. The analogy would be the old Yahoo’s Directories, which were always in a state of decay, and are now dead.

    Metadata and search, supported by simple principles for developers is surely the way to make useful data available; Use simple formats, Publish raw data.

    I can see the use of a central repository, but the purpose should be to guarantee availabilty, persistance and performace of infrastructrure. Perhaps some trust services too. A warehouse, not a library.

  22. 2009 August 30

    There’s a lot of interest in the idea of a machine aggregated index (data.gov.au) and there’s a lot of data already out there in AGLS; plus there’s http://www.agls.gov.au/documents/ – which covers off some of the howtos… or would, if the HOWTOs were written.

    Since I’m very eager to see more open government, I’ve flicked off an email to see if I can’t perk the interest of the AGLS folk, offering to write a simple prototype aggregator (proof of concept) or tutorials for AGLS as RDFa and so forth.

    I’d guess quite a few of you wouldn’t mind chipping in yourselves, so if I manage to get a few draft tutorials together in a collaborative form, I’ll link to them from here and hopefully we can get something Easy and Usable(tm).

    After that, if its simple enough, I might start on an aggregator (PHP, HTML Tidy, w/a few PEAR components to do GRDDL) hosted via google code.

    If these things sound interesting and you want to have a chat, my contact details are available via my open id – just follow the links!

    • 2009 August 31

      Thats a great suggestion Daniel, I have emailed you separately, but keen to be involved.

    • 2009 August 31
      Gordon Grace permalink

      Here’s a first stab using the Agency Search service:

      1. Visit australia.gov.au (or any Australian Government website using Agency Search)
      2. Set the scope to ‘All Australian Government Websites’
      3. Search for something very generic (e.g. ‘the’)
      4. Add the following to the URL to filter by DC.Type.* field: ‘&meta_e=dataset
      5. ???
      6. Profit!

      You should find that all results returned have at least:
      1. DC.Type.* = dataset; or
      2. Body text containing ‘dataset’

      • 2009 August 31

        this didnt seem to work for me… I just kept getting lots of results with “the” in them.

      • 2009 August 31
        Gordon Grace permalink

        Try the first page:

        Inspecting the metadata of the items returned in the first page of results should reveal a few instances of:


        <meta name="DC.Type.documentType" scheme="agls-document" content="dataset" />

      • 2009 August 31

        Neat!

        I’ll have to remember that

    • 2009 August 31

      After a brief email discussion with someone at the National Archives, I’ve discovered that AGLS is currently being reviewed by Standards Australia, and they expect to republish the AGLS Metadata Standard AS5044-2009 towards the end of the year. The Draft revision is available on the Standards Australia website (DR AS 5044.1 CP) for about $30.

      Given this, its unlikely that we could make much headway on the tutorials, however the proof-of-concept aggregator should theoretically be possible, as it would have to work with the legacy schema anyway.

    • 2009 August 31

      It’s worth keeping in mind that AGLS (and the 2008 version, AGRkMS) is only half the equation. Having a standard set of properties, with associated URIs, and machine readable RDFS to provide definitions is a critical first-step. The second step is ensuring the various controlled vocabularies are open, accessible, locatable, and machine interpretable.

      Open – they need to be available to the general public, without legal encumbrance. For example, “Keyword AAA” is a proprietary specification owned and licensed for fee by NSW State Records (http://www.records.nsw.gov.au/recordkeeping/keyword-products/keyword-aaa). Unfortunately it is also included by reference by the NAA and every (AFAIK) state records agency.

      Accessible – they need to be available at a stable URL (or PURL) without the need to access them via intermediary links (ie. click-license, or disclaimer) preventing direct access by computers without human intervention.

      Locatable – the stable URL needs to be included in the various descriptions of the dependent standards, so programs can automatically pull the definition file.

      Machine Interpretable – the definition needs to be available in a standard machine interpretable format. The preferred format would be an RDF serialisation of a SKOS thesaurus. RDFa is an ideal option that can permit the file to be served simultaneously as a human-interpretable XHTML file and a machine-interpretable RDF document.

      • 2009 August 31
        Mike Nelson permalink

        The only sustainable solution is to mandate the use of open standards. If NSW State Records cannot be convinced to make Keyword AAA open (like Adobe did with PDF) then an alternative open, stable, machine interpretable controlled vocabulary needs to be developed.

        At the most basic level, can you even reference “Australia” as a URI in such a way that a machine can know the difference between the Australia in the physical/geographical sense and Australia in the abstract/political/legal sense?

        The traditional government method of getting a “product” like an ontology would be to pay a consultant to come up with a specification, put it out a tender, pay a contractor to develop it and get a finished product. The “product” then remains largely static, assuming it even works, and any maintenance requires paying the contractor to make changes.

        Suppose instead an ontology was developed like any other open source or open standard project. If it is truly open then it could be developed by volunteer experts without having to go through the mess of tenders and contracts. Open standards projects like W3C generally have much better quality control than anything proprietary and developers will put a surprising amount of their own time into making it work the way it is intended.

        My suggestion: pick a non-trivial ontology that needs to be developed and let it run as an open source project.

  23. 2009 August 31
    Mike Nelson permalink

    It seems a no brainer that “data.gov.au” would use AGLS in the same way data.gov uses plain Dublin Core. AGLS is quite a useful element set with logical and useful extensions to DC. It has a lot of advantages over DC but none of the unnecessary complexities of the UK equivalent (eGMS) or the bizarre encumbrances of US GILS (no wonder data.gov dropped GILS and opted for DC). The basic premise of AGLS is sound but implementation seems to have been poor.

    Likewise it is a no brainer to use RDF (in whatever syntax) to describe data, remembering that describing and linking data is Web 3.0 and not the fad of social networking collectively called Web 2.0. RDF is all well and good but there has to be sound OWL/SKOS/etc ontologies underlying it or the metadata quite literally has no meaning. This will mean work up front but once the concepts and ontologies are defined, the rest starts falling into place.

    While Google is wedded to its text search algorithm, what is not yet widely known is Bing and Yahoo are going to become semantically aware search engines supporting natural language queries. While I’m not a huge Microsoft fan and prefer to use open source where possible, well done for seeing where the future is.

  24. 2009 August 31
    Mike Nelson permalink

    Another point to think about on the cathedral/bazaar issue is “data.gov.au” doesn’t have to be a central data repository. Indeed it would be counter productive to put all the data generated by government on a single site. I’m not just talking about things like aggregated economic and health statistics, but think about the massive amounts of scientific data collected by CSIRO, BOM, AIMS, AAD and others.

    A cathedral/bazaar only has to be a central metadata repository. The data itself can be anywhere, presumably with whatever department created it. Since departments seem to change names (hence acronyms and domain names) after every election this means “data.gov.au” would also need to keep track of where the data is physically/virtually located to avoid broken links. There is open source software to do this but it does rely on the custodians of the data to update locations.

    • 2009 September 1
      Kevin Cox permalink

      Mike

      any repository of permanent data in data.gov.au defeats the purpose of gov2.0 – the power of the internet is that we link to sources of information – not make copies. Data should be left where it was created and access provided. This would be a good “principle” for the Task Force to suggest as it would stop the proliferation of data with the attendant security risks. Keep the source data in one place.

    • 2009 September 1
      Kevin Cox permalink

      Mike

      any repository of permanent data in data.gov.au defeats the purpose of gov2.0 – the power of the internet is that we link to sources of information – not make copies. Data should be left where it was created and access provided. This would be a good “principle” for the Task Force to suggest as it would stop the proliferation of data with the attendant security risks. Keep the source data in its place of origin and do not make copies for access.

      • 2009 September 1

        Again, you need to be careful of your terminology here, and avoid over simplification.

        Accessible data is good – accessible data with an atom or rsync feed of updates is better.

        Federated search is good – but only if you have a standard query interface[1]
        but not if you need to perform joins or correlated queries.

        When data is open anyway, where is the security risk? The actual risk is not security, but currency – hence my mention of atom/rsync above.

        So the recommendation will need to be more nuanced. If we are willing to talk actual technologies[2], closer to:

        1) Structured data should be made available in an open format with a well defined mapping into RDF (using standard vocabularies and RDFS/SKOS/OWL)[3]
        2) Metadata should be made available in RDF compliant with AGRkMS, DCMI; with any extensions described in RDFS; supporting thesauri in SKOS; and ontology in OWL.
        3) In addition, both structured-data and metadata should be exposed as a SPARQL endpoint.
        4) Unstructured data should exposed using a full-text search api. Due to its field testing, the google search api is preferred (or even better a mapping of the GS-API into SPARQL).

        It’s not enough to just have access to the data – we need to know what it means (hence points 1&2). It also isn’t enough to just slap on a web-form frontend to a text-search box (as is currently done) to count as “access provided”, you need real query facilities, with standard interfaces. Otherwise the transation costs kill you.

        [1] SPARQL is a good option for structured queries; Defining a mapping from the google search api to a virtual RDF graph would also allow you to use it for feature-oriented search.
        [2] If we want to remain tech-neutral, then we need to expand each of these technologies into a description of the capability they provide, and specify that – but that’s a lot more work, and not something I’m going to attempt in a short post.
        [3] D2RQ and similar technologies are available to map SQL databases into RDF with minimal difficulty; although for really large datasets there are going to be unaddressed scalability issues.

  25. 2009 August 31

    I think the following quote sums up an important aspect of what needs to be considered.

    Opengov means open ‘linked data standards’ – Tim-Berners Lee [including video from TED]: http://opengov.ideascale.com/akira/dtd/5489-4049

  26. 2009 August 31

    Yes to centralization of good AGLS, AGIFT and associated Dublin Core stuff. All great if documents (PDF and Office Suite documents) actually had the metadata in them (how many standard PDFs of laws and statutes even have the AGLS data item “jurisdiction”?).

    This centralization is pretty much like a standard that everybody conforms to so that things are interoperable. (Think of all the rules coming out of the IETF, W3C, and the Linux Standards Base – central control of standards with distributed production using those standards)

    But the choice of keywords and alternate terms is important. Discrepancies between jurisdictions (each state and federal gov has their own thesaurus) make consolidation difficult.

    If the keywords were properly filled in, it would be trivial to ask “give me all the agriculture or water policies or media releases associated with the Grampians” or “give me all the current inquiries across all agencies and jurisdictions that have anything to do with pricing controls”.

    The same thing goes for laws and general efficiency. How long did it take McClelland to go through all laws in Oz to discover all the things to do with secrecy? (Proper discoverability using keywords and something like Google advanced operators would also make it easy on lawyers to check up on topics across different councils, states, etc – and thus easier on the pockets of their clients).

    (Now, if only we could get RFC822 X-headers with AGLS/AGIFT contents put into, and displayed, by common webmail servers and email clients, then the majority of documents in government could be discovered easily by people in the same or different agencies).

    It’s also worth noting that the AGLS defines the authoritative source URL for the document (and there is no reason why XML datasets couldn’t include AGLS keywords at different levels of the schema, just like any other document).

    Thus, there would be core raw-data producers, readily identifiable whoever the intermediary or value-adder is, using centrally defined thesauri, and preferably a core set of centrally-defined schemata.

    So… maybe not “Cathedral”, but a decent sized central chapel that welcomes everybody to help define evolving “dogma”, then the “primary producers” creating the base data, then the “industry” (including cottages) that add value and sell it in the bazaar.

  27. 2009 September 1

    Alan great topic.

    I agree we need both the cathedral and the bazaar to get some dynamic tension. Generally Stephen, Mike, Andrae, Daniel, Mia, Tony, Hanare, Bec, Kerry, Ben are all on the right track in my mind.

    We also need to take some principle based ‘10 steps/rules’ to simplify and ensure balance and break the current ‘do nothing’ nexus [department heads note expect a mandate memo from your boss soon :-) about releasing meta data and most data] -

    1. clearly state what is available ? … meta data (structure data about data to be released immediately on request) .. OR plain data (release subject to controls see below) … OR Hybrid
    2. treat ’sets’ as ‘endless streams’ not as static point-in-time files like open source code (the stream of web site stats from a site does not have a version but the meta does)
    3. treat sets as having endless variations on their meta, because we are always seeking further improvement, the joy of the journey so keep a meta version number and when it started/finished
    4. state a clear owner for each set/stream and a clear point/s for community support [department heads note]
    5. possibly distinguish between sets and streams (sets being point in time, streams being ongoing and more alive)
    6. certainly hold and debate and refine all data in a decentralised way
    7. index and create a central pointer directory with minimal barriers to listing or access or replication/federation of the directory and ideally the data
    8. avoid picking a mandated technology, let the (federated) market decide, an initial short list may simplify and speed startup but no-exclusivity, no caps on new channels or access technologies
    9. release everything unless clear legal/privacy/security reasons can be identified within (say) 14 days of a request (bear cost of materials?) and encourage minimal duplication through good access but don’t mandate no-duplication
    10. prohibit charging for raw data only for value adds

    So a truly resilient and dynamic data.gov.au would hold -
    - policies (like the above) with a bias towards ‘always open meta data’ and ‘usually open unless there are good reasons raw data’
    - directory of pointers
    - showcase of best of breed voted by inside and outside the government.

    But it would not hold data or to the foreseeable future even meta data but as standards evolve and open perhaps it could hold meta data too.

    Thoughts?

    Cheers, Pete.

    • 2009 September 1
      matt permalink

      So a truly resilient and dynamic data.gov.au would hold -
      - policies (like the above) with a bias towards ‘always open meta data’ and ‘usually open unless there are good reasons raw data’
      - directory of pointers
      - showcase of best of breed voted by inside and outside the government.

      But it would not hold data or to the foreseeable future even meta data but as standards evolve and open perhaps it could hold meta data too.

      I don’t have a problem with data.gov.au being a storage manager too.
      I don’t think it’s reasonable to for every data producer, or publisher to make their data available forever, and to take on those costs.

      Suppose a small workgroup if formed to do some studies, and produce a quantity of data, should we expect them to host the data, and care for the infrastructure? Better to have a process when they can ’seal’ the data and submit it to an archive, along with whatever metadata will aid discovery.

      Perhaps we need to ask, how long will published data need to be available?
      Then we can think about how that might be achieved (assuming the metadata and discovery are thrashed out satisfactorily).

      Regards

      Matt

      • 2010 April 26
        Madeleine Kingston permalink

        Matt

        The technicalities of storage and retrieval in digital space are way beyond me. Mine is not to reason how, simply why (Parody of Charge for the Light Brigade)

        But…. I just love the idea of

        “So a truly resilient and dynamic data.gov.au would hold -

        - policies (like the above) with a bias towards ‘always open meta data’ and ‘usually open unless there are good reasons raw data’

        - directory of pointers

        - showcase of best of breed voted by inside and outside the government.”

        Now regardless of the legalities or technicalities of holding data indefinitely, who should be responsible, how it happens –

        Please let us remember the compelling reasons for doing this

        “Learning lessons from the past”

        If Government does not take this on as a commitment, how else will policies and governance improve?

        Please, please…

        Madeleine

  28. 2009 September 1
    Mike permalink

    What do you think? Should government departments embrace some of the principles of the open-source world in order to liberate public sector information?

    God no, unless we can implement restrictions that allow verification. Otherwise, go for it!

    • 2010 April 26
      Madeleine Kingston permalink

      Well Matt

      You do talk in riddles.

      I say go for it. Take a gamble. Learn from mistakes. Is that not what we say to our children? If not why not?

      Madeleine

  29. 2010 April 26
    Madeleine Kingston permalink

    Just saw Nicholas Gruen’s tweet on Adam Smith’s relevance to the here and now. (Club Troppo), as conveniently accessed on tweet deck. Unfortunately comments must be closed or else I would have more thoroughly read the article and tried to participate. Will see if archives allow me access to comment.

    People are questioning whether Maslow had it all, and the order in which we seek to attain access to the hierarchy of needs. Boxing those seeking collaborative engagement is a mistake. This is an outdated theory sold to us by market concept theorists.

    Steven McShane and Tony Travalglione (2007) in discussing Maslow’s Needs Hierarchy Theories seems “amazed that people had accepted his theory wholeheartedly without any critique.”

    The topic is perhaps best discussed under motivation – so it will keep till I get back to an earlier blog today on “Online engagement as a public service pathway.” That is where it really belongs.

    I will be back – on another page.

    Madeleine

    PS I feel like Alice in Wonderland, or perhaps the Cheshire Cat is more to the point, popping up here and there on this page and that – will I ever get back to the non-cyber world of reality? (Lewis Carroll, Alice in Wonderland – a study of people and organizations and their interactions).

    Cheers

    Madeleine

Comments are closed.