Getting started with open data
Last week I was asked how a public library service could get started with publishing open data, and where to look to make it good. There aren’t many library services publishing data (yet!), and it can be a daunting task for a service to get started. For libraries, it’s still leading in innovation rather than jumping on a bandwagon.
Firstly, I wouldn’t worry too much about being excellent straight away. Aside from ensuring you don’t publish personal data, which should very rarely even be in internal data reports, it’s possible to get started quickly and get feedback to develop your processes.
Here are some general notes.
Publish what you already have
The easiest data to publish is the data you already hold and report on regularly. Do you maintain a dataset of number of loans and footfall for every library, every month? Great, well put it online with a licence (which should be the Open Government Licence)
You can even make your open data the place where colleagues in your organisation need to go to get data. Set up a publishing schedule, document and explain the data. You then have a regularly maintained, long-term, openly licensed dataset, with external and internal users, metadata, and documentation! It would be a great start.
Don’t try to represent what you do
I’ve been involved in discussions on library open data for a decade. Most recently there were two workshops in 2022, run by CILIP and Libraries Connected, that brought together various library services to discuss data collection and publishing.
A frequent concern from library services is that ‘old’ data like book loans, and footfall, doesn’t fully represent what libraries do. Libraries have events, they give advice, provide warm spaces, and lots more. Plenty of work can be done to ensure these things are represented in data, but it’s far more pressing to do something with the data already held. You can’t publish what you don’t hold.
The problem comes from thinking open data is anything to do with communicating performance, or providing advocacy for the sector. Open data is about publishing the data you hold, for wider re-use.
Find out what people want
Do you get freedom of information requests for data? If so, note them down as potential candidates for open data. Journalists are often after a fun story on libraries, and will email every library service asking for their most overdue books, or top 100 books loaned that year, etc.
Check with local data and developer groups to hear what they’d like to see. Ask the public what they’d be interested to know more about.
There are also sector colleagues who could be interested. Ask the British Library’s LibraryOn team what they may be interested in, and what they feel could be useful to develop their website. It may be that they could start to use your data in beta versions.
Use data schemas
It’s really useful to have library data published that matches data from other library services. It means the data is both useful locally, and can be merged with other services to be used at a wider level.
That was the point of creating the public library data schemas, a cross-sector project in 2019. Those schemas describe the structure for 7 commonly-held datasets: loans, visits, membership, catalogue, events, library locations, and mobile library stops. The pages also describe potential uses both locally and nationally.
Look at other services
Not many library services publish regular open data, but do take a look at Newcastle Libraries, Barnet Libraries (who use the data schemas mentioned), and Calderdale libraries datasets.
Don’t destroy your data first
There’s a strong tendency to simplify data before publishing it.
“The public don’t need book loans for every library, every month, broken down by item category - how about a nice table for them that gives a summary count of loans per year. That’ll be easy for them to understand.”
Imaginary thoughts a data publisher could have
The mindset that you need to simplify data for easier consumption is easy to fall into. The opposite is true, there is a wider range of data skills outside public library services than inside (true of any service or industry), and people often need more data to do the things they want to.
Provide as good data as you can, and document it well. Be prepared to answer questions on it, and for people to be interested in exploring more.
It won’t make money
Income generation is an unfortunate curse for library heads of service, who are being driven to find new ways of making money, rather than being allowed to run their library service. They may hear phrases like ‘data is the new oil’ and it somehow sounds lovely to have a dirty pipeline of money coming directly from the public data they hold. Maybe data on book loans could be sold to those big tech and data companies?
It’s not going to happen. That’s not to say the data isn’t valuable. It is - but for insight and realising ways of using that data that will benefit libraries and the wider public. Making money from data is hard. And you need to do a LOT more than just publishing what you hold. Running data services, data validation, managing subscriptions, access levels, etc.
It would also be too complex to negotiate the legal hurdles, without even getting into why public libraries would choose to sell public data, when central and local government policies are for open data.
I’ve heard a service say in the past say that no-one used their data when they did publish it. This was after publishing a high level summary of the number of loans per year, and not telling anyone.
So promote it regularly. Do things like making your own visualisations from the data and put them on social media, linking also to the data. For people to use the data they have to know it’s there.