A previous post discussed data on mobile libraries, and improving how this data is used internally, and in providing public information. An encouraging number of library services have been in touch to get involved.
The first step is to analyse existing data, and organise this into a standard. That means deciding what fields would be needed, and additional detail such as whether each field should be a number, or a date, etc. And any constraints on the field, such as being mandatory, or having a minimum length. This sort of standard is generally called a schema.
It’s worth looking around for existing examples and reference points.
- The BusTrip schema on schema.org incorporates BusStop, and column values such as arrivalTime. Many of these ideas apply to mobile libraries.
- There has been a proposal of how mobile libraries could be stored in OpenStreetMap (OSM). The opening hours schema from OSM is proposed as a way of storing the times for each stop.
- There is a Wikidata item statement for ‘bookmobile’. This also references the proposed OSM tag, so that data could be linked between the two.
- IFLA publish mobile library guidelines. These detail physical characteristics of mobiles such as engine and chassis, and the stock that should be on them. But nothing on how the data on mobile library routes should be compiled or presented online.
Mobile library data is of mixed quality, but there are plenty of existing mobile library timetables to research and look through. These won’t be in data formats, but they will indicate the data that library services use to make their timetables.
Creating a schema
This may seem overly technical, but there are practical benefits for having a schema in a computer-readable format. CSV Lint is a service to check CSV (comma-separated values) files to ensure they are valid. CSVs are text files that hold table data, with a line at the top for the headings, and then following rows for values. All separated using commas. A basic example is shown below.
MobileService,Community,Stop Gloucestershire,Hasfield,Farm Gloucestershire,Hucclecote,The post box Wiltshire,Corsham,The pub Wiltshire,Bradford-on-Avon,Bridge
CSV lint will perform checks like ensuring consistent commas in each row, that each header name is unique, and other common errors.
Additionally, given a Table Schema file, CSV lint will check the CSV conforms to the schema. Not just that the format is correct, but checking of the data definition and constraints. The schema then becomes a powerful tool for checking data is correct.
Mobile library schema
A first version of a schema for mobile library stops is available below. This will be used in proof-of-concept tools such as PDF calendar generators, online maps, stop finders, etc. As changes become necessary, new versions of the schema will be released.
A human readable description of the fields is below.
|Organisation||The organisation running the mobile||Wiltshire|
|Mobile||Name for the mobile library||South Mobile Library|
|Route||Name for the route||South Thursday Week 1|
|Community||The community served by the stop||Alderbury|
|Stop name||The individual stop name||Eyres Drive|
|Address||Address for the stop||Eyres Drive, Alderbury|
|Postcode||Nearest postcode for the stop||SP5 3TD|
|Geopoint X||Longitude for stop location||-1.723543|
|Geopoint Y||Latitude for stop location||51.03884|
|Day||Day the mobile library visits this stop||Thursday|
|Arrival time||Time the mobile library arrives||10:00|
|Departure time||Time the mobile library departs||10:20|
|Frequency||Schedule for repeated visits to this stop||FREQ=WEEKLY;INTERVAL=4|
|Start date||Date the timetable starts||2019-04-04|
|End date||Date the timetable ends||2019-09-19|
|Timetable||Link to a PDF or web page||Link|
Some notes about the schema:
- Stop name, community, address, postcode, and x/y coordinates all relate to the stop location. Mobile library stops often do not have addresses or postcodes, but many services publish them. Often this is the place the mobile library stop is closest to. So these fields should be optional. The mandatory ones are stop name and coordinates.
- The most complex field is frequency. In the example above, FREQ=WEEKLY;INTERVAL=4 means the mobile library visits every 4 weeks. FREQ=WEEKLY;INTERVAL=2 would be every 2 weeks; for weekly a simple FREQ=WEEKLY is enough. This uses the iCalendar Recurrence rule specification. The majority of mobile stops will be variations on weekly intervals, but some are more complex such as Worcestershire’s 3rd Monday in the month. That’s fine though, that can be specified as FREQ=MONTHLY;BYDAY=3MO.
- The data is ‘flat’. There is no structure between concepts. The concepts could be said to be Library Service, Mobile Library, Route, and Stop. These should really be in a hierarchy. For example, a Library Service has many Mobile Libraries, which have many Routes, and each route has multiple stops. Holding the data in a flat structure leads to quite a bit of duplication, but is simpler to store.
- There are few identifiers in the data. For example, rather than just having organisation name, the schema should really link to a recognised identifier for the organisation. This would make the data ‘linkable’. However it is designed to be as minimal as possible, added complexity can come later if necessary.
An early test with CSV Lint has proved succesful. Passing Aberdeenshire mobile library data and the schema file comes back with no errors (in reality there were some to start off with!). So that’s good.
A follow up post will begin to describe how this data can be used in practical applications.
Getting this project right will rely on it serving both library users and library services. The process for submitting data shouldn’t be hard or confusing for libraries. It should reduce time spent in creating timetables and web information. For users it needs to meet real needs such as well-formatted timetables, data in accessible formats, and additional features like notification systems and calendar integration.
It’s easy to get carried away. This morning, before getting up, I was imagining how an admin system could work that automatically validated mobile stop data. Not just the schema validation described in this post. But really validated it. For example, the system could calculate the route a mobile library would take between stops, and check the departure and arrival timings are realistic. Perhaps give a warning if it seemed like they needed to be shifted slightly. Even more advanced, it could suggest efficiencies in terms of moving stops between routes.
None of this is intended to remove the human factor behind this - in many cases expertise will have gone in to mobile timetables to decide what times are appropriate in which places (e.g. schools). But technology can be a tool that aids this.