Linked data in the creases: blinkered by BIBFRAME, have we missed the real story?

I keep you in the creases / I hide you in the folds / Protect you from the sunlight / Shield you from the cold. / Everybody said they were glad to see you go / But no one ever has to know.

—Amber Rubarth, “In the Creases

American catalogers and systems librarians can be forgiven for thinking that all the linked-data action lies with the BIBFRAME development effort. BIBFRAME certainly represents the lion’s share of what I’ve bookmarked for next semester’s XML and linked-data course. All along, I’ve wondered where the digital librarians, metadata librarians, records managers, and archivists—information professionals who describe information resources but are at best peripheral to the MARC establishment—were hiding in the linked-data ferment, as BIBFRAME certainly isn’t paying them much attention. After attending Semantic Web in Libraries 2013 (acronym SWIB because the conference takes place in Germany), I know where they are and what they’re making: linked data that lives in the creases, building bridges across boundaries and canals through liminal spaces.

Because linked data is designed to bridge diverse communities, vocabularies, and standards, it doesn’t show to best advantage in siloed, heavily-standardized arenas such as the current MARC environment. If BIBFRAME sometimes feels uncompelling on its own, this is likely why! Linked data shines most where diverse sources and types of data are forced to rub elbows, an increasing pain point for many libraries trying to make one-stop-shopping discovery layers and portals. I first noticed an implementation that spoke to that truth in 2012, when the Missouri History Museum demonstrated their use of linked data as a translation layer between disparate digital collections with differing metadata schemes. SWIB13 offered plentiful examples of similar projects, including an important one from the US side of the pond. In building the AgriVIVO disciplinary search portal, Cornell University walked away from traditional crosswalks, instead finding the pieces of information they needed from whatever metadata their partners could give them and expressing those in linked data. This just-in-time aggregation approach lets AgriVIVO welcome and enhance any available metadata while avoiding tiresome and often fruitless arguments about standards and metadata quality.

What interests me most about this design pattern is how it neatly bypasses problems that led earlier aggregation projects to fail. The ambitious National Science Digital Library project of the mid-2000s foundered on the average science project’s inability to get to grips with XML, never mind setting up as an OAI-PMH provider. (Chapter 10 of Carl Lagoze’s dissertation offers the gory details, for those interested.) AgriVIVO, instead, takes Postel’s law to heart: it accepts whatever it is given, and gives the cleanest linked data it can back to the web. As this design pattern catches on, we could see less friction and standards-squabbling among information communities, which will be free to describe their materials as they see fit while still contributing to the growing interconnection of the cultural-heritage web. Librarians, archivists, and museum and gallery curators meshing together on the web while doing their own thing—what an opportunity!

It should surprise no one that the premiere conference for semantic web technologies in libraries is held in Europe; European libraries have led actual linked-data implementation all along. If I had to guess why, I would point to their small size, small numbers, and resulting agility, as well as their clear and unchallenged technology leadership within their country’s libraries. European national libraries, from what I can see, tend not to bog down as much as American library communities do in grindingly political, perfectionistic, top-down standards processes. Instead, they eye possibilities critically and solve problems however they think best, unconstrained by one-true-standard thinking.

This lent a delightfully grounded ambition to several of the development projects I saw at SWIB13. I was taken rather aback at first by the notion of an entire e-resource management system predicated on linked data—it struck me as frighteningly complex and fraught—but on second thought, if developer Leander Seige is solving a real data-integration problem for his library with the tools he has to hand, why not? Similarly, the ontology- and vocabulary-mapping projects at the Plattner Institute, Stuttgart Media University, and Mannheim University Library are not random pie-in-the-sky experiments, but active real-world problem-solving where linked data is the best-fit solution rather than just a trendy buzzword.

The presentation that most refined my thinking about linked data was Martin Malmsten’s “Decentralisation, Distribution, Disintegration—towards Linked Data as a First Class Citizen in Libraryland” (I would link to the video if I could, as the slides capture very little of Malmsten’s compelling arguments.) Malmsten sold me at once when he related how the National Library of Sweden, sick of MARC behaving as a stumbling-block in many of their projects, declared “Linked Data or die!” and audaciously set about making it happen. Along the way, the Swedish developers discovered that serialization formats like MARC and XML, as well as standards like METS, constrain innovative thinking too much and invariably involve shoehorning data into forms and formats that don’t quite fit it.

What linked data let Malmsten and his compatriots do was express their data in the manner best befitting it, while “keep[ing] formats and monsters on the outside” by automating the re-expression of the data in older, staider standards as necessary—and only as necessary. If broadly adopted, the National Library of Sweden’s approach frees us from the eternal lipstick-on-pig question of how best to present eccentric, often inadequate, almost always expensively-homegrown data to patrons. Instead, we will put the patron experience first, asking “What data do patrons actually want to see or use, and before we go creating it, does it perhaps exist already in the vast existing web of data?”

Malmsten also made clear that “Linked Data + UX = actually useful data.” Linked data on the inside is a hard sell without obvious user-experience benefits on the outside for both patrons and librarians, a point my rather eccentric keynote entirely agreed with. For that reason, France’s OpenCat effort was my favorite linked-data project from SWIB13. Since the National Library of France has already done considerable linked-data authority control on names, subjects, and titles, they are now leveraging it to build lightweight, easy-maintenance, enriched OPACs for some of the smallest libraries in the country, libraries too small for MARC to be an easy or comfortable fit.

After SWIB13, I firmly believe that it isn’t the big standards-development efforts that will shape linked-data adoption in libraries. Linked data will grow in the creases, the folds, the cracks of our notoriously rickety metadata edifices. It will often grow in the dark unnoticed, shielded by its champions, as with a project I heard about informally (and won’t name) that nearly died by cold administrative fiat before its developer made it too amazing to kill off. As it quietly solves stubborn problems, empowers our smallest libraries, and connects libraries big and small with the larger web, linked data will remake more and more library data in its image—and if good interface-design practices come along for the ride, no one ever has to know!

Note: This post is copyright 2013 by Library Journal. Reposted under the terms of my author’s agreement, which permits me to “reuse the work in whole or in part in your own professional activities and subsequent writings.”