Data curation’s dirty little secret

When the Research Data Services group I helped inaugurate worked out a response process for data-management-plan assistance requests, we were careful to respect the disciplinary expertise among our members. After all, even in late 2010 it was a truism that the barrier skill for helping researchers manage data was disciplinary expertise. “In practice,” wrote Alma Swan and Sheridan Brown in 2008, “data scientists need a wide range of skills: domain expertise and computing skills are prerequisites…”

Data curation’s dirty little secret is that this isn’t always true. It isn’t even often true.

Swan and Brown’s own evidence directly contradicted their words. They wrote, quite truthfully, that aside from domain experts who teach themselves digital data management and analysis techniques, another typical data scientist or data manager “originat[ed] as a computer scientist who has acquired domain knowledge over time.” Domain knowledge, then, is not a prerequisite exactly; it can be learned on the job. This being true for computer scientists, why wouldn’t it be true for information professionals as well?

Researchers themselves are the authority for the claim that disciplinary knowledge is required for proper data management, in Swan and Brown as in many successor reports and articles. I must say, I don’t find researchers a reliable source on this point. If researchers knew what skills and techniques are necessary to manage and work with digital data, wouldn’t they be doing it better than they are? Would they even need help with data-management planning? Would they be leaving data management to wet-behind-the-ears graduate students at the very bottom of the lab hierarchy, as I have so often witnessed them doing? Would they be dumping the digital equivalent of moldy boxes from spiderwebby garages on librarians’ desks to the extent they are?

That said, some researchers do believe fiercely in the indispensability of disciplinary knowledge. The last-but-one data-management brownbag that Research Data Services sponsored prominently featured work that two groups of my digital-curation students did to help the Living Environments Laboratory (LEL) store, describe, track, and search/browse the individual images and other digital materials from which virtual-reality scenarios are built. I had to bite my tongue hard when a researcher in attendance incredulously questioned the speaker about my students’ lack of disciplinary expertise. Surely they couldn’t have done that work? The work they had demonstrably done? Myths die hard… and while they live, they cause librarians needless headaches.

I have sent a round dozen groups of students out to solve digital-data problems in the three years I’ve been teaching digital curation. In addition to the LEL researchers, my students have helped a linguist, an art historian, student artists, a demographer, a radio station with media-archiving issues, and more. I’ve also sent interns and practicum students into a campus microscopy lab, our local Forest Service research outpost, and our local Geological Survey office. I match disciplinary expertise when I can, but I usually can’t. It’s never mattered. They do fine. They’ve all done fine.

For my own part, I’ve taught basic data management to engineers, physicists, biologists, historians, clinicians, and computer scientists, and I’ve critiqued data-management plans from even more disciplines than that. My own disciplinary background is in literary analysis and historical linguistics. I can count the questions and situations I haven’t been able to resolve singlehandedly without moving from my left hand to my right. The number I failed to resolve at all? One, that I remember—a confusing workflow in instrument biology, and it was my own fault for not calling in someone else to resolve my confusion before responding.

Are disciplinary differences irrelevant to research-data management? Well, no, but the salient disciplinary differences I’ve seen come in around idiosyncratic research processes and tools. I confess to considerable skepticism, for example, about the possibility of an electronic laboratory notebook software package that will work across the entire breadth of a campus’s research initiatives. Lab notebooks are tightly tied to idiosyncratic, ungeneralizable, often project-specific processes, and my experience with researchers suggests that they expect digital notebooks to conform to their processes equally tightly, and will brook no impedance. I hope I’m wrong—an 80/20 solution seems vaguely within the realm of possibility, perhaps—but we’ll just have to see.

For the advising and consulting around data management that libraries would like to do, of course disciplinary knowledge is useful! No question about it. If nothing else, a little disciplinary knowledge helps convince researchers that librarians are useful people to talk to. (I find that a tiny bit of research before a scheduled meeting allows me to fake it convincingly.) No matter how often researchers claim it is, however, “useful” is not the same thing as “needful.” As libraries work through how we will help researchers with data management, we can take comfort, I hope, in the mythbusting I’ve just done. We don’t have to have all the disciplinary knowledge scattered across campus within our library walls before we start to help.

I once chatted with the inimitable Diane Hillmann at ALA about scholarly communication and data curation. When the disciplinary-expertise canard came up, she said judiciously, “They all think they’re special snowflakes. They’re not.” I’ve never forgotten that. I believe my students and I have abundantly proven it, and I believe academic libraries can—and should—go right on proving it.

Note: This post is copyright 2013 by Library Journal. Reposted under the terms of my author’s agreement, which permits me to “reuse the work in whole or in part in your own professional activities and subsequent writings.”