Manufacturing Serendipity: Research Data Services at UW-Madison

Hello. My name is Dorothea Salo, and I teach at the School of Library and Information Studies at the University of Wisconsin at Madison. With my other hat on, I co-lead a small research-data consulting and training skunkworks called (big surprise here) Research Data Services. I’m here to tell you the story of how we built Research Data Services, which honestly amounts to what I’m going to call “manufacturing serendipity.”

This is a how-the-sausage-got-made presentation. I’m going to be honest about how we got where we are, and how far that is from where I think we need to be, if we can get there. I do this partly to give you hope. At a lot of events, what you see are the Purdues and the California Digital Libraries and the UC-Berkeleys, the places that are light-years ahead of the game. I’m not. I’m just barely keeping my head above water, and I think a lot more libraries and librarians are closer to where we are than to where UC-Berkeley is. So I hope that our story will be helpful and encouraging to those of you whose organizations are on similar paths.

When the NSF Data Management Plan requirement came down in late 2010, Research Data Services’s response looked like X-marks-the-spot, the more so because a lot of high-level campus administrators and major campus players in the research-computing space hadn’t even heard of us prior to that. This led to a few amusing comedies of errors, which isn’t all that surprising coming from a gigantic decentralized research university, but the point remains: we were ready, and our readiness surprised a lot of people.

Was it really just serendipity? Did we just happen to be in the right place at the right time? Either way, what does it mean for you? That’s for you to decide, but I’ll try to pull out some morals-of-the-story at the end.

When I started my new institutional-repository-manager job in the UW-Madison Libraries in 2007, almost exactly five years ago now, the very first committee I was put onto was something called the Scholarly Asset Management Initial Exploratory Group. SAMIEG was sponsored, funded, and mostly crewed by our central IT unit, the Division of Information Technology, and it took the form of a number of focus groups with faculty, where we asked them openended questions about their data practices and needs. As is the way of such groups, the results were written up into a report which as best I can tell nobody at our institution actually read, though I know people outside our institution sat up and took notice, because I’ve seen it cited a fair few places; you can find it on our institutional repository. I hope it was helpful! But so much for that report.

The next thing that happened to us in this space was the CIC “Librarians and E-Science” conference in 2008. The libraries sent half a dozen people to this, IT people and librarians, myself included, and it was a real turning point for us; several of us came back thinking “yes, the writing is on the wall; this is going to be A Thing and we will have to come up with a response to it.” Notably, and here’s where we differ strongly from what’s happening at places like Purdue and Michigan where there have been significant organization-chart shifts with respect to research support, the people who came back thinking this were rank-and-file employees and line managers. They were not campus IT administrators, not library administrators, certainly not campus administrators, just ordinary bottom-of-the-heap schmos like me. There was no way we were going to reorganize the whole library org chart to create a separate arm and a separate dean for research services! And there are a lot of research and research-computing stakeholders on our campus, so there was no way that everybody was just going to fall in line behind the library. So if anything was going to happen, it would have to happen from the bottom up, and in at least two different campus organizational silos: the library, and campus IT. Maybe the Grad School too. Kind of a tall order.

So in 2009, some of the same people who had been on SAMIEG started what I mischievously call Son of SAMIEG, but which was properly called the Research Data Management Study Group. Instead of focus groups, this was a set of more in-depth interviews with faculty, with a more robust interview instrument. As is the way of such groups, the results were written up into a report which nobody at our institution read, though I know people outside our institution sat up and took notice, because this too has been cited a time or two.

Does all this report-writing seem pointless? Well, maybe. But in a crowded environment where everybody has too much to do, this is sometimes the only way that the rank-and-file can light a fire: by writing reports that nobody reads so that they serve as administrative cover when real opportunities come along. Because a thing that happens in large organizations when something difficult and messy and futuristic comes up that nobody wants to deal with, is they tell you, “Scram! Go away and do some market research or user research or needs assessment or something and write us a report.” Look, we all know nine times out of ten nobody will read that report, much less act on it; it’s pure organizational theatre. But in our case, we’d done all the report-writing already, so nobody could reasonably tell us to go do it again. So writing the reports nobody read freed us up to make something happen when opportunity arose.

And arise it did. In late 2009, the new campus CIO started a campuswide IT strategic planning process designed to be very bottom-up. A lot of big open meetings were held where people could bring up issues they thought were important for campus to address. And this is where we, this little group of rank-and-file librarians and IT pros who thought research data management was important, really went to town on manufacturing some serendipity. We went to those meetings, we said our piece, we pointed to the reports from SAMIEG and Son of SAMIEG as evidence that this was important, and what do you know, we got ourselves a strategic-planning charter!

So in 2010 our charter group did some pilot projects working with faculty data, which given that research data management is a whole-lifecycle thing, there’s not much you can really have to show in less than a year, but we did our best. And we started putting together a website, and a business plan, and all that other good stuff.

From where I was sitting, the interesting thing to watch was the behavior of the charter sponsors, who were administrators from all over campus. They just didn’t really quite get what we were doing, or what problem we were trying to address, or why it was important to address it, but to their credit, they weren’t quite ready to stop us doing it. Part of this is that the research-data lifecycle and why it’s going to have to change and how huge an impact that will have on the research enterprise and how much and what kind of help researchers will need to do this right, all this is really hard to explain to people. Honestly, we still have administrators on our campus who look at us like either we or they are holding a banana to the ear! But if you want to blame us for not explaining all this well, I’m completely willing to agree with you.

They did shush us a bit, though. Well, kind of a lot, really. They didn’t want us making waves. Don’t go talk to the research-computing people; they’re really busy. Don’t go talk to deans; you’re just a pilot project. Stuff like that. So they were nervous about us. That’s what happens when these processes are bottom-up instead of top-down. The top worries, doesn’t want to commit itself—and doesn’t want you to turn into anything they might be forced to commit to.

I’ll step outside my own frame for a moment to say that last week I talked to some young librarians who have been hired into e-science and data-curation positions, and they’re telling me that they are being pretty systematically shushed. I don’t even see the point of that. This is a change-agent position. You cannot allow anyone—not your administrators, not campus administrators, not your existing staff, not nobody—to shush your change agents if you want them to, you know, actually create change! If you’re a library administrator and you’re not backing your new people good and hard, and listening to them and helping them when they run into stonewalling or shushing, shame on you.

Anyway, for us, because we’d been shushed so much, when the NSF lowered the boom and we leapt on the opportunity with a website and a consulting service, it really did feel to a lot of campus that we sprang out of nowhere, like Minerva from the head of Jove! When it was really the result of four long years of patient, opportunistic serendipity-manufacturing that we just hadn’t been allowed to tell anyone about.

“What is it that Research Data Services does?” you may well be asking. Well, notably, we don’t do storage or archival. We don’t touch storage, except to suggest existing storage services to people and provide suggestions for future storage services. We are purely an information, consultation, and training service. We do a lot of outreach and education. We don’t do storage. Frankly, storage is a political football on our campus—if we’d seriously tried to pick that football up and run with it, we’d have been tackled and stomped into the ground. It’s not that we don’t need usable working and archival storage on our campus—we absolutely do!—it’s that we knew we had no hope whatever of building it, so we didn’t kill ourselves trying.

“So what do we do?” you may be asking. We’re still doing NSF consulting, but in all honesty, it’s a lot less of our work than we originally thought it would be. We’ve just gotten hooked up with the DMPTool in California, and that is likely to reduce the direct-consulting work even further. We are getting referrals from a couple of our initial clients, though, which is nice! Are we okay with the reduction? Sure we are. We have plenty of other work to do.

Our website has gotten a fair bit of attention nationally, and requests to borrow material. Partly that’s first-mover advantage, but partly it’s that we got a few things right, and I’m proud of us for it. The site is also worth mentioning because maintaining it eats up a shocking amount of time. We’re in the middle of a redesign and re-architecture, and just don’t even get me started. Patricia Hswe of Penn State once called the NSF data-plan requirement the “mandate that launched a thousand websites.” She’s not wrong!

We’ve been doing a fair bit of consciousness-raising. There are really two parts to it: the “hey, this is important!” part, and the “hey, we can help!” part. We have a really gorgeously-designed poster we can take to campus events. I and others travel around giving talks like this so that we keep a national profile, because like it or not, that kind of thing cuts some ice with our brass. We’ve had pretty good luck with a series of videos of campus researchers talking about data, so we did a new series that we’ll launch this month alongside our website redesign, and since it includes the campus CIO, we’re hoping it will get some attention. We also launched a brownbag series that has done a lot better than I hoped. At the last one, we pulled 45 in-person attendees and 12 online. Pro tip: bring in talks about GIS!

While all the serendipity-manufacturing was happening, so was another thing: namely, I was starting to teach technology in libraries for the School of Library and Information Studies at UW-Madison. And because I didn’t embarrass myself in the classroom, SLIS and I started talking about the possibility of perhaps teaching other courses as well.

So, like Christine [Borgman], I’m now teaching a data-curation course; I’ve got the syllabus with me and am happy to share it afterwards. The key point for our purposes is that this course, like Christine’s, has a strong service-learning component, so it’s become a way to sneakily help people on campus manage their data without having to worry so much about approval from the Powers That Be. Serendipity-manufacturing in action! Last year we rescued a file drawer full of CD-ROMs containing photos of MFA art exhibitions; they now live in the library’s digital collections, and they’ll be added to every year as more exhibitions happen, because my students designed a process for that. So we didn’t just rescue a file drawer full of at-risk CD-ROMs; we kept that at-risk pile from growing. This year, I have a group of students working with perhaps the most high-profile research group on campus, the Living Environments Lab in the Wisconsin Institutes for Discovery, and they’re doing a fabulous job.

In what looks like parallel evolution, we are also getting into graduate student training, as other places are. I can’t speak for anyone else, but we’re doing it because when Jan and I went around talking to faculty and asked them about it, they all pretty much said “no one is doing this.” And then they all got wistful about all the data they’ve lost to their graduate students leaving messes behind them! So this summer I’m inaugurating a one-week, one-credit data-management bootcamp in Madison and Milwaukee. I’m glad Christine’s been getting standing-room-only, that’s great, but I haven’t. Frankly, this course may not make for lack of enrollment. I’ll let you know, and if it doesn’t go this time, I’ll try again next year.

I also have an enterprising SLIS student who’s also a Ph.D in chemistry who’s doing an info-literacy practicum this fall. All our info-lit practicums include a special-project component; she’s planning to help develop more data-curation materials. What’s not to like?

While I’m at it, I want to give a shout-out to the amazing people who have volunteered their time to Research Data Services. This is a pretty small, very brave group of people manufacturing serendipity with Jan and me. Note well, you cannot raid these people; I need them! Oh, except my students; them I cordially invite you to hire out from under me.

  • Co-leads: Jan Cheetham, Dorothea Salo
  • Librarians: Allan Barclay, Rebecca Holz (–2011), Keely Merchant, Ryan Schryver, Cindy Severt, Amanda Werhane (–2012)
  • IT professionals: Bruce Barton, Brad Leege (–2011) (Honorable mention: security consultant Allen Monette)
  • Others: Nancy Wiegand, Leah Ujda, Alan Wolf
  • Students: Kristin Briney, Andrew Johnson (–2011), David McHugh, Caroline Meikle, Jason Palmer, and all my digital-curation students

And where are we now, after all of that? Well, honestly, Research Data Services is still a nest of baby birds. We have no administrative home (though one has been kinda-sorta promised to us), the campus-IT half of the sketch is being funded one year at a time, we’re hearing disapproving rumbles from some top brass at the library, the CIO who started the strategic-planning process that led to our creation has left… so we’re hungry and we could die of neglect pretty easily. Or some big campus power or initiative could grab us out of the nest, rip us into bloody bits, and eat us, and we’ve had a few lately who are looking at us funny. So we’re trying to learn to fly. What else can we do?

You were expecting howling triumphalism? You invited the wrong speaker for that, sorry. I don’t know how this is going to turn out. “Badly” is a distinct possibility.


So what are the lessons here?

First, it takes time. Consciously and intentionally managing research data is a huge shift in mindset—for IT, for libraries, for researchers, for grant funders. Now, those of you who waited until recently to get going have a huge advantage we didn’t, namely, the NSF insisting on data-management plans; but even so, don’t expect to gin up a working, successful, respected, well-known service that bursts forth like Minerva from the head of Jove in a couple of weeks or months. It just does not happen that way!

Second, use what you’ve got. Recycle existing resources! And here I want to especially point out how important liaison librarians are to any effort like this. If you ask researchers, they say that what you need to have to work with their research data is disciplinary expertise. True or not, that’s what they think, and there’s only one place on campus with a broad pool of disciplinary experts covering most or all of campus. That place is the library. Take that expertise and use it, along with the relationships built by the liaisons who have it! As I said on Twitter earlier, the combination of a data-curationist with a liaison librarian-slash-domain expert is an incredibly powerful one.

But be aware of your limitations. I’ve been running institutional repositories my entire career in libraries, and I have to tell you, I cringe a bit when I see librarians touting institutional-repository (IR) software platforms as data-curation solutions. They’ll work for some data in some situations, sure, but if you think you can just repurpose most IR software and you’ve solved the research-data management problem, I’m sorry, you’re headed for trouble. As I’ve said and written other places, there are severe mismatches between what IR and digital-library software can do and what research data actually need. Be aware of that, and don’t oversell what you have. Also be aware that some among us are building better, more flexible IR-like systems. If you’re on DSpace or BePress or ContentDM, you should probably start planning for a migration if you’re interested in data repositories.

Finally, figure out how to feed your baby birds. It’s time for a gut check in academic librarianship. Either managing and preserving research data is an important research-library role that’s likely to persist for a good long time, well beyond the minor chore of two-page data management plans, or it isn’t. If you think it isn’t, fine, outsource to DMPTool. Don’t get involved otherwise, and tell all your people not to. If you think it is, though, you’d better not be starving your baby birds! They have enough survival battles they’re fighting—don’t make them fight you too! Yet that’s what happens to so many new things in libraries; they get smothered by bureaucracy, stonewalled by librarians themselves, or starved by lack of resources, because all the resources get shoveled toward the status quo. Don’t even argue with me about this—remember, I ran IRs for six years!

And let me mention one library human-resources antipattern that I lived through with institutional repositories and that libraries evidently didn’t learn from because young librarians are telling me that it’s happening with data curation too. It’s the “we’ll hire the New MLS Messiah to do it all, glory hallelujah!” thing. And no, you are not off the hook if your New MLS Messiah is supposed to “coordinate” his or her peers. You are not off the hook until those peers have gotten the word loud and clear that pitching in on this is not optional, because if they haven’t heard that loud and clear, they will sit on the sidelines, or worse, they will try to sell out or crucify your messiah. Don’t be stupid. Don’t plan that way. It didn’t work for IRs, and it won’t work for this.

Next, you can’t wait until everybody’s ready. Nobody’s ready. Nobody’s ready because nobody wants to be ready. Researchers don’t want to manage their data responsibly! If you read the million-and-one surveys that are out there now, that’s mostly what they tell you. For that matter, many librarians do not want to take part in this; I can tell you lots of horror stories when I’m not on the record. I’m telling you to race ahead anyway. You learn by doing, in this space.

So jump off the cliff already, and shove your people off it with you! Prefer action to demanding reports that no one’s going to read, and especially prefer action to analysis paralysis. I honestly don’t think you’re going to learn anything from local focus groups or surveys at this point that other people haven’t already learned; I hate to say it, but your institution is not a special snowflake. So read some of those studies and then act. Pilot projects, NSF consulting, system building, training programs, whatever makes sense and is feasible where you are, do something tangible to address this constellation of problems, something you can assess after a while and change direction if you need to. But do something. Seriously, do something.

And with that, I invite you all to go forth and manufacture your own serendipity! It’ll be different for every single one of you, and all of your institutions, but that’s half the fun!