Avoiding the Heron’s Way: Planning for a Practical Institutional Repository

I love herons. Great blue herons, green herons, gray herons, I love watching them. This presentation is not a slur on herons! But have you ever noticed how they hunt? They just kind of stand there. Maybe they stalk about a bit, very slowly. Mostly they just stand there and wait for fish to come to them.

Do you think they catch all available fish that way? Theyʼve got an entire body of water to hunt in with lots of fish in it! Are they really going to catch them all, or even just the best ones, by standing there and waiting? I donʼt think so either.

This is The Heron’s Way.

This is also the way a lot of institutional repositories (IRs) are planned. Weʼre going to get some software running, and then weʼre going to… stand there and wait for stuff to show up in it. Maybe we’ll do a little marketing and outreach, maybe not, but mostly we’ll just stand there and wait. This passive style of IR management is what I call The Heron’s Way, and itʼs still much too common.

What The Heron’s Way usually leads to is the digital equivalent of an empty cardboard box. How many good digital materials end up in an empty cardboard box, do you think? I think it’s probably about the same number of good, useful books that spontaneously, without librarian intervention, show up on empty shelves in a library.

Speaking as a long-time IR manager, if what Iʼve got is an empty cardboard box, my instinct is to fill it, to fill it with any old thing, including dubious scraps that Iʼm not entirely sure anybody wants, but I take them because itʼs what I can get or because I think itʼs important regardless of the judgment of others. Have I caught flak for some of these decisions? Are you kidding me? Iʼve caught flak for all of them. With the lone exception of electronic theses and dissertations, anything I collect, anything at all, has somebody who thinks I shouldnʼt be collecting it, and thinks so strongly enough to tell me so.

I sometimes also end up with things that are really nice, like a gorgeous cat who just doesnʼt fit in the box I happen to have. In non-metaphorical terms, things end up in institutional repositories that IR software isnʼt very well-suited to: institutional records, or image collections, or streaming multimedia, or anything interactive.

Empty boxes and ill-fitting cats arenʼt what we have in mind when we plan for institutional repositories, are they? Of course they’re not. So we need not to take The Heron’s Way. We canʼt open up an IR and just stand there.

So letʼs step back and regroup here. What is an institutional repository anyway?

The classic definition that everybody uses is Cliff Lynch’s from 2003:

“… a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.”

Cliff, if youʼre in this room, I apologize to you, and I am your biggest fan ever, but Iʼm also going to suggest that this definition of yours is and has always been entirely incomplete from a planning and implementation perspective. Letʼs unpack it and see why I think that.

“… a set of services
What services, exactly?
that a university offers
The university or its library? Does the university even know this thing exists? Are its priorities the same as the libraryʼs? If theyʼre not, whose priorities win?
to the members of its community
Which members? Do undergraduates count? What about records managers?
for the management and dissemination
Management as well as dissemination? So an IR is a content-management system? Did anybody tell IR software developers that? Because IRs are useless as content-management systems. Also, what about materials that we canʼt disseminate, because of copyright or whatever other reason? Do they count?
of digital materials
Which materials do we actually want? We obviously donʼt want every single bit and pixel from everybody on campus!
created by the institution
What does “the institution” create? Records, certainly; do those belong in an IR?
and its community members.”
Which members, again?

Without clear and broadly agreed-upon answers to these questions, can we do anything but stand there like a heron?

Lesson one: It’s fine not to have an IR.

Cliff Lynch once said that an IR is “essential infrastructure for the digital age.” I love you again, Cliff, but you were wrong on this one. Not everyone can or should have an IR. And there are no IR police! Nobody is going to revoke your MLS or your consortium membership or anything like that because your library doesnʼt have an IR. If you canʼt come up with answers you can believe in to my questions about Cliffʼs definition of an IR, donʼt have an IR! Itʼs fine. Really.

Lesson two: “Having an IR” is a non-goal.

If you want an IR, be clear about why. Especially donʼt open an IR just to “have an IR.” Thatʼs just silly. Do you open a library building just for the sake of a building? Do you open a library building just so you can have some nifty shelving? No, I didnʼt think so. You want the stuff thatʼs on the shelves in the building, not the shelves or the building per se.

“Having an IR” is a complete non-goal for the same reason; nobody cares about an empty digital cardboard box, and nobody should! If you want an IR, you owe it to everybody in your library, especially the poor schmucks whoʼll have to run the thing, to be clear about why you want it, what you want in that digital cardboard box.

So why might you want one, then? From where Iʼm sitting, I see three basic reasons libraries open IRs. Iʼll discuss each of them in turn, and pull out some more planning lessons.

Advocacy tool

Early on in IRsʼ history, libraries opened them because they considered IRs their contribution to the open-access movement. Weʼve opened an IR, so weʼre fixing scholarly communication! Go us! Of course, it hasnʼt turned out to be that simple.

Let me be completely clear about one thing: open-access advocacy is hard. It takes patience, guts, and political capital. Thereʼs a ton of misinformation out there that youʼll have to counteract. Itʼs risky, too; faculty and librarians are accustomed to the current system. I have scars aplenty from trying to do advocacy with faculty and librarians who feel invested in toll-access publishing and the systems and structures around it.

How good an advocacy tool is an institutional repository? Well, in the early days of IR adoption, the thinking seemed to go something like this: weʼll offer people an empty cardboard box, theyʼll put their stuff in it for some not-adequately-explored reason, and then magic will happen and the whole campus will go open-access. Really? No, seriously, really? How does that work, exactly? I wish I knew. Itʼs never happened to me.

Look, a lone IR all by its lonesome is a lousy advocacy tool. Terrible. And if open-access advocacy is the only reason you have an IR, itʼs in all likelihood a lousy reason. An IR can definitely be a backstop for a good advocacy effort, but all by itself, itʼs not advocacy.

Now, I am all for open-access advocacy! We need more of it, not less. But weʼll do advocacy better without empty cardboard boxes confusing the issue. Here are some options:

  • National-level lobbying: The NIH Public Access Policy has done pretty well for itself, but it hasnʼt expanded beyond NIHʼs borders. If we want that to happen, we will need to lobby for it, because toll-access publishers are already lobbying against it.
  • Local initiatives: OA author funds. Memberships with reputable open-access publishers. Serials licensing agreements that help local authors publish OA. Library-internal open-access mandates, for libraries whose librarians publish in the literature.

When I shared a draft of this presentation, a friend of mine said, “come on, surely nobody thinks an IR is only or mostly for advocacy?” By pure coincidence, two days later I came across a job ad for a repository manager that made it very clear that advocacy is a major part of the job:

“[T]his innovative and energetic individual… will work collaboratively with colleagues on the faculty and in the libraries to develop and implement an open access repository in support of the Open Access Policy, including policies, procedures, workflows, metadata, recruiting and harvesting content, and marketing and outreach to the University community advocating for best practices in open access.”

So no, Iʼm not making this up. Interestingly, this ad is from an institution whose faculty just put an institutional open-access mandate in place, in the footsteps of Harvard. Which is very cool, so go them! But do you notice something about that? The library didnʼt plan for, much less open, an IR until after they had the policy mandate from faculty to do so! So letʼs talk about institutional open-access mandates a moment, because theyʼre actually an excellent reason to have a repository.

Want a mandate? Find faculty champions.

If you have your eye on a Harvard-style open-access mandate, the first step is to find local faculty champions! We know that we librarians canʼt tell faculty what to do. We also know that faculty are intensely suspicious of college and university administrations, so we canʼt just go to the provost or chancellor and ask for a mandate. Itʼs not coincidence that every single institutional open-access mandate I know of comes through shared-governance arrangements like faculty senates and library councils!

If you want this to happen where you are, find, cultivate, or build an army of faculty champions who will fight for it. There is no other way. But you can do that without an IR! In fact, Harvard did. Harvard’s repository DASH didnʼt open until well after the first mandates passed their faculty-governance bodies. Frankly, I suspect itʼs easier that way.

Want a mandate? Eat your own dog food.

The other thing you can and should do if youʼre after a faculty mandate is to set the example inside the library. Several libraries, like Michigan and Oregon State, have instituted library-internal policies that materials that librarians present or publish must end up in the IR. How much more obvious can it be that if we want faculty to use the IR, letʼs use it ourselves! If we canʼt make ourselves do it, on what ethical basis can we tell faculty that they should? If weʼre serious about mandates itʼs hypocritical not to have one for ourselves, especially if weʼre tenure-track.

All this leads us to Lesson Three.

Lesson Three: Open-access advocacy is a separate question from IR planning. Do justice to the intricacies of both.

You donʼt get a get-out-of-advocacy-free card just because you have an IR. Frankly, if all you have is an IR, you do not actually have an open-access advocacy program. And if you call somebody a scholarly-communications or digital-repository librarian and you expect them to create this huge cultural shift on campus all by themselves, but all you give them to work with is a lousy IR, youʼre delusional. Youʼre not a great planner and leader, either, because youʼre sending that poor person into a gunfight with a butter knife. Think again. Think hard.

Collection

So letʼs say you think of your IR as a container for well-chosen digital collections, the same way your buildings and shelves are containers for your well-chosen print collection.

You have two basic questions to ask yourself:

  1. What do you want? Note well, this is not the same as “what you will accept.” “What you will accept” is The Heron’s Way.
  2. How will you get it? Note well, the answer is not “they’ll just put it in!” That is The Heron’s Way.

Hereʼs some stuff you might want:

  • Peer-reviewed literature
  • Gray literature
  • Websites
  • Conference proceedings
  • Working papers
  • Theses and dissertations
  • Research data
  • Other student research
  • Learning objects
  • Multimedia

I canʼt tell you whether you want any of that stuff or not; thatʼs for you to decide, just like any collection-development policy. But when youʼre deciding whether you want these things, you need to keep my second question—“how will you get it?”—in mind. Part of deciding whether you want something, as any good collection developer will tell you, is deciding whether itʼs worth the effort and expense to collect. Saying “I want something!” without figuring out a plan to collect it is a classic The Heron’s Way error. Trust me, what you want won’t just spontaneously show up.

So how will you collect these things? Let’s first discuss how you won’t.

  • Self-archiving. Nobody self-archives in IRs. Not even librarians. We know this. If the planning for your IR assumes that people will just magically self-archive, you have just set sail for the failboat.
  • Marketing. If the IR doesn’t help anybody, marketing won’t help the IR. We know this. There have been endless webinars and conference sessions and articles about IR marketing and outreach; I havenʼt seen much of anything result from the techniques they suggest. If your fundamental stance toward the IR is The Heron’s Way, completely passive except for some marketing, youʼre sailing for the failboat again, because marketing is just not enough. If you donʼt look at an IR from a faculty memberʼs point of view and make sure they will get something tangible and useful out of participation in the IR, no marketing plan will help you. Now, if youʼre sneaky, you can engineer a success or two, then start marketing. I hear that sometimes works. So does having a library-internal mandate, sometimes. But donʼt kid yourself, engineering a visible success is the hard part here!

Your inevitable bête noire will be copyright: people will put in stuff they shouldn’t, and people are afraid to put in stuff they can. We know this; it’s a trap that caught a lot of early IRs. Nobody planned to help anybody out with copyright issues! Folks, copyright is a morass. It touches almost everything you might want to put in an IR, from dissertations and theses, to faculty who think they own copyright in articles when they donʼt, to really stupid faculty who want to put in somebody elseʼs work, to third-party copyright questions, to people who are terrified to put in a lesson plan with a two-line quotation from a book because they think they might get sued. I hate this too. I hate it a lot. But itʼs a reality. Your IR plan had better plan to deal with it!

After all that gloom and doom, hereʼs how you might catch some fish:

  • Electronic thesis and dissertation program (but beware of the MFAs)
  • Digitization
  • Crawling the local web
  • Allied services (such as copyright consulting)
  • Active collecting

These strategies often work together. You might, for example, crawl your local facultyʼs web pages and find half a working-paper series, the other half of which only exists on paper. If you can digitize the print-only half, chances are theyʼll give you the whole thing! If you canʼt, they wonʼt be interested.

Lesson Four: IRs take work. Anybody who says otherwise is selling something. Don’t buy.

You might have noticed that all those strategies involve time and effort, well beyond the technical effort involved in setting up and maintaining IR software, well beyond marketing. Quite correct. But how many IRs plan for that time and effort?

If you donʼt want to put any such effort into an IR, remember Lesson One: you donʼt have to have an IR in the first place! But wimping out on the work is The Heron’s Way. Donʼt go there. You wonʼt catch fish.

Lesson Five: A one-person IR is (or will soon be) a failing IR.

The corollary to “it takes work” is “it takes work from more than one person.” Don’t throw a colleague to the wolves! Now, obviously this depends on the size of your campus and your library. If youʼre a six-person library, feel free to ignore me. But if one person canʼt do regular collection development for your whole library, why would you think one person can “do the IR?”

In my six years doing this, Iʼve seen plenty of libraries hire a single IR librarian—often a brand-new librarian, at that—and think theyʼre done. More often than not they then completely ignore that person, except to blame them when the IR isnʼt what they think they want it to be. This is a horrible, reprehensible thing to do to a colleague, especially a colleague with a young career, as often happens. If the IR is not a whole-library priority and a whole-library initiative, why do you have one? Remember Lesson One!

Successful IRs have more than one person behind them. Ohio State knows this. So does Harvard; theyʼve got a whole office! So does Illinois. There are plenty of other examples.

A word about records in IRs

Don’t. I admit Iʼve changed my thinking on this over time. I used to think no-harm-no-foul, but honestly, I think IRs have to some extent gotten in the way of building usable e-records-management systems, and that is awful. So Iʼm serious. Donʼt use IRs for records management. I see small schools particularly doing this, and I think theyʼre making a big mistake by adopting the wrong software for the job at hand.

Now, once something has gone through the records mill and become genuinely archival, it might be worth posting to the IR. Thatʼs your call. But for day-to-day records management, seriously, don’t. IR software is hopeless at:

  • Scheduling
  • Automated workflows
  • Fine-grained access control

IRs also have bells and whistles you donʼt even want if youʼre a records manager. Do you want WorldCat harvesting your meeting minutes through OAI-PMH? Probably not. So your records donʼt even need OAI-PMH, but with an IR youʼll get it even though you donʼt need it.

Lesson Six: IRs are no substitute for proper records-management software and services.

If you want records-management software, get it. Look at Archivematica and iRODS and Kuali and so on. Donʼt use an IR. The software is just hopelessly inadequate for what you need.

Service

So you think your IR is a service, the way Cliff Lynch said. Again, I think you have two questions to ask yourself…

  1. What service(s) will you offer?
  2. Is the IR what you need to offer the service? (That is, do you really need the IR at all? Do you need some other thing entirely?)

Services you might offer include:

  • Copyright consultation
  • Research-data planning advice
  • Records management
  • Retirement archival
  • Digital preservation
  • Publishing e-journal and conference proceedings
  • Publication advice
  • “Scholar’s Lab”

So do you need an IR to do these things?

You donʼt need an IR just to give advice or consult. Whether itʼs copyright, research-data planning, publishing, or the refdesk-y kind of function youʼd find in a Scholarʼs Lab, you do not need an IR to do it! Just do it!

For some other things, maybe you want an IR, maybe you donʼt. Think hard before you assume that of course the IR is perfect for whatever you have in mind.

Notice what youʼre not seeing here? Anything for which an IR is obligatory! Itʼs amazing how often IR software is just plain not necessary to do something useful. To be clear, if youʼre still bound and determined to have an IR, some of these are services you should be offering alongside it. Never underestimate the copyright monster, or the appetite for digitization services!

Lesson Seven: An IR is not by itself a digital-preservation or e-publishing system.

As I mentioned before, libraries commonly think that with IRs theyʼre offering digital-preservation services. Well, theyʼre not, not really; they may be offering pieces of a digital preservation service, but not the whole hog. If youʼre interested in digital preservation, use the Trusted Repository Audit Checklist or DRAMBORA as planning tools, because digital preservation goes far beyond software. Just as IRs arenʼt get-out-of-open-access-advocacy-free cards, they arenʼt get-out-of-digital-preservation-free cards either.

Similarly, there are lots of services you might want to offer in the scholarly-communication and digital-preservation spaces; donʼt make the mistake of thinking that opening an IR is all you need to do to offer them.

Can one IR do it all?

So right now you might be thinking that you want it all. You want an open-access advocacy tool and a quality digital-materials collection and a quality service!

Well, gee, I want a pony. Can I have a pony?

What you need to understand is that these goals, fine as they all are, conflict with one another in practice. While youʼre planning, you need to be aware of the tradeoffs.

If youʼre doing open-access advocacy, classically what you want is the peer-reviewed journal literature, usually in preprint or postprint form. Well, Iʼm afraid thatʼs the hardest thing to pry out of people in the absence of a mandate, so what you end up with is the empty cardboard box. If youʼre also assessing the IR by the size of its collection, youʼre going to be disappointed.

The same is true if you think youʼre doing open-access advocacy alongside service. Nobody actually wants an open-access service, so you wind up right at the empty box again. This is the mistake early IR adopters made in droves, and it’s why I said earlier that advocacy needs to be a separate thing. Itʼs harder to do, harder to assess, and doesnʼt fit well with other common goals for IRs.

What happens when you want both a great collection and a great service? Letʼs assume for the moment that the service your IR claims to offer is digital preservation, because thatʼs fairly typical. Honestly, your collection will probably turn into a bit of a scrapheap. People want to preserve the weirdest and most useless things! People arenʼt good at selection and weeding the way librarians are! Are you ready to do that for them? If not, are you offering the service they need?

If you bait-and-switch, tell people youʼre a generic digital-preservation service and them tell them “no, we don’t want fifty years of your departmentʼs faculty-meeting minutes, or terabytes of badly-digitized video from your local speaker series,” they turn around and never darken your door again, even with stuff you do want, and theyʼll tell all their faculty friends to avoid you too. Another mistake a lot of early IRs made was writing these incredibly strict collection policies, which led them straight to the empty cardboard box. If you really want to be a service, then taking in stuff that makes you hold your nose is part of the cost of doing business.

Trying to be a digital-preservation service also runs into the stuff-that-doesnʼt-fit problem, largely because most IR software is overoptimized for the single-PDF article. IR software wonʼt stream audio and video. It doesnʼt have an internal pageturner for page-scanned books and articles. It doesnʼt have image galleries or nifty Flash embeds. Itʼs really a big ugly mostly-useless silo. You need to plan around these limitations, at least for now, because IR software is what it is, which isnʼt necessarily what you want it to be. IR software is not:

  • digital-library software
  • research-data management software
  • records-management software
  • digital-preservation software

Worst of all, IR software does not play well with others. It wonʼt play with your courseware, or your content-management systems, or your catalog, or really almost anything you can imagine youʼd want it to play with. IR software is profoundly antisocial. This is a huge problem, but you canʼt just wish it away—you have to plan around it!

You may have noticed how often “management” comes up in these descriptions of what IR software is not. This is something else a lot of early IR adopters didnʼt understand: most IR software assumes that what youʼre putting in is the final, immutable version of something. Funny thing about that: most people donʼt think they need a whole service just to manage the final version of something digital! To manage digital things while work is ongoing, that they need, but IRs wonʼt do it.

Interesting hybrids

That said, there are some interesting software hybrids out there now. Digital Commons is an IR/e-journal publishing system hybrid, for example. But in my book, the better hybrids are coming out of Fedora Commons. Fedora Commons is an open-source software package that provides a lot of the underlying engineering you need for a useful, reliable repository. What Fedora doesn’t provide is all the user-interface chrome on top. So that opens up the intriguing possibility of putting all kinds of different user interfaces on top, both for getting stuff into the repository and for using stuff once itʼs there.

And thatʼs what these hybrid systems are doing. Islandora, for example, marries Fedora with the open-source content-management system Drupal, which is incredibly flexible and powerful. Hydra is an attempt to build a content-management stack from the ground up atop Fedora; “Hydrangea” is its first demo system. And the University of Virginia Scholarʼs Lab is connecting up the online exhibition software Omeka to Fedora, via an Omeka plugin theyʼre calling FedoraConnector. There are lots more of these hybrids; RUCore from Rutgers is another great one to look at.

So fairly soon, I think, you might be able to have and eat a few more cakes. I hope, anyway. I do think this style of hybridization is the right way to go, whether your underlying base is Fedora Commons or curation microservices or iRODS or something homegrown. Silos are awful.

Lesson Eight: Choose your platform last.

“What software (or service) will we use?” is often the first question someone asks in an IR-planning process. This is completely counterproductive; please donʼt do it! You must know what youʼll be collecting first, and how, and how youʼre going to assess success, so that you can gauge how different platforms fit what youʼre trying to do. If you donʼt choose the platform to fit the content and the workflows, the content and the workflows will be constrained by your choice of platform! This is not what you want!

There are a lot of DSpace-based IRs in the States. Iʼve been running DSpace IRs for six years, and I honestly donʼt understand its broad adoption. I have to figure that planning processes chose software first instead of last, and I also have to guess that “nobody ever got fired for choosing DSpace.” Well, maybe somebody should have been!

Lesson Nine: Plan to assess your IR honestly and fairly, understanding the challenges inherent in your goals.

All this talk about software affordances is really about assessment, which is an important thing to build explicitly into your planning. People have asked me for years, “where are the successful IRs?” “What does a successful IR look like?” I ask back, and oddly enough, nobody ever has an answer.

You have to establish your success conditions up-front. I canʼt tell you how many IRs never do that. I can tell you what not doing that leads to: a lot of confusion among librarians and potential IR users about what the IR is there for, IRs that arenʼt as successful as they could be if it was clearer what theyʼre supposed to accomplish, and a lot of hideously painful uncertainty for IR managers—unclear success conditions are a great way to abuse your IR manager.

So once you have your success conditions, figure out when and how youʼll assess them, and think about what youʼll do if your goals arenʼt met. And be fair about it! If you decide youʼre running a preservation service, itʼs not fair to assess on collection quality. If youʼre running an advocacy effort with an IR as backstop, itʼs not fair to assess solely on collection size (though if you have a big collection of quality stuff, clearly your advocacy is getting somewhere). If youʼre focused on building a quality collection, whatever your definition of “quality” happens to be, itʼs not fair to complain that the IR isnʼt solving everybodyʼs digital-preservation problems.

Time for a recap.

Lessons

  1. It’s fine not to have an IR.
  2. “Having an IR” is a non-goal. If you want an IR, be clear about why.
  3. Open-access advocacy is a separate question from IR planning. Do justice to the intricacies of both.
  4. IRs take work. Anybody who says otherwise is selling something. Don’t buy.
  5. A one-person IR is (or will soon be) a failing IR. Don’t throw a colleague to the wolves!
  6. IRs are no substitute for proper records-management software and services.
  7. An IR is not by itself a digital-preservation or e-publishing system.
  8. Choose your platform last. Choose what fits your goals and workflows, current and future.
  9. Plan to assess your IR honestly and fairly, understanding the challenges inherent in your goals.

All this leads to my final lesson.

Lesson Ten: Don’t make the mistakes we did.

I didnʼt come out of library school in 2005 to run my first IR knowing all this. Nobody in libraries knew all this then; we were still experimenting and learning. So these lessons and others like them werenʼt even in the library literature then! But they are now, so donʼt stick your head underwater and ignore them! Read the literature thatʼs out there now. Ask people who have been there and done that and have the scars to show for it. If youʼre not reading Carol Hixson and Denise Troll Covey, what is wrong with you?

Then go do better; my own example is a very low bar. I did just about nothing but fail for six very long and frankly agonizing years of this running-IRs business. You can beat me! All of you! So go plan your IR right, and show me up right and proper.

Let’s try the Osprey Way.

Have you ever seen an osprey hunt? Itʼs fabulous. They survey the whole lake from on high until they see something they know they want, and then they hurtle down and nab it with a huge splash. I think IRs need to be more like ospreys than herons. Decide what you want with your eye on the big picture, and seek it out, wherever itʼs hiding. Make the effort to dive for it. And grab it—donʼt wait for it to come to you.

Uncategorized