OSINTing a library’s privacy practices

When I’ve taught ILSes or databases in the iSchool, I’ve always been pretty clear about How Things Work vis-à-vis circulation privacy. A circulation transaction is a row in a bridge/junction/associative table (dear database community: standardize your jargon, please, love, me), connecting a patron with something they’re checking out. Circulation assessment amounts to a checkout-count column in the item table, with one added to an item’s count each time it is checked out. (More or less. In-house use is often also tabulated, to the extent possible and with no attempt to record who actually plucked the item off the shelf.) When the item comes back in decent condition and fee-less, the row in the bridge table gets deleted, while the count is left alone. Simple. Data minimized. As private as the transaction can reasonably be while still allowing the library to keep tabs on its materials and measure what’s circulating and what isn’t.

I’m more than a little ashamed at having taught from such a state of naïveté, honestly. The evidence that the General Library System (GLS) at my institution was not handling circulation in this way was right there if I’d cared to go looking for it. What I’ll do now—again, with thanks to my spring 2021 LIS 510 students—is give you some idea where to go looking if you’re curious about a library’s stated practices.

OSINT stands for “open source intelligence” and is a bit of information-security jargon. It refers to researching a target without actually breaking into or even interacting with the target’s systems or people. It includes (but is not limited to) clever search-engine use, fossicking around on social media, and taking advantage of web-wide scanning services like Shodan. I teach a few extremely basic OSINT techniques (mostly Google dorking) in LIS 510.

Privacy policy

I mean, yes, this is the obvious place to start! Where this exists, it should be linked from the library’s home page, from the policies page if there is one, or from the About the Library page. My favorite library privacy policy is San Francisco Public Library’s. It’s strong, non-obfuscatory, and written with admirable precision and clarity.

In my institution’s libraries’ case, there is no overarching privacy policy. None whatever. Not at the UW-Madison level, not at the Council of UW Libraries level. So. There that is.

The sole privacy policy I know of for any library unit belongs to the UW Digital Collections Center and applies only to the digitized collections, finding aids, institutional repository, and other odds and ends under their purview. It, too, is short, clear, and reasonable, though a tad bit loophole-prone. I should mention that although I worked for UWDCC for some years, I had nothing to do with that policy. I’m not praising it because I helped write or institute it—I didn’t!

Applicable law

In the US, most states have laws pertaining to the confidentiality and/or privacy of library circulation records. Depending on the state, these laws may apply to public libraries only or both public and academic libraries; in Wisconsin it applies to any library accepting public (state) funding. Unfortunately, most such laws are too old to cover electronic-resource transactions. ALA has a helpful page listing these laws.

Wisconsin’s law is actually not great. If a library has “surveillance devices” anywhere, the law as I read it (I am not a lawyer and this is not legal advice!) allows law enforcement to stroll in and ask for pretty much whatever they want from them whenever they want without a court order; the only limiter is that the alleged crime has to have happened on (physical?) library territory. I really, really don’t love that! Wisconsin libraries, please don’t install surveillance devices!

As for circulation records, the law prohibits disclosure under most circumstances (there’s some stuff about minors that mostly won’t be relevant to an academic library), but there’s a loophole I could drive a sixteen-wheeler through: “persons acting within the scope of their duties in the administration of the library or library system” may disclose patron identities as they see fit. So if GLS employees take it into their heads to dump identified circulation records willy-nilly into a campus data warehouse for use by all and sundry, my reading of Wisconsin law says that I, as a patron viscerally infuriated by this notion, wouldn’t have a legal leg to stand on in opposing it.

I really, really don’t love this law. Go find yours and see if you love it any better.

Records-management schedule

Records managers (more recently, “information/data governance” folks) are the people who ensure that documents, forms, emails, and all the other bureaucratic detritus that organizations deal with are kept as long as legally and business-process-ly necessary and no longer. (Outdated records can be a legal liability; they also take up space unnecessarily. You want them gone.)

The first step in getting records under control is scheduling them—that is, figuring out how long each class of records should be kept. These decisions are documented in a records schedule, logically enough, and that’s a good search term to know.

In my case, the records schedules that industrious LIS 510 students found were absolutely the smoking gun. Here they are for your perusal, but I’ll quote the two immediately-relevant ones in full. First, UWLIB122:

UWLIB122 Circulation Records

Records documenting the borrowing of circulating library materials by qualified patrons. This series may include but is not limited to: bibliographic information of item, the name and identification of the borrower; the titles of materials borrowed; the length of time borrowed; the due date; overdue and fine payment notations; and related documentation and correspondence.

Circulation records are kept for the duration of a patron’s status as an authorized borrower as a courtesy to patrons interested in their borrowing history. Circulation records are handled by the Alma integrated library system. Alma migrates circulation records into its analytics database. These records are scheduled under UWLIB147 for longer retention.

Like, when were the libraries going to tell me, in some more prominent way than this bit of buried (though necessary) bureaucratese, that they were keeping my circ records “as a courtesy” to me? Being a trained and well-ethicked-up librarian myself, this absolutely violates my privacy expectations, detailed at the beginning of this post! I would love to write a piece with Helen Nissenbaum about that! (I would love to write anything with Helen Nissenbaum, let’s be clear here.) If I don’t consider this record retention a courtesy (and I very much don’t), what exactly is my route to getting the records deleted? There’s nothing I can find on the library website about it!

Oh. Right. Yeah. No privacy policy. Hm.

Moving on to the abovementioned UWLIB147:

UWLIB147 New Analytics Records

Records of both incomplete and completed circulation and interlending actions created by the library automated system (Alma) that are migrated into a separate database for purpose of statistical analysis. The records include patron data as well as bibliographic. Alma Analytics records are kept 10 years to provide better collection security for special collections materials.

New schedule added due to new functionality of the new library automated system (Alma)

I will have more to say about Alma Analytics, but I’m still doing my homework on it. Suffice to say that the idea that identified circulation records are sitting around for ten years for any reason makes my head explode. That they’re doing it in service of faddish, dubiously ethical, unproven (and where tested, largely ineffectual) Big Data practices makes my head vaporize with fury.

This is wrong and it should not be happening. Have we forgotten Henry Melnek? Have we forgotten the Connecticut Four? What is wrong with us, librarians?

Anyway. So. Let’s examine the records I released of my own borrowing behavior under these schedules. As I noted in the README, there are circ records on me all the way back to 2002. I can’t make that work under UWLIB122, because in 2005 I graduated from the UW-Madison iSchool (then SLIS) and moved away to take a job at George Mason University. From then to March 2007, I had no ties to UW-Madison except as alumna. As I read UWLIB122, my circ records should have been deleted in 2005. They weren’t. They still haven’t been. So the General Library System sure looks to me to be in violation of their own records schedule. Maybe the schedule was different in the pre-Alma Voyager days, but I don’t care. Once UWLIB122 became the working schedule, deletions should have happened in accordance with it.

I don’t appreciate that my circ records go so far back—I’m quite angry about it, actually—and I don’t appreciate that the libraries’ own records schedule is not reflected in its practices. This is not okay. (I’ll have more personal reflections on this later, but I’m trying, however unsuccessfully, to keep this post more nuts-and-bolts-ish.)

As for UWLIB147, it is extraordinarily mendacious and it infuriates me. Yes, as Lisa Hinchliffe recently brought back to my attention while we were discussing the NASIG panel we were both on, there actually are legitimate operational reasons special collections departments keep patron records a long time. People steal and deface unique, rare, and valuable materials and they need to be held to account for it. Using that as a cheap rationalization to retain all circulation records for ten whole entire years, however, is in my view the grossest of absolute garbage.

Trust me, the Alma system knows which materials belong in special collections; that’s one thing catalogers are for and the GLS has many excellent catalogers. “But we can’t limit extended records retention only to special-collections patrons, so we have to keep everything!” is therefore also garbage. Even if Alma doesn’t have this functionality built in—and I have to think it must; practically every library system has weird library- or collection-specific circulation rules somewhere or other, so Ex Libris must have built that—it would be pretty easy to dump special-collections records out of the system to retain them longer.

Campus data policies

Whoof. These are a nightmare on my campus, not least because the idea that data has to be properly managed and people have to be accountable for that is pretty new. (That’s hardly unique to UW. It’s the base state most places, I think.) Here is your starting place if you have plenty of time free.

What is there in that morass that touches library records? Pretty much nothing. If you’re interested in the learning-management system, however, as good student-privacy advocates certainly should be, there’s plenty you can learn, and it will likely not thrill you.

These kinds of policy warehouses usually have a small taxonomy of data sensitivity that they shoehorn data into. Find that first, because you’ll need it to understand the rest of the available documentation.

And that’s a wrap

That’s plenty to get you going. Happy OSINTing—I hope it’s happier than my students’ turned out to be.