Library logs and library privacy

In LIS 510 (“Human Factors in Information Security” these days), a student responding to course content about typical web surveillance and its impact on security asked whether logs were also a security and privacy issue.

I went a bit feral in my answer, and I thought you might enjoy it:

Yes, logs can be and are used in behavioral tracking. They are often overshadowed these days by web-bug-style tracking because of its near-real-time “advantages,” ease of aggregation and reidentification (which is super-big for cross-organization, cross-website tracking, of course), and configurability, but logs have gone absolutely NOWHERE and they’re dangerous as hell.

This is such a big topic that I need a minute to figure out how I’m going to approach it… okay, I’ll use a library example because I’m familiar with it and it’s Becoming A Thing in today’s environment of widespread attempts at censorship.

This article came out the other day on ed-tech surveillance outing LGBTQ+ students and putting them at tremendous risk. (“Remote learning” here is a bit of a red herring, honestly. PRETTY MUCH ALL ED TECH IS CAPABLE OF THIS KIND OF THING, as is any kind of “personalization” tech.) Notice as you read that there isn’t actually a whole lot of data aggregation across multiple tools or organizations happening here — one tool on the laptop the student is using is enough. That’s the power of logs in a nutshell.

Logs aren’t a new digital idea. Bricks-and-mortar libraries have had them for ages, in the form of records of who checked out what when—library cards. These are obviously operationally necessary for the library if we assume that “books walking out the door forever” isn’t an acceptable outcome. That said, once a book is back at the library, the checkout record is no longer operationally necessary and can be destroyed, and that’s been considered best practice for a long time (until recently; hold that thought).

Using logs to get people in trouble isn’t new either! The American Library Association committed to confidentiality of checkout records way back in 1939 because there had already been court cases where someone’s reading was used to accuse them of being a garbage human. (This article is absolutely excellent, MA/LIS folks and anyone else interested in this history.)

Is it possible to track materials use without tracking who used what when, as logs typically do? Absolutely. It’s called “counting.” Libraries, for example, can and often do count how many times a given book was checked out without the least heed to who did the checking-out. If any of you have been aides or shelvers in libraries, you may also have been asked to record which books were off the shelf (on tables or whatever) as you put them back—that’s also counting use.

Is it possible to keep attributed records of materials use for a while and then de-attribute or even delete them? Sure, and it’s a good idea. Predictably, de-attribution is not as simple as “removing names”—in most digital logs, you also need to ditch at minimum IP addresses and exact timestamps, and any trails of “this person looked at THIS thing and then THAT thing and then THIS OTHER thing” should be fully broken. This kind of work is called data minimization and for libraries it ought to be table stakes.

Yet a couple years ago, two student groups in this very course discovered that University of Wisconsin libraries have been keeping fully-identified checkout records for ten whole entire years. My jaw hit the floor, and I put together an open-records request to see how much they still had on me (since I’ve been around—off and on, one way and another—since 1993).

It’s a lot, folks. It’s way too much—not just ten years of my checkouts, but TWENTY. I’m pretty pissed off about it. And yeah, I wrote some stuff modeling the threats here.

Back to where we started—does this practice potentially put vulnerable library patrons at risk? ABSO-FREAKIN’-LUTELY it does. I don’t trust the present environment not to either go straight to Ex Libris (developers of Alma, the library system in question) or simply commandeer the servers.

But Dorothea, you may well say, there aren’t many present-day stories of libraries getting used against patrons, so isn’t this a movie-plot threat? Legitimate question. Here’s what I think. Libraries in the 20th and early 21st centuries established ourselves as a difficult target—we were obstructionist, we said “go get a warrant,” we minimized checkout records, we refused to go to court to testify against our patrons, we even told the Patriot Act to go take a hike.

I think we are becoming softer targets, largely because of the (highly unethical, in my opinion) turn toward “assessment,” “customer-relationship management,” and various sorts of analytics, including learning analytics. (Surveillance by vendors is also a Whole Dang Thing, but it’s a little orthogonal to this discussion, so I’m bypassing it. Read this piece I wrote if you want to think about it.) I think it’s only a matter of time before law enforcement and/or hate groups try again and discover how soft we’ve gotten. At that point, all hell breaks loose.

I hope not. But that’s what I’m presently betting. DON’T KEEP LOGS IF YOU DON’T HAVE TO. IF YOU DO HAVE TO, DELETE THEM AS SOON AS YOU CAN. Thus endeth my catechism.