Patrons schooling librarians on privacy

My big-data ethics course is underway; I’m quite enjoying the changes it’s making in how I evaluate what I read. (“Deontologist,” I muttered to myself while bookmarking another ethics-in-AI piece the other day.) It also explains why Tattle Tape’s been a bit quiet lately—with that, Data Doubles work, RADD work, and revising and Canvas-ing my other summer course, I have no time to breathe.

Still, some things I can’t miss posting about and still be me, so. Yesterday a group of citizens of Santa Cruz, California put out an amazing document detailing how the Santa Cruz Public Library bought into a surveillance-as-a-service deal from Gale, and how that deal stomped all over patron privacy.

Let me say this again, a little louder: a group of public-library patrons absolutely schooled their library on privacy. What has happened to my profession. What.

Some librarians there can be proud of themselves: the ones who, in the words of the report, “voiced concerns about patron privacy.” Thank you and well done, SCPL librarians who spoke up. Those SCPL librarians and/or administrators who overrode those concerns should do some serious soul-searching. Y’all messed entirely up, and you are being called to account for it; the citizen group is a local “Grand Jury” and they have the authority to require a response from SCPL library top brass.

To that top brass I say: admit your fault, apologize sincerely, dump Gale right back into the filthy surveillance-capitalism abyss whence it came, and copy out the ALA Library Bill of Rights one hundred times longhand in full public view. I don’t want to hear any empty platitudes or who-could-have-knowns out of you—and more importantly, that grand jury doesn’t want to hear that either.

In my time pushing privacy, I’ve seen some librarians say that we can be trusted with patron data because we care about privacy and ethics—we’re the good guys. To that I say what I said to my big-data ethics students: no one can be intrinsically good, or indeed be good at all. We can only do good—or not. We librarians only care about privacy and ethics insofar as we put that caring into action.

SCPL didn’t do that. I’m so glad SCPL’s patrons called out the problem so cogently and effectively—and I am bitterly sorry and ashamed that they had to.

Triumph, RomAn-21-style

It is pretty nice to see librarians step up. That’s what just happened with our good buddy RA21. The chariot awaits, everybody. Climb in!

But I’ll be the whisperer behind the chariot: This probably isn’t over. I mean, RA21 will certainly go back to the drawing board, and a fittingly-embarrassed NISO will be at least a tiny bit more careful not to be so obviously vendor-captured next time…

… but the STM Association is a hard nut to crack. I’ve tangled with them before, and watched others tangle with them as well. They’re persistent, they don’t subscribe to library ethics, and they use every dirty trick in the book and a few outside it.

I’m less worried than I was, since my chief fear was that the STM Association with NISO as convenient clueless patsy would sneak RA21 past libraries unopposed. I’m now confident that won’t happen. What will happen is hard to guess—the standards space is labyrinthine and the STM Association knows it pretty well, so finding or even making another, quieter patsy is hardly off the table. We’ll just have to wait and see.

Unizin Not-Common-Knowledge Data Model

I’m doing a talk this Tuesday for a campus IT conference. Should be a good time, for certain values of that phrase. I’ll post a link to the slides here afterwards.

While writing the talk—I’m one of those dorks who does script out talks word-for-word, though I do it in the slangy, choppy rhetorical style I actually talk in; academese is not my speech register and I don’t pretend it is—I ran across the Unizin Common Data Model, which if I understand correctly underlies the giant data warehouse for student data called the Unizin Data Platform. This will hold data from all Unizin member institutions.

To Unizin’s credit, they have a data dictionary publicly available, though every time I’ve tried to get just the table listing (or ERD?) it hasn’t worked. Still, the list of column/attribute names is there, and this list is a swift and daunting education in student Big Data.

See for yourself by all means, but here are some specific areas of the table I suggest looking at:

  • Everything from the Incident and IncidentPerson tables (conveniently, the table name is the first column in the data dictionary and is how the dictionary is ordered), especially the RelatedToDisabilityManifestationInd column
  • the LearnerAction and LearnerActivity tables, noting for the record that hashing the LearnerID is not anything like a sufficient privacy guarantee
  • the Person table and related tables, which are detailed to an extent that gives me nightmares

Have fun asking yourself why on earth a learning-management system needs to know all this… and considering the Equifax-level horror if there is ever a breach in it.

Kanopy and Elsevier: united in password mishandling?

My introductory information-security course contains both undergraduates and iSchool graduate students. Every once in a while I get to drop in a library- or archives-specific tidbit, and today (the first class meeting after Spring Break), I had two among all the other news:

Shortly after the Kanopy breach broke, Jessamyn West passed on a very important question from Dan Turkel to Kanopy on Twitter: “Are you [Kanopy] storing user passwords in plaintext?”

Let’s back up and examine that question a moment, shall we?

“Plaintext” is information-security jargon for “not encrypted.” “Encrypted,” for our purposes, means “changed such that the original data cannot easily (or ideally at all) be figured out.” So, when Elsevier actually broadcast passwords in plaintext to all and sundry via some web dashboard, it disobeyed one of the fundamental best practices in infosec. If Kanopy was storing its passwords in plaintext, that’s just as bad.

(How do you know if a user’s password is correct, if you can’t store it figure-outably? Well, you know exactly how you changed it. When the user enters their password, you just change it the same way you originally changed the stored password, at which point you can compare the results.)

Nobody is supposed to store passwords in plaintext! Ever! Much less broadcast them in plaintext to all and sundry on a web dashboard! (What you are supposed to do with them is… complicated, and keeps changing as password-cracking software and hardware improves. Check with your favorite infosec expert, okay? And consider multi-factor authentication.) So what Turkel was asking Kanopy boils down to “okay, you were caught being careless; exactly how careless were you?”

Kanopy never answered, at least not on Twitter. This… does not exactly inspire confidence. Nor has Elsevier’s post-incident public relations on Twitter, which as best I can tell has substantially amounted to “it wasn’t that bad!” “everybody else has breaches too!” and similar sad, disingenuous deflections of responsibility. There are best practices in handling security incidents—perhaps unsurprisingly, infosec refers to them by the term “incident response.” These are not them.

I hope to have more to say about incident response in time, because it’s a thing more libraries will find themselves stuck doing—including when our vendors should but don’t—and the first step is always “have a plan for it.”