L’affaire C4LJ: what is privacy, actually?

I spent one of my summers home from college doing filing for the health center where my mother worked as a nurse. (Nepotism? Yeah. Was I being paid less than I was worth? Also yeah. Win-win, after a fashion.) Before I so much as touched a patient file, I signed a fearsome confidentiality agreement, and had it explained in detail by my supervisor (who was not my mother).

The gist of it was that neither the health center nor I could do our jobs without seeing health information, some of it information that the patients very much needed and deserved to keep to themselves. It was everyone’s absolute responsibility to use that information only as absolutely necessary to carry out health care for the patients, and otherwise, to put that information in a little box in our brains and never let it out. Not ever. Not for any reason. Not even if I could jog another carer’s memory about a patient based on seeing a file recently—you never know who’s listening. Never.

So I didn’t. Never have. (These days, I couldn’t if I tried—I honestly don’t remember a thing from those files.)

Differences in regulatory environment aside (and those differences are significant), libraries and health care share the ethical tightrope of needing to create and (sometimes) access intensely private information without divulging it unnecessarily. Libraries and health care also share the desire, often pressed on them by third parties, to stretch the definition of “necessary,” sometimes well past reason or ethics.

That leaves the question of what is “necessary.” Now, I like heuristics, as you’ll know if you’ve read my Physical-Equivalent Privacy piece. I can’t and don’t expect librarians to walk through a complicated hair-splitting trolley-problem–style ethics exercise every single time there’s an ethical question to answer. Life, and work days, are too short for that.

So here’s my necessity heuristic vis-à-vis patron data: if the data use is not for clear, proven, and above all direct patient/patron benefit, or for an internal function without which the organization could not run, don’t touch the data. Any data use beyond that requires at minimum notification and consent—and I do not mean by that some sleazy overarching click-through; I mean actual informed, specific, refusable, revocable consent.

Or, shorter: if you don’t absolutely have to, don’t.

The code4lib journal piece at issue fails every piece of my heuristic. There is no direct benefit to the patrons whose data was collected and analyzed—“better collections” is an indirect benefit at best, even were it proven (and this article is not sufficient proof). Without this analysis, collection development would continue unabated, so it is not required for the library to keep running. There was no notification to and no consent from patrons, genuine or otherwise.

But that’s my heuristic, which I can’t expect to impose on everyone everywhere. (Feel free to poke holes in it.) Let’s try a couple more privacy analyses: one based on Physical-Equivalent Privacy, and one based on Nissenbaum et al.’s Contextual Integrity Theory.

Physical-equivalent privacy analyses ask the analyst to estimate the kind and amount of surveillance required to amass and disseminate the same type and amount of data collected by online means if the information packages in question were physical rather than online. Based on how squicky (technical term!) the physical surveillance feels, we can then judge how squicky the electronic surveillance should feel. For the code4lib journal article author to pull off the same analysis on use of physical bound volumes and abstracting/indexing sources, he would have to install video cameras throughout the stacks and reference area, identify each and every patron caught on camera, and make a note of every page of every volume consulted, alongside the name of the patron who consulted it. To emulate the (appalling, Briney-excoriation-worthy) security practices in evidence, the notes and videotapes, deidentified in exactly no way at all and not even close to anonymous, should live in a flimsily-locked (if locked at all) file cabinet in (I am guessing, but it seems a reasonable guess that the spreadsheets are on the author’s computer) the author’s sometimes-unlocked office, findable by any malefactor willing to put in some effort.

Does this seem okay to you? Would you run to implement this in your library? It surely doesn’t to me, and I surely wouldn’t. So in my estimation, the article’s data collection and handling fails the physical-equivalent privacy heuristic.

A contextual-integrity analysis asks the analyst to evaluate the putative privacy harm of a given information flow against apropos social norms and along five axes:

  • Data sender
  • Data subject
  • Data recipient
  • Information type
  • Transmission principle

“Apropos social norms” is legitimately a tough one to call, partly because of what Nissenbaum calls in Privacy in Context “the tyranny of the normal:” the tendency for social norms (especially nowadays) to evolve in ever-more-surveillance-ridden directions as people either don’t understand enough about them or are too worn down to protest them. I’m rock-solid sure that code4lib journal could dig up plenty of patrons who would say they’re totally okay with the article methods as published! Also, of course, plenty who wouldn’t—the dicey situation of social-media research (see e.g.) is relevant here.

As I read this situation, however, we’re evaluating the information flows in this article not against patron norms, but librarian norms. (This is exactly how Nissenbaum recommends dealing with the tyranny of the normal, of course: figure out who you are and what your ethics are and abide by that.) Those norms are set out in our codes of ethics and interpretations thereof, though as I mention above, in practice they have been steadily and unethically eroding.

For me, the contextual-integrity problems here live in all five axes, which makes for something of a tiring analysis, but oh well, here goes:

  • Data sender: Librarians are not exempt from the injunction to keep patron data private. Indeed, our ethics codes exist as a form of self-regulation! It’s not okay “because it’s us doing it.” It’s not acceptable for a librarian to abuse access to private and confidential patron data for research purposes.
  • Data subject: Patrons—all patrons—are entitled to the privacy and confidentiality of any data regarding information sources they consult. (I think they’re entitled to more, myself—ideally we would codify the duty to keep data about library and non-information-resource library-service use private and confidential as well—but at present our ethics codes don’t go there, so I won’t either.)
  • Data recipient: Again, it’s not acceptable for a librarian to use private and confidential data for research without notice to and genuine consent from all patrons involved.
  • Information type: Records of information resources that patrons use are privileged under library-ethics codes. They’re not supposed to be used except (as I suggest above) for direct benefit to patrons or for absolutely necessary organizational functions (“research” and “collection assessment” being nice, but not strictly necessary).
  • Transmission principle: Again, this is where notice and (genuine) consent come into it. Without those, this data flow isn’t okay under our ethics codes.

I can’t make this article feel okay with a contextual-integrity analysis either. If you can, me not being the world’s greatest contextual-integrity expert, I’d be interested to hear.

Given the analyses I’ve made here, I stand by my belief that both the article and the editorial attempting to justify its publication merit retraction on grounds of unethical data collection and use, and unacceptably careless data analysis methods, storage, and security.