L’affaire C4LJ: policies and processes

I am delighted not to have to do a technosocial rundown of everything that’s wrong with the contested code4lib journal article. Becky Yoose’s comment on it does a fine job of that (though I’d like to see the k-anonymity/l-diversity analysis, myself, because I want to learn how to do those properly). Being a bit of a process and organizational-behavior wonk, I want instead to look at the policy and process problems that brought about this unacceptable result, and how they can be improved—not just at this specific journal, but throughout the LIS literature.

It’s staringly obvious that “ask a privacy expert for a volunteer last-minute ethics review after the paper has already been accepted; then ignore their review, publish the article anyway, and write a self-justifying editorial with a lot of low-wattage rationalizations and poorly-thought-through technobabble; then have part of the editorial board creep on and tone-police the Twitter discussion while whinging about the editors being volunteers, as though practically all journal editors in LIS weren’t volunteers” was the wrong way to go about this. What the right way might be, however, is considerably less obvious. So let’s think that one through.

I think Rule One is that no journal gets to be surprised by ethics snafus more than once. Arguably after this fiasco no LIS journal gets to be surprised by this particular class of privacy-ethics snafu ever! And as Becky has noted, code4lib journal failed this rule—this is the second piece with dubiously-ethical-if-ethical-at-all data collection and handling processes to have been published there and immediately called out by library-privacy folks.

There simply has to be a research-ethics policy for any LIS journal that attracts (or may attract) material that is ethically problematic by library standards—honestly, that’s all LIS journals everywhere. In my head, such a policy explains as clearly as possible what’s a no-go (recognizing that being clear about this is hard, of course; people do come up with near-infinite ways to exploit and abuse other people in the name of research) and lays out a binding ethics-review process for at least the articles that trip editors’ or reviewers’ sensors as potentially ethically problematic.

“Binding” has to mean “an article that fails ethics review is rejected.” Ethics review isn’t the same as many of the characteristics peer reviewing classically looks for (originality, methodological correctness, importance, writing quality, and so on)—those can sometimes be ameliorated with a revise-and-resubmit. Ethical failure has to mean a rejection, though. The alternative is—well, it’s publishing something that’s ethically wrong, which is an ethically wrong (and potentially career-damaging for all concerned) thing to do! As Becky wrote this week, publishing ethically-tarnished work creates an information hazard by implying that ethics are disposable, to be broken without consequence.

But my head is not everyone’s (on balance, probably a good thing), and the journal portal: Libraries and the Academy is taking a different approach. (Bias alert: I’ve co-authored with this piece’s author, Kyle Jones, before. He is also PI on the Data Doubles research project I have been contributing to. It’s been a privilege and a pleasure working with Kyle—yet I think he would be the first to tell you that although I respect him highly, I am not his quisling; I disagree with him frequently, as he with me.) I want to reproduce one short paragraph from the piece before I discuss that approach, because the paragraph is just so good, I wish I had written it, and it is absolutely apropos to the code4lib journal situation:

Journals are not—or at least should not be—disinterested publishers. They have a responsibility in curating conversations on particular topics, and they also set expectations for structure, inclusive language, quality standards, and research ethics. The guidelines a publication establishes for potential authors signal its values and beliefs, and contributors must adhere to those principles should they choose to publish with a particular journal.

This is Rule Two, I think, this right here. The ethics buck stops with the editorial board. It’s their names on the masthead and About page, them getting the service credit and the résumé/CV line. It’s them setting journal policy. It’s them choosing reviewers and guest editors, and (for many LIS journals) making publish-or-not decisions. Pushing off responsibility with an airy Wikipedian “if you care so much, come fix it yourself!” is just not on (never mind that coming in to try to fix it because she cares is exactly what Becky Yoose did vis-à-vis the contested article, to zero effect). Your journal, your responsibility what’s in it. Not mine. Not Becky’s. Not even article authors’, really. Yours.

Right, back to portal’s policy statement. In essence, it’s all about expectations: stating what they expect from authors by way of privacy-ethics discussion inside learning-analytics pieces that authors submit there, and placing responsibility on reviewers and editors to make publishing decisions consonant with ethics. I like this a lot! It is good, and should be a clear enough signal to authors lacking in ethics that it will save journal staff from having to read (not to mention having no backing to reject) a lot of ethically-bankrupt dreck!

I’m… not thrilled that it only addresses learning analytics, though. Certainly that’s a dangerous ethics pain point in the LIS literature presently; those who know me well may correctly suspect that I am holding back a veritable flood of furious words about this. The contested code4lib journal article and my discoveries about University of Wisconsin circulation-record retention, though, say pretty clearly that LIS’s privacy-ethics problem is much larger than learning analytics. In fairness, portal is the first journal I know of to codify expectations with respect to privacy ethics at all; I don’t blame them one bit for wanting to start small, specific, and clear. I do think they’ll find themselves in more ethics quagmires, ones they might have avoided had this statement been broader.

I’m also not sure the statement is enough. If I were on portal’s editorial board, I’d want a clear internal process document to back this up. There may be one under development, of course! Nitty-gritty process stuff is insider-baseball enough that I wouldn’t expect a journal to make it public, though I think that given portal’s status as cutting-edge leader on this, doing so would be a service to all of LIS. I would want such a process document to answer the following questions:

  • How should an editor or reviewer handle a piece that may go against journal ethics policy? (Can an editor desk-reject? With or without consulting other editors, or the board?) How should authors of pieces rejected on the basis of poor ethics be notified of the reason for rejection? (I don’t think “not telling them” is an ethical option here.)
  • For pieces that need an ethics review, how does the editor go about getting one? Who’s eligible to review for this, and on what basis? (Becky certainly can’t do it all!) I don’t hate the idea of reserving one spot on a journal’s editorial board for an ethics watchdog, frankly; just be sure to set standards for the position such that foxes don’t end up watching the henhouse.
  • What is an ethics review actually supposed to accomplish? Just like peer review, there needs to be a form for this, and it needs to accord with the editorial board’s policy on go/revise/no-go. (Unlike regular peer review, “revise” should be a vanishingly rare choice, for reasons discussed above.)
  • For journals adopting a developmental-editing process (as code4lib journal does, I am given to understand), when does ethics review come into it? The earlier, the better: wasting a researcher’s effort on something that then fails ethics review is not great, and privacy-ethics harms often happen (as they did in the contested article) quite early in the research, at time of data collection. Can prospective authors request ethics review of their methods up-front, IRB-style? (I think that’d be a great idea LIS-wide, and I’d happily participate in reviewing.) How does that work, then?
  • What are the criteria for determining whether the ethics are so bad that offices of research integrity, grant funders, or other authorities need to be notified? When notification is judged necessary, what is the process for doing so? I doubt this is super-likely in LIS, but stranger things have happened, and (as I keep saying) LIS research and assessment ethics are deteriorating and likely haven’t hit bottom.
  • What’s the process if this process fails such that an unethical article slips through it? How does retraction work, and how should a retraction notice read? What’s the external communications procedure? Nobody loves “incident response” (as infosec calls it), but everybody needs a process for it, if only for peace of mind and avoiding off-the-cuff wildcat responses that make incidents even worse. An incident-response process should also include a post-mortem aimed at necessary policy and process revisions—nobody gets these completely right the first time.

I have very likely not thought of everything here! Please refine this, everyone, and COPE is likely to have useful advice I didn’t think of.

For all that code4lib journal’s editorial board bumbled this situation in almost every way they could have, I want to say (as someone who has also seriously bumbled stuff yet lived to tell the tale) that there’s an opportunity for redemption here—not only redemption, even, but helping establish ethics standards for the rest of LIS publishing, as portal is doing.

I think that opportunity looks a lot like retracting the article and editorial with a public apology (especially though not exclusively to Becky Yoose), then doing the policy and process work I have laid out here. I’m not the right person to help, given my history with code4lib—I can’t trust them and they can’t trust me—but if they do it, especially if they do it well and sincerely instead of as a mere butt-covering exercise, I will absolutely cheer them on from the sidelines. LIS needs this.

L’affaire C4LJ: what is privacy, actually?

I spent one of my summers home from college doing filing for the health center where my mother worked as a nurse. (Nepotism? Yeah. Was I being paid less than I was worth? Also yeah. Win-win, after a fashion.) Before I so much as touched a patient file, I signed a fearsome confidentiality agreement, and had it explained in detail by my supervisor (who was not my mother).

The gist of it was that neither the health center nor I could do our jobs without seeing health information, some of it information that the patients very much needed and deserved to keep to themselves. It was everyone’s absolute responsibility to use that information only as absolutely necessary to carry out health care for the patients, and otherwise, to put that information in a little box in our brains and never let it out. Not ever. Not for any reason. Not even if I could jog another carer’s memory about a patient based on seeing a file recently—you never know who’s listening. Never.

So I didn’t. Never have. (These days, I couldn’t if I tried—I honestly don’t remember a thing from those files.)

Differences in regulatory environment aside (and those differences are significant), libraries and health care share the ethical tightrope of needing to create and (sometimes) access intensely private information without divulging it unnecessarily. Libraries and health care also share the desire, often pressed on them by third parties, to stretch the definition of “necessary,” sometimes well past reason or ethics.

That leaves the question of what is “necessary.” Now, I like heuristics, as you’ll know if you’ve read my Physical-Equivalent Privacy piece. I can’t and don’t expect librarians to walk through a complicated hair-splitting trolley-problem–style ethics exercise every single time there’s an ethical question to answer. Life, and work days, are too short for that.

So here’s my necessity heuristic vis-à-vis patron data: if the data use is not for clear, proven, and above all direct patient/patron benefit, or for an internal function without which the organization could not run, don’t touch the data. Any data use beyond that requires at minimum notification and consent—and I do not mean by that some sleazy overarching click-through; I mean actual informed, specific, refusable, revocable consent.

Or, shorter: if you don’t absolutely have to, don’t.

The code4lib journal piece at issue fails every piece of my heuristic. There is no direct benefit to the patrons whose data was collected and analyzed—“better collections” is an indirect benefit at best, even were it proven (and this article is not sufficient proof). Without this analysis, collection development would continue unabated, so it is not required for the library to keep running. There was no notification to and no consent from patrons, genuine or otherwise.

But that’s my heuristic, which I can’t expect to impose on everyone everywhere. (Feel free to poke holes in it.) Let’s try a couple more privacy analyses: one based on Physical-Equivalent Privacy, and one based on Nissenbaum et al.’s Contextual Integrity Theory.

Physical-equivalent privacy analyses ask the analyst to estimate the kind and amount of surveillance required to amass and disseminate the same type and amount of data collected by online means if the information packages in question were physical rather than online. Based on how squicky (technical term!) the physical surveillance feels, we can then judge how squicky the electronic surveillance should feel. For the code4lib journal article author to pull off the same analysis on use of physical bound volumes and abstracting/indexing sources, he would have to install video cameras throughout the stacks and reference area, identify each and every patron caught on camera, and make a note of every page of every volume consulted, alongside the name of the patron who consulted it. To emulate the (appalling, Briney-excoriation-worthy) security practices in evidence, the notes and videotapes, deidentified in exactly no way at all and not even close to anonymous, should live in a flimsily-locked (if locked at all) file cabinet in (I am guessing, but it seems a reasonable guess that the spreadsheets are on the author’s computer) the author’s sometimes-unlocked office, findable by any malefactor willing to put in some effort.

Does this seem okay to you? Would you run to implement this in your library? It surely doesn’t to me, and I surely wouldn’t. So in my estimation, the article’s data collection and handling fails the physical-equivalent privacy heuristic.

A contextual-integrity analysis asks the analyst to evaluate the putative privacy harm of a given information flow against apropos social norms and along five axes:

  • Data sender
  • Data subject
  • Data recipient
  • Information type
  • Transmission principle

“Apropos social norms” is legitimately a tough one to call, partly because of what Nissenbaum calls in Privacy in Context “the tyranny of the normal:” the tendency for social norms (especially nowadays) to evolve in ever-more-surveillance-ridden directions as people either don’t understand enough about them or are too worn down to protest them. I’m rock-solid sure that code4lib journal could dig up plenty of patrons who would say they’re totally okay with the article methods as published! Also, of course, plenty who wouldn’t—the dicey situation of social-media research (see e.g.) is relevant here.

As I read this situation, however, we’re evaluating the information flows in this article not against patron norms, but librarian norms. (This is exactly how Nissenbaum recommends dealing with the tyranny of the normal, of course: figure out who you are and what your ethics are and abide by that.) Those norms are set out in our codes of ethics and interpretations thereof, though as I mention above, in practice they have been steadily and unethically eroding.

For me, the contextual-integrity problems here live in all five axes, which makes for something of a tiring analysis, but oh well, here goes:

  • Data sender: Librarians are not exempt from the injunction to keep patron data private. Indeed, our ethics codes exist as a form of self-regulation! It’s not okay “because it’s us doing it.” It’s not acceptable for a librarian to abuse access to private and confidential patron data for research purposes.
  • Data subject: Patrons—all patrons—are entitled to the privacy and confidentiality of any data regarding information sources they consult. (I think they’re entitled to more, myself—ideally we would codify the duty to keep data about library and non-information-resource library-service use private and confidential as well—but at present our ethics codes don’t go there, so I won’t either.)
  • Data recipient: Again, it’s not acceptable for a librarian to use private and confidential data for research without notice to and genuine consent from all patrons involved.
  • Information type: Records of information resources that patrons use are privileged under library-ethics codes. They’re not supposed to be used except (as I suggest above) for direct benefit to patrons or for absolutely necessary organizational functions (“research” and “collection assessment” being nice, but not strictly necessary).
  • Transmission principle: Again, this is where notice and (genuine) consent come into it. Without those, this data flow isn’t okay under our ethics codes.

I can’t make this article feel okay with a contextual-integrity analysis either. If you can, me not being the world’s greatest contextual-integrity expert, I’d be interested to hear.

Given the analyses I’ve made here, I stand by my belief that both the article and the editorial attempting to justify its publication merit retraction on grounds of unethical data collection and use, and unacceptably careless data analysis methods, storage, and security.

L’affaire C4LJ: Starting places

So, this week the code4lib journal published an article whose methods I believe to be unethical enough to deserve retraction, alongside an editorial rationalizing the choice to publish which I also believe should be retracted. Because privacy ethics in LIS publishing is a hot-button issue for me—these are far, far from the only two pieces in the LIS literature I believe should be retracted on privacy-ethics grounds!—my reaction on Twitter was Twitter-style intemperate. I reproduce the text of that tweet here because I routinely delete tweets of mine older than six months, and given the gravity of this situation, this one needs a longer lifespan:

Today’s absolute BS take, from the Code4lib Journal: https://journal.code4lib.org/articles/16208

If you cannot collect data while keeping patron information and resource use anonymous (not just deidentified), YOU DO NOT COLLECT IT.

Retract this editorial and that article. Immediately. For shame.

I don’t use “for shame” often or lightly, I should perhaps mention. It’s harsh. When I do say it, I mean it. In my estimation, both article and editorial are shameful, and their authors and editors should be ashamed of them.

Additional necessary context: my colleague and friend Becky Yoose explained on Twitter her involvement in the pre-publication deliberations. C4LJ editorial-board member Peter Murray responded on his blog.

Before I discuss the article and editorial, and the process by which they came to be, I think it will help for me to lay out both the axioms I’m working from, and my history with code4lib. The former is important if the fundamental issues this case raises are ever to be resolved within the LIS literature; the latter is important because it casts some doubt on who’s making good-faith arguments here—and to be clear, “who isn’t” may in fact be me. I don’t think it is, of course! But I don’t get to make that determination for all of you.


Research that is unethical should not be published; if it is published, it should be retracted. The Committee on Publication Ethics concurs, by way of a rather handy workflow diagram explaining how to handle misconduct allegations, and the statement that editors should consider retracting a piece that “reports unethical research.”

Research on library patrons that contravenes library-specific ethics is unethical; it should not be published in the LIS literature, and when published there, should be retracted. The Belmont Report, and the Institutional Review Board infrastructure that grew out of it, are not library-specific and do not capture the entirety of library-specific ethics concerns. In the absence of a similar report and related infrastructure for library research, it is incumbent on editors and peer reviewers for the LIS literature to understand and apply library ethics to their editorial and review work, rejecting work that infringes library ethics. (It is not, however, incumbent on LIS publishing labor to manage ethics for other disciplines, just to be clear. Plenty of work to do cleaning up our own house, I think.)

Library-specific ethics codes hold patron privacy sacrosanct. For example:

  • IFLA: “Librarians and other information workers respect personal privacy, and the protection of personal data, necessarily shared between individuals and institutions. The relationship between the library and the user is one of confidentiality and librarians and other information workers will take appropriate measures to ensure that user data is not shared beyond the original transaction.”
  • CFLA-FCAB (see also): “Libraries have a core responsibility to safeguard and defend privacy in the individual’s pursuit of expressive content. To this end, libraries protect the identities and activities of library users except when required by the courts to cede them.”
  • ALA (see also): “We protect each library user’s right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted.” Note particularly that confidentiality is not a substitute for privacy; patrons are entitled to both. In the current context, last winter’s Resolution on the Misuse of Behavioral Data Surveillance in Libraries warrants a read.
  • ACRL: “The privacy of library users is and must be inviolable. Policies should be in place that maintain confidentiality of library borrowing records and of other information relating to personal use of library information and services.”

Given the nationality of the article author, I did look for relevant Canadian library-ethics codes, but did not locate one. If this represents a fumble on my part, by all means point me to the relevant documents—I want to know where they are and what they say! (I do read French, though slowly.) If they don’t exist, then my sense is that IFLA ethics codes would apply. My thanks to Becky Yoose for helping me locate the CFLA-FCAB statement.

The corollary arising from the above axioms seems clear to me: Research that violates patron privacy should not be published in the LIS literature, because it is unethical per standard library ethics; when published there, it should be retracted.

Code4Lib and me

This is a lot of long stories, but I’ll try to keep it short. I found the code4lib IRC channel early in my career as a librarian, and participated there until I could no longer abide repeated, unchecked (indeed, supported and defended) sexist expression from other participants. The planning process for their initial conference was the first last straw. Repeated attempts at silencing and tone-policing me for blogging openly about the first last straw was the next last straw, and this episode was the last last straw.

Where this ties into the current saga is this tweet from Peter Murray calling for civility. As Becky has been quite civil throughout, I suspect this tone policing to have been aimed at my original tweet. Given my history with code4lib—indeed, with librarianship—I can’t be surprised, only exasperated.

Do with this knowledge what you will.