Some tidbits about data handling in library learning analytics

So I finished building and coding up my dataset of library learning-analytics articles! That was a lot of work. I also have a data dictionary, and a methods section in the article draft! Yesterday I got to start writing queries against the database and writing them up in the article draft.

Want some tidbits about the 62 research projects I ended up studying (46 of them American)? Of course you do. Have some:

  • 35 of 62 projects, 27 of those 35 American, made no attempt whatever to deidentify data before analysis. Ahoy ahoy potential data leaks and insider threat!
  • 11 projects, 8 American, used data that revealed the subject of a patron’s inquiry. That’s a pretty bright-line no-no in libraries, folks.
  • Only 11 projects notified students about the specific research that would be taking place using their data. One more claimed that students were notified because the campus ID card terms of service told them research (unspecified) would be happening.
  • Actual informed consent? Five. Five projects sought it. Out of 62.
  • Wondering where ethics review was in all this? Yeah, me too. Of the 46 American projects, eleven passed IRB review, four were declared exempt, and a big fat nothing for the remainder. One of the 16 non-American projects received ethics review.
  • Sensitive data used in these projects included: socioeconomic status data or proxies thereof (13 projects), high-school performance data (GPAs and SAT/ACT-or-analogue scores, 13 projects), location data (7 projects), first-generation student status (6 projects), national origin or citizenship data (4 projects), military/veteran status (3 projects), and disability status (1 project).

I’ve got more; I wrote plenty of ANDed WHERE clauses yesterday (SQL is so much fun!), and more may occur to me as I continue the writing-up. But the above certainly gives you the flavor. It is not a good flavor.

Please miss me with all the gaping loopholes in which projects must receive ethics review. I know. That’s part of the problem! I plan to write about it at length in the other paper! (I also want to acknowledge David Fiander for giving me lots of useful intel on Canadian ethics-review loopholes yesterday on Mastodon. Appreciate it, David, and I’ll also acknowledge your help in one or both papers.) It may seem convenient to dodge all this red tape, but in my head what it really means is that LIS is letting its researchers show their ethics underwear all over the place, unguided and (crucially) unprotected. It’s not the Value Agenda for Libraries pushers whose careers will be tarnished when (and it’ll be when if I have anything to say about it) retractions and expressions of concern start happening; I expect they’ll claim it’s on researchers to Do Ethics Right, none of their concern. It’s pretty much academic librarians doing what VAL pushers told them was okay—not just okay, vitally important—who will be hung out to dry.

Not sure how the VAL pushers sleep at night, honestly—if my analysis holds water, which I think it does or I wouldn’t still be working on it, they’ve royally screwed students and librarians—but I suppose that’s not my problem.

Anyway, a lot of the discussion for this piece will be the first (as far as I know) attempt at examining real-world library learning-analytics practices in light of what we know from Data Doubles and similar research (which there’s rather more of now! yay!) about student preferences, the top two of which have repeatedly been shown to be notification and the chance to consent (or not). There’s an ethics-of-care argument there that I’m happy to make: if we care about students as much as we claim to, ignoring or overriding their stated preferences, especially for a research agenda that does not directly benefit them (hello benevolence! the Value Agenda for Libraries has none of you!), cannot be ethically acceptable.

I’ll publish the data, too. Zotero exports for both eligible and ineligible project citations, SQLite database, CSV database exports (though I need to think about building useful views for later-researcher convenience), basically the lot. CC0 on all of it, not that there’s much if any copyright in it to claim. You want to play in my data playground? Go for it.