Shaarli: a good migration target from Pinboard

To make a long story short, Pinboard owner Maciej Ceglowski milkshake-ducked himself with bizarre and out-of-nowhere hair-splitting regarding whether JK Rowling is a TERF. I have too many trans and non-binary people in my life to give my money willingly to someone like that. So I was suddenly, unexpectedly, and unhappily in the market for a new linkspam tool.

A quick recap of my user story: I keep (not to say “hoard”) links because I build readings for course syllabi out of them. I also share tag-based URLs with students and colleagues when inspiration strikes. New links get shunted to my Mastodon account, since apparently some folks find that useful. I need my linkspam tool to handle a LOT of links, give me a reasonably fast link-add mechanism, produce an RSS feed, let me combine tags (ideally with search) into a filtered linklist and share the resulting URL with others, and let me build such URLs from memory based on my knowledge of my own tag use.

I tried LinkAce first. I can say in its favor that it can be run on CPanel-enabled shared hosting, though I wouldn’t call it exactly easy to install. (I did manage it, and I’m a terrible awful useless sysadmin, so yeah.) I can say little else in its favor—it’s painfully slow, its link-add page is almost as infuriating as Raindrop’s (the tag lookup is just deadly bad), and it doesn’t have combined-tag filters or intelligible URLs. I have hopes for it, but as-is, I can’t make it work for what I need. To add insult to injury, its HTML export (in the de facto exchange format for link tools) wouldn’t work for me.

I was scared off Shaarli at first because of the documentation’s exhaustive list of server prerequisites and incantations. I shouldn’t have been! It installed quite easily on my webhost! Let’s say you want your Shaarli to live on the web at linkspam.example.com, and in a folder named “linkspam” on your webhost.

  1. In CPanel, make the linkspam.example.com subdomain, and point it to your linkspam folder. (Currently this is done via CPanel’s “Subdomains” menu item, but apparently this functionality is being moved to “Domains” shortly.)
  2. Go to CPanel’s “Domains” menu item and toggle “Force HTTPS Redirect” on for linkspam.example.com. (Do it for all your other domains and subdomains while you’re at it. I’d missed a couple of mine!)
  3. Download the .zip file for the latest Shaarli release.
  4. Use CPanel’s File Manager (or SFTP, if you’re so inclined) to upload the .zip file to the folder one level up from your linkspam folder. (Trust me, okay?)
  5. In File Manager, delete your empty linkspam folder. (No, really, trust me!)
  6. Choose the Terminal menu item in CPanel. You’re in your home directory; if you need to, cd to the folder you put the .zip file in. Now type unzip sh and hit your tab key, which should autocomplete the filename for the Shaarli .zip file. Hit return.
  7. Go back to File Manager. Reload it. Find a folder named Shaarli, and rename it to linkspam. (Now you see where I was going with this! You could also do this in the Terminal with mv Shaarli linkspam.)
  8. Go to linkspam.example.com in your browser, and finish setup.

And that should be it. (On my to-do list: setting up an automatic backup for Shaarli’s “data” directory. Pretty sure I can do this directly in CPanel, via cron if necessary.)

I don’t have time for a full Shaarli report card, but here are a few things I’ve noticed in the couple of hours I’ve had it running:

  • Text search can be combined with tag filtering, which Raindrop can do but Pinboard can’t. Nifty, though it’d be more elegant with just one search bar and a parseable text trigger (probably #) for tags, as Raindrop does.
  • I don’t love Shaarli’s URLs—everything is query parameters—but I can live with them. They definitely copy-and-paste cleanly, unlike Raindrop’s or LinkAce’s, and the components are memorizable.
  • Bookmarklet is a tiny bit slow to load, but so was Pinboard’s sometimes; I can live with it. Bookmark entry is a breeze; Shaarli does not do LinkAce’s horrible horrible real-time(-ish) tag lookup.
  • OH MY GOSH, SEARCH OPERATORS! Phrase searching, minus-ing, wildcards! I will enjoy getting to know these.
  • The Wayback Machine integration is clever and useful, and will save me some time during syllabus construction. (Sometimes irreplaceable links 404.)
  • Shaarli could really use some CSS love. Maybe if I locate some spare time. One thing I would immediately do is get rid of the little tag icon next to tags. It’s purest visual clutter, but a simple display:none will take care of it.
  • What the hell is that QR code thing doing there, and why can’t I get rid of it?! Minor nit, it’s not all that obtrusive, but ugh.
  • I haven’t checked into its add-ons yet, but there seems to be a flourishing community, so I will.

So yeah, Shaarli is solid and useful and does what I need it to, and is not profiting any milkshake ducks. Fancy and pretty I don’t actually need. As for my Pinboard, I’m leaving it up for a bit until I’ve weaned my various syllabi and course assignments off it, and then it will go away.

Half a filk

It took me a lot longer to get to Hamilton than most. I dearly want to send Michael Gorman a ticket, after his racist diss of hip-hop that probably only I remember by now. Ah, well.

I made half a filk after watching the Disney production a while back. The rest is still to be written, I think; I hope it has a better ending for librarianship than for Hamilton. Grateful to the Genius website for making the lyrics available (and copiously annotated).

[Salo:]
There’s nothing like tenure-driven research
Data in the ILS meets data from the teachers
There’s value in the air, you can smell it
And a researcher’s by himself. I’ll let him tell it

[Researcher:]
I hadn’t slept in a week
I was weak, I was awake
You never seen a librarian
More in need of a break
Longing for significance
Missing ROI
That’s when Ms. Value Agenda walked into my life. She said:

[VAL:]
I know you are a professional
I’m so sorry to bother you at home
But I don’t know where to go, and I came here all alone…

[Researcher:]
She said:

[VAL:]
My admin’s doin’ me wrong
Puntin’ me, huntin’ me, defundin’ me…
Suddenly budget’s up and gone
I don’t have the means to go on

[Researcher:]
So I offered her a meal, I offered to break her Big Deal, she said

[VAL:]
You’re too kind, sir

[Researcher:]
I gave her some spreadsheets that I had socked away
She worked a block away, she said:

[VAL:]
This one’s mine, sir

[Researcher:]
Then I said, “Well, I should head back home,”
She turned foxy, she led me to EZProxy
Opened her data boxie and said:

[VAL:]
Stay?

[Researcher:]
Hey…

[VAL:]
Hey…

[Researcher:]
That’s when I began to pray:
ALA, show me how to
Say no to this
I don’t know how to
Say no to this
But my God, the data’s so fresh
And the journal’s saying, “Hell, yes.”

[VAL:]
Whoa…

[Researcher:]
No, show me how to

[Researcher/Ensemble:]
Say no to this

[Researcher:]
I don’t know how to

[Researcher/Ensemble:]
Say no to this

[Researcher:]
In my mind, I’m tryin’ to go

[Ensemble:]
Go! Go! Go!

[Researcher:]
Then the data lake’s online, and I don’t say…

[Ensemble:]
No! No!
Say no to this!
No! No!
Say no to this!
No! No!
Say no to this!
No! No!
Say no to this!

[Researcher:]
I wish I could say that was the last time
I said that last time. It became a pastime…

Off it goes…

Well, the article I started last spring is finished and off to a journal. I expect to have some trouble placing this one, because I’m just stubborn enough to send it to outlets whose practices it calls into question. (They are absolutely appropriate outlets for the piece, I hasten to say—I’m not wasting anybody’s time, that would be wrong of me.)

So I’ll track the rejections, and post ’em here once it’s finally accepted somewhere. Which it will be, I’m confident. I did good and useful work on this one, if I do say so myself.

Some tidbits about data handling in library learning analytics

So I finished building and coding up my dataset of library learning-analytics articles! That was a lot of work. I also have a data dictionary, and a methods section in the article draft! Yesterday I got to start writing queries against the database and writing them up in the article draft.

Want some tidbits about the 62 research projects I ended up studying (46 of them American)? Of course you do. Have some:

  • 35 of 62 projects, 27 of those 35 American, made no attempt whatever to deidentify data before analysis. Ahoy ahoy potential data leaks and insider threat!
  • 11 projects, 8 American, used data that revealed the subject of a patron’s inquiry. That’s a pretty bright-line no-no in libraries, folks.
  • Only 11 projects notified students about the specific research that would be taking place using their data. One more claimed that students were notified because the campus ID card terms of service told them research (unspecified) would be happening.
  • Actual informed consent? Five. Five projects sought it. Out of 62.
  • Wondering where ethics review was in all this? Yeah, me too. Of the 46 American projects, eleven passed IRB review, four were declared exempt, and a big fat nothing for the remainder. One of the 16 non-American projects received ethics review.
  • Sensitive data used in these projects included: socioeconomic status data or proxies thereof (13 projects), high-school performance data (GPAs and SAT/ACT-or-analogue scores, 13 projects), location data (7 projects), first-generation student status (6 projects), national origin or citizenship data (4 projects), military/veteran status (3 projects), and disability status (1 project).

I’ve got more; I wrote plenty of ANDed WHERE clauses yesterday (SQL is so much fun!), and more may occur to me as I continue the writing-up. But the above certainly gives you the flavor. It is not a good flavor.

Please miss me with all the gaping loopholes in which projects must receive ethics review. I know. That’s part of the problem! I plan to write about it at length in the other paper! (I also want to acknowledge David Fiander for giving me lots of useful intel on Canadian ethics-review loopholes yesterday on Mastodon. Appreciate it, David, and I’ll also acknowledge your help in one or both papers.) It may seem convenient to dodge all this red tape, but in my head what it really means is that LIS is letting its researchers show their ethics underwear all over the place, unguided and (crucially) unprotected. It’s not the Value Agenda for Libraries pushers whose careers will be tarnished when (and it’ll be when if I have anything to say about it) retractions and expressions of concern start happening; I expect they’ll claim it’s on researchers to Do Ethics Right, none of their concern. It’s pretty much academic librarians doing what VAL pushers told them was okay—not just okay, vitally important—who will be hung out to dry.

Not sure how the VAL pushers sleep at night, honestly—if my analysis holds water, which I think it does or I wouldn’t still be working on it, they’ve royally screwed students and librarians—but I suppose that’s not my problem.

Anyway, a lot of the discussion for this piece will be the first (as far as I know) attempt at examining real-world library learning-analytics practices in light of what we know from Data Doubles and similar research (which there’s rather more of now! yay!) about student preferences, the top two of which have repeatedly been shown to be notification and the chance to consent (or not). There’s an ethics-of-care argument there that I’m happy to make: if we care about students as much as we claim to, ignoring or overriding their stated preferences, especially for a research agenda that does not directly benefit them (hello benevolence! the Value Agenda for Libraries has none of you!), cannot be ethically acceptable.

I’ll publish the data, too. Zotero exports for both eligible and ineligible project citations, SQLite database, CSV database exports (though I need to think about building useful views for later-researcher convenience), basically the lot. CC0 on all of it, not that there’s much if any copyright in it to claim. You want to play in my data playground? Go for it.

It ain’t necessarily so

The publication Book Riot has been doing incredibly necessary journalism very skillfully around the latest rash of attempted and successful censorship of library materials and librarian voices. If you’re in the States and it’s not in your daily round of book news, whyever not?

They put out a great piece today on the mess in the Oklahoma City library system regarding abortion information. It’s really, really good, and probably headed for my fall intro syllabus.

But one sentence gave me pause: “[S]ome information privacy practices in public libraries emerged following the Patriot Act, which is why, for example, records of materials checked out by individuals are not saved and why it is shared computers are wiped of their histories between sessions.”

Oh. Oh, no. Oh, dear. Neither of these assertions is universally true, and the assertion about circulation records appears to be becoming steadily less true as CRM systems and assessment/analytics take firm hold. Usually the shared-computers thing is about available IT staff and budget rather than any sort of intentional retention plan, but a whole lot of libraries are strapped for IT talent and budget, so.

As for circulation records, I have direct proof of one academic-library system not deleting them! But long story short, libraries (both public and academic) that retain identified circulation records past materials return typically do so for one or more of the following reasons:

  • Patron convenience, commoner in public libraries than academic, but the academic-library consortium that serves my workplace explicitly names this as its retention rationale in its records schedule
  • “Assessment” and/or “analytics,” which is where CRM systems come into it
  • Academic-style research, which overlaps with assessment/analytics in tangled ways, as with “value of academic libraries” research
  • ILS settings that aren’t twiddled in favor of privacy
  • Isolated edge cases, such as special collections (where defacement and theft of materials by patrons are extra-serious issues)

In some libraries, it’s more than physical-materials circulation records—I don’t want anyone thinking “well, I never check out books, so I’m safe!” For academic-library-purchased ebooks and ejournals, identified or reidentifiable traces of patron information use can be left in proxy-server logs, which (I hear from e-resources librarians of my acquaintance) can definitely stick around a lot longer than they should. There’s also the whole question of what data the vendors are keeping, but that’s tangential to the Book Riot question, so I’ll let it go for this post, noting only that I wrote a big long thing about it that people can read, and they should also pay attention to Sarah Lamdan on the subject.

In response to a media query last Friday that went “Should people be concerned about their data privacy when it comes to searching for abortion-related resources?” I wrote the following paragraph about libraries:

If we are talking about libraries: yes, and as a librarian who strongly values information privacy I hate this answer, but it is the only honest answer I have. Too many libraries are retaining identified or reidentifiable search logs and identified circulation records far longer than they should. Too many vendors who sell online content access to libraries for use by patrons are using the same trackers and surveillance adtech as the rest of the web. I’m fighting my own profession to make it live up to its stated privacy ethics with everything I have in me—but it’s an uphill battle. Folks need to be aware that libraries, whatever our rhetoric as librarians, are not necessarily keeping them safe.

But I feel a bit of a filk coming on, so…

It ain’t necessarily so
It ain’t necessarily so
The privacy mottos
In library grottoes
It ain’t necessarily so

And I’ll leave it at that, before I get myself in trouble yet again calling out specific people, libraries, and practices.

I broke my Twitter leave of absence to ping Book Riot’s Twitter about this. I repeat here what I said there: there’s lots of skeevy stuff happening, and I’m as good an option as most to talk knowledgeably about it. Give me a shout, Book Riot, if you would.

Who’d like to be the Dr. Latanya Sweeney of library patron data?

Dr. Latanya Sweeney, for those who haven’t encountered her work before, is one of the titans of data privacy and reidentification research. Her work is the source of the oft-quoted factoid about nearly nine in ten Americans being uniquely identifiable with a combination of birth date, gender (binary assumed), and ZIP code. She’s deliberately reidentified politicians who blathered ignorantly about data privacy. She assembled evidence to call out search engine ad targeting for racism based on naming practices specific to African-American communities. Basically, she’s badass and a hero and I admire her exceedingly.

And I think there’s a tremendous LIS research agenda that grows out of her work (and the work of others, Arvind Narayanan not least) going begging: assessing the reidentifiability and risk profile of common sources of library patron data, and quantifying how possible it is to (re)associate a patron with evidence of (as ALA privacy guidance puts it) the subject(s) of their interest.

Such sources should probably include:

  • Retained circulation records, identified and various flavors of de-. It may seem obvious that identified circ records pose a hazard to patrons, but it’s really, really not obvious to many librarians and most patrons.
  • Proxy-server logs, identified and de-.
  • CRM system records; these are identified (or what would be the point?) — the research questions have to do with the use of these records to reidentify patrons in other data sources.
  • Chat reference logs, since they get retained for internal analysis and research a lot.
  • Website and web-service logs from all sources (local logging, SaaS-tool-logging, logging-by-web-tracker-emphatically-including-Google-Analytics, logging-by-usability-tool)
  • The usual-suspect computer-use logs and caches: browser caches, software-use caches, desktop-search caches, and the like.
  • Single-sign-on data, especially but not only when it’s pseudonymous or limited to entitlement data. This isn’t strictly-speaking library data, but it’s getting baked into library authentication processes deeply enough that I consider it a valuable arena of LIS inquiry.

Example research questions include but are emphatically not limited to:

  • How easy is it to associate a given patron with the subject(s) of their inquiry based on these data sources? Combinations of these data sources? Combinations of one or more of these data sources with common library-external data (e.g. for library learning analytics projects, GPA and major and demographic data and whatnot)? For academic libraries, combinations of one or more of these data sources with institution-external data (e.g. LinkedIn and alumni databases)? From the published LIS literature (because oh, are there ever skeletons in this closet and where is the Narayanan who will proof-of-concept them)?
  • Quantification of some standard measures of reidentifiability potential — k-anonymity and l-diversity and stuff like that.
  • Feasibility of reidentification-by-behavior-trail. For example, if an attacker (probably a library insider) with access to library data comes in with knowledge of a specific person’s likely interests, can they pick that person out of (just as one example) proxy-server logs? What data-retention time horizons enable/prevent such reidentification? Put another way… how unique do behavior trails tend to be, and which patron populations are most at risk of behavior-trail reidentification? (Like, some freshman in a large-lecture course with a canned research project likely isn’t super-reidentifiable from deidentified proxy logs… but I suspect pretty strongly that I, a longtime staff member with fairly outré intellectual interests, would be.)
  • What’s the potential of assessing group membership, particularly for groups targeted by law enforcement? Basically, if The Man drops by wanting to know who’s been searching up abortion or immigration or critical race theory or LGBTQ+ issues, can library data (alone or in combination with other data) rat patrons out to The Man as possible uterus possessors, or Dreamers, or people of color, or queer folks?
  • What actually are libraries’ present data-retention and data-handling practices? Records schedules? Privacy policies? Governance processes? Data-handling processes during internal assessment as well as research for publication? We just don’t know enough about this, and the work I’m doing barely scratches the surface of the work that’s possible. (Pour one out for ARL SPEC Kits; this would actually be a good use for them.)

Here’s the kicker. A lot of this work pretty much has to be done by working library practitioners, because they’re (quite properly, to be sure) the only folks who can actually get at library-internal data. I’ve wanted to do some of the above work for literal actual years, but as yet I haven’t located a library IT person willing to go in on it with me. There are additional wrinkles with library-external IT, too—I really want those single-sign-on reidentifiability studies to happen, but most librarians can’t unilaterally do them because (again) they can’t get at the institutional data without IT cooperation.

Like, I would actually prefer to be less paranoid than I am about library-patron data. It’d help my blood pressure, if nothing else. But without answers to the research questions I just posed, I… kind of have to assume the worst, based on the anecdata I have and on data-privacy solecisms evident in the LIS literature.

So. I’d be absolutely delighted to see a journal or conference or two—emphatically including code4lib journal—create some incentives for this work (and for doing it carefully and ethically, natch). Hey, editors and editorial boards, how about a themed issue? Even if it has to be guest-edited (and yes, I would absolutely serve in that capacity, pace my well-known unwillingness to donate labor to grossly exploitative commercial publishers). Hey, LIS conference organizers, can we get a track please? Hey, folks mentoring new academic librarians in need of research agendas, how about suggesting this one?

Let’s do this. It sure does need doing.

L’affaire c4lj: ma fin est ma commencement

So, one piece of the code4lib journal saga has reached its end: the article in question has been retracted at author request. According to a letter I received from the president of the university where the article author works, this happened to bring an end to the ethics investigation I requested with no one admitting any fault. Which, fine. I think the retraction cures the major harms, and I was never in this out of any onus against the author.

This should not be the end for the journal, however, which continues its silence on what, if anything, it plans to do to keep unethical research from appearing in its pixels. The best time to have done something would have been a long time ago; the second-best time is now.

How do I feel? Sad and tired, mostly. Mildly vindicated, yes, I’ll cop to that. But as Machaut wrote, ma fin est ma commencement: there’s a lot more work to do about all this, and I seem to have elected myself to the position of Something-Doer In Chief.

I’m seeing the light at the end of the tunnel for data collection on the evaluate-VAL-data-practices piece; I need to find one review piece in print to check its citations (not a problem; the iSchool library has print of the issue I need) and finish going through Library Assessment Conference proceedings and the entire run of EBLIP for eligible studies before the analysis can start.

One thing I noticed while I was piling up material, though, is that VAL is not the only locus of library-privacy-ethics despair in the current LIS research environment. That’s not surprising, I suppose, but it sure is dispiriting. There’s plenty of dubious use being made of proxy-server logs, since we’re on the topic—both the journal and the author can be forgiven for thinking this was okay, since it’s been done and published before. I’ve not seen much on circulation records beyond circulation counts, which is interesting; it implies that some academic-library researchers understand that identified circ records are off-limits, but haven’t managed to make the logical leap that any identified record of reading needs to be kept private and confidential, including proxy-server logs.

(Also, the ALA Code of Ethics uses boolean AND, not boolean OR, between “privacy” and “confidentiality.” Patrons are entitled to both, including from nosy-parker library researchers. “We’re keeping it confidential!” is not adequate under the Code as I read it. Besides, a lot of VAL work doesn’t even do that much.)

The larger problem is that both ethical guidance and ethics-related processes around the use of library patron data are sparse, gappy, and outdated. This lets nosy-parker researchers ethicswash, compliance-hunt, and IRB-dodge their way into data uses and abuses that are not ethical and shouldn’t be allowed. I tried like heck (as de facto project manager) to get this point made clearly in this DLF explainer, but four years later, I have to admit that one sank like a stone and has not achieved my aim. (I can’t speak for the other contributors on that, of course, only for myself.)

So I’m going to try again, louder and clearer this time, naming names, calling out dubious practices directly (with nods to Briney and Asher/Robershaw among others), and demanding change. I’d originally thought of this as the lit review for the VAL research piece, but it just kept growing and complexifying in my head and my article notes, so I’m now thinking it’s too big and diffuse for that.

And after that, and after the VAL piece? I think it may be time to start making a whole whackton of retraction and institutional research-integrity investigation requests. Let me tell you, that’s not how I imagined my mid-career research and praxis agenda going, but I don’t know how else to get this unethical data abuse—and unethical data abuse is exactly what it is—to stop.

Libraries: the next academic bossware?

OCLC Research is going all-in on “research analytics” (see also). Rah-rah libraries, or something.

I chuckled ruefully at the discussion of libraries whose constituencies can’t see them as anything other than book pushers. Been there, done that, burned a whole stack of T-shirts—dealing with it right now at the day job, actually, and it’s as demoralizing and demotivating as ever.

But no, I actually had a much more visceral reaction to the idea that’s far more TattleTape-relevant. I’ll be blunt: the point of all this bibliometrics stuff at the institution level is judging and punishing people. Oh, sure, it’s never couched in those terms, but is anybody at all fooled? Really? It’s about denying tenure and promotion, axing “unproductive” departments, and making sure every single researcher knows that the bosses are breathing down their neck.

It’s bossware. It’s all bossware. Academic bossware. And libraries—some of them, anyway—are piling right in, unthinking and uncaring. Parallels with learning analytics will be left as an exercise for the reader… but I’ll spoil it: the main commonality is the so-far ubiquitous and inevitable use of the analysis techniques to judge and punish. Knowing what the “successful” student or researcher does just leads to judging and punishing the “unsuccessful” one. Understanding success does not come from data; it comes from understanding, which is not a quantitative enterprise.

Down with learning analytics. Down with research analytics. They’re too easy to wield as bossware, as weapons.

So that happened

I appeared in local news yesterday to talk about data privacy. Good conversation leading to a nice piece, really like the photo and quote they chose to highlight (I may be ugly as a mud fence but I do have nice hands), 10/10 would pundit again.

As they were packing up their gear, the interviewer asked whether I was surprised at having to use my knowledge of privacy technology this way. “Well, I’m a librarian,” I said, “and we have a pretty long history of navigating the authorities wanting to know what people read and watch to get them in trouble for it, so no, I’m not surprised exactly—just appalled.”

“So you think people are still safe in libraries?” was the (perfectly natural!) follow-up question.

I winced. Like, actual physical flinch. “I’m actually fighting my own profession on this one,” I said ruefully. “There’s a lot of data FOMO happening, and I don’t think it’s right, so actually that’s my research and publication focus right now.” Then I told them about UW-Madison’s twenty years of my circulation records.

I hate so much that this is the kind of answer I have to give when asked about privacy in libraries. Until librarianship cleans its house, though, it’s the only honest answer I have.

Foxes and henhouses

Now that Data Doubles is winding down and the linked-data-ethics piece I worked on with Ruth Kitchin-Tillman is heading for publication, I find myself a tiny bit freer to pursue my own projects. (That said, it’s an accreditation cycle for my shop, RADD needs a rebranding and a lot of repair work before I can reopen it, I’m reviewing another round of grant applications, and I’m on the usual raft of task forces, so “freer” is… relative.) What I’m starting to tackle is a review of data used and ethics-related practices implemented (or not) in the portion of “Value Agenda for Libraries” research that specifically addresses individual student achievement vis-à-vis library use.

(VAL is a beast, if you look at the whole thing. The chunk of VAL I’m looking at is far from all of VAL. Figuring out inclusion criteria for the studies I’m studying got to be an adventure, I can tell you. I’m not sure I got it right, but I am sure I can clearly explain what I did, so that will have to do. I will of course post citation info for included/excluded studies in public so others can try slicing them different ways if they want.)

It probably won’t surprise anyone that I’m doing this because I think this particular chunk of VAL as it is actually implemented is lots of different kinds of unethical—library-privacy unethical, human-subjects research unethical, contextual-integrity unethical, duty-of-care unethical, exploiting-power-relations unethical. (I also want to talk to somebody more knowledgeable about FERPA, especially FERPA-as-implemented-in-practice than I am. If that’s you, I’d appreciate an email.) I am absolutely, positively one of the Big Meanies that Kirsten Kinsley complains about, and I am exactly zero percent sorry about that. Of course I will elaborate on my reasoning in the study, once I’ve done it and start writing it up; it’s a bit lengthy and involuted for a blog post.

The thing is, though, it’s hard to have productive arguments about this without a reasonably clear idea of what’s actually happening with data about individual students when libraries do this flavor of VAL research. So that’s what I’m trying to build, with nods to Briney and Asher/Robertshaw for preceding me.

So yeah, that’s where I’m coming from. I bring it up here because I saw the other day that ACRL is doing a webinar or something about doing student-centric VAL research ethically… in part run by the very people who invented VAL originally, building what I strongly believe to be extremely poor ethics into its foundations. I strongly suspect this learning opportunity will be full of motivated reasoning from people invested in not giving ground on the purported necessity of VAL-style research, not to mention as long on empty performative nods to the necessity of ethics as it will be short on actually actionable advice.

I’d love to be wrong, but I doubt I will be. Do not try to learn ethics from foxes defending their access to the student-data warehouse henhouse, is my feeling on this one.

I can’t even guess at this point when I’ll have a finished piece to send around to journals, though I’d be pleasantly shocked if it happened before 2023. At present I’m still building the eligible-study corpus, whose Zotero folder at this point includes roughly a hundred articles, theses, and conference presentations. (“Project” rather than “publication” has to be my unit of study here, because a number of projects have generated multiple published/presented outputs.) Wish me luck!