Never trust vendor defaults, librarians on power trips, or data marts

So, I’m teaching a new data-management course this fall, crosslisted between the MA/LIS and our new MS in Information. It’s not my old research-data management course (though I do plan to bring RDM into it), but a much more enterprise-y thing based on the DAMA Data Management Body of Knowledge book.

This book. Y’all. If you follow me on Twitter you might have seen me howling about it. It is so bad. So, so bad. Outright falsehoods, misinformation, decontextualized bafflegab, horrifically poor editing. I’m still going to use it—I’m just putting a bug bounty on it, in the form of extra credit when a student finds a howler and can back up their correction with evidence. I’m guessing they’ll all earn A-pluses. That’s how bad this book is.

Still. It did teach me about data marts and star schemas, enough to know that that’s what I was looking at when I read some Alma Analytics documentation. For those who haven’t had the dubious pleasure of reading the DAMA DMBOK book, a “data mart” is a reformulation (along with a process for data updating) of an existing database for faster, more convenient, less friction-filled querying by more people at a lower level of tech savviness. Properly implemented, a data mart can also help enforce confidentiality and privacy controls; sensitive data doesn’t have to be copied over to the data mart, after all, and often isn’t because data marts are generally intended to fuel aggregate (rather than individual) analytics.

So Ex Libris, the company that sells the Alma ILS and its Alma Analytics data mart, could have aligned with library privacy ethics while still making a large swathe of library-internal reporting easier by simply not copying patron-specific data into their data mart. Did they? Did they hell. Straight off their website, with apologies for the total lack of accessibility (I did put the money quote in the alt text):

List of queryable Alma Analytics entities, including "Users: Provides detailed information on users (for example, details, contact information, and roles)

Let’s review! Data marts, like their larger relatives data warehouses, are intended to offer lots more people easier ability to access and query data! The practical upshot here is that unless a library is super-extra-careful about data-mart access (which kicks against the whole point of data marts, especially in the Assessment Age), lots more people can compromise patrons’ information privacy. How awesome is that?

(Not awesome. It is not awesome at all.)

Do I trust UW-Madison GLS to have determined and implemented appropriate access controls on the Alma Analytics data mart? Hollow laugh. This is the same GLS that doesn’t even have a privacy policy, has analytics in its strategic plan, and kept nearly 20 years of my circulation records, contravening their very own records schedule—a schedule that added a whole new retention period just to play with Alma Analytics! No, I don’t trust them.

More specifically, I don’t trust the (presumably) upper-level administrators who built that records schedule and contracted to buy Alma Analytics. (AUL for IT Lee Konrad would likely be where the buck stops around this in GLS. I don’t know about CUWL.) I do have a fair amount of trust in the Library Technology Group that directly administers Alma, however. Why? Because of the proxy-server records from my data dump, which contained only six months’ worth of data. These records are not represented directly in the records schedule, and many library administrators don’t know much about them, so my bet (which may be wrong) is that LTG itself decided the retention period. Six months wouldn’t be my preference—shorter is better!—but it’s certainly reasonable. This hints pretty strongly that LTG left to its own devices is likely to do the right thing by patrons.

Do I trust Ex Libris? Hollow laugh the second. Patrons aren’t their customers; libraries are, and this is the Assessment Age. Ex Libris built a data mart because they thought libraries would want it. It’s likely some librarians asked for it, even! (Probably not by name, more by functionality.) Do I trust every librarian ever? Hollow laugh the third. I’ve heard worse stories than this from friends of mine who program ILSes, such as a development request to let the ILS store images of patrons’ identification documents such as driver’s licenses and passports. (Do not do this, for pity’s sake. It is a living invitation to identity theft, including by rogue library insiders.) Be ye wary of librarians who have replaced their ethics with power trips.

Does Alma have to be this level of privacy disaster? Absolutely not, if libraries change the utterly disastrous defaults! The secret (well, it’s not really secret; Library Twitter found it for me readily enough) is that the library has to change its user-management defaults to anonymize loans once they are complete. This also prevents Alma Analytics from getting its grubby paws on identified patron information-use data, a consummation devoutly to be wished.

Never trust vendor-set defaults, basically, but hold out hope that changing them is worthwhile!

Oh, and the special-collections special case for circulation-record retention that I mentioned earlier in this series? Per Alma’s documentation, “The [patron anonymization] rules are defineable per library/location.” So GLS absolutely does not need to retain identified patron data from non-special-collections transactions just to protect special collections. The rationalization in the records schedule is purest mendacity.