semantic web/ part 4

October 2, 2017October 2, 2017 / pataforest / Leave a comment

Is Library ready for a solution? Library is not (yet).

I want you to think about your library’s ILS (Integrated Library System- if you who work in a library, an ILS is the thing you minimized to read this blog on a work computer. If you’ve never worked in a library, it’s the software platform a library uses to do almost everything). Regardless of which one your library uses, odds are, you aren’t particularly fond of it. They were designed to simulate the helpless disorientation of a Kafka novel while also honoring the Rube Goldberg ideal that complexity is far more important than functionality– the ILS has also been adopted as the official guide to existential despair.

The reason they’re like this basically comes down to incest. As I mentioned in the previous post, librarians have been trying to do this data management thing all on their own. The MARC formats might have been revolutionary in the 1960’s, but we’re still using them! It’s like we’re plotting a course to Mars on an abacus- even if it’s possible, why the hell would you want to do it?

As it stands, library data can only be used by libraries. In order to manage all that data, companies develop technologies that can only be used by libraries to manage library data. And since these ILS platforms are enterprise data management systems, each library can really only access and use their own data. This is bonkers! It’s going to take more than a flashy quote from Neil Gaiman to remind people why libraries exist post-Google.

Yeah Neil, but what if Google could BRING YOU A LIBRARIAN??

Now, thanks to one apoplectic battle cry and some real talk from fed-up libraries, the library is finally deciding to trade in the abacus… kind of. In 2012 the Library of Congress teamed up with Zepheira (a legit player from the web/data world with serious linked-data expertise) to start developing BIBFRAME (I’m not being cheeky this time- the word actually is in all caps- bit arrogant if you ask me), which is what we’ll (allegedly) be using instead of MARC21 in the near future.

Without getting too technical, BIBFRAME is going to usher the library into the Golden Age of Linked Data… we’re just hoping that age isn’t over before the library makes it to the ball. BIBFRAME is certainly a step in the right direction- it incorporates a lot of linked data elements that will make library data discoverable and usable to the outside world, but at the same time, developers are trying to make sure that the transition from MARC to BIBFRAME isn’t too hard on our delicate, dust-jacketed sensibilities. I truly do have high hopes for this project, especially given some of the more recent, practical steps being taken, but honestly, if we’re still using clunky ILSes ten years down the road, and not managing our data the way all other data is managed, our data incest problem will go on for generations… which is gross and frowned upon in most cultures.

At the end of the day, there’s only one thing that will determine how useful the library will be to its users: you! If we, as library professionals, are engaged enough to learn about, evaluate and advocate new opportunities on behalf of our patrons, developments like this Semantic Web thing will gain the internal momentum they need to actually work. If you’re tired of muttering profanities at your ILS, if you’re feeling the frustration of your patrons as they slog through OPAC search results, or if you think it might be neat to search for the latest bestseller on Google and find the nearest library that has it in stock, then go learn more! Talk to your coworkers and your patrons! Decide on a future, then bug the hell out of the people who make decisions until it happens!

If you want to see your library begin taking part in some Semantic Web-ery right away, the easiest first step is to register your library with the Library.link Network.

semantic web/ part 3

October 1, 2017October 2, 2017 / pataforest / Leave a comment

The problematic solution to the solution: catalogers

If you happen to strike up a cocktail conversation with a cataloging librarian, it’s going to feel a lot like talking about music with that indie kid from high school. “Aw man, can you believe they used [obscure band] for the [indie film] soundtrack? I was listening to those guys YEARS before anyone had heard of them! Now Jeff is saying the football team is gonna use it at the pep rally! I hate everything, darkness, darkness etc.”

But here’s the thing: that kid didn’t make any of the music he’s claiming credit for. Catalogers though? They’re the real deal. Actual rock-stars, but ones that won’t leave a couple goats with red racing stripes painted on their sides lapping up whiskey from the hotel bath tub.

You remember how the W3C started trying to figure out how to get machines to read data waaaay back in 1999? Librarians beat them there by over THIRTY YEARS.

Librarians probably welcomed Neil Armstrong to the moon

I’m sure there was a librarian waiting to welcome Neil Armstrong to the moon

All the records in your library’s catalog are encoded in the MARC21 format, right? (right) And what does MARC stand for? MAchine Readable Cataloging! It was developed by the brilliant Henriette Avram in 1966, and while this kind of “machine reading” isn’t quite the same as what we need for the Semantic Web, it’s clear proof that librarians were gnawing on this pickle loooong before anyone else knew the Snack Shack around the corner was even selling pickles.

Look at it this way: for as long as people have been recording information, our proto-librarian ancestors have been classifying, cataloging and archiving that information for other people to use. Trust me, we know what we’re doing.

And what is the Internet but a feral wilderness of undomesticated information? Who better to tame these stampeding herds than a librarian? Alas, the guys in IT were too busy looking for mugwumps to bother with a librarian, and librarians weren’t about to let a bunch of rowdy youngsters come stomping into the hallowed ground of their sacrosanct catalogs, tracking grease from their parents garage and still reeking of pizza and RC Cola.

[side rant/ we don’t like to talk about it much, but we all know the library spent a dark few decades in a Medieval period of witches-won’t-sink-but-they’ll-sure-burn thinking. Somehow we lost sight of the ENTIRE POINT OF LIBRARIES and thought it was our job to scare the shit out of anyone brave enough step foot in a library.

Things are better now–most libraries are amazing, but it took us a while to come around.]

So what should have been a natural partnership never worked out. All the skill and experience of catalogers was effectively wasted without the practical expertise of the computer gurus. Even now, the two fields are far too separate from one another, but there are a few important overlaps. As luck would have it, one of those cooperative overlaps is a shared vision for the Semantic Web. Next up, Bibframe, Zepheira and other tales of not-quite-linked-data.

semantic web/ part 2

October 1, 2017October 2, 2017 / pataforest / Leave a comment

The solution: teach the internet to use the internet – rdf and linked data

So, if a computer can’t do any semanticry with HTML, how can we package information in a way that allows for more complex data manipulation? The ominously named W3C, (World Wide Web Consortium: the shadowy arbiters of the web, founded by Berners-Lee) along with some other folks, have been hard at work on this problem for some time.

The solution?

Smash it all to smithereens!

giphy

Then painstakingly stitch all the trillions of trillions of pieces together into one massive and grotesque glob!

Problem solved!

As uninspiring as this may seem, it’s really the only way to get all the crap out of the silos and make it usable. And unlike smashing a watermelon, there’s some serious ingenuity and elegance behind this deconstruction– after all, this is coming from people who make Internet.

Step one: the pieces

Way back in 1999, when the phrase “you’ve got mail” could still be heard without evoking ironic nostalgia, the W3C proposed a new vision for information on the web: RDF. Resource Description Framework is a conceptual model (that is, it’s not a programming language, a type of software or file format or platform. It’s more of a “hey, let’s try to make everything look like this”) that expresses internet stuff in relation to other internet stuff.

Each little piece of information is given a URI (unique resource identifier), which is kind of like your social security number in that it’s unique to you and it’s how the IRS knows to come after you, instead of some other chump with your same name. Without getting too far into the weeds, we can say that these URIs are the words that make up sentences that a machine can read. Then we give all of the sentences the exact same structure with “triples”: three words (URIs) ordered as subject–predicate–object to form a meaningful statement. For example, “Mary eats mutton” identifies the subject (Mary), the object (her little lamb), and describes how they relate to each other (lunch). This makes it so that the meaning of the statement is built in to the structure and will function the same way for all sentences built that way. If you combine enough of these statements, you get the whole story.

Step two: The glob

Putting all those pieces together is tricky. Ideally, everyone would just stop building silos and adopt an RDF-based metadata schema (this fancy sounding phrase is what they call the set of rules for organizing and encoding data). Unfortunately, it’s never that straightforward.

First off, there’s too much money tied up in each of these silos, and people flip out when you start screwing with their money. Even setting aside the open data problem, it’s really hard to get people to change the way they do things for the sake of some nebulous greater good, especially when it would take a ton of people a ton of time and money to do it.

But for the sake of explanation, let’s say everyone decided to buy in. This new web of linked data would make it possible to search for a thing and find that thing. I can’t even begin to express how amazing this would be.

A really great article written by Matt Enis for Library Journal in 2015 called “Ending the Invisible Library | Linked Data”, opens with this example:

“To explain the utility of semantic search and linked data, Jeff Penka, director of channel and product development for information management solutions provider Zepheira, uses a simple exercise. Type “Chevy Chase” into Google’s search box, and in addition to a list of links, a panel appears on the right of the screen, displaying photos of the actor, a short bio, date of birth, height, full name, spouses and children, and a short list of movies and TV shows in which he has starred. Continue typing the letters “ma” into the search box, and the panel instantly changes, showing images, maps, current weather, and other basic information regarding the town of Chevy Chase, MD.”

Imagine if a patron could get library/holdings information this easily! But also, think of the benefit to academic libraries and to research in general! Linked data has huge potential to revolutionize every field of research. I mean, wouldn’t it be nice to have all cancer research at the fingertips of all cancer researchers? Yes. Yes, it would be nice.

The 'pataForest

the librarian's guide to the 'pataphorical wilderness of the internets

Month: October 2017