The Semantic Web Rabbit Hole

08 Sep 2022

resources structured data semantic web schema.org Jekyll themes

So, I was working on implementing the Book schema and had gotten pages and posts working with test data off the top of my head. I had put a hook in for book lists and decided that this would be a good place to test out Jekyll collections.

Now, I could fabricate more data, but since the inspiration for wanting book lists in the first place was sitting right there, no need to imagine a list. I'd take a few books from the page, create them as collection records, test out the whole collection stuff, decide how to template/style a list entry (I was already thinking about how to activate/deactivate fields for a nice listing format), and Bob's your uncle! Of course, life is not so simple.

The second book (which in retrospect, I should maybe have skipped?) was one of the more complicated types you're likely to encounter:

there are multiple editions
it is the translation of another book
many elements have multilingual versions

But, you might say, good test case! Nothing like "real world" data to test your code! So I started making adjustments. I'd want to have both title (and author name) and translated title show up for both the canonical listing of the English edition as well as the Chinese edition, but not taint things with more anglophone chauvinism than necessary due to an anglophone being the author of the theme.

Add more fields. Define them inside the schema and just for the page. Book part, per se, done but then I have all this publication/publisher information that I wanted to retain. You can. You can define a publisher as an Organization and then their address can be a PostalAddress and, and, and... but do you need to? Examples on schema.org use WorldCat and VIAF as resources. If I could point to an official record somewhere with all the publisher information or even all the author information, then I wouldn't have to be storing or coding it myself: go there if you're curious or need to know for whatever.

VIAF is hideously confusing at first glance. It was quick to narrow down from "author with this name" to "author of this book with this name". The record had tonnes of other entries that I believe are librarians from various different instutions doing data entry ever so slightly differently, but no so much that the authority's software doesn't collate them together (without winnowing them down to a canonical one). But they only had one of her five books.

WorldCat on the other hand looked to have all of her books, but very few had the cover images, in particular the non-English editions making me less certain of my interpretation. Also, in WorldCat, I was unable to winnow out the other authors with the same name, so... what to do???

I had hoped that the "official" sources of the world would save me from formally formatting a bunch of data and let me just point to the references but I have more data than they do, and I am begrudgingly willing to encode it. If I go to the trouble, can I contribute it to the world's formatted data? VIAF and WorldCat are library based places. Of course, you don't want random mischievous users polluting your rigorously formatted formal data pool, but you still need help, no? I found a document that mentions that VIAF takes "contributions" from WikiData which does take user contributions... should I?

The lazy (correct??) voice in my soul says that I could retain this extraneous information, unstructured, in the description of my book data and call it a day (or even not bother retaining it, who cares?). Does the publisher even exist anymore? Would/will anyone ever need to make use of the information? These are some of the thoughts that plague an info-pack rat.

Post a New Comment