Federating Annotations Using Digital Object Identifiers (DOIs)

By judell | 22 June, 2017

Scientific journals come and go, but the scientific record is permanent, and its annotation layer should be too. New Hypothesis support for DOIs (digital object identifiers) helps ensure a robust connection between articles and annotations. Let’s explore how that works.

First, here’s a magic trick you might not realize Hypothesis has up its sleeve. Consider this PLOS One article. Annotate it in one tab, then open a second tab and annotate the PDF version there. You’ll see both annotations in both tabs. How is that possible?

The answer is that when scholarly publishers provide HTML versions of articles, they typically include metadata that points to PDF versions of the same articles. Here’s one way that happens:

<meta name=”citation_pdf_url” content=”http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0168597&type=printable”>

Hypothesis remembers the correspondence between the HTML and PDF versions, and coalesces annotations across them.

Here’s an even more magical trick. Download that PDF to your file system, load it into a third tab, and annotate again. Now you’ll see all three annotations in all three tabs!

Since Hypothesis doesn’t know that the local copy of the PDF came from http://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0168597&type=printable, or that it’s related to http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0168597, how is that possible?

The answer is that the PDF standard defines a unique identifier, or “fingerprint,” that authoring tools encode into the PDFs they create. When you use the Hypothesis client to annotate web-hosted PDF, it captures the fingerprint and sends it to the server. When the client then loads a local copy of the PDF, it asks the server for all annotations bound to the fingerprint, no matter where the file was hosted. Or wasn’t hosted, actually, since you can also annotate another local copy!

How do annotations made on the HTML version coalesce with those made on any members of a family of PDF copies? We’ve already seen that answer: The HTML version provides a PDF URL in its metadata. That leads Hypothesis to a PDF file, and thence to the fingerprint within it, which unites all the PDF copies and bridges that set to the HTML page.

Whew! This is clever, and you might think we’ve got all the bases covered. But in the scholarly realm there’s more to the story. That same article also lives at another URL, http://europepmc.org/articles/PMC5179025. Until recently, annotations made there would not coalesce with the other three versions we’ve seen. But now they do:

How? Hypothesis now uses DOIs to join variants of the same document in the same way it uses PDF fingerprints. Both pieces of metadata — the DOI, and the PDF URL — are typically included in HTML metadata. The DOI takes one of two forms:

<meta name=”citation_doi” content=”10.1371/journal.pone.0168597″/> (Highwire Press)

<meta name=”dc:identifier” content=”10.1371/journal.pone.0168597″> (Dublin Core)

It was already the case that you could search Hypothesis for the DOI, like so:

https://hypothes.is/search?q=uri:doi:10.1523/JNEUROSCI.5212-13.2014 (interactive)

https://hypothes.is/api/search?uri=doi:10.1523/JNEUROSCI.5212-13.2014 (api)

But the Hypothesis client formerly did not include the DOI when searching for annotations. Now it does, and you’ll see all four of the annotations shown in the screenshot on all four of the variants we’ve discussed.

The W3C’s web annotation standard defines an annotation’s target as a web resource identified with an IRI, which is “an extension to the URI specification to allow characters from Unicode.” We usually associate web resources with domains like journals.plos.org, and URLs like http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0168597, but there are more abstract identifiers. Here are the two at the core of the examples shown here:

urn:x-pdf:a08d875ad57045edf70d283087a0e339 (PDF fingerprint)

doi:10.1523/JNEUROSCI.5212-13.2014 (DOI)

The scientific literature needs to be permanent. Journals may come and go, but articles will live on, and so should annotations that refer to them. The DOI enables a domain-independent connection between content and annotations. By taking better advantage of the DOI, Hypothesis now reinforces that connection.

Share this article