1. Home
  2. Knowledge Base
  3. Publisher Partners
  4. How to establish (or avoid) document equivalence in the Hypothesis system

How to establish (or avoid) document equivalence in the Hypothesis system

Hypothesis uses common metadata conventions to identify and alias documents (a process called “establishing document equivalence”). Publishers and website owners can use these conventions in order to make sure annotations remain visible on documents, even if they are moved, syndicated, or hosted at multiple locations. This help article contains recommendations for establishing or troubleshooting document equivalence. For a more detailed look at the concepts discussed here, see our help article on how Hypothesis interacts with document metadata.

Resilience to URL change

There are three change-resilient ways for Hypothesis to associate annotations with documents:

  1. For DOI-equipped HTML documents: If an annotation’s target document declares a DOI, Hypothesis will bind the annotation to all documents that declare the same DOI. An example URL-independent identifier: doi:10.1126/science.51.1305.8 doi:
  2. For PDF files: If an annotation’s target document is a PDF, Hypothesis will bind it to all copies of that PDF. An example URL-independent identifier: urn:x-pdf:db49e0a7b073bbadeb889a910835b716
  3. For other HTML documents: If the target is neither an HTML page with a DOI nor a PDF, Hypothesis can use the dc.identifier / dc.relation.ispartof syntax. An example URL-independent identifier: urn:x-dc:elifesciences.org/blog-article/e3d858b3.
  4. See a full example here.

Keep reading for more details on how these methods work.

HTML pages

Scholarly publishers: use Highwire Press tags or Dublin Core metadata to establish document equivalence.

The Hypothesis system can map a DOI to a URL or multiple URLs. Including a document’s DOI in the metadata of a web page will ensure that annotations appear on that document regardless of where it’s hosted.

For example, an article published by Cell includes the tag
<meta name="citation_doi" content="10.1016/j.ajhg.2017.02.007">.
That same article at PubMed Central includes the same tag. Annotations made on this article over at Cell will show when you view the article at PubMed Central and vice versa.

There are two ways for a scholarly web page to declare its URL as an alias for a DOI. The most common method is a de facto standard known as the Highwire Press tag set, popular because it’s supported by Google Scholar. Another way comes from the Dublin Core Metadata Initiative. It’s common for both of these to be included in scholarly metadata. For the Hypothesis system, either is sufficient to establish a DOI/URL mapping for the purpose of annotation.

If you’re using Dublin Core metadata to establish URL aliases for DOIs and the annotations are not syncing as expected, check the formatting of your tags to ensure that http://doi.org/ or https://doi.org/ is not included in the value attribute of the dc.identifier tag.

Hypothesis unifies documents that contain the same <link rel="canonical"> tag and does not unify documents that contain the same <link rel="alternate"> tag (except when the alternate link points to mobile vs desktop versions of the same page – in that case, Hypothesis will unify those two documents). Note: There is currently no way to remove aliases interactively or by means of the Hypothesis API— meaning once a document has been associated with a canonical URL, there isn’t a way to “un-associate” it.

If annotations made on one page are showing on another page where you would not expect them to, check the <head> of each page for a <link rel="canonical"> tag or a <link rel="alternate"> tag pointing to a desktop or mobile version of a web page.

If you would like annotations to “follow” a page to a new URL, you could use <link rel="canonical"> tags, but this is not recommended because it is still URL-dependent. Instead, we recommend using Dublin Core metadata to create future-proof, URL-independent identifiers.

See our help article on how Hypothesis interacts with metadata for information on using Dublin Core metadata on documents that do not have DOIs.

PDFs

If you want annotations to sync across multiple copies of a PDF, check the fingerprint of each copy to ensure they’re all the same. If you want to prevent annotations from syncing across copies of a PDF, save new copies of the PDF with unique fingerprints.

HTML and PDF versions of the same document

If you want to sync annotations between HTML and PDF versions of a document, use <meta name="citation_pdf_url"> and make sure the value of the tag’s content attribute is a URL that points to a PDF directly, not to an HTML page that embeds the PDF. For guidance on using <meta name="citation_pdf_url"> see the section of our metadata help article on PDF-related HTML metadata.

Was this article helpful?

Related Articles