Let’s Use the Annotated Web to Coordinate the Struggle Against Fake News
This post by Jon Udell was originally published on the MisinfoCon blog on 18 Oct 2017 and is reposted here with permission.
In February of 2017, the W3C approved standards [1, 2] for annotations to text, images, audio, or video. They matter to the struggle against misinformation in the very particular way I’ll discuss here.
But first, a quick primer on web annotation. In text, annotations attach to paragraphs, sentences, phrases, numbers, or anything you can select. The annotated web uses enhanced URLs to connect annotations to such selections. An annotation is a package of data, plus a description that connects the data to a selection in a page.
The web we know is an information fabric woven of linked resources. By increasing the thread count of that fabric, the annotated web enables a new class of application for which selections in documents are first-class resources. Consider one statement in a news article:
We have been on an upward trend in terms of heavy rainfall events over the past two decades, which is likely related to the amount of water vapor going up in the atmosphere,” said Dr. Kenneth Kunkel, of the Cooperative Institute for Climate and Satellites.
Kenneth Kunkel isn’t the only climate scientist backing the linked statement. The link points to this confirmation by Climate Feedback’s Emmanuel Vincent:
Dr. Vincent’s note includes a comment, a chart, and a link, all written in the freeform text format supported by the Hypothesis annotation client. His note can serve as the root of a threaded conversation conducted openly, or in private group spaces.
Now let’s add a second annotation that refers to the same selection. It’s created by a human using an enhanced annotation client, or by a robot recognizer/classifier. Either can deliver a payload of structured, machine-readable data, governed by a standard vocabulary or schema, perhaps including a rating. This second annotation enables a reader to view the claim through lenses provided by human curators, artificial intelligences, and — most powerfully — people and machines working together.
Here’s a mockup of one such enhanced client:
The idea here is that the annotation tool knows the URL of the article being annotated, the sentence that is the subject of the claim, and the identity of the fact-checker. So it need only gather a few more bits of information — the rating, the name and job title of the reporter — in order to produce a chunk of ClaimReview markup like this:
{ “@context”: “http://schema.org", “@type”: [“Review”, “ClaimReview”], “datePublished”: “2016–10–06”, “url”: “https://www.theguardian.com/environment/2016/aug/16/louisiana-flooding-natural-disaster-weather-climate-change", “author”: { “@type”: “Organization”, “url”: “http://climatefeedback.org/" }, “accountablePerson”: { “@type”: “Person”, “name”: “Emmanuel Vincent”, “url”: “http: //snri.ucmerced.edu/emmanuel-vincent” }, “claimReviewed”: “We have been on an upward trend in terms of heavy rainfall events over the past two decades.”, “reviewRating”: { “@type”: “Rating”, “ratingValue”: 5, “bestRating”: 5, “alternateName”: “True” }, “itemReviewed”: { “@type”: “CreativeWork”, “author”: { “@type”: “Person”, “name”: “Oliver Milman”, “jobTitle”: “environment reporter for the Guardian US” }, “datePublished”: “2016–10–06”, “name”: “Disasters like Louisiana floods will worsen as planet warms, scientists warn” } }
This review, which attaches to the same sentence that Emmanuel Vincent annotated, will be found by a search for the annotation’s URL, its author, its group, or its tag. Readers can watch activity connected to the target sentence, robots can ingest and analyze that activity, publishers may or may not choose to display it.
The combination of human expertise and automated analysis can exist in multiple overlays. Climate scientists, economists, political analysts, and automated fact checkers might converge on a single sentence in a story on climate change. Nothing depends on any domain-specific vocabulary or schema. Annotation is simply the connective tissue that makes statements in web pages addressable, and binds those addresses to conversations, supporting documents, source data, or truth claims that bear on annotated statements.
As we battle misinformation, we are creating many different systems that people and robots will use to check facts and classify statements. They all share a common pattern: reference to selections in web documents, and attachment of data to those selections.
The annotated web embodies that pattern. Systems that embrace it will tend to work well with one another. Their outputs will be available to mine, crosslink, and remix, and those activities will drive collective improvement.
So here’s the call to action, to builders and funders of such systems in particular. Build on a common foundation: the annotated web.
Jon Udell (judell@hypothes.is) is Director of Integration at Hypothesis.
He is grateful to Dan Froomkin, Joe Germuska, and Dan Gillmor for feedback on an earlier draft of this article.