Why We No Longer Run an Open Proxy

Hypothesis no longer maintains an open web proxy. That’s a good thing.

Early on, we had a chicken-or-the-egg problem. We wanted to bring open, interoperable web annotation to everyone, but we knew that only a tiny number of people early on were going to have our browser extension installed. If people could only share their annotations with others who had the extension installed, that would create substantial friction.

We came up with a clever workaround, which was to run a “proxy” that would enable any page on the web to have the Hypothesis sidebar client added to it simply by pre-pending the URL with “via.hypothes.is/.” For six years, it has been a highly successful solution for nearly any website or PDF on the web, solving a fundamental problem crucial to the development of open annotation as a paradigm for general users and our educational service alike. Over most of that time our usage has been relatively modest, and there have been no issues.

In the last year, however, usage of our service has expanded dramatically. In the last few months we became aware that there were reports of individuals attempting to use our proxy for malicious purposes. These people were prepending “via” to the URLs of websites designed to capture sensitive personal details in such a way as to evade the protections built in to web browsers and other services. We don’t know that they were ever successful in those efforts, but the mere suggestion that they could have been is bad enough.

We took rapid action to address these concerns and ensure that our proxy cannot be misused in this way.

The primary change we’ve made is that we no longer run an open proxy. An open proxy is one where any website can be proxied without restriction. The utility of this is obvious: Since one doesn’t know beforehand which site any given user might want to annotate, running an open proxy ensures that any site will work. The problem is that this openness also includes web properties hosting web pages that could be used maliciously. As of now, only sites that we have approved are able to be annotated using the “via” proxy.

Most people will never notice this change. Of the nearly 400 million domains on the internet, only a relatively small handful are used by most people. Across the 20 million annotations made so far, there are somewhere in the neighborhood of 100 thousand unique domains. Further, the large majority of those exist in a “long tail” where there may only be one or a couple of annotations ever made. Less than a thousand domains represent the substantial corpus where most annotations happen.

We’ve been careful to select a set of domains that we’re comfortable are safe and we’ve included them on our initial allow list. All other domains are now explicitly disallowed from the proxy. We are also now continuously scanning even this restricted set of domains against a set of widely used and maintained blocklists to ensure that even sites that we think are safe now continue to be so in the future.

It’s important to note that all domains can still be annotated using the extension or bookmarklet, or when a site owner purposefully embeds the Hypothesis client on the page. This change only affects the proxying of pages for annotators who don’t have the extension installed or who aren’t using a bookmarklet or embedded solution.

We understand that this may occasionally create friction for some of our users. If you would like us to consider adding a domain that is not currently included, please let us know. Over time we will implement strategies to make extending the list of allowed sites more seamless.

