Preprint Services Gather to Explore an Annotated Future

On Thursday, 25 January, 2018, representatives from many of the major preprint servers and other related organizations gathered in person and remotely to discuss annotation use cases around preprint content. The Alfred P. Sloan Foundation generously hosted this session in their offices at Rockefeller Center in New York City and also provided valuable context to the discussion from the funder perspective. Before the gathering, attendees submitted responses to a brief survey intended to identify topics for deeper exploration and to help inform discussion about annotation with preprints.

Submissions to preprint servers are meant to spark conversations and facilitate the improvement of research prior to its submission for formal peer review and publication. Hypothesis has long observed annotation activity on top of preprints, ranging from private notes to collaboration groups to public annotations. Recently, bioRxiv announced that later in 2018, they will launch a dedicated, moderated annotation layer on content they host. Following the announcement, we discussed with bioRxiv’s Richard Sever the idea of bringing a wider group of preprint services together, and we later coordinated with Oya Rieger at arXiv to to host a summit just prior to their preprint convening, planned for January 2018 in New York City.

Attendees included representatives from ACS, AGU, arXiv, ASAPbio, bioRxiv, ChemRxiv, COS, CUNY, Earth ArXiv, ECS, ECSarXiv, engrXiv, ESSOAr, figshare, LISSA, PaleorXiv, PeerJ, PLOS, PsyArXiv, MDPI, SocRxiv, and SSRN representing a broad range of disciplines from the math and computer sciences, to the physical, life sciences and social sciences. We have also been in contact with preprint services in the humanities, who we expect to join the group as it develops. After a brief round of introductions, Hypothesis CEO Dan Whaley kicked off the day with an update on Hypothesis capabilities and near-future plans. The impending launch of the eLife integration (now live) offered attendees a glimpse at functionality that would be available to them to make multi-purpose annotation layers visible by default to readers on their platforms. These authoritative annotation layers will offer 3rd-party authentication, moderation capabilities and UX customization. They enable inline annotation to connect conversations to precise portions of manuscripts and, with direct linking, enable annotators to connect resources across the web. Publishers can control who can see annotations and who can create them, with different permissions possible in different layers. Lively discussion sprang from this roadmap discussion and flowed into a lengthy Q&A that included in-person and remote participants across the United States and Europe.

Survey responses from attendees identified three areas for further discussion: peer review, identity, and versioning. These discussions were led by Jessica Polka of ASAPbio, Richard Sever of bioRxiv, and Oya Rieger of arXiv respectively.

Preprint servers and the “burden of moderation”

The conversation kicked off with a foundational discussion regarding the intent of facilitating annotation on preprint content. Would this be undertaken as preprint-service-hosted annotation as peer review or as a form of lightly moderated post-publication peer review? In the first case, the service wields the power of a journal editor. In the second case, the service supports moderation of site discussion for which — one participant noted — current solutions such as Disqus comments are not well suited.

For the services that support commenting now, spam is of course always a concern, but more generally, keeping up with moderation of content in comments is a real burden to staff. Several attendees noted that private collaboration groups are possible now, but the notion of making annotations visible by default would raise additional questions.

Some participants pointed toward annotation playing a more visible role as a part of peer review processes in preprints, as in recent initiatives around overlay journals and ASAPbio’s ideas about the creation of an independent peer review service that might function like an overlay journal. Each preprint server has its own ideas about the current and future role its service might play in this space, with some feeling strongly that getting involved in more formalized peer review is out of scope.

Current moderation efforts for existing comment systems range from none to community moderation after publication to pre-moderation by either preprint staff or designated moderators. Expectation management is key, as contributors should be aware of how long it might take for their comments to appear, as well as expected community codes of conduct around submissions. Wikipedia provides an example of community moderation in the form of “upvoting,” but one attendee particularly familiar with the process noted that it is “a messy space” and the process must be continually revisited. Communities must be transparent about moderation policies, including who is moderating, as well as community guidelines, linking to codes of conduct. (Examples: Climate Feedback, Hypothesis, LIS Scholarship Archive, PaleorXiv.) The location of organizations can add additional complexities with UK libel laws cited as one example. The EU’s “right to be forgotten” also might challenge notions around the immutability of comments and annotations that might mention individuals directly or indirectly.

The conversation then moved to the persistence of annotations as content: What license would they receive? Where would they live? What would happen to them if the original content disappears? Who could remove them and why? Not all of the answers are simple. Current Hypothesis public annotations are published with CC0 declarations to the public domain, but Hypothesis feels strongly that different publishers or platforms should be able to designate their own preferred copyright statuses. Annotations live on Hypothesis servers and remain accessible there to their creators and readers even if their target documents are no longer accessible on the web. Users can modify or delete their own annotations, but any replies by other users remain. Attendees felt that some types of annotations, which are of course content themselves, might after some period of time or some designated phase of the process be “frozen” into the permanent scholarly record, receiving DOIs for citation purposes and archived accordingly. Others wondered whether DOIs could simply be requested by authors or readers for any annotation for the purposes of citation.


Attendees next moved to the topic of identity. Who is allowed to annotate where, what mechanisms are involved in verifying the identity of those contributing annotations, and will anonymous annotations be allowed? Conversation started by noting that anyone can already annotate in the Hypothesis public channel and that private collaboration groups have control their own members. Along the same lines, individual preprint servers should be able to make these determinations for their content and deal with annotators accordingly.

Many attendees assumed that the identity of annotators should be connected to their ORCIDs. While anyone can apply for and receive an ORCID, it raises the bar for participation for those who may have ill intent.

Attendees appreciated the difficulty of balancing these choices. Some authors contributing articles might be hesitant to receive annotations from anonymous readers or they might not want to receive annotations at all. Attendees noted that there could be valid reasons for not annotating under a real name, perhaps for junior researchers fearing reprisal from more powerful peers or researchers from countries with limited freedom of speech. Still, there was the sense from direct experience that higher quality contributions result from those using their real names. Some felt that authors will want to know who is making the annotations, so that they can properly evaluate their worth. Others noted that blind peer review already obscures the identity of reviewers, thus authors are already familiar with this paradigm (although blind peer review is mediated by a trusted editor). There was a suggestion that pre-moderation of comments would put the burden on the moderator to ensure the value of anonymous contributions. Moderators could also act as trusted third parties to post annotations from those in countries where researchers felt less free to annotate under their real names. Preprint servers should be able to establish their own policies and revise as needed, taking into consideration the purpose of the annotations — whether for suggested improvements or community engagement — and community standards and culture. Technically, Hypothesis already makes it possible for people to annotate using an alias, but, for example, should annotators need to prove their identity when setting up such pseudonymous accounts?


Preprint servers and their communities use versioning in different ways. Most allow submitters to post later revised versions of articles, and these are tracked in multiple ways — version groupings or numbers, sometimes as DOI suffixes. Other servers enable post-prints or accepted articles to be uploaded. These might include official Open Access published versions or post-embargo versions. Participants had a lengthy discussion about these topics and how the taxonomy and practice of versioning preprints is often confusing and misunderstood, reinforcing the need for better practices and transparency around versioning and annotation.

Participants agreed that metadata around an article — and around an annotation — will be key. Versions often show the date submitted and the last date revised. Annotations are also time-stamped, and this offers one way to connect them with article versions. Varying policies around when DOIs are assigned to papers, and the possibility of assigning DOIs to annotations, raise important questions about discoverability. Search engines sometimes drive readers to paywalled versions even when an open copy is available. Recent funding policies allow for citation of preprint articles, but those citing may be inadvertently misidentifying the version they are citing or journal citation formatting policies may make the process unclear.

It was clear from this discussion that work needs to be done around versioning for preprints themselves, including best practices and taxonomies. More work needs to be done around DOI identification and citation frameworks. Then there are questions around how or if DOIs for annotations might be assigned as well. Participants only had time to scratch the surface of these essential discussions.

Where do we go from here?

Preprint services are bringing scholarly communities together to make research findings more accessible, more timely and more responsive to feedback. Hypothesis’ commitment is to identify the key innovations the annotation community can offer — especially through open software and services — that can compliment those goals. As we all consider these various issues, including moderation, identity, versioning and the overall role for annotation in the review process, we’ll rely principally on the feedback and input from the participants at this workshop and the larger preprint community as our guide.

As we explore annotations on preprints, Dawa Riley, our UX Lead, is keen to conduct in-depth interviews about specific use cases, so feel free to reach out to her directly if you’d like to participate.

