New Search API Parameter: search_after

New Search API Parameter: search_after

By |2018-11-13T09:06:39-08:00November 8th, 2018|

Occasionally, there is a need to search for and return a large number of annotations. Previously, Hypothesis’ API made use of a parameter called offset that allowed the user to skip a number of initial annotations. By using offset, users could search all annotations and return a specific subset of them at a time. The time it takes to perform this request is proportional to the offset plus the number of annotations returned. So, while this method works very well when the value of offset is only, say, a couple thousand, it becomes very slow for larger offsets. In some cases, if offset is large enough, the request can fail completely. To combat this problem, a new method of searching for bulk annotations has been introduced: the parameter search_after. Hypothesis recommends changing any requests to the /api/search endpoint that currently use offset to page through thousands of annotations to use search_after instead.

Why We Made the Change

Previously, Hypothesis searched for bulk annotations by a sliding window where the /api/search endpoint would return limit number of annotations starting at offset:

offset integer [ 0 .. 9800 ]
Default: 0
The number of initial annotations to skip. This is used for pagination.


limit integer [ 0 .. 200 ]
Default: 20
The maximum number of annotations to return.

i.e.: If there were a total of 100 annotations, offset=10, and limit=20, the search endpoint would return annotations 10-30.

Newer versions of elasticsearch impose a restriction on offset and limit such that offset+limit can not be greater than 10,000. This means that the /api/search endpoint will not return any annotations beyond the 10,000th annotation by using offset and limit. Regardless of what is passed to offset, offset is capped at 9,800, and so, search_after became the new standard in Hypothesis to search for bulk annotations:

search_after string
Returns results after the annotation whose sort field has this value. If specifying a date use the format
yyyy-MM-dd'T'HH:mm:ss.SSX or time in milliseconds since the epoch.
This is used for iteration through large collections of results.

How it Works

search_after is based on sort and order. sort defaults to the updated field, or the last time the annotation was updated, and order defaults to descending (so the most recently updated annotations will be found and returned first). search_after will return annotations that occur after the annotation whose sort field has the search_after’s value.

Examples:

  • If there are 31 annotations—1 for each day in October—the search parameter combination of sort=updated, order=desc, limit=10, and search_after=2018-10-05 will retrieve annotations made from the 6th of October to the 16th of October.
  • If there are 31 annotations with IDs 0-31, the search parameter combination of sort=id, order=asc, limit=10, and search_after=5, will return the annotations with IDs 6-16.

Searching using offset and limit is inefficient because elasticsearch must load all the annotations (offset+limit number of annotations) into memory and sort them before returning the window of annotations defined by offset and limit. search_after does not require all the annotations to be loaded and sorted because it can be applied as a filter on the search query itself—as opposed to offset, which must be applied after the initial search. This is why search_after is more efficient and, while the old parameter offset does remain, it is not recommended to use it.

Those who are interested in working with our API can learn more by reading our API documentation. For more information on how to use  sort and order with search_after, see the /api/search section.

Community, Privacy, Accessibility, and Research at Hypothesis