Occasionally, there is a need to search for and return a large number of annotations. Previously, Hypothesis’ API made use of a parameter called offset
that allowed the user to skip a number of initial annotations. By using offset
, users could search all annotations and return a specific subset of them at a time. The time it takes to perform this request is proportional to the offset
plus the number of annotations returned. So, while this method works very well when the value of offset
is only, say, a couple thousand, it becomes very slow for larger offsets. In some cases, if offset
is large enough, the request can fail completely. To combat this problem, a new method of searching for bulk annotations has been introduced: the parameter search_after
. Hypothesis recommends changing any requests to the /api/search endpoint that currently use offset
to page through thousands of annotations to use search_after
instead.
Why We Made the Change
Previously, Hypothesis searched for bulk annotations by a sliding window where the /api/search endpoint would return limit
number of annotations starting at offset
:
offset |
integer [ 0 .. 9800 ] Default: 0 The number of initial annotations to skip. This is used for pagination. |
|
limit |
integer [ 0 .. 200 ] Default: 20 The maximum number of annotations to return. |
i.e.: If there were a total of 100 annotations, offset
=10, and limit
=20, the search endpoint would return annotations 10-30.
Newer versions of elasticsearch impose a restriction on offset
and limit
such that offset
+limit
can not be greater than 10,000. This means that the /api/search endpoint will not return any annotations beyond the 10,000th annotation by using offset
and limit
. Regardless of what is passed to offset
, offset
is capped at 9,800, and so, search_after
became the new standard in Hypothesis to search for bulk annotations:
search_after |
string Returns results after the annotation whose sort field has this value. If specifying a date use the format yyyy-MM-dd'T'HH:mm:ss.SSX or time in milliseconds since the epoch.This is used for iteration through large collections of results. |
How it Works
search_after
is based on sort
and order
. sort
defaults to the updated field, or the last time the annotation was updated, and order
defaults to descending (so the most recently updated annotations will be found and returned first). search_after
will return annotations that occur after the annotation whose sort field has the search_after
’s value.
Examples:
- If there are 31 annotations—1 for each day in October—the search parameter combination of
sort
=updated,order
=desc,limit
=10, andsearch_after
=2018-10-05 will retrieve annotations made from the 6th of October to the 16th of October. - If there are 31 annotations with IDs 0-31, the search parameter combination of
sort
=id,order
=asc,limit
=10, andsearch_after
=5, will return the annotations with IDs 6-16.
Searching using offset
and limit
is inefficient because elasticsearch must load all the annotations (offset
+limit
number of annotations) into memory and sort them before returning the window of annotations defined by offset
and limit
. search_after
does not require all the annotations to be loaded and sorted because it can be applied as a filter on the search query itself—as opposed to offset
, which must be applied after the initial search. This is why search_after
is more efficient and, while the old parameter offset
does remain, it is not recommended to use it.
Those who are interested in working with our API can learn more by reading our API documentation. For more information on how to use sort
and order
with search_after
, see the /api/search section.