Sometimes you may find yourself working with a PDF in which none of the text is selectable. This usually happens when a PDF is created from scanned images of text. You can use OCR technology to optimize these PDFs.

What is OCR?

OCR, or Optical Character Recognition, is the process by which computers and software convert images of text into a machine-readable format. OCR allows users to highlight parts of the text, and is necessary for annotating PDFs with Hypothesis.

Note: OCR-optimized documents are beneficial to blind and visually impaired readers, as OCR allows screen readers and other assistive technology to interact with the text. Working with OCR-optimized documents is a best practice regardless of whether you are annotating with Hypothesis.

How do I know whether my PDF is OCR-optimized?

If you can easily select a line of text and then copy and paste it elsewhere, your PDF is OCR-optimized and you can start annotating.

You will need to apply OCR technology to your PDF if any of the following is true:

  • You are unable to to select any text
  • You can select text, but it is difficult to get only the text you want
  • You can select text, but it is “garbled” or poorly formatted once you copy and paste it elsewhere
  • Someone who uses screen reader technology has indicated the PDF is difficult to read

How to OCR-optimize a PDF

To use the tutorials below, you will need to have Adobe Acrobat installed. If you do not have an Adobe subscription, you might consider downloading a free trial of Acrobat or checking with your school, institutional, or local library.

Here are written instructions for using Adobe Acrobat’s OCR technology, or you can watch a short video tutorial below:

