Thursday, 19 May 2022

Installing OCRmyPDF

 https://ocrmypdf.readthedocs.io/en/latest/installation.html#installing-on-windows

Installing OCRmyPDF

Installing on Windows

Native Windows

Note

Administrator privileges will be required for some of these steps.

You must install the following for Windows:

  • Python 3.7 (64-bit) or later

  • Tesseract 4.0 or later

  • Ghostscript 9.50 or later

Using the Chocolatey package manager, install the following when running in an Administrator command prompt:

  • choco install python3

  • choco install --pre tesseract

  • choco install ghostscript

  • choco install pngquant (optional)

The commands above will install Python 3.x (latest version), Tesseract, Ghostscript and pngquant. Chocolatey may also need to install the Windows Visual C++ Runtime DLLs or other Windows patches, and may require a reboot.

You may then use pip to install ocrmypdf. (This can performed by a user or Administrator.):

  • pip install ocrmypdf

Chocolatey automatically selects appropriate versions of these applications. If you are installing them manually, please install 64-bit versions of all applications for 64-bit Windows, or 32-bit versions of all applications for 32-bit Windows. Mixing the “bitness” of these programs will lead to errors.

OCRmyPDF will check the Windows Registry and standard locations in your Program Files for third party software it needs (specifically, Tesseract and Ghostscript). To override the versions OCRmyPDF selects, you can modify the PATH environment variable. Follow these directions to change the PATH.

Warning

As of early 2021, users have reported problems with the Microsoft Store version of Python and OCRmyPDF. These issues affect many other third party Python packages. Please download Python from Python.org or Chocolatey instead, and do not use the Microsoft Store version.

Windows Subsystem for Linux

  1. Install Ubuntu 20.04 for Windows Subsystem for Linux, if not already installed.

  2. Follow the procedure to install OCRmyPDF on Ubuntu 20.04.

  3. Open the Windows command prompt and create a symlink:

wsl sudo ln -s  /home/$USER/.local/bin/ocrmypdf /usr/local/bin/ocrmypdf

Then confirm that the expected version from PyPI (OCRmyPDF latest released version on PyPI) is installed:

wsl ocrmypdf --version

You can then run OCRmyPDF in the Windows command prompt or Powershell, prefixing wsl, and call it from Windows programs or batch files.

 

No comments:

Post a Comment

Note: only a member of this blog may post a comment.

Blog Archive