public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Jim Garrison <jhg@jhmg.net>
To: cygwin@cygwin.com
Subject: Volunteer testers for OCRmyPDF install instructions under Cygwin?
Date: Sun, 17 May 2020 21:44:41 -0700	[thread overview]
Message-ID: <d8f6cb30-ebeb-fcbe-d15e-b34702a65359@jhmg.net> (raw)

OCFmyPDF is a command-line utility that will take image-only PDFs,
perform OCR and add a text layer to the PDF, allowing it to be
searched.  It is written in Python and C++, and on Linux is installed
via the Python 'pip' installer.

I tried installing it under Cygwin64 but ran into a compiler error
while building a dependency, pikepdf.  This turned out to be fixable
by a single CFLAGS change (from -std=c++14 to -std=gnu++14), which the
maintainer of pikepdf (and OCRmyPDF) graciously fast-tracked.

The instructions for installing under Cygwin are:

1. Install the following Cygwin packages:

        python36 (or later)
        python3?-devel
        python3?-pip
        python3?-lxml

     (where 3? means match the version of python3 you installed)

        gcc-g++
        ghostscript
        libexempi3
        libexempi-devel
        libffi6
        libffi-devel
        pngquant
        qpdf
        libqpdf-devel
        tesseract-ocr
        tesseract-ocr-devel

2. In a terminal, run the following commands

        pip3 install wheel
        pip3 install ocrmypdf

    Note: You may get a warning about the version of pip that came
    with Cygwin being out of date.  It is not required, but if you want
    you can update pip to the latest version with

        pip3 install --upgrade pip

    But note that if you do this the command name will now be just
    'pip' instead of 'pip3'.

There is one optional dependency, "unpaper" that is currently not
available under Cygwin. Without it, certain options such as --clean
will produce an error message.  However, the OCR-to-text-layer
functionality is available.  I'll take a look at building a Cygwin
version of unpaper.

I've tried this in a clean, minimal Cygwin install but would like to
get confirmation from a few other people before submitting this to the
OCRmyPDF maintainer for inclusion in their install instructions.

Is there anyone with interest in OCRmyPDF willing to try these
instructions and report back?  Off-list is fine if that would be off-
topic here.

Thanks

--
Jim Garrison jhg@acm.org

                 reply	other threads:[~2020-05-18  4:44 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8f6cb30-ebeb-fcbe-d15e-b34702a65359@jhmg.net \
    --to=jhg@jhmg.net \
    --cc=cygwin@cygwin.com \
    --cc=jhg@acm.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).