From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.jhmg.net (smtp.jhmg.net [45.55.176.36]) by sourceware.org (Postfix) with ESMTPS id 01F30386F02B for ; Mon, 18 May 2020 04:44:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 01F30386F02B Received: from [192.168.10.7] (c-73-11-123-33.hsd1.or.comcast.net [73.11.123.33]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.jhmg.net (Postfix) with ESMTPSA id 464D2403DE for ; Mon, 18 May 2020 00:44:41 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp.jhmg.net 464D2403DE From: Jim Garrison Subject: Volunteer testers for OCRmyPDF install instructions under Cygwin? Reply-To: jhg@acm.org To: cygwin@cygwin.com Autocrypt: addr=jhg@jhmg.net; prefer-encrypt=mutual; keydata= mQGiBDWEWocRBADfF9Q6lhkW9USReZ96cBC93kq3bblkNslVAZzm9itW7sAEzHbydIZ9hZjm e93UxUPzg1zGXX9xrdQy0+lHxkj2wvzgEF50Kqjft6KAd8AqiNmcbu5Q+/SHIP87C/tD/wWO TX7I99ekggy+5a6illN/s7MhuPIsMtt3ofFFcuOvswCg/08V11KALulG6u9j1affyHy20UMD /A1MRT3YZt6NJE9XbcalVLQzWc+ArCkW0oxNs/wrQ26lYoWuj20nusq9MDkuOL1h1FxeUrgx kKP+1zyYaQkB2lbJyvGvIpXgxY4vUnOXwMovTcRST3bWOOSIiYVOzKWJh5fPtoEaU5wFZ6yU lu/QGoS8Lt9QOI/XjjRaJjf0T6rKBADTn4xcxNIQNWSxJthmH3ipn39+sizwkZHfmAVHUf6w f4cDJ8mA3jl0RWKTnxj+5zEY32VduewHtNUtgwugXaIlLM/ErO+tzxQ4R6QysucgxmJBUvw8 uDgUAKv8HQFviEGeUpQSoZLKoqxk3udT+9UEDHdUFZzUw6cb7nBL5RR05rQfSmFtZXMgSC4g R2Fycmlzb24gPGpoZ0BhY20ub3JnPohOBBARAgAOBAsDAgEFAk9QBRECGQEACgkQKW78YnBz jYiN1QCgsJYtE2vUORbwWAqC/DMqYGSjMWAAoIFomnf2gp9zrl5pMv9gD1gTEGEPuQINBDWE WocQCAD2Qle3CH8IF3KiutapQvMF6PlTETlPtvFuuUs4INoBp1ajFOmPQFXz0AfGy0OplK33 TGSGSfgMg71l6RfUodNQ+PVZX9x2Uk89PY3bzpnhV5JZzf24rnRPxfx2vIPFRzBhznzJZv8V +bv9kV7HAarTW56NoKVyOtQa8L9GAFgr5fSI/VhOSdvNILSd5JEHNmszbDgNRR0PfIizHHxb LY7288kjwEPwpVsYjY67VYy4XTjTNP18F1dDox0YbN4zISy1Kv884bEpQBgRjXyEpwpy1obE AxnIByl6ypUM2Zafq9AKUJsCRtMIPWakXUGfnHy9iUsiGSa6q6Jew1XpMgs7AAICB/9P0SzY Lt1xjTmFGwf+uEYL6ymfMeeGVQMl53vm38kxAzYpAPEuk/6pJQHzQkeAYI55rhgqomZacGtT W4p0JzX2rLzunltzpDGiqkqu3ZLFrKpKkadZCWN6qVUhE8LaObZBuppZNm1CnIPB+RNucYGe Sn60mia08EBO+IzlLmOJBkopMME3vTzTsnvmECchEoPov5A9tXMW3TJpLQtSyiXMGs8TalHb by40WOPvPkyCrWVrYCEoUz8wgz2L5ZzmPcwQQVTfzpxFIb5HINAspyHqP5KBtfrYF05DEAXg RZEoh9T3HDtzMLwAgxFN0BzVXIwgYTtqwPsTBTqJHNwQZ0BTiEYEGBECAAYFAjWEWocACgkQ KW78YnBzjYi0zgCgv6RuSo28x1TBIbEQJgAwAV6DPdMAnjC3YrzFCHHmI+4tNkU/JmgLy+t3 Message-ID: Date: Sun, 17 May 2020 21:44:41 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Firefox/68.0 Thunderbird/68.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 May 2020 04:44:43 -0000 OCFmyPDF is a command-line utility that will take image-only PDFs, perform OCR and add a text layer to the PDF, allowing it to be searched. It is written in Python and C++, and on Linux is installed via the Python 'pip' installer. I tried installing it under Cygwin64 but ran into a compiler error while building a dependency, pikepdf. This turned out to be fixable by a single CFLAGS change (from -std=c++14 to -std=gnu++14), which the maintainer of pikepdf (and OCRmyPDF) graciously fast-tracked. The instructions for installing under Cygwin are: 1. Install the following Cygwin packages: python36 (or later) python3?-devel python3?-pip python3?-lxml (where 3? means match the version of python3 you installed) gcc-g++ ghostscript libexempi3 libexempi-devel libffi6 libffi-devel pngquant qpdf libqpdf-devel tesseract-ocr tesseract-ocr-devel 2. In a terminal, run the following commands pip3 install wheel pip3 install ocrmypdf Note: You may get a warning about the version of pip that came with Cygwin being out of date. It is not required, but if you want you can update pip to the latest version with pip3 install --upgrade pip But note that if you do this the command name will now be just 'pip' instead of 'pip3'. There is one optional dependency, "unpaper" that is currently not available under Cygwin. Without it, certain options such as --clean will produce an error message. However, the OCR-to-text-layer functionality is available. I'll take a look at building a Cygwin version of unpaper. I've tried this in a clean, minimal Cygwin install but would like to get confirmation from a few other people before submitting this to the OCRmyPDF maintainer for inclusion in their install instructions. Is there anyone with interest in OCRmyPDF willing to try these instructions and report back? Off-list is fine if that would be off- topic here. Thanks -- Jim Garrison jhg@acm.org