From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.jhmg.net (smtp.jhmg.net [45.55.176.36]) by sourceware.org (Postfix) with ESMTPS id C0E5D3870855 for ; Thu, 14 May 2020 22:50:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org C0E5D3870855 Received: from [192.168.10.7] (c-73-11-123-33.hsd1.or.comcast.net [73.11.123.33]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.jhmg.net (Postfix) with ESMTPSA id 33E84400E3 for ; Thu, 14 May 2020 18:50:44 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp.jhmg.net 33E84400E3 Reply-To: jhg@acm.org Subject: Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall To: cygwin@cygwin.com References: From: Jim Garrison Autocrypt: addr=jhg@jhmg.net; prefer-encrypt=mutual; keydata= mQGiBDWEWocRBADfF9Q6lhkW9USReZ96cBC93kq3bblkNslVAZzm9itW7sAEzHbydIZ9hZjm e93UxUPzg1zGXX9xrdQy0+lHxkj2wvzgEF50Kqjft6KAd8AqiNmcbu5Q+/SHIP87C/tD/wWO TX7I99ekggy+5a6illN/s7MhuPIsMtt3ofFFcuOvswCg/08V11KALulG6u9j1affyHy20UMD /A1MRT3YZt6NJE9XbcalVLQzWc+ArCkW0oxNs/wrQ26lYoWuj20nusq9MDkuOL1h1FxeUrgx kKP+1zyYaQkB2lbJyvGvIpXgxY4vUnOXwMovTcRST3bWOOSIiYVOzKWJh5fPtoEaU5wFZ6yU lu/QGoS8Lt9QOI/XjjRaJjf0T6rKBADTn4xcxNIQNWSxJthmH3ipn39+sizwkZHfmAVHUf6w f4cDJ8mA3jl0RWKTnxj+5zEY32VduewHtNUtgwugXaIlLM/ErO+tzxQ4R6QysucgxmJBUvw8 uDgUAKv8HQFviEGeUpQSoZLKoqxk3udT+9UEDHdUFZzUw6cb7nBL5RR05rQfSmFtZXMgSC4g R2Fycmlzb24gPGpoZ0BhY20ub3JnPohOBBARAgAOBAsDAgEFAk9QBRECGQEACgkQKW78YnBz jYiN1QCgsJYtE2vUORbwWAqC/DMqYGSjMWAAoIFomnf2gp9zrl5pMv9gD1gTEGEPuQINBDWE WocQCAD2Qle3CH8IF3KiutapQvMF6PlTETlPtvFuuUs4INoBp1ajFOmPQFXz0AfGy0OplK33 TGSGSfgMg71l6RfUodNQ+PVZX9x2Uk89PY3bzpnhV5JZzf24rnRPxfx2vIPFRzBhznzJZv8V +bv9kV7HAarTW56NoKVyOtQa8L9GAFgr5fSI/VhOSdvNILSd5JEHNmszbDgNRR0PfIizHHxb LY7288kjwEPwpVsYjY67VYy4XTjTNP18F1dDox0YbN4zISy1Kv884bEpQBgRjXyEpwpy1obE AxnIByl6ypUM2Zafq9AKUJsCRtMIPWakXUGfnHy9iUsiGSa6q6Jew1XpMgs7AAICB/9P0SzY Lt1xjTmFGwf+uEYL6ymfMeeGVQMl53vm38kxAzYpAPEuk/6pJQHzQkeAYI55rhgqomZacGtT W4p0JzX2rLzunltzpDGiqkqu3ZLFrKpKkadZCWN6qVUhE8LaObZBuppZNm1CnIPB+RNucYGe Sn60mia08EBO+IzlLmOJBkopMME3vTzTsnvmECchEoPov5A9tXMW3TJpLQtSyiXMGs8TalHb by40WOPvPkyCrWVrYCEoUz8wgz2L5ZzmPcwQQVTfzpxFIb5HINAspyHqP5KBtfrYF05DEAXg RZEoh9T3HDtzMLwAgxFN0BzVXIwgYTtqwPsTBTqJHNwQZ0BTiEYEGBECAAYFAjWEWocACgkQ KW78YnBzjYi0zgCgv6RuSo28x1TBIbEQJgAwAV6DPdMAnjC3YrzFCHHmI+4tNkU/JmgLy+t3 Message-ID: <2afedbcf-f3d7-4e96-e196-fb3091630245@jhmg.net> Date: Thu, 14 May 2020 15:50:44 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Firefox/68.0 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_ASCII_DIVIDERS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 May 2020 22:50:55 -0000 The magic incantation necessary to get strdup turns out to be - D_GNU_SOURCE, as noted on StackOverflow. However, now I'm encountering a problem with Python's DLL handling code. When attempting to run OCRmyPDF I get $ ocrmypdf --help Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 11, in load_entry_point('ocrmypdf==9.8.0.post3+g5944044.d20200514', 'console_scripts', 'ocrmypdf')() File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2852, in load_entry_point return ep.load() File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2443, in load return self.resolve() File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2449, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) File "/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/__init__.py", line 18, in from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo File "/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/leptonica.py", line 67, in """ ocrmypdf.exceptions.MissingDependencyError: --------------------------------------------------------------------- This error normally occurs when ocrmypdf can't find the Leptonica library, which is usually installed with Tesseract OCR. It could be that Tesseract is not installed properly, we can't find the installation on your system PATH environment variable. The library we are looking for is usually called: liblept-5.dll (Windows) liblept*.dylib (macOS) liblept*.so (Linux/BSD) Please review our installation procedures to find a solution: https://ocrmypdf.readthedocs.io/en/latest/installation.html --------------------------------------------------------------------- In the last file of the traceback (leptonica.py) there's this: from ctypes.util import find_library ... if os.name == 'nt': libname = 'liblept-5' os.environ['PATH'] = shim_paths_with_program_files() else: libname = 'lept' In Cygwin, that library is /usr/bin/cyglept-5.dll (why was the name changed?) First I created a symlink from cyglept-5.dll to liblept-5.dll, with no effect. So I added a test for Cygwin at that point, resulting in this code: if os.name == 'nt': libname = 'liblept-5' os.environ['PATH'] = shim_paths_with_program_files() elif sys.platform == 'cygwin': libname = 'cyglept-5' else: libname = 'lept' This also had no effect, so I tried playing with find_library() in the interactive shell. In Cygwin, it doesn't seem to find any DLLs even though those DLLs are actually loadable. Viz: $ python3 Python 3.7.7 (default, Apr 10 2020, 07:59:19) [GCC 9.3.0] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> import sys >>> from ctypes import * >>> from ctypes.util import find_library >>> find_library('cyglept-5') or 'Not found' 'Not found' >>> find_library('cyglept-5.dll') or 'Not Found' 'Not Found' >>> cdll.LoadLibrary('cyglept-5.dll') or 'Not Found' So it appears to me that possibly find_library() is broken because it doesn't find the library, but yet Python can actually load the library. What am I missing? -- Jim Garrison jhg@acm.org