* Trying to build OCRmyPDF under Cygwin, hit a brick wall @ 2020-05-14 19:45 Jim Garrison 2020-05-14 22:50 ` Jim Garrison 0 siblings, 1 reply; 7+ messages in thread From: Jim Garrison @ 2020-05-14 19:45 UTC (permalink / raw) To: cygwin I'm trying to build OCRmyPDF under Cygwin and have run into a brick wall. While I've been a developer my entire career, I've worked mostly in Java and have little knowledge of Python internals and C++. The problem might be obvious to an expert in these areas but I'm stumped. OCRmyPDF on Linux installs as a set of "wheel" packages. I gather a wheel is a pre-built bundle of dependencies. For some reason, under Cygwin the pip installer believes it cannot use the wheel bundles and wants to rebuild from source. The problem occurs when trying to rebuild the pikepdf package. This post is preliminary. Including the error message here would be hard to read as it contains very long lines that will wrap and alignment will be messed up. I've posted the question to StackOverflow at https://stackoverflow.com/q/61803714/18157 where I can use markdown to make it much more readable. Is it OK to ask here for any interested party to look at the SO post? If not, I apologize, and I will post the question in its entirety here. Thanks -- Jim Garrison jhg@acm.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall 2020-05-14 19:45 Trying to build OCRmyPDF under Cygwin, hit a brick wall Jim Garrison @ 2020-05-14 22:50 ` Jim Garrison 2020-05-15 0:23 ` René Berber 2020-05-15 1:45 ` Marco Atzeri 0 siblings, 2 replies; 7+ messages in thread From: Jim Garrison @ 2020-05-14 22:50 UTC (permalink / raw) To: cygwin The magic incantation necessary to get strdup turns out to be - D_GNU_SOURCE, as noted on StackOverflow. However, now I'm encountering a problem with Python's DLL handling code. When attempting to run OCRmyPDF I get $ ocrmypdf --help Traceback (most recent call last): File "/usr/bin/ocrmypdf", line 11, in <module> load_entry_point('ocrmypdf==9.8.0.post3+g5944044.d20200514', 'console_scripts', 'ocrmypdf')() File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2852, in load_entry_point return ep.load() File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2443, in load return self.resolve() File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 2449, in resolve module = __import__(self.module_name, fromlist=['__name__'], level=0) File "/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/__init__.py", line 18, in <module> from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo File "/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/leptonica.py", line 67, in <module> """ ocrmypdf.exceptions.MissingDependencyError: --------------------------------------------------------------------- This error normally occurs when ocrmypdf can't find the Leptonica library, which is usually installed with Tesseract OCR. It could be that Tesseract is not installed properly, we can't find the installation on your system PATH environment variable. The library we are looking for is usually called: liblept-5.dll (Windows) liblept*.dylib (macOS) liblept*.so (Linux/BSD) Please review our installation procedures to find a solution: https://ocrmypdf.readthedocs.io/en/latest/installation.html --------------------------------------------------------------------- In the last file of the traceback (leptonica.py) there's this: from ctypes.util import find_library ... if os.name == 'nt': libname = 'liblept-5' os.environ['PATH'] = shim_paths_with_program_files() else: libname = 'lept' In Cygwin, that library is /usr/bin/cyglept-5.dll (why was the name changed?) First I created a symlink from cyglept-5.dll to liblept-5.dll, with no effect. So I added a test for Cygwin at that point, resulting in this code: if os.name == 'nt': libname = 'liblept-5' os.environ['PATH'] = shim_paths_with_program_files() elif sys.platform == 'cygwin': libname = 'cyglept-5' else: libname = 'lept' This also had no effect, so I tried playing with find_library() in the interactive shell. In Cygwin, it doesn't seem to find any DLLs even though those DLLs are actually loadable. Viz: $ python3 Python 3.7.7 (default, Apr 10 2020, 07:59:19) [GCC 9.3.0] on cygwin Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> import sys >>> from ctypes import * >>> from ctypes.util import find_library >>> find_library('cyglept-5') or 'Not found' 'Not found' >>> find_library('cyglept-5.dll') or 'Not Found' 'Not Found' >>> cdll.LoadLibrary('cyglept-5.dll') or 'Not Found' <CDLL 'cyglept-5.dll', handle 3f7970000 at 0x6fffffea76d0> So it appears to me that possibly find_library() is broken because it doesn't find the library, but yet Python can actually load the library. What am I missing? -- Jim Garrison jhg@acm.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall 2020-05-14 22:50 ` Jim Garrison @ 2020-05-15 0:23 ` René Berber 2020-05-15 18:17 ` Jim Garrison 2020-05-15 1:45 ` Marco Atzeri 1 sibling, 1 reply; 7+ messages in thread From: René Berber @ 2020-05-15 0:23 UTC (permalink / raw) To: cygwin On 5/14/2020 5:50 PM, Jim Garrison via Cygwin wrote: > The magic incantation necessary to get strdup turns out to be - > D_GNU_SOURCE, as noted on StackOverflow. > > However, now I'm encountering a problem with Python's DLL handling > code. When attempting to run OCRmyPDF I get [snip] > line 18, in <module> > from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo > File > "/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/leptonica.py", > line 67, in <module> > """ > ocrmypdf.exceptions.MissingDependencyError: [snip] > In the last file of the traceback (leptonica.py) there's this: > > > from ctypes.util import find_library > ... > if os.name == 'nt': > libname = 'liblept-5' > os.environ['PATH'] = shim_paths_with_program_files() > else: > libname = 'lept' > > > In Cygwin, that library is /usr/bin/cyglept-5.dll (why was the name > changed?) > > First I created a symlink from cyglept-5.dll to liblept-5.dll, with no > effect. So I added a test for Cygwin at that point, resulting in this > code: > > > if os.name == 'nt': > libname = 'liblept-5' > os.environ['PATH'] = shim_paths_with_program_files() Notice this change in search path, dll files in Windows are executables and they are (must) installed in the system PATH (or the current directory). > elif sys.platform == 'cygwin': > libname = 'cyglept-5' On Cygwin you can do the same as above, it will contain /bin (or /usr/bin which are one and the same). > else: > libname = 'lept' > > > This also had no effect, so I tried playing with find_library() in the > interactive shell. In Cygwin, it doesn't seem to find any DLLs even > though those DLLs are actually loadable. Viz: [snip] My guess is the search path is incorrect. Either that or python needs the symbols file, like the linker, which in this case would be /usr/lib/liblept.dll.a, which is in the -devel package, but I doubt it. -- R.Berber ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall 2020-05-15 0:23 ` René Berber @ 2020-05-15 18:17 ` Jim Garrison 0 siblings, 0 replies; 7+ messages in thread From: Jim Garrison @ 2020-05-15 18:17 UTC (permalink / raw) To: cygwin On 5/14/2020 5:23 PM, René Berber via Cygwin wrote: [snip] >> if os.name == 'nt': >> libname = 'liblept-5' >> os.environ['PATH'] = shim_paths_with_program_files() > > Notice this change in search path, dll files in Windows are executables > and they are (must) installed in the system PATH (or the current > directory). $PATH contains /usr/bin > Either that or python needs the symbols file, like the linker, which in > this case would be /usr/lib/liblept.dll.a, which is in the -devel > package, but I doubt it. Installing the -devel package, and undoing my "fix" to leptonica.py fixed the problem. I guess I'm now going to have to learn how to build a Cygwin package for it. Unfortunately I didn't realize how involved building it on Cygwin was going to be and that it would be worth documenting, so I didn't keep track of everything I had to do (sigh!) How would I gauge interest in having a Cygwin version of OCRmyPDF? -- Jim Garrison jhg@acm.org ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall 2020-05-14 22:50 ` Jim Garrison 2020-05-15 0:23 ` René Berber @ 2020-05-15 1:45 ` Marco Atzeri 2020-05-15 18:19 ` Jim Garrison 1 sibling, 1 reply; 7+ messages in thread From: Marco Atzeri @ 2020-05-15 1:45 UTC (permalink / raw) To: cygwin Am 15.05.2020 um 00:50 schrieb Jim Garrison via Cygwin: > The magic incantation necessary to get strdup turns out to be - > D_GNU_SOURCE, as noted on StackOverflow. > > However, now I'm encountering a problem with Python's DLL handling > code. When attempting to run OCRmyPDF I get > > > > $ ocrmypdf --help > Traceback (most recent call last): > File "/usr/bin/ocrmypdf", line 11, in <module> > load_entry_point('ocrmypdf==9.8.0.post3+g5944044.d20200514', > 'console_scripts', 'ocrmypdf')() > File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", > line 489, in load_entry_point > return get_distribution(dist).load_entry_point(group, name) > File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", > line 2852, in load_entry_point > return ep.load() > File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", > line 2443, in load > return self.resolve() > File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py", > line 2449, in resolve > module = __import__(self.module_name, fromlist=['__name__'], level=0) > File > "/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/__init__.py", > line 18, in <module> > from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo > File > "/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/leptonica.py", > line 67, in <module> > """ > ocrmypdf.exceptions.MissingDependencyError: > > --------------------------------------------------------------------- > This error normally occurs when ocrmypdf can't find the Leptonica > library, which is usually installed with Tesseract OCR. It could > be that > Tesseract is not installed properly, we can't find the installation > on your system PATH environment variable. > > The library we are looking for is usually called: > liblept-5.dll (Windows) > liblept*.dylib (macOS) > liblept*.so (Linux/BSD) > > Please review our installation procedures to find a solution: > https://ocrmypdf.readthedocs.io/en/latest/installation.html > > --------------------------------------------------------------------- > > > In the last file of the traceback (leptonica.py) there's this: > > > from ctypes.util import find_library > ... > if os.name == 'nt': > libname = 'liblept-5' > os.environ['PATH'] = shim_paths_with_program_files() > else: > libname = 'lept' > > > In Cygwin, that library is /usr/bin/cyglept-5.dll (why was the name > changed?) standard on Cygwin to differentiate from mingw build https://cygwin.com/cygwin-ug-net/dll.html > > First I created a symlink from cyglept-5.dll to liblept-5.dll, with no > effect. So I added a test for Cygwin at that point, resulting in this > code: > > > if os.name == 'nt': > libname = 'liblept-5' > os.environ['PATH'] = shim_paths_with_program_files() > elif sys.platform == 'cygwin': > libname = 'cyglept-5' > else: > libname = 'lept' > > > This also had no effect, so I tried playing with find_library() in the > interactive shell. In Cygwin, it doesn't seem to find any DLLs even > though those DLLs are actually loadable. Viz: > > > $ python3 > Python 3.7.7 (default, Apr 10 2020, 07:59:19) > [GCC 9.3.0] on cygwin > Type "help", "copyright", "credits" or "license" for more information. >>>> import os >>>> import sys >>>> from ctypes import * >>>> from ctypes.util import find_library >>>> find_library('cyglept-5') or 'Not found' > 'Not found' >>>> find_library('cyglept-5.dll') or 'Not Found' > 'Not Found' >>>> cdll.LoadLibrary('cyglept-5.dll') or 'Not Found' > <CDLL 'cyglept-5.dll', handle 3f7970000 at 0x6fffffea76d0> > > > So it appears to me that possibly find_library() is broken because > it doesn't find the library, but yet Python can actually load the > library. > > What am I missing? > where are you looking for ? Have you installed libleptonica_5 and libleptonica-devel ? $ cygcheck -cd |grep lept leptonica 1.79.0-1 libleptonica-devel 1.79.0-1 libleptonica_5 1.79.0-1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall 2020-05-15 1:45 ` Marco Atzeri @ 2020-05-15 18:19 ` Jim Garrison 0 siblings, 0 replies; 7+ messages in thread From: Jim Garrison @ 2020-05-15 18:19 UTC (permalink / raw) To: cygwin On 5/14/2020 6:45 PM, Marco Atzeri via Cygwin wrote: > Am 15.05.2020 um 00:50 schrieb Jim Garrison via Cygwin: >> The magic incantation necessary to get strdup turns out to be - >> D_GNU_SOURCE, as noted on StackOverflow. [xnip] > > where are you looking for ? > Have you installed libleptonica_5 and libleptonica-devel ? I had not installed the -devel package, but after I installed it (and removed my "fix" to leptonica.py) it worked. Thanks! -- Jim Garrison jhg@acm.org ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <0d9b4a1b-05ba-8ab1-3783-c3d1f04f97b7@gmail.com>]
* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall [not found] <0d9b4a1b-05ba-8ab1-3783-c3d1f04f97b7@gmail.com> @ 2020-05-15 19:34 ` Marco Atzeri 0 siblings, 0 replies; 7+ messages in thread From: Marco Atzeri @ 2020-05-15 19:34 UTC (permalink / raw) To: cygwin On 15.05.2020 20:17, Jim Garrison via Cygwin wrote: > On 5/14/2020 5:23 PM, René Berber via Cygwin wrote: > [snip] >>> if os.name == 'nt': >>> libname = 'liblept-5' >>> os.environ['PATH'] = shim_paths_with_program_files() >> >> Notice this change in search path, dll files in Windows are executables >> and they are (must) installed in the system PATH (or the current >> directory). > > $PATH contains /usr/bin > >> Either that or python needs the symbols file, like the linker, which in >> this case would be /usr/lib/liblept.dll.a, which is in the -devel >> package, but I doubt it. > > Installing the -devel package, and undoing my "fix" to leptonica.py > fixed the problem. I guess I'm now going to have to learn how to > build a Cygwin package for it. the usage of import library for linking a library is very standard, not really Cygwin specific. headers and import library are almost every time in the libXXX-devel. In the doubt you can look at the source package info: https://cygwin.com/packages/summary/leptonica-src.html > Unfortunately I didn't realize how involved building it on Cygwin was > going to be and that it would be worth documenting, so I didn't keep > track of everything I had to do (sigh!) building for Cygwin is a Unix-like exercise. May be you started with a Windows-like view ? > > How would I gauge interest in having a Cygwin version of OCRmyPDF? > from the list at https://ocrmypdf.readthedocs.io/en/latest/installation.html#installing-with-python-pip ghostscript \ libexempi3 \ libffi6 \ pngquant \ python3.6 \ qpdf \ tesseract-ocr \ unpaper the only package NOT available on Cygwin is "unpaper" Regards Marco ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-05-15 19:34 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-05-14 19:45 Trying to build OCRmyPDF under Cygwin, hit a brick wall Jim Garrison 2020-05-14 22:50 ` Jim Garrison 2020-05-15 0:23 ` René Berber 2020-05-15 18:17 ` Jim Garrison 2020-05-15 1:45 ` Marco Atzeri 2020-05-15 18:19 ` Jim Garrison [not found] <0d9b4a1b-05ba-8ab1-3783-c3d1f04f97b7@gmail.com> 2020-05-15 19:34 ` Marco Atzeri
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).