public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall
       [not found] <0d9b4a1b-05ba-8ab1-3783-c3d1f04f97b7@gmail.com>
@ 2020-05-15 19:34 ` Marco Atzeri
  0 siblings, 0 replies; 7+ messages in thread
From: Marco Atzeri @ 2020-05-15 19:34 UTC (permalink / raw)
  To: cygwin


On 15.05.2020 20:17, Jim Garrison via Cygwin wrote:
> On 5/14/2020 5:23 PM, René Berber via Cygwin wrote:
> [snip]
>>> if os.name == 'nt':
>>>       libname = 'liblept-5'
>>>       os.environ['PATH'] = shim_paths_with_program_files()
>>
>> Notice this change in search path, dll files in Windows are executables
>> and they are (must) installed in the system PATH (or the current
>> directory).
> 
> $PATH contains /usr/bin
> 
>> Either that or python needs the symbols file, like the linker, which in
>> this case would be /usr/lib/liblept.dll.a, which is in the -devel
>> package, but I doubt it.
> 
> Installing the -devel package, and undoing my "fix" to leptonica.py
> fixed the problem.  I guess I'm now going to have to learn how to
> build a Cygwin package for it.

the usage of import library for linking a library is very standard,
not really Cygwin specific.

headers and import library are almost every time in the libXXX-devel.
In the doubt you can look at the source package info:

https://cygwin.com/packages/summary/leptonica-src.html

> Unfortunately I didn't realize how involved building it on Cygwin was
> going to be and that it would be worth documenting, so I didn't keep
> track of everything I had to do (sigh!)

building for Cygwin is a Unix-like exercise.
May be you started with a Windows-like view ?

> 
> How would I gauge interest in having a Cygwin version of OCRmyPDF?
> 

from the list at
https://ocrmypdf.readthedocs.io/en/latest/installation.html#installing-with-python-pip

     ghostscript \
     libexempi3 \
     libffi6 \
     pngquant \
     python3.6 \
     qpdf \
     tesseract-ocr \
     unpaper


the only package NOT available on Cygwin is "unpaper"

Regards
Marco



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall
  2020-05-15  1:45   ` Marco Atzeri
@ 2020-05-15 18:19     ` Jim Garrison
  0 siblings, 0 replies; 7+ messages in thread
From: Jim Garrison @ 2020-05-15 18:19 UTC (permalink / raw)
  To: cygwin

On 5/14/2020 6:45 PM, Marco Atzeri via Cygwin wrote:
> Am 15.05.2020 um 00:50 schrieb Jim Garrison via Cygwin:
>> The magic incantation necessary to get strdup turns out to be -
>> D_GNU_SOURCE, as noted on StackOverflow.
[xnip]
> 
> where are you looking for ?
> Have you installed libleptonica_5 and libleptonica-devel ?

I had not installed the -devel package, but after I installed
it (and removed my "fix" to leptonica.py) it worked.  Thanks!


-- 
Jim Garrison jhg@acm.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall
  2020-05-15  0:23   ` René Berber
@ 2020-05-15 18:17     ` Jim Garrison
  0 siblings, 0 replies; 7+ messages in thread
From: Jim Garrison @ 2020-05-15 18:17 UTC (permalink / raw)
  To: cygwin

On 5/14/2020 5:23 PM, René Berber via Cygwin wrote:
[snip]
>> if os.name == 'nt':
>>      libname = 'liblept-5'
>>      os.environ['PATH'] = shim_paths_with_program_files()
> 
> Notice this change in search path, dll files in Windows are executables
> and they are (must) installed in the system PATH (or the current
> directory).

$PATH contains /usr/bin

> Either that or python needs the symbols file, like the linker, which in
> this case would be /usr/lib/liblept.dll.a, which is in the -devel
> package, but I doubt it.

Installing the -devel package, and undoing my "fix" to leptonica.py
fixed the problem.  I guess I'm now going to have to learn how to
build a Cygwin package for it.

Unfortunately I didn't realize how involved building it on Cygwin was
going to be and that it would be worth documenting, so I didn't keep
track of everything I had to do (sigh!)

How would I gauge interest in having a Cygwin version of OCRmyPDF?

-- 
Jim Garrison jhg@acm.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall
  2020-05-14 22:50 ` Jim Garrison
  2020-05-15  0:23   ` René Berber
@ 2020-05-15  1:45   ` Marco Atzeri
  2020-05-15 18:19     ` Jim Garrison
  1 sibling, 1 reply; 7+ messages in thread
From: Marco Atzeri @ 2020-05-15  1:45 UTC (permalink / raw)
  To: cygwin

Am 15.05.2020 um 00:50 schrieb Jim Garrison via Cygwin:
> The magic incantation necessary to get strdup turns out to be -
> D_GNU_SOURCE, as noted on StackOverflow.
> 
> However, now I'm encountering a problem with Python's DLL handling
> code.  When attempting to run OCRmyPDF I get
> 
> 
> 
> $ ocrmypdf --help
> Traceback (most recent call last):
>    File "/usr/bin/ocrmypdf", line 11, in <module>
>      load_entry_point('ocrmypdf==9.8.0.post3+g5944044.d20200514',
> 'console_scripts', 'ocrmypdf')()
>    File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py",
> line 489, in load_entry_point
>      return get_distribution(dist).load_entry_point(group, name)
>    File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py",
> line 2852, in load_entry_point
>      return ep.load()
>    File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py",
> line 2443, in load
>      return self.resolve()
>    File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py",
> line 2449, in resolve
>      module = __import__(self.module_name, fromlist=['__name__'], level=0)
>    File
> "/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/__init__.py",
> line 18, in <module>
>      from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo
>    File
> "/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/leptonica.py",
> line 67, in <module>
>      """
> ocrmypdf.exceptions.MissingDependencyError:
> 
> ---------------------------------------------------------------------
>          This error normally occurs when ocrmypdf can't find the Leptonica
>          library, which is usually installed with Tesseract OCR. It could
> be that
>          Tesseract is not installed properly, we can't find the installation
>          on your system PATH environment variable.
> 
>          The library we are looking for is usually called:
>              liblept-5.dll   (Windows)
>              liblept*.dylib  (macOS)
>              liblept*.so     (Linux/BSD)
> 
>          Please review our installation procedures to find a solution:
>              https://ocrmypdf.readthedocs.io/en/latest/installation.html
> 
> ---------------------------------------------------------------------
> 
> 
> In the last file of the traceback (leptonica.py) there's this:
> 
> 
> from ctypes.util import find_library
> ...
> if os.name == 'nt':
>      libname = 'liblept-5'
>      os.environ['PATH'] = shim_paths_with_program_files()
> else:
>      libname = 'lept'
> 
> 
> In Cygwin, that library is /usr/bin/cyglept-5.dll (why was the name
> changed?)

standard on Cygwin to differentiate from mingw build
https://cygwin.com/cygwin-ug-net/dll.html
> 
> First I created a symlink from cyglept-5.dll to liblept-5.dll, with no
> effect. So I added a test for Cygwin at that point, resulting in this
> code:
> 
> 
> if os.name == 'nt':
>      libname = 'liblept-5'
>      os.environ['PATH'] = shim_paths_with_program_files()
> elif sys.platform == 'cygwin':
>      libname = 'cyglept-5'
> else:
>      libname = 'lept'
> 
> 
> This also had no effect, so I tried playing with find_library() in the
> interactive shell.  In Cygwin, it doesn't seem to find any DLLs even
> though those DLLs are actually loadable.  Viz:
> 
> 
> $ python3
> Python 3.7.7 (default, Apr 10 2020, 07:59:19)
> [GCC 9.3.0] on cygwin
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import os
>>>> import sys
>>>> from ctypes import *
>>>> from ctypes.util import find_library
>>>> find_library('cyglept-5') or 'Not found'
> 'Not found'
>>>> find_library('cyglept-5.dll') or 'Not Found'
> 'Not Found'
>>>> cdll.LoadLibrary('cyglept-5.dll') or 'Not Found'
> <CDLL 'cyglept-5.dll', handle 3f7970000 at 0x6fffffea76d0>
> 
> 
> So it appears to me that possibly find_library() is broken because
> it doesn't find the library, but yet Python can actually load the
> library.
> 
> What am I missing?
> 

where are you looking for ?
Have you installed libleptonica_5 and libleptonica-devel ?


$ cygcheck -cd |grep lept
leptonica                               1.79.0-1
libleptonica-devel                      1.79.0-1
libleptonica_5                          1.79.0-1





^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall
  2020-05-14 22:50 ` Jim Garrison
@ 2020-05-15  0:23   ` René Berber
  2020-05-15 18:17     ` Jim Garrison
  2020-05-15  1:45   ` Marco Atzeri
  1 sibling, 1 reply; 7+ messages in thread
From: René Berber @ 2020-05-15  0:23 UTC (permalink / raw)
  To: cygwin

On 5/14/2020 5:50 PM, Jim Garrison via Cygwin wrote:

> The magic incantation necessary to get strdup turns out to be -
> D_GNU_SOURCE, as noted on StackOverflow.
> 
> However, now I'm encountering a problem with Python's DLL handling
> code.  When attempting to run OCRmyPDF I get
[snip]
> line 18, in <module>
>      from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo
>    File
> "/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/leptonica.py",
> line 67, in <module>
>      """
> ocrmypdf.exceptions.MissingDependencyError:
[snip]

> In the last file of the traceback (leptonica.py) there's this:
> 
> 
> from ctypes.util import find_library
> ...
> if os.name == 'nt':
>      libname = 'liblept-5'
>      os.environ['PATH'] = shim_paths_with_program_files()
> else:
>      libname = 'lept'
> 
> 
> In Cygwin, that library is /usr/bin/cyglept-5.dll (why was the name
> changed?)
> 
> First I created a symlink from cyglept-5.dll to liblept-5.dll, with no
> effect. So I added a test for Cygwin at that point, resulting in this
> code:
> 
> 
> if os.name == 'nt':
>      libname = 'liblept-5'
>      os.environ['PATH'] = shim_paths_with_program_files()

Notice this change in search path, dll files in Windows are executables 
and they are (must) installed in the system PATH (or the current directory).

> elif sys.platform == 'cygwin':
>      libname = 'cyglept-5'

On Cygwin you can do the same as above, it will contain /bin (or 
/usr/bin which are one and the same).

> else:
>      libname = 'lept'
> 
> 
> This also had no effect, so I tried playing with find_library() in the
> interactive shell.  In Cygwin, it doesn't seem to find any DLLs even
> though those DLLs are actually loadable.  Viz:
[snip]

My guess is the search path is incorrect.

Either that or python needs the symbols file, like the linker, which in 
this case would be /usr/lib/liblept.dll.a, which is in the -devel 
package, but I doubt it.
-- 
R.Berber



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Trying to build OCRmyPDF under Cygwin, hit a brick wall
  2020-05-14 19:45 Jim Garrison
@ 2020-05-14 22:50 ` Jim Garrison
  2020-05-15  0:23   ` René Berber
  2020-05-15  1:45   ` Marco Atzeri
  0 siblings, 2 replies; 7+ messages in thread
From: Jim Garrison @ 2020-05-14 22:50 UTC (permalink / raw)
  To: cygwin

The magic incantation necessary to get strdup turns out to be -
D_GNU_SOURCE, as noted on StackOverflow.

However, now I'm encountering a problem with Python's DLL handling
code.  When attempting to run OCRmyPDF I get



$ ocrmypdf --help
Traceback (most recent call last):
  File "/usr/bin/ocrmypdf", line 11, in <module>
    load_entry_point('ocrmypdf==9.8.0.post3+g5944044.d20200514',
'console_scripts', 'ocrmypdf')()
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py",
line 489, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py",
line 2852, in load_entry_point
    return ep.load()
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py",
line 2443, in load
    return self.resolve()
  File "/usr/lib/python3.7/site-packages/pkg_resources/__init__.py",
line 2449, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File
"/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/__init__.py",
line 18, in <module>
    from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo
  File
"/usr/lib/python3.7/site-packages/ocrmypdf-9.8.0.post3+g5944044.d20200514-py3.7.egg/ocrmypdf/leptonica.py",
line 67, in <module>
    """
ocrmypdf.exceptions.MissingDependencyError:

---------------------------------------------------------------------
        This error normally occurs when ocrmypdf can't find the Leptonica
        library, which is usually installed with Tesseract OCR. It could
be that
        Tesseract is not installed properly, we can't find the installation
        on your system PATH environment variable.

        The library we are looking for is usually called:
            liblept-5.dll   (Windows)
            liblept*.dylib  (macOS)
            liblept*.so     (Linux/BSD)

        Please review our installation procedures to find a solution:
            https://ocrmypdf.readthedocs.io/en/latest/installation.html

---------------------------------------------------------------------


In the last file of the traceback (leptonica.py) there's this:


from ctypes.util import find_library
...
if os.name == 'nt':
    libname = 'liblept-5'
    os.environ['PATH'] = shim_paths_with_program_files()
else:
    libname = 'lept'


In Cygwin, that library is /usr/bin/cyglept-5.dll (why was the name
changed?)

First I created a symlink from cyglept-5.dll to liblept-5.dll, with no
effect. So I added a test for Cygwin at that point, resulting in this
code:


if os.name == 'nt':
    libname = 'liblept-5'
    os.environ['PATH'] = shim_paths_with_program_files()
elif sys.platform == 'cygwin':
    libname = 'cyglept-5'
else:
    libname = 'lept'


This also had no effect, so I tried playing with find_library() in the
interactive shell.  In Cygwin, it doesn't seem to find any DLLs even
though those DLLs are actually loadable.  Viz:


$ python3
Python 3.7.7 (default, Apr 10 2020, 07:59:19)
[GCC 9.3.0] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import sys
>>> from ctypes import *
>>> from ctypes.util import find_library
>>> find_library('cyglept-5') or 'Not found'
'Not found'
>>> find_library('cyglept-5.dll') or 'Not Found'
'Not Found'
>>> cdll.LoadLibrary('cyglept-5.dll') or 'Not Found'
<CDLL 'cyglept-5.dll', handle 3f7970000 at 0x6fffffea76d0>


So it appears to me that possibly find_library() is broken because
it doesn't find the library, but yet Python can actually load the
library.

What am I missing?


-- 
Jim Garrison jhg@acm.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Trying to build OCRmyPDF under Cygwin, hit a brick wall
@ 2020-05-14 19:45 Jim Garrison
  2020-05-14 22:50 ` Jim Garrison
  0 siblings, 1 reply; 7+ messages in thread
From: Jim Garrison @ 2020-05-14 19:45 UTC (permalink / raw)
  To: cygwin

I'm trying to build OCRmyPDF under Cygwin and have run into a brick
wall. While I've been a developer my entire career, I've worked mostly
in Java and have little knowledge of Python internals and C++. The
problem might be obvious to an expert in these areas but I'm stumped.

OCRmyPDF on Linux installs as a set of "wheel" packages. I gather a
wheel is a pre-built bundle of dependencies. For some reason, under
Cygwin the pip installer believes it cannot use the wheel bundles and
wants to rebuild from source. The problem occurs when trying to rebuild
the pikepdf package.

This post is preliminary. Including the error message here would be
hard to read as it contains very long lines that will wrap and
alignment will be messed up.  I've posted the question to StackOverflow
at https://stackoverflow.com/q/61803714/18157 where I can use markdown
to make it much more readable.

Is it OK to ask here for any interested party to look at the SO post?
If not, I apologize, and I will post the question in its entirety
here.

Thanks

-- 
Jim Garrison jhg@acm.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-05-15 19:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0d9b4a1b-05ba-8ab1-3783-c3d1f04f97b7@gmail.com>
2020-05-15 19:34 ` Trying to build OCRmyPDF under Cygwin, hit a brick wall Marco Atzeri
2020-05-14 19:45 Jim Garrison
2020-05-14 22:50 ` Jim Garrison
2020-05-15  0:23   ` René Berber
2020-05-15 18:17     ` Jim Garrison
2020-05-15  1:45   ` Marco Atzeri
2020-05-15 18:19     ` Jim Garrison

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).