public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Help debugging a dll issue
@ 2016-05-20  2:54 Eliot Moss
       [not found] ` <CABHT960Yx_bg-NaHWcxePEV+Xz74NaVtsu+NjkrSZs4-62rCOA@mail.gmail.com>
  2016-05-21 23:30 ` Eliot Moss
  0 siblings, 2 replies; 10+ messages in thread
From: Eliot Moss @ 2016-05-20  2:54 UTC (permalink / raw)
  To: cygwin

Dear Cygwin friends --

I am trying to get pypy to build under cygwin.  (It used to do so, but
has not been maintained.)  I am very close, but there is something quite
odd happening when trying to access the large dll that the system builds:
the first call into that dll goes wild and causes a segfault.  The issue
seems to lie with run-time linking, for I can use dlopen to open the dll
and then dlsym to look up the function, and I get the same bad address.
I see nothing wrong from nm and objdump.  The dll is about 70 million
bytes long, so I can't really post it, but if you want to have a crack
at this, we can find some mutually agreeable place and I can tell you
the entry point I am trying to access.

I have found that if I patch the indirection in the associated .exe file
to refer to the actual address of the function, then the program runs,
so it's just this one linkage that is not working (apparently).  Very
mysterious to me.

Regards -- Eliot Moss

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help debugging a dll issue
       [not found] ` <CABHT960Yx_bg-NaHWcxePEV+Xz74NaVtsu+NjkrSZs4-62rCOA@mail.gmail.com>
@ 2016-05-20 10:38   ` Eliot Moss
  2016-05-20 11:26     ` Duncan Roe
  0 siblings, 1 reply; 10+ messages in thread
From: Eliot Moss @ 2016-05-20 10:38 UTC (permalink / raw)
  To: Sam Habiel, cygwin

On 5/19/2016 11:28 PM, Sam Habiel wrote:
> I had trouble with dlopen in Cygwin, where it did not behave intuitively. In my case, I was
> dlopening libicu and friends. If you search using my name on the Cygwin mailing list, you should be
> able to find out how I resolved the issue. I don't recall exactly what I did, but I think it was
> that Cygwin put everything in a global namespace, and you need to dlsym NULL to grab the function
> addresses.

I just tried using NULL for the handle in dlsym, and I get the same result as before, and it
does not change between using RTLD_LOCAL or RTLD_GLOBAL in dlopen.

What I am seeing is that looking up one symbol is giving the value for a totally different
one -- it's not returning an error indication.

And this same wrong value is what happens if I just allow the natural linking to take place
(which is what I really want to happen -- the dl calls simply help focus the issue).

I will look up your previous issue, though, to see if there is something else there of use
in this situation.

Regards -- EM

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help debugging a dll issue
  2016-05-20 10:38   ` Eliot Moss
@ 2016-05-20 11:26     ` Duncan Roe
  2016-05-20 12:02       ` Eliot Moss
  0 siblings, 1 reply; 10+ messages in thread
From: Duncan Roe @ 2016-05-20 11:26 UTC (permalink / raw)
  To: cygwin

On Fri, May 20, 2016 at 06:37:57AM -0400, Eliot Moss wrote:
> On 5/19/2016 11:28 PM, Sam Habiel wrote:
> >I had trouble with dlopen in Cygwin, where it did not behave intuitively. In my case, I was
> >dlopening libicu and friends. If you search using my name on the Cygwin mailing list, you should be
> >able to find out how I resolved the issue. I don't recall exactly what I did, but I think it was
> >that Cygwin put everything in a global namespace, and you need to dlsym NULL to grab the function
> >addresses.
>
> I just tried using NULL for the handle in dlsym, and I get the same result as before, and it
> does not change between using RTLD_LOCAL or RTLD_GLOBAL in dlopen.
>
> What I am seeing is that looking up one symbol is giving the value for a totally different
> one -- it's not returning an error indication.
>
> And this same wrong value is what happens if I just allow the natural linking to take place
> (which is what I really want to happen -- the dl calls simply help focus the issue).
>
> I will look up your previous issue, though, to see if there is something else there of use
> in this situation.
>
> Regards -- EM
>
Hi Eliot,

Do you know what is the name of the totally different symbol? (maybe from nm -D)

I wrote a "findit" utility a while back - it would be interesting if it gave the
same answer for both symbols. If you would git clone
https://github.com/duncan-roe/command_line_tools, cd to the findit subdirectory
and enter "make" then you will have it.

Example use:
> 21:23:15$ ./findit cygwin1.dll printf
> Found printf in cygwin1.dll at 0x18012ecbe
> 21:24:37$

HTH ... Duncan.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help debugging a dll issue
  2016-05-20 11:26     ` Duncan Roe
@ 2016-05-20 12:02       ` Eliot Moss
  2016-05-20 13:37         ` Duncan Roe
  0 siblings, 1 reply; 10+ messages in thread
From: Eliot Moss @ 2016-05-20 12:02 UTC (permalink / raw)
  To: cygwin

On 5/20/2016 7:26 AM, Duncan Roe wrote:

> Hi Eliot,
>
> Do you know what is the name of the totally different symbol? (maybe from nm -D)

Yes -- I have been using nm and objdump to examine the relevant files.  The dll
is called libpypy-c.dll.  The symbol I want to bind to is pypy_main_startup, and
its proper value (as returned by nm and objdump) is 0x6410ac60.  The result I
get is the value of symbol pypy_g_PyNumber_Negative (an automatically generated
C function), which is 0x63443f00.

I wonder if these collide in some internal hash table and the hash lookup (or
the table building) is broken in some subtle way.

Regards -- Eliot

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help debugging a dll issue
  2016-05-20 12:02       ` Eliot Moss
@ 2016-05-20 13:37         ` Duncan Roe
  2016-05-20 13:46           ` Eliot Moss
  0 siblings, 1 reply; 10+ messages in thread
From: Duncan Roe @ 2016-05-20 13:37 UTC (permalink / raw)
  To: cygwin

On Fri, May 20, 2016 at 08:02:20AM -0400, Eliot Moss wrote:
> On 5/20/2016 7:26 AM, Duncan Roe wrote:
>
> >Hi Eliot,
> >
> >Do you know what is the name of the totally different symbol? (maybe from nm -D)
>
> Yes -- I have been using nm and objdump to examine the relevant files.  The dll
> is called libpypy-c.dll.  The symbol I want to bind to is pypy_main_startup, and
> its proper value (as returned by nm and objdump) is 0x6410ac60.  The result I
> get is the value of symbol pypy_g_PyNumber_Negative (an automatically generated
> C function), which is 0x63443f00.
>
> I wonder if these collide in some internal hash table and the hash lookup (or
> the table building) is broken in some subtle way.
>
> Regards -- Eliot
>
Does findit give the same answer for both symbols?

If you could build your library and libdl.a with debug (-g) then you might be
able to see how the lookup goes wrong.

HTH ... Duncan.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help debugging a dll issue
  2016-05-20 13:37         ` Duncan Roe
@ 2016-05-20 13:46           ` Eliot Moss
  0 siblings, 0 replies; 10+ messages in thread
From: Eliot Moss @ 2016-05-20 13:46 UTC (permalink / raw)
  To: cygwin

On 5/20/2016 9:36 AM, Duncan Roe wrote:
> On Fri, May 20, 2016 at 08:02:20AM -0400, Eliot Moss wrote:
>> On 5/20/2016 7:26 AM, Duncan Roe wrote:
>>
>>> Hi Eliot,
>>>
>>> Do you know what is the name of the totally different symbol? (maybe from nm -D)
>>
>> Yes -- I have been using nm and objdump to examine the relevant files.  The dll
>> is called libpypy-c.dll.  The symbol I want to bind to is pypy_main_startup, and
>> its proper value (as returned by nm and objdump) is 0x6410ac60.  The result I
>> get is the value of symbol pypy_g_PyNumber_Negative (an automatically generated
>> C function), which is 0x63443f00.
>>
>> I wonder if these collide in some internal hash table and the hash lookup (or
>> the table building) is broken in some subtle way.
>>
>> Regards -- Eliot
>>
> Does findit give the same answer for both symbols?
>
> If you could build your library and libdl.a with debug (-g) then you might be
> able to see how the lookup goes wrong.
>
> HTH ... Duncan.

Well, the wrong answer comes back from the Windows routine GetProcAddress.  The
bug seems to lie either in the Windows run-time code or in how the dll is being
built.  I am trying giving one of the functions a different name to see what
happens (if it's a hash collision effect, presumably something will show up
different).

Regards -- EM

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help debugging a dll issue
  2016-05-20  2:54 Help debugging a dll issue Eliot Moss
       [not found] ` <CABHT960Yx_bg-NaHWcxePEV+Xz74NaVtsu+NjkrSZs4-62rCOA@mail.gmail.com>
@ 2016-05-21 23:30 ` Eliot Moss
  2016-05-22  1:46   ` René Berber
  2016-05-22  2:58   ` Duncan Roe
  1 sibling, 2 replies; 10+ messages in thread
From: Eliot Moss @ 2016-05-21 23:30 UTC (permalink / raw)
  To: cygwin

On 5/19/2016 10:54 PM, Eliot Moss wrote:
> Dear Cygwin friends --
>
> I am trying to get pypy to build under cygwin.  (It used to do so, but
> has not been maintained.)  I am very close, but there is something quite
> odd happening when trying to access the large dll that the system builds:
> the first call into that dll goes wild and causes a segfault.  The issue
> seems to lie with run-time linking, for I can use dlopen to open the dll
> and then dlsym to look up the function, and I get the same bad address.
> I see nothing wrong from nm and objdump.  The dll is about 70 million
> bytes long, so I can't really post it, but if you want to have a crack
> at this, we can find some mutually agreeable place and I can tell you
> the entry point I am trying to access.
>
> I have found that if I patch the indirection in the associated .exe file
> to refer to the actual address of the function, then the program runs,
> so it's just this one linkage that is not working (apparently).  Very
> mysterious to me.

I used binary search, eliminating .o files from the .dll on the thought
that it was either a particular .o file that was leading to a problem,
or possibly the overall size (this is a huge link!).  I found that a .dll
with 58725 section 1 symbols (as reported by objdump -t) works, and one
with 66675 section one symbols fails.  So it appears to be a size issue.

Is anyone out there skilled enough with gnu ld to guide me as to how to
keep that section from getting too big?  I tried --split-by-reloc, but
that gave no improvement (I don't think it's relocations that are the
problem, just the overall size of a section).  I'll try --split-by-file,
but I am doubting that is the right thing either.

In fact, it is looking that the solution may be to get pypy to build
its .dll with fewer symbols in the symbol table, perhaps by suitable
use of __declspec(dllexport) and __declspec(dllimport), etc.  (These
are apparently deprecated in favor of __attribute__((visibility("hidden"))),
etc., but a number of those generate warnings that the visbility
attributes are not supported in this configuration!)

Any thoughts from the populace?

Regards -- EM

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help debugging a dll issue
  2016-05-21 23:30 ` Eliot Moss
@ 2016-05-22  1:46   ` René Berber
  2016-05-22  2:53     ` Eliot Moss
  2016-05-22  2:58   ` Duncan Roe
  1 sibling, 1 reply; 10+ messages in thread
From: René Berber @ 2016-05-22  1:46 UTC (permalink / raw)
  To: cygwin

On 5/21/2016 6:30 PM, Eliot Moss wrote:

[snip]
> I used binary search, eliminating .o files from the .dll on the thought
> that it was either a particular .o file that was leading to a problem,
> or possibly the overall size (this is a huge link!).  I found that a .dll
> with 58725 section 1 symbols (as reported by objdump -t) works, and one
> with 66675 section one symbols fails.  So it appears to be a size issue.

That's telling, since USHRT_MAX (65535) may be the limit, then somewhere
there is the use of a variable of that type (unsigned short int,
uint16_t), which may be part of some specification (i.e. the format of
libraries).

Supporting that is: https://ghc.haskell.org/trac/ghc/ticket/5292 which
mentions:

"65536 symbols. This is the limit that Windows DLLs can handle (the
source of the limitation is that they use 16-bit integers to represent
"ordinals")"

and also point to an interesting bug report (5 years old):

https://sourceware.org/bugzilla/show_bug.cgi?id=12969

No answers, but at least an explanation.
-- 
René Berber


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help debugging a dll issue
  2016-05-22  1:46   ` René Berber
@ 2016-05-22  2:53     ` Eliot Moss
  0 siblings, 0 replies; 10+ messages in thread
From: Eliot Moss @ 2016-05-22  2:53 UTC (permalink / raw)
  To: cygwin

On 5/21/2016 9:45 PM, René Berber wrote:
> On 5/21/2016 6:30 PM, Eliot Moss wrote:
>
> [snip]
>> I used binary search, eliminating .o files from the .dll on the thought
>> that it was either a particular .o file that was leading to a problem,
>> or possibly the overall size (this is a huge link!).  I found that a .dll
>> with 58725 section 1 symbols (as reported by objdump -t) works, and one
>> with 66675 section one symbols fails.  So it appears to be a size issue.
>
> That's telling, since USHRT_MAX (65535) may be the limit, then somewhere
> there is the use of a variable of that type (unsigned short int,
> uint16_t), which may be part of some specification (i.e. the format of
> libraries).
>
> Supporting that is: https://ghc.haskell.org/trac/ghc/ticket/5292 which
> mentions:
>
> "65536 symbols. This is the limit that Windows DLLs can handle (the
> source of the limitation is that they use 16-bit integers to represent
> "ordinals")"
>
> and also point to an interesting bug report (5 years old):
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=12969
>
> No answers, but at least an explanation.

Why the maintainers did not fix this, I don't know -- would have saved
me a week of effort tacking things down!

The solution was to use __declspec(dllexport), sparingly, so that only
a few symbols would be exported, and to drop --export-all-symbols. (How
did that work before?  Was the system a lot smaller?)  Supposedly
__attribute__((dllexport)) also works, though I did not try it -- using
__declspec was more in line with code for Windows native C compilers.

At least this thread may help someone in the future!

Regards -- Eliot

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Help debugging a dll issue
  2016-05-21 23:30 ` Eliot Moss
  2016-05-22  1:46   ` René Berber
@ 2016-05-22  2:58   ` Duncan Roe
  1 sibling, 0 replies; 10+ messages in thread
From: Duncan Roe @ 2016-05-22  2:58 UTC (permalink / raw)
  To: cygwin

On Sat, May 21, 2016 at 07:30:37PM -0400, Eliot Moss wrote:
> On 5/19/2016 10:54 PM, Eliot Moss wrote:
> >Dear Cygwin friends --
> >
> >I am trying to get pypy to build under cygwin.  (It used to do so, but
> >has not been maintained.)  I am very close, but there is something quite
> >odd happening when trying to access the large dll that the system builds:
> >the first call into that dll goes wild and causes a segfault.  The issue
> >seems to lie with run-time linking, for I can use dlopen to open the dll
> >and then dlsym to look up the function, and I get the same bad address.
> >I see nothing wrong from nm and objdump.  The dll is about 70 million
> >bytes long, so I can't really post it, but if you want to have a crack
> >at this, we can find some mutually agreeable place and I can tell you
> >the entry point I am trying to access.
> >
> >I have found that if I patch the indirection in the associated .exe file
> >to refer to the actual address of the function, then the program runs,
> >so it's just this one linkage that is not working (apparently).  Very
> >mysterious to me.
>
> I used binary search, eliminating .o files from the .dll on the thought
> that it was either a particular .o file that was leading to a problem,
> or possibly the overall size (this is a huge link!).  I found that a .dll
> with 58725 section 1 symbols (as reported by objdump -t) works, and one
> with 66675 section one symbols fails.  So it appears to be a size issue.
>
> Is anyone out there skilled enough with gnu ld to guide me as to how to
> keep that section from getting too big?  I tried --split-by-reloc, but
> that gave no improvement (I don't think it's relocations that are the
> problem, just the overall size of a section).  I'll try --split-by-file,
> but I am doubting that is the right thing either.
>
> In fact, it is looking that the solution may be to get pypy to build
> its .dll with fewer symbols in the symbol table, perhaps by suitable
> use of __declspec(dllexport) and __declspec(dllimport), etc.  (These
> are apparently deprecated in favor of __attribute__((visibility("hidden"))),
> etc., but a number of those generate warnings that the visbility
> attributes are not supported in this configuration!)
>
> Any thoughts from the populace?
>
> Regards -- EM
>
You surely tried this already: strip --strip-unneeded or --strip-debug?

Cheers ... Duncan.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-05-22  2:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-20  2:54 Help debugging a dll issue Eliot Moss
     [not found] ` <CABHT960Yx_bg-NaHWcxePEV+Xz74NaVtsu+NjkrSZs4-62rCOA@mail.gmail.com>
2016-05-20 10:38   ` Eliot Moss
2016-05-20 11:26     ` Duncan Roe
2016-05-20 12:02       ` Eliot Moss
2016-05-20 13:37         ` Duncan Roe
2016-05-20 13:46           ` Eliot Moss
2016-05-21 23:30 ` Eliot Moss
2016-05-22  1:46   ` René Berber
2016-05-22  2:53     ` Eliot Moss
2016-05-22  2:58   ` Duncan Roe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).