public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* LD_HWCAP_MASK failure with tst-env-setuid
@ 2017-05-19 18:04 Siddhesh Poyarekar
  2017-05-22 13:18 ` Adhemerval Zanella
  0 siblings, 1 reply; 4+ messages in thread
From: Siddhesh Poyarekar @ 2017-05-19 18:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: Adhemerval Zanella, Szabolcs Nagy

Adhemerval,

I tried a bunch of things with the LD_HWCAP_MASK and tst-env-setuid and
other programs and my current conclusion is that it may be due to a
stale tst-env-setuid binary.  I've attached a long form description of
things I tried for you or others to poke holes into, but can you confirm
if a clean build and test run also fails similarly for you?

Here's what I did:

1. To begin with, I simply ran /bin/true with LD_HWCAP_MASK set:

LD_HWCAP_MASK=0xffffffff /bin/true

and sure enough, on one of my boxes it failed with the ENOMEM and on
another it took a good 5-6 seconds before finishing.  This confirmed
that the issue has been long-standing but was never really noticed.  At
this point I was going with the assumption that this was a generic bug
and did not bother testing aarch64.

2. Now I tried running elf/ld.so under a debugger and was able to see
the delay, but I was simply unable to break at the point of the delay or
failure.  I could not understand at that point what was going on, so I
moved on to something else

3. Now I ran /bin/true with testrun.sh and the LD_HWCAP_MASK envvar set
and could see the delay.  I tried attaching to elf/ld.so during that
delay and once again it seemed to be in arbitrary places and I could not
figure out what was going on.

4. I ran perf and found the place in _dl_important_hwcaps where the
program spent the most time.  I put a bunch of _dl_debug_printf's all
over the place and oddly the printfs near the hotspot never even got
invoked, the function was returning much before that.

5. And then my Alexander Graham Bell moment happened, where I
accidentally ran elf/ld.so directly instead of from within testrun.sh
and the program succeeded immediately, no more delay.  Likewise on the
other box, running the built elf/ld.so directly no longer showed the
ENOMEM failure.

6. Then I formed the hypothesis that using the old glibc from the system
was to blame and that trunk glibc was working fine.  This fit in with
all of the failures perfectly because all of them involved execution of
a shell or another intermediary program using the system dynamic linker
and that is what was failing, not the test.  gdb could not break at that
point because the delay was in the shell it had invoked to start the
program; the program had not even started.

I decided to test this by doing a git bisect.

7. The bisect led to the fix for pr#21391 that HJ Lu pushed, which
seemed to have stopped the delays and ENOMEMs in their tracks.  This led
me to conclude that the issue is specific to x86 and does not affect
aarch64.  I tested that hypothesis using my mustang aarch64 machine and
sure enough, it succeeded all of the tests that x86 failed.

So to conclude, the only way that tst-env-setuid would have failed for
you in this case was if it was stale i.e. failed to rebuild somehow.
Hence my request to test again with a clean build.

Phew.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: LD_HWCAP_MASK failure with tst-env-setuid
  2017-05-19 18:04 LD_HWCAP_MASK failure with tst-env-setuid Siddhesh Poyarekar
@ 2017-05-22 13:18 ` Adhemerval Zanella
  2017-05-22 17:17   ` Siddhesh Poyarekar
  0 siblings, 1 reply; 4+ messages in thread
From: Adhemerval Zanella @ 2017-05-22 13:18 UTC (permalink / raw)
  To: Siddhesh Poyarekar, libc-alpha; +Cc: Szabolcs Nagy

On 19/05/2017 15:04, Siddhesh Poyarekar wrote:
> Adhemerval,
> 
> I tried a bunch of things with the LD_HWCAP_MASK and tst-env-setuid and
> other programs and my current conclusion is that it may be due to a
> stale tst-env-setuid binary.  I've attached a long form description of
> things I tried for you or others to poke holes into, but can you confirm
> if a clean build and test run also fails similarly for you?

Hi Siddhesh, unfortunately the test is still failing on my x86_64 system
with LD_HWCAP_MASK=0xffffffff.  I used you latest tunable patchset [1]
applies on top of master (402bf0695218bbe290418b9486b1dd5fe284d903) and
configure with:

--host=x86_64-linux-gnu --build=x86_64-linux-gnu --enable-add-ons=libidn
--without-selinux --enable-stackguard-randomization --enable-obsolete-rpc
--enable-systemtap --enable-multi-arch --enable-lock-elision --enable-tunables

However I am only seeing this issue on x86_64, aarch64 does not bail out
with 'cannot create capability list: Cannot allocate memory'.

And using you analysis I tried to install the built glibc on a sysroot
and neither the 'bin/true' or the 'tst-env-setuid' failed with 
LD_HWCAP_MASK=0xffffffff.  So I think for BZ#21391 indeed fixed and your
suggestion about installed glibc messing up with the testing still
worries me.  I think what might be happening in fact is static linked
binaries are still relying on ld.so.cache on some internal calculation,
which I think it is not the intended behaviour.  I will try spend some
time figuring out why this is still fails on my system.

[1] https://sourceware.org/ml/libc-alpha/2017-05/msg00570.html 

> 
> Here's what I did:
> 
> 1. To begin with, I simply ran /bin/true with LD_HWCAP_MASK set:
> 
> LD_HWCAP_MASK=0xffffffff /bin/true
> 
> and sure enough, on one of my boxes it failed with the ENOMEM and on
> another it took a good 5-6 seconds before finishing.  This confirmed
> that the issue has been long-standing but was never really noticed.  At
> this point I was going with the assumption that this was a generic bug
> and did not bother testing aarch64.
> 
> 2. Now I tried running elf/ld.so under a debugger and was able to see
> the delay, but I was simply unable to break at the point of the delay or
> failure.  I could not understand at that point what was going on, so I
> moved on to something else
> 
> 3. Now I ran /bin/true with testrun.sh and the LD_HWCAP_MASK envvar set
> and could see the delay.  I tried attaching to elf/ld.so during that
> delay and once again it seemed to be in arbitrary places and I could not
> figure out what was going on.
> 
> 4. I ran perf and found the place in _dl_important_hwcaps where the
> program spent the most time.  I put a bunch of _dl_debug_printf's all
> over the place and oddly the printfs near the hotspot never even got
> invoked, the function was returning much before that.
> 
> 5. And then my Alexander Graham Bell moment happened, where I
> accidentally ran elf/ld.so directly instead of from within testrun.sh
> and the program succeeded immediately, no more delay.  Likewise on the
> other box, running the built elf/ld.so directly no longer showed the
> ENOMEM failure.
> 
> 6. Then I formed the hypothesis that using the old glibc from the system
> was to blame and that trunk glibc was working fine.  This fit in with
> all of the failures perfectly because all of them involved execution of
> a shell or another intermediary program using the system dynamic linker
> and that is what was failing, not the test.  gdb could not break at that
> point because the delay was in the shell it had invoked to start the
> program; the program had not even started.
> 
> I decided to test this by doing a git bisect.
> 
> 7. The bisect led to the fix for pr#21391 that HJ Lu pushed, which
> seemed to have stopped the delays and ENOMEMs in their tracks.  This led
> me to conclude that the issue is specific to x86 and does not affect
> aarch64.  I tested that hypothesis using my mustang aarch64 machine and
> sure enough, it succeeded all of the tests that x86 failed.
> 
> So to conclude, the only way that tst-env-setuid would have failed for
> you in this case was if it was stale i.e. failed to rebuild somehow.
> Hence my request to test again with a clean build.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: LD_HWCAP_MASK failure with tst-env-setuid
  2017-05-22 13:18 ` Adhemerval Zanella
@ 2017-05-22 17:17   ` Siddhesh Poyarekar
  2017-05-22 17:42     ` Siddhesh Poyarekar
  0 siblings, 1 reply; 4+ messages in thread
From: Siddhesh Poyarekar @ 2017-05-22 17:17 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha; +Cc: Szabolcs Nagy

On Monday 22 May 2017 06:48 PM, Adhemerval Zanella wrote:
> Hi Siddhesh, unfortunately the test is still failing on my x86_64 system
> with LD_HWCAP_MASK=0xffffffff.  I used you latest tunable patchset [1]
> applies on top of master (402bf0695218bbe290418b9486b1dd5fe284d903) and
> configure with:
> 
> --host=x86_64-linux-gnu --build=x86_64-linux-gnu --enable-add-ons=libidn
> --without-selinux --enable-stackguard-randomization --enable-obsolete-rpc
> --enable-systemtap --enable-multi-arch --enable-lock-elision --enable-tunables
> 
> However I am only seeing this issue on x86_64, aarch64 does not bail out
> with 'cannot create capability list: Cannot allocate memory'.
> 
> And using you analysis I tried to install the built glibc on a sysroot
> and neither the 'bin/true' or the 'tst-env-setuid' failed with 
> LD_HWCAP_MASK=0xffffffff.  So I think for BZ#21391 indeed fixed and your
> suggestion about installed glibc messing up with the testing still
> worries me.  I think what might be happening in fact is static linked
> binaries are still relying on ld.so.cache on some internal calculation,
> which I think it is not the intended behaviour.  I will try spend some
> time figuring out why this is still fails on my system.

OK I've found a box that exhibits this too and it is happening only with
my patchset and not without.  Let me dig as well.

Siddhesh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: LD_HWCAP_MASK failure with tst-env-setuid
  2017-05-22 17:17   ` Siddhesh Poyarekar
@ 2017-05-22 17:42     ` Siddhesh Poyarekar
  0 siblings, 0 replies; 4+ messages in thread
From: Siddhesh Poyarekar @ 2017-05-22 17:42 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha; +Cc: Szabolcs Nagy

On Monday 22 May 2017 10:47 PM, Siddhesh Poyarekar wrote:
> OK I've found a box that exhibits this too and it is happening only with
> my patchset and not without.  Let me dig as well.

Alright I see what is going on now.

Static binaries do not process LD_HWCAP_MASK and as a result,
tst-env-setuid before my patch would not see the effect of
LD_HWCAP_MASK.  I'll need to fix up patch 4/5 to ignore
glibc.tune.hwcap_mask for static binaries.

Siddhesh

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-05-22 17:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19 18:04 LD_HWCAP_MASK failure with tst-env-setuid Siddhesh Poyarekar
2017-05-22 13:18 ` Adhemerval Zanella
2017-05-22 17:17   ` Siddhesh Poyarekar
2017-05-22 17:42     ` Siddhesh Poyarekar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).