public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* LD_HWCAP_MASK failure with tst-env-setuid
@ 2017-05-19 18:04 Siddhesh Poyarekar
  2017-05-22 13:18 ` Adhemerval Zanella
  0 siblings, 1 reply; 4+ messages in thread
From: Siddhesh Poyarekar @ 2017-05-19 18:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: Adhemerval Zanella, Szabolcs Nagy

Adhemerval,

I tried a bunch of things with the LD_HWCAP_MASK and tst-env-setuid and
other programs and my current conclusion is that it may be due to a
stale tst-env-setuid binary.  I've attached a long form description of
things I tried for you or others to poke holes into, but can you confirm
if a clean build and test run also fails similarly for you?

Here's what I did:

1. To begin with, I simply ran /bin/true with LD_HWCAP_MASK set:

LD_HWCAP_MASK=0xffffffff /bin/true

and sure enough, on one of my boxes it failed with the ENOMEM and on
another it took a good 5-6 seconds before finishing.  This confirmed
that the issue has been long-standing but was never really noticed.  At
this point I was going with the assumption that this was a generic bug
and did not bother testing aarch64.

2. Now I tried running elf/ld.so under a debugger and was able to see
the delay, but I was simply unable to break at the point of the delay or
failure.  I could not understand at that point what was going on, so I
moved on to something else

3. Now I ran /bin/true with testrun.sh and the LD_HWCAP_MASK envvar set
and could see the delay.  I tried attaching to elf/ld.so during that
delay and once again it seemed to be in arbitrary places and I could not
figure out what was going on.

4. I ran perf and found the place in _dl_important_hwcaps where the
program spent the most time.  I put a bunch of _dl_debug_printf's all
over the place and oddly the printfs near the hotspot never even got
invoked, the function was returning much before that.

5. And then my Alexander Graham Bell moment happened, where I
accidentally ran elf/ld.so directly instead of from within testrun.sh
and the program succeeded immediately, no more delay.  Likewise on the
other box, running the built elf/ld.so directly no longer showed the
ENOMEM failure.

6. Then I formed the hypothesis that using the old glibc from the system
was to blame and that trunk glibc was working fine.  This fit in with
all of the failures perfectly because all of them involved execution of
a shell or another intermediary program using the system dynamic linker
and that is what was failing, not the test.  gdb could not break at that
point because the delay was in the shell it had invoked to start the
program; the program had not even started.

I decided to test this by doing a git bisect.

7. The bisect led to the fix for pr#21391 that HJ Lu pushed, which
seemed to have stopped the delays and ENOMEMs in their tracks.  This led
me to conclude that the issue is specific to x86 and does not affect
aarch64.  I tested that hypothesis using my mustang aarch64 machine and
sure enough, it succeeded all of the tests that x86 failed.

So to conclude, the only way that tst-env-setuid would have failed for
you in this case was if it was stale i.e. failed to rebuild somehow.
Hence my request to test again with a clean build.

Phew.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-05-22 17:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19 18:04 LD_HWCAP_MASK failure with tst-env-setuid Siddhesh Poyarekar
2017-05-22 13:18 ` Adhemerval Zanella
2017-05-22 17:17   ` Siddhesh Poyarekar
2017-05-22 17:42     ` Siddhesh Poyarekar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).