public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* Question about readelf output from shared library built with lld, gold, and bfd linkers
@ 2022-11-20 16:37 Tom Kacvinsky
  2022-11-21 14:14 ` Nick Clifton
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Kacvinsky @ 2022-11-20 16:37 UTC (permalink / raw)
  To: Binutils

[-- Attachment #1: Type: text/plain, Size: 7945 bytes --]

Isee the following output from readelf for the lld, gold and bfd linkers
(binutils 2.39, lld 14.0.6)
Output below.  Notice how the ordering is different in each case.  The
interesting thing about this is, and why I am looking at various
differences between object code linked by these three linkers, is that I am
trying to track down why startup times are slower due to relocations (based
on the perf tool output).  Would any of these differences make, well, a
difference in startup time?

Thanks,

Tom

Output from readelf for the different linkers.  The particular function is
irrelevant, it happens with other symbols as well.

lld linker:

readelf -Wa libVcAppLib.so | grep ada__strings__search__index
000000000162e508  000055c800000007 R_X86_64_JUMP_SLOT     0000000001571220
ada__strings__search__index + 0
000000000162e510  0000587700000007 R_X86_64_JUMP_SLOT     0000000001570a30
ada__strings__search__index__2 + 0
000000000162e518  0000587d00000007 R_X86_64_JUMP_SLOT     0000000001570c40
ada__strings__search__index__3 + 0
000000000162e520  0000588400000007 R_X86_64_JUMP_SLOT     00000000015715d0
ada__strings__search__index__4 + 0
000000000162e528  0000588a00000007 R_X86_64_JUMP_SLOT     0000000001570d00
ada__strings__search__index__5 + 0
000000000162e530  0000588d00000007 R_X86_64_JUMP_SLOT     0000000001570dd0
ada__strings__search__index__6 + 0
000000000162e540  000003fe00000007 R_X86_64_JUMP_SLOT     0000000001570ea0
ada__strings__search__index_non_blank + 0
000000000162e548  0000278900000007 R_X86_64_JUMP_SLOT     0000000001570f00
ada__strings__search__index_non_blank__2 + 0
  1022: 0000000001570ea0    91 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index_non_blank
 10121: 0000000001570f00   167 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index_non_blank__2
 21960: 0000000001571220   930 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index
 22647: 0000000001570a30   523 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index__2
 22653: 0000000001570c40   187 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index__3
 22660: 00000000015715d0   207 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index__4
 22666: 0000000001570d00   207 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index__5
 22669: 0000000001570dd0   199 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index__6
 36851: 0000000001571220   930 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index
 36853: 0000000001570a30   523 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index__2
 36854: 0000000001570c40   187 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index__3
 36856: 00000000015715d0   207 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index__4
 36858: 0000000001570d00   207 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index__5
 36860: 0000000001570dd0   199 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index__6
 36862: 0000000001570ea0    91 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index_non_blank
 36864: 0000000001570f00   167 FUNC    GLOBAL DEFAULT   14
ada__strings__search__index_non_blank__2

gold linker:

readelf -Wa libVcAppLib.so | grep ada__strings__search__index
0000000001610048  00000f0600000007 R_X86_64_JUMP_SLOT     000000000115a220
ada__strings__search__index + 0
0000000001610050  00003fa300000007 R_X86_64_JUMP_SLOT     0000000001159a30
ada__strings__search__index__2 + 0
0000000001610058  00003fa600000007 R_X86_64_JUMP_SLOT     0000000001159c40
ada__strings__search__index__3 + 0
0000000001610060  00003fa800000007 R_X86_64_JUMP_SLOT     000000000115a5d0
ada__strings__search__index__4 + 0
0000000001610068  00003fa900000007 R_X86_64_JUMP_SLOT     0000000001159d00
ada__strings__search__index__5 + 0
0000000001610070  00003fad00000007 R_X86_64_JUMP_SLOT     0000000001159dd0
ada__strings__search__index__6 + 0
0000000001610078  0000075000000007 R_X86_64_JUMP_SLOT     0000000001159ea0
ada__strings__search__index_non_blank + 0
0000000001610080  000050cb00000007 R_X86_64_JUMP_SLOT     0000000001159f00
ada__strings__search__index_non_blank__2 + 0
  1872: 0000000001159ea0    91 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index_non_blank
  3846: 000000000115a220   930 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index
 16291: 0000000001159a30   523 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__2
 16294: 0000000001159c40   187 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__3
 16296: 000000000115a5d0   207 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__4
 16297: 0000000001159d00   207 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__5
 16301: 0000000001159dd0   199 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__6
 20683: 0000000001159f00   167 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index_non_blank__2
 16810: 0000000001159f00   167 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index_non_blank__2
 16812: 0000000001159ea0    91 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index_non_blank
 16814: 0000000001159dd0   199 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__6
 16816: 0000000001159d00   207 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__5
 16818: 000000000115a5d0   207 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__4
 16820: 0000000001159c40   187 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__3
 16821: 0000000001159a30   523 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__2
 16823: 000000000115a220   930 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index

bfd linker:

readelf -Wa libVcAppLib.so | grep ada__strings__search__index
00000000015db948  00003fa600000007 R_X86_64_JUMP_SLOT     000000000114fa20
ada__strings__search__index__2 + 0
00000000015de700  00003fa800000007 R_X86_64_JUMP_SLOT     000000000114fc30
ada__strings__search__index__3 + 0
00000000015e1908  0000075200000007 R_X86_64_JUMP_SLOT     000000000114fe90
ada__strings__search__index_non_blank + 0
00000000015e55c8  00000f0a00000007 R_X86_64_JUMP_SLOT     0000000001150210
ada__strings__search__index + 0
00000000015e6e40  00003fae00000007 R_X86_64_JUMP_SLOT     000000000114fdc0
ada__strings__search__index__6 + 0
00000000015ec3a0  00003faa00000007 R_X86_64_JUMP_SLOT     00000000011505c0
ada__strings__search__index__4 + 0
00000000015edb08  000050cd00000007 R_X86_64_JUMP_SLOT     000000000114fef0
ada__strings__search__index_non_blank__2 + 0
00000000015ef168  00003fad00000007 R_X86_64_JUMP_SLOT     000000000114fcf0
ada__strings__search__index__5 + 0
  1874: 000000000114fe90    91 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index_non_blank
  3850: 0000000001150210   930 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index
 16294: 000000000114fa20   523 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__2
 16296: 000000000114fc30   187 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__3
 16298: 00000000011505c0   207 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__4
 16301: 000000000114fcf0   207 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__5
 16302: 000000000114fdc0   199 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__6
 20685: 000000000114fef0   167 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index_non_blank__2
 16406: 000000000114fa20   523 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__2
 19426: 000000000114fc30   187 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__3
 22716: 000000000114fe90    91 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index_non_blank
 26768: 0000000001150210   930 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index
 28382: 000000000114fdc0   199 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__6
 34154: 00000000011505c0   207 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__4
 35651: 000000000114fef0   167 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index_non_blank__2
 37103: 000000000114fcf0   207 FUNC    GLOBAL DEFAULT   12
ada__strings__search__index__5

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about readelf output from shared library built with lld, gold, and bfd linkers
  2022-11-20 16:37 Question about readelf output from shared library built with lld, gold, and bfd linkers Tom Kacvinsky
@ 2022-11-21 14:14 ` Nick Clifton
  2022-11-21 14:51   ` Tom Kacvinsky
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Clifton @ 2022-11-21 14:14 UTC (permalink / raw)
  To: Tom Kacvinsky, Binutils

Hi Tom,

> Isee the following output from readelf for the lld, gold and bfd linkers
> (binutils 2.39, lld 14.0.6)

To be clear - this is not a readelf problem.  It is showing you the correct
results.  It is the fact that the three linkers are not producing identical
output and instead showing slight variations in their layout of the linked
binary that is bothering you, yes ?

In general variations in the layout like this should not make any difference
to the program#s startup time.  There might - possibly - be variations in
performance due to affects like cache misses and the like, but this is hard
to quantify in isolation.


> Output below.  Notice how the ordering is different in each case.  The
> interesting thing about this is, and why I am looking at various
> differences between object code linked by these three linkers, is that I am
> trying to track down why startup times are slower due to relocations (based
> on the perf tool output).  Would any of these differences make, well, a
> difference in startup time?

I don't think so.  The number of relocations is the same, so the amount
of start up time spent resolving them should effectively be the same as
well.

I assume that you have compared started up times when linking with "-z now"
vs "-z lazy" ?

Cheers
   Nick


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about readelf output from shared library built with lld, gold, and bfd linkers
  2022-11-21 14:14 ` Nick Clifton
@ 2022-11-21 14:51   ` Tom Kacvinsky
  2022-11-21 15:16     ` Nick Clifton
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Kacvinsky @ 2022-11-21 14:51 UTC (permalink / raw)
  To: Binutils

[-- Attachment #1: Type: text/plain, Size: 2195 bytes --]

On Mon, Nov 21, 2022 at 9:14 AM Nick Clifton <nickc@redhat.com> wrote:

> Hi Tom,
>
> > Isee the following output from readelf for the lld, gold and bfd linkers
> > (binutils 2.39, lld 14.0.6)
>
> To be clear - this is not a readelf problem.  It is showing you the correct
> results.  It is the fact that the three linkers are not producing identical
> output and instead showing slight variations in their layout of the linked
> binary that is bothering you, yes ?
>

It's not bothering me so much as I was wondering if this would point to an
issue with slow start up times.  Reading your reply in totality, I now see
this
should not make a difference.


> In general variations in the layout like this should not make any
> difference
> to the program#s startup time.  There might - possibly - be variations in
> performance due to affects like cache misses and the like, but this is hard
> to quantify in isolation.
>

I didn't stop to think about caching and the like for relocations at start
up.  I was
testing with the perf tool and LD_DEBUG=statistics to see how much time was
spent in start up by symbols/relocations.  Results varied from run to run,
so I can
now see how caching may play into this.

> Output below.  Notice how the ordering is different in each case.  The
> > interesting thing about this is, and why I am looking at various
> > differences between object code linked by these three linkers, is that I
> am
> > trying to track down why startup times are slower due to relocations
> (based
> > on the perf tool output).  Would any of these differences make, well, a
> > difference in startup time?
>
> I don't think so.  The number of relocations is the same, so the amount
> of start up time spent resolving them should effectively be the same as
> well.
>
> I assume that you have compared started up times when linking with "-z now"
> vs "-z lazy" ?
>

I thought -z lazy was the default, and that if you wanted the equivalent of
LD_BIND_NOW=1
on the command line, then one would use -z now.  We don't want the latter,
we already spawn
enough processes that -z now would slow things down even more.  I can try
with -z lazy and
report back.

Thanks for the input,

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about readelf output from shared library built with lld, gold, and bfd linkers
  2022-11-21 14:51   ` Tom Kacvinsky
@ 2022-11-21 15:16     ` Nick Clifton
  2022-11-21 15:33       ` Tom Kacvinsky
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Clifton @ 2022-11-21 15:16 UTC (permalink / raw)
  To: Tom Kacvinsky, Binutils

Hi Tom,

>> I assume that you have compared started up times when linking with "-z now"
>> vs "-z lazy" ?
>>
> 
> I thought -z lazy was the default, and that if you wanted the equivalent of
> LD_BIND_NOW=1 on the command line, then one would use -z now.

It depends upon the environment.  In Fedora (and RHEL) "-z now" is the
default.  The reason is program security - lazy binding means that an
attacker might be able to alter the relocations before they are evaluated,
allowing all kinds of unexpected things to happen.

Cheers
   Nick



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about readelf output from shared library built with lld, gold, and bfd linkers
  2022-11-21 15:16     ` Nick Clifton
@ 2022-11-21 15:33       ` Tom Kacvinsky
  2022-11-21 16:48         ` Nick Clifton
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Kacvinsky @ 2022-11-21 15:33 UTC (permalink / raw)
  To: Binutils

[-- Attachment #1: Type: text/plain, Size: 1083 bytes --]

On Mon, Nov 21, 2022 at 10:17 AM Nick Clifton <nickc@redhat.com> wrote:

> Hi Tom,
>
> >> I assume that you have compared started up times when linking with "-z
> now"
> >> vs "-z lazy" ?
> >>
> >
> > I thought -z lazy was the default, and that if you wanted the equivalent
> of
> > LD_BIND_NOW=1 on the command line, then one would use -z now.
>
> It depends upon the environment.  In Fedora (and RHEL) "-z now" is the
> default.  The reason is program security - lazy binding means that an
> attacker might be able to alter the relocations before they are evaluated,
> allowing all kinds of unexpected things to happen.
>

Good to know.  I am using CentOS 7 with a built from source binutils (2.39)
as the system linker is too old to handle the DWARF5 debug symbols made
by GCC 12.1 (also built from source).  So the modification made by Fedora
and RHEL is custom to their distros, right?  If so, then I think that I am
using
-z lazy.

Also, a question: would -z combreloc make a difference with relocations at
startup?  I was just perusing the ld manual and ran across that.

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about readelf output from shared library built with lld, gold, and bfd linkers
  2022-11-21 15:33       ` Tom Kacvinsky
@ 2022-11-21 16:48         ` Nick Clifton
  2022-11-21 17:43           ` Tom Kacvinsky
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Clifton @ 2022-11-21 16:48 UTC (permalink / raw)
  To: Tom Kacvinsky, Binutils

Hi Tom,

> Also, a question: would -z combreloc make a difference with relocations at
> startup?  I was just perusing the ld manual and ran across that.

It may well do.  It is certainly worth a try.

Cheers
   Nick



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about readelf output from shared library built with lld, gold, and bfd linkers
  2022-11-21 16:48         ` Nick Clifton
@ 2022-11-21 17:43           ` Tom Kacvinsky
  2022-11-22 11:20             ` Nick Clifton
  0 siblings, 1 reply; 8+ messages in thread
From: Tom Kacvinsky @ 2022-11-21 17:43 UTC (permalink / raw)
  To: Binutils

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

On Mon, Nov 21, 2022 at 11:48 AM Nick Clifton <nickc@redhat.com> wrote:

> Hi Tom,
>
> > Also, a question: would -z combreloc make a difference with relocations
> at
> > startup?  I was just perusing the ld manual and ran across that.
>
> It may well do.  It is certainly worth a try.
>

Alas, no joy.  These are the types of fun problems I face at work - change
one of the tools
we use to build our product and one of our performance tests has a
regression.  Then it
is on me to figure out why.  But, to be honest, this is black magic to me.
I do not know
where to start looking to figure out why lld generates the final ELF file
that runs quicker
than what is generated by the gold linker.

So, I guess the next question is this - what should I look for in the files
generated by lld
and gold, respectively, using  readelf?

Thanks,

Tom

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Question about readelf output from shared library built with lld, gold, and bfd linkers
  2022-11-21 17:43           ` Tom Kacvinsky
@ 2022-11-22 11:20             ` Nick Clifton
  0 siblings, 0 replies; 8+ messages in thread
From: Nick Clifton @ 2022-11-22 11:20 UTC (permalink / raw)
  To: Tom Kacvinsky, Binutils

Hi Tom,

> I do not know
> where to start looking to figure out why lld generates the final ELF file
> that runs quicker than what is generated by the gold linker.

Are we talking about application start-up times or application run-time
performance ?  The two are different and likely to be affected by different
changes in the linker.

[As an aside have you also compared performance when linking with the BFD
based linker, and with mold ?)


> So, I guess the next question is this - what should I look for in the files
> generated by lld and gold, respectively, using  readelf?

You don't.  At least that is not where I would start.  Your best bet is to
use one or more profiling tools and figure out why one version of the
application is faster than the other.  If it is a case that the profiles of
the two versions are basically the same - ie they spend the same percentages
of their time in the same functions - then the cause is likely to be the
layout of the code.  The faster performing code is getting better performance
from the cache.  If they have different profiles, then maybe one linker is
able to discard more unneeded code, or optimize the linked code more effectively.

Another place to look is in the changelogs/reflogs/announcements for the
two linkers.  Likely lld has recently had some new features/improvements added
to it, and these are causing the performance improvements.

Cheers
   Nick

PS.  Waiving my devil's advocate flag here: why bother with this search ?
   Isn't it enough that you have found a linker that provides better performance ?




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-11-22 11:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-20 16:37 Question about readelf output from shared library built with lld, gold, and bfd linkers Tom Kacvinsky
2022-11-21 14:14 ` Nick Clifton
2022-11-21 14:51   ` Tom Kacvinsky
2022-11-21 15:16     ` Nick Clifton
2022-11-21 15:33       ` Tom Kacvinsky
2022-11-21 16:48         ` Nick Clifton
2022-11-21 17:43           ` Tom Kacvinsky
2022-11-22 11:20             ` Nick Clifton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).