From: "Fāng-ruì Sòng" <maskray@google.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
"Carlos O'Donell" <carlos@redhat.com>,
Adhemerval Zanella <adhemerval.zanella@linaro.org>,
Florian Weimer <fweimer@redhat.com>,
Joseph Myers <joseph@codesourcery.com>
Subject: Re: [PATCH v2] elf: Avoid nested functions in the loader (all ports) [BZ #27220]
Date: Thu, 23 Sep 2021 09:58:30 -0700 [thread overview]
Message-ID: <CAFP8O3JZWqi4N9vRDkvmt6zCsU1rYNwsh8q0qmSMai0j3DDxkw@mail.gmail.com> (raw)
In-Reply-To: <CAMe9rOo6asvifRjjPcKogam-GU59PWEpQ+NdE1YosvNBOa=nww@mail.gmail.com>
On Thu, Sep 23, 2021 at 9:20 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Sep 23, 2021 at 12:23 AM Fangrui Song via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > [Alternative to https://sourceware.org/pipermail/libc-alpha/2021-August/130340.html
> > This version fixes all ports and doesn't add NESTING dispatches.]
> > [Available at
> > https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/maskray/unnest ]
> >
> > dynamic-link.h is included more than once in some elf/ files (rtld.c,
> > dl-conflict.c, dl-reloc.c, dl-reloc-static-pie.c) and uses GCC nested
> > functions. This harms readability and the nested functions usage
> > is the biggest obstacle prevents CC=clang (which doesn't support the
> > feature).
> >
> > To un-nest elf_machine_rela, the key idea is to pass the variable in
> > the containing scope as extra arguments.
> > Stan Shebs implemented ppc64/x86-64 parts in the google/grte/v5-2.27/master
> > branch by using static variables.
> > This patch is inspired by that but takes an approach avoiding static variables.
> >
> > Tested on aarch64-linux-gnu, powerpc64le-linux-gnu and x86_64-linux-gnu.
> > On x86_64, ld.so size does not change (.text is slightly larger, .rodata
> > is slightly smaller).
>
> Please disassemble and show what additional/changed instructions are.
>
> > On aarch64, .text is 64 bytes larger but .rodata is 240 bytes smaller.
> > On ppc64le, both .text and .rodata are smaller.
> > I think the performance does not matter because most cycles are spent in symbol
> > lookup.
> >
> > Tested build-many-glibcs.py compilers with
> > {alpha,arc,csky,hppa,ia64,microblaze,s390,s390x,sh,sparc64,sparcv9}-linux-gnu,
> > arm-linux-gnueabi{,hf}, mips64{,el}-linux-gnu{,-n32,-n64}, and
> > riscv64-linux-gnu-rv64imafdc-lp64d.
>
> H.J.
For elf/dl-reloc.os on x86-64, I think the nested function version can
use *(rbp - offset) to get some arguments.
The new version needs more instructions but the cost is small compared
with the expensive _dl_lookup_symbol_x.
+<L57>:
+ mov rax, qword ptr [r15 + 104]
+ mov edi, dword ptr [r14]
+ mov rcx, qword ptr [rax + 8]
+ xor eax, eax
+ test byte ptr [r15 + 798], 2
+ je <L58>
+ mov rax, qword ptr [r15]
<L58>:
- mov edi, dword ptr [r15]
push 0
- lea rdx, [rbp - 120]
- mov rsi, r14
+ add rdi, rcx
mov rcx, qword ptr [rbp - 168]
- add rdi, qword ptr [rbp - 176]
+ lea rdx, [rbp - 120]
push 9
- mov qword ptr [rbp - 184], r10
+ add rdi, rax
+ mov rsi, r15
+ mov qword ptr [rbp - 176], r10
call <L59>
- 0000000000000970: R_X86_64_PLT32 _dl_lookup_symbol_x-0x4
+ 0000000000000996: R_X86_64_PLT32 _dl_lookup_symbol_x-0x4
<L59>:
For aarch64, the nested function seems to need more setup after prologue
- ldr x22, [x2, #8]
- tbz w1, #1, 0x4d8 <_dl_relocate_object+0x230>
- ldr x28, [x26]
- add x1, x22, x28
- str x1, [x29, #168]
- cbnz w0, 0x125c <_dl_relocate_object+0xfb4>
+ cbnz w0, 0x4d0 <_dl_relocate_object+0x228>
the call site saving may be canceled, so in the end the .text size
does not change.
- ldp x3, x1, [x29, #160]
+ ldrb w3, [x26, #846]
+ mov x2, #0
+ ldr x0, [x26, #104]
+ ldr w1, [x24]
+ ldr x0, [x0, #8]
+ tbz w3, #1, 0x8cc <_dl_relocate_object+0x624>
+ ldr x2, [x26]
+ add x0, x1, x0
+ ldr x3, [x29, #168]
+ add x0, x0, x2
+ mov x1, x26
add x2, x29, #176
- ldr w0, [x23]
mov x7, #0
mov w6, #9
- add x0, x1, x0
- mov x1, x25
- bl 0x8a4 <_dl_relocate_object+0x5fc>
- 00000000000008a4: R_AARCH64_CALL26 _dl_lookup_symbol_x
+ bl 0x8e8 <_dl_relocate_object+0x640>
+ 00000000000008e8: R_AARCH64_CALL26 _dl_lookup_symbol_x
next prev parent reply other threads:[~2021-09-23 16:58 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-23 7:17 Fangrui Song
2021-09-23 8:07 ` Andreas Schwab
2021-09-23 8:17 ` Fangrui Song
2021-09-23 8:39 ` Andreas Schwab
2021-09-23 13:08 ` Zack Weinberg
2021-09-23 16:11 ` Fangrui Song
2021-09-23 16:19 ` H.J. Lu
2021-09-23 16:58 ` Fāng-ruì Sòng [this message]
2021-09-23 17:13 ` Florian Weimer
2021-09-23 17:34 ` Fāng-ruì Sòng
2021-09-23 17:40 ` Florian Weimer
2021-09-23 19:05 ` Fangrui Song
2021-09-23 21:48 ` Florian Weimer
2021-09-23 22:41 ` Fāng-ruì Sòng
2021-09-24 5:58 ` Florian Weimer
2021-09-24 7:23 ` Florian Weimer
2021-09-24 7:32 ` Fāng-ruì Sòng
2021-09-24 8:27 ` Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFP8O3JZWqi4N9vRDkvmt6zCsU1rYNwsh8q0qmSMai0j3DDxkw@mail.gmail.com \
--to=maskray@google.com \
--cc=adhemerval.zanella@linaro.org \
--cc=carlos@redhat.com \
--cc=fweimer@redhat.com \
--cc=hjl.tools@gmail.com \
--cc=joseph@codesourcery.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).