public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "Fāng-ruì Sòng" <maskray@google.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: GNU C Library <libc-alpha@sourceware.org>,
	"Carlos O'Donell" <carlos@redhat.com>,
	 Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	Florian Weimer <fweimer@redhat.com>,
	 Joseph Myers <joseph@codesourcery.com>
Subject: Re: [PATCH v2] elf: Avoid nested functions in the loader (all ports) [BZ #27220]
Date: Thu, 23 Sep 2021 09:58:30 -0700	[thread overview]
Message-ID: <CAFP8O3JZWqi4N9vRDkvmt6zCsU1rYNwsh8q0qmSMai0j3DDxkw@mail.gmail.com> (raw)
In-Reply-To: <CAMe9rOo6asvifRjjPcKogam-GU59PWEpQ+NdE1YosvNBOa=nww@mail.gmail.com>

On Thu, Sep 23, 2021 at 9:20 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Sep 23, 2021 at 12:23 AM Fangrui Song via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > [Alternative to https://sourceware.org/pipermail/libc-alpha/2021-August/130340.html
> > This version fixes all ports and doesn't add NESTING dispatches.]
> > [Available at
> > https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/maskray/unnest ]
> >
> > dynamic-link.h is included more than once in some elf/ files (rtld.c,
> > dl-conflict.c, dl-reloc.c, dl-reloc-static-pie.c) and uses GCC nested
> > functions. This harms readability and the nested functions usage
> > is the biggest obstacle prevents CC=clang (which doesn't support the
> > feature).
> >
> > To un-nest elf_machine_rela, the key idea is to pass the variable in
> > the containing scope as extra arguments.
> > Stan Shebs implemented ppc64/x86-64 parts in the google/grte/v5-2.27/master
> > branch by using static variables.
> > This patch is inspired by that but takes an approach avoiding static variables.
> >
> > Tested on aarch64-linux-gnu, powerpc64le-linux-gnu and x86_64-linux-gnu.
> > On x86_64, ld.so size does not change (.text is slightly larger, .rodata
> > is slightly smaller).
>
> Please disassemble and show what additional/changed instructions are.
>
> > On aarch64, .text is 64 bytes larger but .rodata is 240 bytes smaller.
> > On ppc64le, both .text and .rodata are smaller.
> > I think the performance does not matter because most cycles are spent in symbol
> > lookup.
> >
> > Tested build-many-glibcs.py compilers with
> > {alpha,arc,csky,hppa,ia64,microblaze,s390,s390x,sh,sparc64,sparcv9}-linux-gnu,
> > arm-linux-gnueabi{,hf}, mips64{,el}-linux-gnu{,-n32,-n64}, and
> > riscv64-linux-gnu-rv64imafdc-lp64d.
>
> H.J.

For elf/dl-reloc.os on x86-64, I think the nested function version can
use *(rbp - offset) to get some arguments.
The new version needs more instructions but the cost is small compared
with the expensive _dl_lookup_symbol_x.

+<L57>:
+                       mov     rax, qword ptr [r15 + 104]
+                       mov     edi, dword ptr [r14]
+                       mov     rcx, qword ptr [rax + 8]
+                       xor     eax, eax
+                       test    byte ptr [r15 + 798], 2
+                       je       <L58>
+                       mov     rax, qword ptr [r15]
 <L58>:
-                       mov     edi, dword ptr [r15]
                        push    0
-                       lea     rdx, [rbp - 120]
-                       mov     rsi, r14
+                       add     rdi, rcx
                        mov     rcx, qword ptr [rbp - 168]
-                       add     rdi, qword ptr [rbp - 176]
+                       lea     rdx, [rbp - 120]
                        push    9
-                       mov     qword ptr [rbp - 184], r10
+                       add     rdi, rax
+                       mov     rsi, r15
+                       mov     qword ptr [rbp - 176], r10
                        call     <L59>
-               0000000000000970:  R_X86_64_PLT32       _dl_lookup_symbol_x-0x4
+               0000000000000996:  R_X86_64_PLT32       _dl_lookup_symbol_x-0x4
 <L59>:

For aarch64, the nested function seems to need more setup after prologue

-                       ldr     x22, [x2, #8]
-                       tbz     w1, #1, 0x4d8 <_dl_relocate_object+0x230>
-                       ldr     x28, [x26]
-                       add     x1, x22, x28
-                       str     x1, [x29, #168]
-                       cbnz    w0, 0x125c <_dl_relocate_object+0xfb4>
+                       cbnz    w0, 0x4d0 <_dl_relocate_object+0x228>

the call site saving may be canceled, so in the end the .text size
does not change.

-                       ldp     x3, x1, [x29, #160]
+                       ldrb    w3, [x26, #846]
+                       mov     x2, #0
+                       ldr     x0, [x26, #104]
+                       ldr     w1, [x24]
+                       ldr     x0, [x0, #8]
+                       tbz     w3, #1, 0x8cc <_dl_relocate_object+0x624>
+                       ldr     x2, [x26]
+                       add     x0, x1, x0
+                       ldr     x3, [x29, #168]
+                       add     x0, x0, x2
+                       mov     x1, x26
                        add     x2, x29, #176
-                       ldr     w0, [x23]
                        mov     x7, #0
                        mov     w6, #9
-                       add     x0, x1, x0
-                       mov     x1, x25
-                       bl      0x8a4 <_dl_relocate_object+0x5fc>
-               00000000000008a4:  R_AARCH64_CALL26     _dl_lookup_symbol_x
+                       bl      0x8e8 <_dl_relocate_object+0x640>
+               00000000000008e8:  R_AARCH64_CALL26     _dl_lookup_symbol_x

  reply	other threads:[~2021-09-23 16:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-23  7:17 Fangrui Song
2021-09-23  8:07 ` Andreas Schwab
2021-09-23  8:17   ` Fangrui Song
2021-09-23  8:39     ` Andreas Schwab
2021-09-23 13:08       ` Zack Weinberg
2021-09-23 16:11         ` Fangrui Song
2021-09-23 16:19 ` H.J. Lu
2021-09-23 16:58   ` Fāng-ruì Sòng [this message]
2021-09-23 17:13     ` Florian Weimer
2021-09-23 17:34       ` Fāng-ruì Sòng
2021-09-23 17:40         ` Florian Weimer
2021-09-23 19:05 ` Fangrui Song
2021-09-23 21:48   ` Florian Weimer
2021-09-23 22:41     ` Fāng-ruì Sòng
2021-09-24  5:58       ` Florian Weimer
2021-09-24  7:23         ` Florian Weimer
2021-09-24  7:32         ` Fāng-ruì Sòng
2021-09-24  8:27           ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFP8O3JZWqi4N9vRDkvmt6zCsU1rYNwsh8q0qmSMai0j3DDxkw@mail.gmail.com \
    --to=maskray@google.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=hjl.tools@gmail.com \
    --cc=joseph@codesourcery.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).