public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Sergey Bugaev <bugaevc@gmail.com>
Cc: libc-alpha@sourceware.org
Subject: Re: [PATCH v2] Mark more functions as __COLD
Date: Mon, 22 May 2023 17:41:56 -0300	[thread overview]
Message-ID: <ed5674c3-3c65-b530-d620-e35ee6b46cb8@linaro.org> (raw)
In-Reply-To: <CAN9u=HdQ=o-KUG0Wsxav4b00DmgE5bnbzVCG+oKAFOiAEMGh2g@mail.gmail.com>



On 19/05/23 07:35, Sergey Bugaev wrote:
> On Thu, May 18, 2023 at 10:43 PM Adhemerval Zanella Netto
> <adhemerval.zanella@linaro.org> wrote:
>> The rationale seems ok, some comments below.
> 
> Thanks. Any thoughts on the .text.{startup,exit} part?
> 
>>> -void
>>> +void __COLD
>>>  __libc_fatal (const char *message)
>>>  {
>>>    _dl_fatal_printf ("%s", message);
>>>  }
>>>  rtld_hidden_def (__libc_fatal)
>>>
>>
>> Can't you just add on the function prototype at include/stdio.h? Same
>> question for the __assert_fail and __assert_perror_fail below.
> 
> But I did just that (added __COLD to the prototypes in include/stdio.h
> and include/assert.h), didn't I?
> 
> If you're saying that it's not worth repeating __COLD on the
> definition, then sure, I could remove that if you prefer.

The later, specially because for __chk_fail you do add an specific comment.

> 
>>> +/* Intentionally not marked __COLD in the header, since this only causes GCC
>>> +   to create a bunch of useless __foo_chk.cold symbols containing only a call
>>> +   to this function; better just keep calling it directly.  */
>>>  extern void __chk_fail (void) __attribute__ ((__noreturn__));
>>>  libc_hidden_proto (__chk_fail)
>>>  rtld_hidden_proto (__chk_fail)
>>
>> Why exactly gcc generates the useless __foo_chk.cold for this case? Is this a
>> bug or a limitation?
> 
> I don't know; your guess is as good as mine (actually yours would be
> better than mine). But my guess would be that they just didn't think
> to add a check that whatever code size savings they're getting by
> moving the cold path into a separate section outweigh the jump
> instruction to get there.
> 
> Here's what I'm getting specifically, on i686-gnu:
> 
> Dump of assembler code for function __ppoll_chk:
> Address range 0x198760 to 0x19879e:
>    0x00198760 <+0>: 56                 push   %esi
>    0x00198761 <+1>: 53                 push   %ebx
>    0x00198762 <+2>: 83 ec 04           sub    $0x4,%esp
>    0x00198765 <+5>: 8b 44 24 20         mov    0x20(%esp),%eax
>    0x00198769 <+9>: 8b 54 24 14         mov    0x14(%esp),%edx
>    0x0019876d <+13>: 8b 4c 24 10         mov    0x10(%esp),%ecx
>    0x00198771 <+17>: 8b 5c 24 18         mov    0x18(%esp),%ebx
>    0x00198775 <+21>: c1 e8 03           shr    $0x3,%eax
>    0x00198778 <+24>: 8b 74 24 1c         mov    0x1c(%esp),%esi
>    0x0019877c <+28>: 39 d0               cmp    %edx,%eax
>    0x0019877e <+30>: 0f 82 9d bb e8 ff   jb     0x24321 <__ppoll_chk.cold>
>    0x00198784 <+36>: 89 74 24 1c         mov    %esi,0x1c(%esp)
>    0x00198788 <+40>: 89 5c 24 18         mov    %ebx,0x18(%esp)
>    0x0019878c <+44>: 89 54 24 14         mov    %edx,0x14(%esp)
>    0x00198790 <+48>: 89 4c 24 10         mov    %ecx,0x10(%esp)
>    0x00198794 <+52>: 83 c4 04           add    $0x4,%esp
>    0x00198797 <+55>: 5b                 pop    %ebx
>    0x00198798 <+56>: 5e                 pop    %esi
>    0x00198799 <+57>: e9 b2 b9 fb ff     jmp    0x154150 <__GI_ppoll>
> Address range 0x24321 to 0x24326:
>    0x00024321 <-1524799>: e8 5c ff ff ff     call   0x24282 <__GI___chk_fail>
> End of assembler dump.
> 
> It's spending 6 bytes for the 'jb __ppoll_chk.cold', only to jump to
> 'call __GI___chk_fail' which takes 5 bytes. That's negative space
> savings, both overall and inside .text.

My guess this is arch-specific, since for aarch64-linux I am not seeing
any '.cold' sections being generated:

00000000000f4950 <__ppoll_chk>:
   f4950:       eb440c3f        cmp     x1, x4, lsr #3
   f4954:       54000048        b.hi    f495c <__ppoll_chk+0xc>  // b.pmore
   f4958:       17ffa1a6        b       dcff0 <ppoll>
   f495c:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
   f4960:       910003fd        mov     x29, sp
   f4964:       97fcdae1        bl      2b4e8 <__chk_fail>
   f4968:       d503201f        nop
   f496c:       d503201f        nop

So I don't have a strong opinion about it.  It does seems to generate
better code for x86, although not as much for aarch64:

With this patch:

   text    data     bss     dec     hex filename
1867381  411832   55080 2334293  239e55 x86_64-linux-gnu-patch/libc.so
2147360  129084   39524 2315968  2356c0 i686-linux-gnu-patch/libc.so
1574355  410624   51704 2036683  1f13cb aarch64-linux-gnu-patch/libc.so

With this patch with __COLD for __chk_fail prototype:

   text    data     bss     dec     hex filename
1868824  411832   55080 2335736  23a3f8 x86_64-linux-gnu/libc.so
2149056  129084   39524 2317664  235d60 i686-linux-gnu/libc.so
1574256  410624   51704 2036584  1f1368 aarch64-linux-gnu/libc.so

> 
> And actually frankly that's bad codegen altogether, unless I'm missing
> something. Why not
> 
> mov 20(%esp), %eax
> shr $3, %eax
> cmp 8(%esp), %eax
> jnb __GI_ppoll
> push %ebp
> mov %esp, %ebp
> call __GI___chk_fail
> 
> Then maybe it'd make sense to move the "push, mov, call" into
> .text.unlikely, adding a jmp.

It might be worth to open a bug report on GCC.

  reply	other threads:[~2023-05-22 20:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 14:48 [RFC PATCH 0/6] .text.subsections for some questionable benefit Sergey Bugaev
2023-05-15 14:48 ` [RFC PATCH 1/6] Mark more functions as __COLD Sergey Bugaev
2023-05-15 15:22   ` Andreas Schwab
2023-05-15 15:27     ` Sergey Bugaev
2023-05-18 17:06       ` [PATCH v2] " Sergey Bugaev
2023-05-18 19:43         ` Adhemerval Zanella Netto
2023-05-19 10:35           ` Sergey Bugaev
2023-05-22 20:41             ` Adhemerval Zanella Netto [this message]
2023-05-15 14:48 ` [RFC PATCH 2/6] mcheck: Microoptimize Sergey Bugaev
2023-05-15 14:48 ` [RFC PATCH 3/6] sys/cdefs.h: Define __TEXT_STARTUP & __TEXT_EXIT Sergey Bugaev
2023-05-15 14:48 ` [RFC PATCH 4/6] Mark various functions as __TEXT_STARTUP and __TEXT_EXIT Sergey Bugaev
2023-05-15 14:48 ` [RFC PATCH 5/6] Also place entry points into .text.startup Sergey Bugaev
2023-05-15 14:48 ` [RFC PATCH 6/6] mach: In rtld, mark MIG routines as __TEXT_STARTUP Sergey Bugaev
2023-05-15 15:33 ` [RFC PATCH 0/6] .text.subsections for some questionable benefit Cristian Rodríguez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed5674c3-3c65-b530-d620-e35ee6b46cb8@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=bugaevc@gmail.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).