From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x230.google.com (mail-oi1-x230.google.com [IPv6:2607:f8b0:4864:20::230]) by sourceware.org (Postfix) with ESMTPS id B22F53857720 for ; Mon, 22 May 2023 20:42:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B22F53857720 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-oi1-x230.google.com with SMTP id 5614622812f47-3980f9321b9so889589b6e.0 for ; Mon, 22 May 2023 13:42:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1684788120; x=1687380120; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:from:to:cc:subject:date:message-id:reply-to; bh=1b66tP4DtRf9HXDJoG7OL8Ou7wmEs8bbN/MGXNPwt8Y=; b=xq4/VI/hc2+oCESYfQCJgMjegk35qvflSoAADLjM4svMUEtKo4p69e0y2aKCM4nzIi ov+8JKJNheL79dzYZ88dz2x2pNtCG95+CvGc+40udZhn3rfr/Lv4e8kObsYP+jiq3n8Z +tHepR+NeQWhkstWQD+HXc48GFrqSYtxgyY77Hp/y9Fx+9gVvO3Gy7XRzjjb0pZ52YdU bOD8xwTaD4reVNKlJMiQV+6KmnS7RHO/UVPQ2Y19NPP63CHyt7n3jwHGkfgFF2BNmteG pw7zgFh28Mq/QXLodtwpdYyqHu3oK7Zd03WapjERm1EFRNhY6YdtkCRqbmfvjnMWonSK Bibg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684788120; x=1687380120; h=content-transfer-encoding:in-reply-to:organization:from:references :cc:to:content-language:subject:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1b66tP4DtRf9HXDJoG7OL8Ou7wmEs8bbN/MGXNPwt8Y=; b=hCFPcHAythwmKwcKR2JIn6O/rV2NWj4682KtLC9Bde+80KOatU+Wb3TnufJeOScPr+ W9onEv2l0FgBHWKsfql5P9SQL5qJZaFDP5K/Yu0vkZxnfx7Uxzpad94yWjDbp9dDnxe4 EVLylSXVTVFsYKoOMs1B/X3/ZvB2RBKpHF24/JI7Eq5suZYx60qu85NZmiYg7l/D/3vU jlHBmIcZ4xJCAD1ZRn+Y3elKmd6nt7VKG8ltsAbu9Pq/o48eq+QSfnmpHiwekliImZlT NHoJi3uATlmUM3oJVIsmSHlY3BzhcFpxDKiEic/AZ0XqnobHq5t5ojs0fWEFaVkhWaKL uzuQ== X-Gm-Message-State: AC+VfDyG9ZWBktvEXIGpub2hY7fyDsDELwCmT7a8K/vI12YtyD35umSq 2ldqDBwHLBRmAyU3yYQ8wwI0ZN79SyPzUfTAjela+A== X-Google-Smtp-Source: ACHHUZ49ibF6Kqrz9REuRDocQtIJbWDV1u2hY0WTUyanIb6kNt0TnBac6SCN2jq0JH689Tnn61xe/w== X-Received: by 2002:aca:2802:0:b0:38c:c177:a6bb with SMTP id 2-20020aca2802000000b0038cc177a6bbmr7048346oix.23.1684788119743; Mon, 22 May 2023 13:41:59 -0700 (PDT) Received: from ?IPV6:2804:1b3:a7c0:c914:a530:d4b6:532c:752? ([2804:1b3:a7c0:c914:a530:d4b6:532c:752]) by smtp.gmail.com with ESMTPSA id i81-20020acaea54000000b0039466dd53fasm3164614oih.19.2023.05.22.13.41.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 22 May 2023 13:41:58 -0700 (PDT) Message-ID: Date: Mon, 22 May 2023 17:41:56 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [PATCH v2] Mark more functions as __COLD Content-Language: en-US To: Sergey Bugaev Cc: libc-alpha@sourceware.org References: <20230518170648.93316-1-bugaevc@gmail.com> <98620b0e-7251-3781-2935-ce058a5953dc@linaro.org> From: Adhemerval Zanella Netto Organization: Linaro In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,URIBL_BLACK autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 19/05/23 07:35, Sergey Bugaev wrote: > On Thu, May 18, 2023 at 10:43 PM Adhemerval Zanella Netto > wrote: >> The rationale seems ok, some comments below. > > Thanks. Any thoughts on the .text.{startup,exit} part? > >>> -void >>> +void __COLD >>> __libc_fatal (const char *message) >>> { >>> _dl_fatal_printf ("%s", message); >>> } >>> rtld_hidden_def (__libc_fatal) >>> >> >> Can't you just add on the function prototype at include/stdio.h? Same >> question for the __assert_fail and __assert_perror_fail below. > > But I did just that (added __COLD to the prototypes in include/stdio.h > and include/assert.h), didn't I? > > If you're saying that it's not worth repeating __COLD on the > definition, then sure, I could remove that if you prefer. The later, specially because for __chk_fail you do add an specific comment. > >>> +/* Intentionally not marked __COLD in the header, since this only causes GCC >>> + to create a bunch of useless __foo_chk.cold symbols containing only a call >>> + to this function; better just keep calling it directly. */ >>> extern void __chk_fail (void) __attribute__ ((__noreturn__)); >>> libc_hidden_proto (__chk_fail) >>> rtld_hidden_proto (__chk_fail) >> >> Why exactly gcc generates the useless __foo_chk.cold for this case? Is this a >> bug or a limitation? > > I don't know; your guess is as good as mine (actually yours would be > better than mine). But my guess would be that they just didn't think > to add a check that whatever code size savings they're getting by > moving the cold path into a separate section outweigh the jump > instruction to get there. > > Here's what I'm getting specifically, on i686-gnu: > > Dump of assembler code for function __ppoll_chk: > Address range 0x198760 to 0x19879e: > 0x00198760 <+0>: 56 push %esi > 0x00198761 <+1>: 53 push %ebx > 0x00198762 <+2>: 83 ec 04 sub $0x4,%esp > 0x00198765 <+5>: 8b 44 24 20 mov 0x20(%esp),%eax > 0x00198769 <+9>: 8b 54 24 14 mov 0x14(%esp),%edx > 0x0019876d <+13>: 8b 4c 24 10 mov 0x10(%esp),%ecx > 0x00198771 <+17>: 8b 5c 24 18 mov 0x18(%esp),%ebx > 0x00198775 <+21>: c1 e8 03 shr $0x3,%eax > 0x00198778 <+24>: 8b 74 24 1c mov 0x1c(%esp),%esi > 0x0019877c <+28>: 39 d0 cmp %edx,%eax > 0x0019877e <+30>: 0f 82 9d bb e8 ff jb 0x24321 <__ppoll_chk.cold> > 0x00198784 <+36>: 89 74 24 1c mov %esi,0x1c(%esp) > 0x00198788 <+40>: 89 5c 24 18 mov %ebx,0x18(%esp) > 0x0019878c <+44>: 89 54 24 14 mov %edx,0x14(%esp) > 0x00198790 <+48>: 89 4c 24 10 mov %ecx,0x10(%esp) > 0x00198794 <+52>: 83 c4 04 add $0x4,%esp > 0x00198797 <+55>: 5b pop %ebx > 0x00198798 <+56>: 5e pop %esi > 0x00198799 <+57>: e9 b2 b9 fb ff jmp 0x154150 <__GI_ppoll> > Address range 0x24321 to 0x24326: > 0x00024321 <-1524799>: e8 5c ff ff ff call 0x24282 <__GI___chk_fail> > End of assembler dump. > > It's spending 6 bytes for the 'jb __ppoll_chk.cold', only to jump to > 'call __GI___chk_fail' which takes 5 bytes. That's negative space > savings, both overall and inside .text. My guess this is arch-specific, since for aarch64-linux I am not seeing any '.cold' sections being generated: 00000000000f4950 <__ppoll_chk>: f4950: eb440c3f cmp x1, x4, lsr #3 f4954: 54000048 b.hi f495c <__ppoll_chk+0xc> // b.pmore f4958: 17ffa1a6 b dcff0 f495c: a9bf7bfd stp x29, x30, [sp, #-16]! f4960: 910003fd mov x29, sp f4964: 97fcdae1 bl 2b4e8 <__chk_fail> f4968: d503201f nop f496c: d503201f nop So I don't have a strong opinion about it. It does seems to generate better code for x86, although not as much for aarch64: With this patch: text data bss dec hex filename 1867381 411832 55080 2334293 239e55 x86_64-linux-gnu-patch/libc.so 2147360 129084 39524 2315968 2356c0 i686-linux-gnu-patch/libc.so 1574355 410624 51704 2036683 1f13cb aarch64-linux-gnu-patch/libc.so With this patch with __COLD for __chk_fail prototype: text data bss dec hex filename 1868824 411832 55080 2335736 23a3f8 x86_64-linux-gnu/libc.so 2149056 129084 39524 2317664 235d60 i686-linux-gnu/libc.so 1574256 410624 51704 2036584 1f1368 aarch64-linux-gnu/libc.so > > And actually frankly that's bad codegen altogether, unless I'm missing > something. Why not > > mov 20(%esp), %eax > shr $3, %eax > cmp 8(%esp), %eax > jnb __GI_ppoll > push %ebp > mov %esp, %ebp > call __GI___chk_fail > > Then maybe it'd make sense to move the "push, mov, call" into > .text.unlikely, adding a jmp. It might be worth to open a bug report on GCC.