public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: "Fāng-ruì Sòng" <maskray@google.com>
To: Joseph Myers <joseph@codesourcery.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>, libc-alpha@sourceware.org
Subject: Re: [PATCH] regex: Unnest nested functions in regcomp.c
Date: Wed, 27 Oct 2021 21:12:03 -0700	[thread overview]
Message-ID: <CAFP8O3K17-e0UmF3QT_5sJ7FxOE2Aft9g=iOF0KwEn-Fq-VMGA@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.22.394.2110271609080.1681686@digraph.polyomino.org.uk>

On Wed, Oct 27, 2021 at 9:11 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Wed, 27 Oct 2021, Paul Eggert wrote:
>
> > On 10/26/21 22:29, Fangrui Song wrote:
> > > collseqwc, table_size, symb_table, extra are now initialized to appease
> > > GCC -Werror=maybe-uninitialized false positive.
> >
> > Are the diagnostics really false positives? As I understand it, the modified
> > code would have undefined behavior if these variables were not initialized.

Are sorry, the diagnostics are legitimate.

collseqwc, table_size, symb_table, extra are now passed as arguments
so they need to be initialized.
Previously they were accessed from `auto inline` nested functions.
With the delayed access GCC did not warn.

In file included from regex.c:74:
regcomp.c: In function ‘parse_expression’:
regcomp.c:3102:11: error: ‘table_size’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
 3102 |   int32_t table_size;
      |           ^~~~~~~~~~

I don't know how to compare performance difference but the new version
is just 16 bytes larger in .text on x86-64 (gcc -O2).

% ~/projects/bloaty/Release/bloaty /tmp/c/regex1.o -- /tmp/c/regex.o
    FILE SIZE        VM SIZE
 --------------  --------------
  +118%     +20  [ = ]       0    [Unmapped]
  +0.0%     +16  +0.0%     +16    .text
  -5.6%      -4  [ = ]       0    .data
  +0.0%     +32  +0.0%     +16    TOTAL

The code move apparently triggers large assembly move in GCC's code generation.
The 16 byte .text increase is likely due to the now needed zero
initialization for the four variables.

% diff -u =(llvm-objdump -d --symbolize-operands -M intel
--no-leading-addr --no-show-raw-insn /tmp/c/regex.o) =(llvm-objdump -d
--symbolize-operands -M intel --no-leading-addr --no-show-raw-insn
/tmp/c/regex1.o)
...
@@ -13236,7 +13236,11 @@
                        mov     r15d, ebx
                        test    ebx, ebx
                        jne      <L33>
-<L102>:
+                       mov     qword ptr [rsp + 96], 0
+                       mov     qword ptr [rsp + 120], 0
+                       mov     dword ptr [rsp + 116], 0
+                       mov     qword ptr [rsp + 88], 0
+<L94>:
                        mov     esi, 1
                        mov     edi, 32
                        call     <L34>

> And if they are false positives, the glibc convention in such cases is
> generally to use the DIAG_* macros from libc-diag.h to suppress the
> warnings, with appropriate comments explaining why they are false
> positives, rather than adding an initialization that should not be
> necessary.  (Sometimes another approach is more appropriate to warning
> suppression - for example, a call to __builtin_unreachable that enables
> the compiler to see that certain paths through the code cannot actually
> occur.)

Thanks for the tip.

> --
> Joseph S. Myers
> joseph@codesourcery.com

  reply	other threads:[~2021-10-28  4:12 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-27  5:29 Fangrui Song
2021-10-27  9:35 ` Paul Eggert
2021-10-27 16:10   ` Joseph Myers
2021-10-28  4:12     ` Fāng-ruì Sòng [this message]
2021-11-02  2:40 ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFP8O3K17-e0UmF3QT_5sJ7FxOE2Aft9g=iOF0KwEn-Fq-VMGA@mail.gmail.com' \
    --to=maskray@google.com \
    --cc=eggert@cs.ucla.edu \
    --cc=joseph@codesourcery.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).