public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110533] New: [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
@ 2023-07-03 12:35 engelke at in dot tum.de
  2023-07-03 20:17 ` [Bug target/110533] " pinskia at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: engelke at in dot tum.de @ 2023-07-03 12:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533

            Bug ID: 110533
           Summary: [x86-64] naked with -O0 and register-passed
                    struct/int128 clobbers parameters/callee-saved regs
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: engelke at in dot tum.de
  Target Milestone: ---

Compiling a naked function with a parameter that is split over multiple
registers generates several mov operations with -O0, clobbering other
parameters and callee-saved registers. This does not happen with -O1. This
happens since the introduction of naked in GCC 8, at least up to GCC 13.

Example:

    __attribute__((naked))
    void fn(__int128 a) {
        asm("ret");
    }

Compiles to; note that rbx (callee-saved) is clobbered:

    fn:
    .LFB0:
            .cfi_startproc
            movq    %rdi, %rdx
            movq    %rsi, %rax
            movq    %rcx, %rbx
            movq    %rdx, %rcx
            movq    %rax, %rbx
    #APP
    # 3 "<stdin>" 1
            ret
    # 0 "" 2
    #NO_APP
            nop
            ud2
            .cfi_endproc

With two parameters:

    __attribute__((naked))
    void fn(__int128 a, __int128 b) {
        asm("ret");
    }

Compiles to; note that rbx and the second parameter are clobbered:

    fn:
    .LFB0:
            .cfi_startproc
            movq    %rdi, %rdx
            movq    %rsi, %rax
            movq    %rcx, %rbx
            movq    %rdx, %rcx
            movq    %rax, %rbx
    #APP
    # 3 "<stdin>" 1
            ret
    # 0 "" 2
    #NO_APP
            nop
            ud2
            .cfi_endproc

With a slight modification everything works as expected:

    __attribute__((naked))
    void fn(int x, int y, __int128 a) {
        asm("ret");
    }

Compiles to:

    fn:
    .LFB0:
            .cfi_startproc
    #APP
    # 3 "<stdin>" 1
            ret
    # 0 "" 2
    #NO_APP
            nop
            ud2
            .cfi_endproc

(Above examples generated with gcc 12.2.1, but many other versions are affected
as well.)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/110533] [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
  2023-07-03 12:35 [Bug target/110533] New: [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs engelke at in dot tum.de
@ 2023-07-03 20:17 ` pinskia at gcc dot gnu.org
  2023-07-04  8:47 ` ubizjak at gmail dot com
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-03 20:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
>clobbering other parameters and callee-saved registers.


(insn 2 8 3 2 (set (reg:DI 84)
        (reg:DI 5 di [ aD.2522 ])) "/app/example.cpp":3:25 -1
     (nil))
(insn 3 2 4 2 (set (reg:DI 85)
        (reg:DI 4 si [ aD.2522+8 ])) "/app/example.cpp":3:25 -1
     (nil))
(insn 4 3 5 2 (set (reg:TI 83)
        (subreg:TI (reg:DI 84) 0)) "/app/example.cpp":3:25 -1
     (nil))
(insn 5 4 6 2 (set (subreg:DI (reg:TI 83) 8)
        (reg:DI 85)) "/app/example.cpp":3:25 -1
     (nil))
(insn 6 5 7 2 (set (reg/v:TI 82 [ aD.2522 ])
        (reg:TI 83)) "/app/example.cpp":3:25 -1
     (nil))

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/110533] [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
  2023-07-03 12:35 [Bug target/110533] New: [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs engelke at in dot tum.de
  2023-07-03 20:17 ` [Bug target/110533] " pinskia at gcc dot gnu.org
@ 2023-07-04  8:47 ` ubizjak at gmail dot com
  2023-07-06 12:27 ` roger at nextmovesoftware dot com
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: ubizjak at gmail dot com @ 2023-07-04  8:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533

--- Comment #2 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Andrew Pinski from comment #1)
> >clobbering other parameters and callee-saved registers.
> 
> 
> (insn 2 8 3 2 (set (reg:DI 84)
>         (reg:DI 5 di [ aD.2522 ])) "/app/example.cpp":3:25 -1
>      (nil))
> (insn 3 2 4 2 (set (reg:DI 85)
>         (reg:DI 4 si [ aD.2522+8 ])) "/app/example.cpp":3:25 -1
>      (nil))
> (insn 4 3 5 2 (set (reg:TI 83)
>         (subreg:TI (reg:DI 84) 0)) "/app/example.cpp":3:25 -1
>      (nil))
> (insn 5 4 6 2 (set (subreg:DI (reg:TI 83) 8)
>         (reg:DI 85)) "/app/example.cpp":3:25 -1
>      (nil))
> (insn 6 5 7 2 (set (reg/v:TI 82 [ aD.2522 ])
>         (reg:TI 83)) "/app/example.cpp":3:25 -1
>      (nil))

This is emitted by middle-end to reconstruct a function argument in cases when
argument is passed via multiple registers. Function argument is specified by
the same hook that is used for the caller and the callee, so it is not possible
to simple disable it for naked functions.

When -O2 is used, optimizers figure out that the reconstructed value is unused
and remove the whole reconstruction sequence. This is unfortunately not the
case with -O0.

So, middle-end should provide some sort of mechanism to suppress the generation
of reconstruction sequence. The above sequence is emitted in
function.cc/assign_parm_remove_parallels, but similar functionality can
probably be found elsewhere in the function handling code.

Or only pass simple arguments to naked function. Naked functions are
specialist's tool, not intended for "general public".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/110533] [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
  2023-07-03 12:35 [Bug target/110533] New: [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs engelke at in dot tum.de
  2023-07-03 20:17 ` [Bug target/110533] " pinskia at gcc dot gnu.org
  2023-07-04  8:47 ` ubizjak at gmail dot com
@ 2023-07-06 12:27 ` roger at nextmovesoftware dot com
  2023-07-07 19:41 ` cvs-commit at gcc dot gnu.org
  2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-06 12:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-07-06
     Ever confirmed|0                           |1
                 CC|                            |roger at nextmovesoftware dot com
             Status|UNCONFIRMED                 |NEW

--- Comment #3 from Roger Sayle <roger at nextmovesoftware dot com> ---
The patch recently proposed at
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623756.html would resolve
this issue.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/110533] [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
  2023-07-03 12:35 [Bug target/110533] New: [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs engelke at in dot tum.de
                   ` (2 preceding siblings ...)
  2023-07-06 12:27 ` roger at nextmovesoftware dot com
@ 2023-07-07 19:41 ` cvs-commit at gcc dot gnu.org
  2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-07 19:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:bdf2737cda53a83332db1a1a021653447b05a7e7

commit r14-2386-gbdf2737cda53a83332db1a1a021653447b05a7e7
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Jul 7 20:39:58 2023 +0100

    i386: Improve __int128 argument passing (in ix86_expand_move).

    Passing 128-bit integer (TImode) parameters on x86_64 can sometimes
    result in surprising code.  Consider the example below (from PR 43644):

    unsigned __int128 foo(unsigned __int128 x, unsigned long long y) {
      return x+y;
    }

    which currently results in 6 consecutive movq instructions:

    foo:    movq    %rsi, %rax
            movq    %rdi, %rsi
            movq    %rdx, %rcx
            movq    %rax, %rdi
            movq    %rsi, %rax
            movq    %rdi, %rdx
            addq    %rcx, %rax
            adcq    $0, %rdx
            ret

    The underlying issue is that during RTL expansion, we generate the
    following initial RTL for the x argument:

    (insn 4 3 5 2 (set (reg:TI 85)
            (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1
         (nil))
    (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8)
            (reg:DI 87)) "pr43644-2.c":5:1 -1
         (nil))
    (insn 6 5 7 2 (set (reg/v:TI 84 [ x ])
            (reg:TI 85)) "pr43644-2.c":5:1 -1
         (nil))

    which by combine/reload becomes

    (insn 25 3 22 2 (set (reg/v:TI 84 [ x ])
            (const_int 0 [0])) "pr43644-2.c":5:1 -1
         (nil))
    (insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0)
            (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal}
         (expr_list:REG_DEAD (reg:DI 93)
            (nil)))
    (insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8)
            (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal}
         (expr_list:REG_DEAD (reg:DI 94)
            (nil)))

    where the heavy use of SUBREG SET_DESTs creates challenges for both
    combine and register allocation.

    The improvement proposed here is to avoid these problematic SUBREGs
    by adding (two) special cases to ix86_expand_move.  For insn 4, which
    sets a TImode destination from a paradoxical SUBREG, to assign the
    lowpart, we can use an explicit zero extension (zero_extendditi2 was
    added in July 2022), and for insn 5, which sets the highpart of a
    TImode register we can use the *insvti_highpart_1 instruction (that
    was added in May 2023, after being approved for stage1 in January).
    This allows combine to work its magic, merging these insns into a
    *concatditi3 and from there into other optimized forms.

    So for the test case above, we now generate only a single movq:

    foo:    movq    %rdx, %rax
            xorl    %edx, %edx
            addq    %rdi, %rax
            adcq    %rsi, %rdx
            ret

    But there is a little bad news.  This patch causes two (minor) missed
    optimization regressions on x86_64; gcc.target/i386/pr82580.c and
    gcc.target/i386/pr91681-1.c.  As shown in the test case above, we're
    no longer generating adcq $0, but instead using xorl.  For the other
    FAIL, register allocation now has more freedom and is (arbitrarily)
    choosing a register assignment that doesn't match what the test is
    expecting.  These issues are easier to explain and fix once this patch
    is in the tree.

    The good news is that this approach fixes a number of long standing
    issues, that need to checked in bugzilla, including PR target/110533
    which was just opened/reported earlier this week.

    2023-07-07  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR target/43644
            PR target/110533
            * config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of
            TImode destinations from paradoxical SUBREGs (setting the lowpart)
            into explicit zero extensions.  Use *insvti_highpart_1 instruction
            to set the highpart of a TImode destination.

    gcc/testsuite/ChangeLog
            PR target/43644
            PR target/110533
            * gcc.target/i386/pr110533.c: New test case.
            * gcc.target/i386/pr43644-2.c: Likewise.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/110533] [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs
  2023-07-03 12:35 [Bug target/110533] New: [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs engelke at in dot tum.de
                   ` (3 preceding siblings ...)
  2023-07-07 19:41 ` cvs-commit at gcc dot gnu.org
@ 2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-22 20:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110533

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:8125b12f846b41f26e58c0fe3b218d654f65d1c8

commit r14-2730-g8125b12f846b41f26e58c0fe3b218d654f65d1c8
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Sat Jul 22 21:52:55 2023 +0100

    i386: Don't use insvti_{high,low}part with -O0 (for compile-time).

    This patch attempts to help with PR rtl-optimization/110587, a regression
    of -O0 compile time for the pathological pr28071.c.  My recent patch helps
    a bit, but hasn't returned -O0 compile-time to where it was before my
    ix86_expand_move changes.  The obvious solution/workaround is to guard
    these new TImode parameter passing optimizations with "&& optimize", so
    they don't trigger when compiling with -O0.  The very minor complication
    is that "&& optimize" alone leads to the regression of pr110533.c, where
    our improved TImode parameter passing fixes a wrong-code issue with naked
    functions, importantly, when compiling with -O0.  This should explain
    the one line fix below "&& (optimize || ix86_function_naked (cfun))".

    I've an additional fix/tweak or two for this compile-time issue, but
    this change eliminates the part of the regression that I've caused.

    2023-07-22  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            * config/i386/i386-expand.cc (ix86_expand_move): Disable the
            64-bit insertions into TImode optimizations with -O0, unless
            the function has the "naked" attribute (for PR target/110533).

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-07-22 20:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-03 12:35 [Bug target/110533] New: [x86-64] naked with -O0 and register-passed struct/int128 clobbers parameters/callee-saved regs engelke at in dot tum.de
2023-07-03 20:17 ` [Bug target/110533] " pinskia at gcc dot gnu.org
2023-07-04  8:47 ` ubizjak at gmail dot com
2023-07-06 12:27 ` roger at nextmovesoftware dot com
2023-07-07 19:41 ` cvs-commit at gcc dot gnu.org
2023-07-22 20:55 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).