[Bug target/91681] Missed optimization for 128 bit arithmetic operations

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
       [not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
@ 2021-12-27  7:29 ` pinskia at gcc dot gnu.org
  2022-07-25 16:37 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-27  7:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC 11 is better (note the widen multiply is still not detected):
        movq    %rdx, %rax
        movq    %rdi, %r9
        movq    %rdx, %rdi
        mulq    %rsi
        movq    %rdx, %rax
        xorl    %edx, %edx
        movq    %rax, %r10
        movq    %rdi, %rax
        movq    %rdx, %r11
        mulq    %r9
        addq    %rax, %r10
        movq    %rsi, %rax
        adcq    %rdx, %r11
        mulq    %rcx
        movq    %rcx, %rax
        xorl    %edi, %edi
        movq    %r10, (%r8)
        movq    %r11, 8(%r8)
        movq    %rdx, %rsi
        mulq    %r9
        addq    %rax, %rsi
        adcq    %rdx, %rdi
        movq    %rsi, 16(%r8)
        movq    %rdi, 24(%r8)
        ret

Still 2 more instructions than LLVM but at least no longer using the stack.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
       [not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
  2021-12-27  7:29 ` [Bug target/91681] Missed optimization for 128 bit arithmetic operations pinskia at gcc dot gnu.org
@ 2022-07-25 16:37 ` cvs-commit at gcc dot gnu.org
  2023-07-07 20:00 ` roger at nextmovesoftware dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-07-25 16:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:16aafa3194d4851a07cc204f56a5f0618f77e5d7

commit r13-1826-g16aafa3194d4851a07cc204f56a5f0618f77e5d7
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Mon Jul 25 17:33:48 2022 +0100

    PR target/91681: zero_extendditi2 pattern for more optimizations on x86.

    Technically, PR target/91681 has already been resolved; we now recognize
the
    highpart multiplication at the tree-level, we no longer use the stack, and
    we currently generate the same number of instructions as LLVM.  However, it
    is still possible to do better, the current x86_64 code to generate a
double
    word addition of a zero extended operand, looks like:

            xorl    %r11d, %r11d
            addq    %r10, %rax
            adcq    %r11, %rdx

    when it's possible (as LLVM does) to use an immediate constant:

            addq    %r10, %rax
            adcq    $0, %rdx

    This is implemented by introducing a zero_extendditi2 pattern,
    for zero extension from DImode to TImode on TARGET_64BIT that is
    split after reload.  With zero extension now visible to combine,
    we add two new define_insn_and_split that add/subtract a zero
    extended operand in double word mode.  These apply to both 32-bit
    and 64-bit code generation, to produce adc $0 and sbb $0.

    One consequence of this is that these new patterns interfere with
    the optimization that recognizes DW:DI = (HI:SI<<32)+LO:SI as a pair
    of register moves, or more accurately the combine splitter no longer
    triggers as we're now converting two instructions into two instructions
    (not three instructions into two instructions).  This is easily
    repaired (and extended to handle TImode) by changing from a pair
    of define_split (that handle operand commutativity) to a set of
    four define_insn_and_split (again to handle operand commutativity).

    2022-07-25  Roger Sayle  <roger@nextmovesoftware.com>
                UroÅ¡ Bizjak  <ubizjak@gmail.com>

    gcc/ChangeLog
            PR target/91681
            * config/i386/i386-expand.cc (split_double_concat): A new helper
            function for setting a double word value from two word values.
            * config/i386/i386-protos.h (split_double_concat): Prototype here.
            * config/i386/i386.md (zero_extendditi2): New
define_insn_and_split.
            (*add<dwi>3_doubleword_zext): New define_insn_and_split.
            (*sub<dwi>3_doubleword_zext): New define_insn_and_split.
            (*concat<mode><dwi>3_1): New define_insn_and_split replacing
            previous define_split for implementing DST = (HI<<32)|LO as
            pair of move instructions, setting lopart and hipart.
            (*concat<mode><dwi>3_2): Likewise.
            (*concat<mode><dwi>3_3): Likewise, where HI is zero_extended.
            (*concat<mode><dwi>3_4): Likewise, where HI is zero_extended.

    gcc/testsuite/ChangeLog
            PR target/91681
            * g++.target/i386/pr91681.C: New test case (from the PR).
            * gcc.target/i386/pr91681-1.c: New int128 test case.
            * gcc.target/i386/pr91681-2.c: Likewise.
            * gcc.target/i386/pr91681-3.c: Likewise, but for ia32.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
       [not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
  2021-12-27  7:29 ` [Bug target/91681] Missed optimization for 128 bit arithmetic operations pinskia at gcc dot gnu.org
  2022-07-25 16:37 ` cvs-commit at gcc dot gnu.org
@ 2023-07-07 20:00 ` roger at nextmovesoftware dot com
  2023-07-12 13:14 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-07 20:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com
                 CC|                            |roger at nextmovesoftware dot com

--- Comment #4 from Roger Sayle <roger at nextmovesoftware dot com> ---
Advance warning, the testcase pr91681-1.c will start FAILing (temporarily) due
to changes/improvements in __int128 parameter passing, as explained in
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623756.html
Easily fixed by additional define_insn_and_split patterns which I'll submit
soon.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
       [not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2023-07-07 20:00 ` roger at nextmovesoftware dot com
@ 2023-07-12 13:14 ` cvs-commit at gcc dot gnu.org
  2023-07-12 13:46 ` cvs-commit at gcc dot gnu.org
  2024-01-21 23:31 ` roger at nextmovesoftware dot com
  5 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-12 13:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:275a2124e4928c88bd5469096356ba393b6aadfb

commit r14-2466-g275a2124e4928c88bd5469096356ba393b6aadfb
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Wed Jul 12 14:14:15 2023 +0100

    i386: Fix FAIL of gcc.target/i386/pr91681-1.c

    The recent change in TImode parameter passing on x86_64 results in the
    FAIL of pr91681-1.c.  The issue is that with the extra flexibility,
    the combine pass is now spoilt for choice between using either the
    *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
    patterns, when one operand is a *concat and the other is a zero_extend.
    The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext
    define_insn_and_split, that can benefit both from the register allocation
    of *concat, and still avoid the xor normally required by zero extension.

    I'm investigating a follow-up refinement to improve register allocation
    further by avoiding the early clobber in the =&r, and handling (custom)
    reloads explicitly, but this piece resolves the testcase failure.

    2023-07-12  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR target/91681
            * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
            define_insn_and_split derived from *add<dwi>3_doubleword_concat
            and *add<dwi>3_doubleword_zext.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
       [not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2023-07-12 13:14 ` cvs-commit at gcc dot gnu.org
@ 2023-07-12 13:46 ` cvs-commit at gcc dot gnu.org
  2024-01-21 23:31 ` roger at nextmovesoftware dot com
  5 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-12 13:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:30dbfcd86c364da8491634ed4f99def184e2d042

commit r14-2467-g30dbfcd86c364da8491634ed4f99def184e2d042
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Wed Jul 12 14:43:37 2023 +0100

    i386: Fix FAIL of gcc.target/i386/pr91681-1.c

    I committed the wrong version of this patch (with a typo).
    Updating to the correct bootstrapped and regression tested version
    as obvious.

    2023-07-12  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR target/91681
            * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): Typo.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
       [not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2023-07-12 13:46 ` cvs-commit at gcc dot gnu.org
@ 2024-01-21 23:31 ` roger at nextmovesoftware dot com
  5 siblings, 0 replies; 6+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-01-21 23:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED
   Target Milestone|---                         |14.0

--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> ---
This is now fixed on mainline.  GCC is now optimal, and generates one less
instruction than clang.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-21 23:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
2021-12-27  7:29 ` [Bug target/91681] Missed optimization for 128 bit arithmetic operations pinskia at gcc dot gnu.org
2022-07-25 16:37 ` cvs-commit at gcc dot gnu.org
2023-07-07 20:00 ` roger at nextmovesoftware dot com
2023-07-12 13:14 ` cvs-commit at gcc dot gnu.org
2023-07-12 13:46 ` cvs-commit at gcc dot gnu.org
2024-01-21 23:31 ` roger at nextmovesoftware dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).