* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
[not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
@ 2021-12-27 7:29 ` pinskia at gcc dot gnu.org
2022-07-25 16:37 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-27 7:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC 11 is better (note the widen multiply is still not detected):
movq %rdx, %rax
movq %rdi, %r9
movq %rdx, %rdi
mulq %rsi
movq %rdx, %rax
xorl %edx, %edx
movq %rax, %r10
movq %rdi, %rax
movq %rdx, %r11
mulq %r9
addq %rax, %r10
movq %rsi, %rax
adcq %rdx, %r11
mulq %rcx
movq %rcx, %rax
xorl %edi, %edi
movq %r10, (%r8)
movq %r11, 8(%r8)
movq %rdx, %rsi
mulq %r9
addq %rax, %rsi
adcq %rdx, %rdi
movq %rsi, 16(%r8)
movq %rdi, 24(%r8)
ret
Still 2 more instructions than LLVM but at least no longer using the stack.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
[not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
2021-12-27 7:29 ` [Bug target/91681] Missed optimization for 128 bit arithmetic operations pinskia at gcc dot gnu.org
@ 2022-07-25 16:37 ` cvs-commit at gcc dot gnu.org
2023-07-07 20:00 ` roger at nextmovesoftware dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-07-25 16:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681
--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:16aafa3194d4851a07cc204f56a5f0618f77e5d7
commit r13-1826-g16aafa3194d4851a07cc204f56a5f0618f77e5d7
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Mon Jul 25 17:33:48 2022 +0100
PR target/91681: zero_extendditi2 pattern for more optimizations on x86.
Technically, PR target/91681 has already been resolved; we now recognize
the
highpart multiplication at the tree-level, we no longer use the stack, and
we currently generate the same number of instructions as LLVM. However, it
is still possible to do better, the current x86_64 code to generate a
double
word addition of a zero extended operand, looks like:
xorl %r11d, %r11d
addq %r10, %rax
adcq %r11, %rdx
when it's possible (as LLVM does) to use an immediate constant:
addq %r10, %rax
adcq $0, %rdx
This is implemented by introducing a zero_extendditi2 pattern,
for zero extension from DImode to TImode on TARGET_64BIT that is
split after reload. With zero extension now visible to combine,
we add two new define_insn_and_split that add/subtract a zero
extended operand in double word mode. These apply to both 32-bit
and 64-bit code generation, to produce adc $0 and sbb $0.
One consequence of this is that these new patterns interfere with
the optimization that recognizes DW:DI = (HI:SI<<32)+LO:SI as a pair
of register moves, or more accurately the combine splitter no longer
triggers as we're now converting two instructions into two instructions
(not three instructions into two instructions). This is easily
repaired (and extended to handle TImode) by changing from a pair
of define_split (that handle operand commutativity) to a set of
four define_insn_and_split (again to handle operand commutativity).
2022-07-25 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/91681
* config/i386/i386-expand.cc (split_double_concat): A new helper
function for setting a double word value from two word values.
* config/i386/i386-protos.h (split_double_concat): Prototype here.
* config/i386/i386.md (zero_extendditi2): New
define_insn_and_split.
(*add<dwi>3_doubleword_zext): New define_insn_and_split.
(*sub<dwi>3_doubleword_zext): New define_insn_and_split.
(*concat<mode><dwi>3_1): New define_insn_and_split replacing
previous define_split for implementing DST = (HI<<32)|LO as
pair of move instructions, setting lopart and hipart.
(*concat<mode><dwi>3_2): Likewise.
(*concat<mode><dwi>3_3): Likewise, where HI is zero_extended.
(*concat<mode><dwi>3_4): Likewise, where HI is zero_extended.
gcc/testsuite/ChangeLog
PR target/91681
* g++.target/i386/pr91681.C: New test case (from the PR).
* gcc.target/i386/pr91681-1.c: New int128 test case.
* gcc.target/i386/pr91681-2.c: Likewise.
* gcc.target/i386/pr91681-3.c: Likewise, but for ia32.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
[not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
2021-12-27 7:29 ` [Bug target/91681] Missed optimization for 128 bit arithmetic operations pinskia at gcc dot gnu.org
2022-07-25 16:37 ` cvs-commit at gcc dot gnu.org
@ 2023-07-07 20:00 ` roger at nextmovesoftware dot com
2023-07-12 13:14 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-07 20:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com
CC| |roger at nextmovesoftware dot com
--- Comment #4 from Roger Sayle <roger at nextmovesoftware dot com> ---
Advance warning, the testcase pr91681-1.c will start FAILing (temporarily) due
to changes/improvements in __int128 parameter passing, as explained in
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/623756.html
Easily fixed by additional define_insn_and_split patterns which I'll submit
soon.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
[not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2023-07-07 20:00 ` roger at nextmovesoftware dot com
@ 2023-07-12 13:14 ` cvs-commit at gcc dot gnu.org
2023-07-12 13:46 ` cvs-commit at gcc dot gnu.org
2024-01-21 23:31 ` roger at nextmovesoftware dot com
5 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-12 13:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:275a2124e4928c88bd5469096356ba393b6aadfb
commit r14-2466-g275a2124e4928c88bd5469096356ba393b6aadfb
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Wed Jul 12 14:14:15 2023 +0100
i386: Fix FAIL of gcc.target/i386/pr91681-1.c
The recent change in TImode parameter passing on x86_64 results in the
FAIL of pr91681-1.c. The issue is that with the extra flexibility,
the combine pass is now spoilt for choice between using either the
*add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
patterns, when one operand is a *concat and the other is a zero_extend.
The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext
define_insn_and_split, that can benefit both from the register allocation
of *concat, and still avoid the xor normally required by zero extension.
I'm investigating a follow-up refinement to improve register allocation
further by avoiding the early clobber in the =&r, and handling (custom)
reloads explicitly, but this piece resolves the testcase failure.
2023-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/91681
* config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
define_insn_and_split derived from *add<dwi>3_doubleword_concat
and *add<dwi>3_doubleword_zext.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
[not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2023-07-12 13:14 ` cvs-commit at gcc dot gnu.org
@ 2023-07-12 13:46 ` cvs-commit at gcc dot gnu.org
2024-01-21 23:31 ` roger at nextmovesoftware dot com
5 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-12 13:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:30dbfcd86c364da8491634ed4f99def184e2d042
commit r14-2467-g30dbfcd86c364da8491634ed4f99def184e2d042
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Wed Jul 12 14:43:37 2023 +0100
i386: Fix FAIL of gcc.target/i386/pr91681-1.c
I committed the wrong version of this patch (with a typo).
Updating to the correct bootstrapped and regression tested version
as obvious.
2023-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/91681
* config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): Typo.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/91681] Missed optimization for 128 bit arithmetic operations
[not found] <bug-91681-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2023-07-12 13:46 ` cvs-commit at gcc dot gnu.org
@ 2024-01-21 23:31 ` roger at nextmovesoftware dot com
5 siblings, 0 replies; 6+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-01-21 23:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91681
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
Target Milestone|--- |14.0
--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> ---
This is now fixed on mainline. GCC is now optimal, and generates one less
instruction than clang.
^ permalink raw reply [flat|nested] 6+ messages in thread