* [Bug rtl-optimization/97756] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
@ 2020-11-09 6:29 ` tkoenig at gcc dot gnu.org
2020-12-25 11:35 ` tkoenig at gcc dot gnu.org
` (15 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-11-09 6:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
--- Comment #1 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Actually, it was on a Ryzen 1700 (for the -march=native).
I'm at odds with architecture names...
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
2020-11-09 6:29 ` [Bug rtl-optimization/97756] " tkoenig at gcc dot gnu.org
@ 2020-12-25 11:35 ` tkoenig at gcc dot gnu.org
2021-03-10 12:55 ` ppalka at gcc dot gnu.org
` (14 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2020-12-25 11:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Thomas Koenig <tkoenig at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=98438
--- Comment #2 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
Might be related to / dup of PR 98438.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
2020-11-09 6:29 ` [Bug rtl-optimization/97756] " tkoenig at gcc dot gnu.org
2020-12-25 11:35 ` tkoenig at gcc dot gnu.org
@ 2021-03-10 12:55 ` ppalka at gcc dot gnu.org
2021-04-28 4:58 ` pinskia at gcc dot gnu.org
` (13 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: ppalka at gcc dot gnu.org @ 2021-03-10 12:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Patrick Palka <ppalka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ppalka at gcc dot gnu.org
--- Comment #3 from Patrick Palka <ppalka at gcc dot gnu.org> ---
Perhaps related to this PR: On x86_64, the following basic wrapper around
int128 addition
__uint128_t f(__uint128_t x, __uint128_t y) { return x + y; }
gets compiled (/w -O3, -O2 or -Os) to the seemingly suboptimal
movq %rdi, %r9
movq %rdx, %rax
movq %rsi, %r8
movq %rcx, %rdx
addq %r9, %rax
adcq %r8, %rdx
ret
Clang does:
movq %rdi, %rax
addq %rdx, %rax
adcq %rcx, %rsi
movq %rsi, %rdx
retq
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (2 preceding siblings ...)
2021-03-10 12:55 ` ppalka at gcc dot gnu.org
@ 2021-04-28 4:58 ` pinskia at gcc dot gnu.org
2021-04-28 7:20 ` [Bug rtl-optimization/97756] [9/10/11/12 Regression] " jakub at gcc dot gnu.org
` (12 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-04-28 4:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dushistov at mail dot ru
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 100301 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [9/10/11/12 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (3 preceding siblings ...)
2021-04-28 4:58 ` pinskia at gcc dot gnu.org
@ 2021-04-28 7:20 ` jakub at gcc dot gnu.org
2021-06-01 8:18 ` rguenth at gcc dot gnu.org
` (11 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-28 7:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|enhancement |normal
CC| |jakub at gcc dot gnu.org,
| |vmakarov at gcc dot gnu.org
Summary|Inefficient handling of |[9/10/11/12 Regression]
|128-bit arguments |Inefficient handling of
| |128-bit arguments
Last reconfirmed| |2021-04-28
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Priority|P3 |P2
Target Milestone|--- |9.4
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
On the
__uint128_t f(__uint128_t x, __uint128_t y) { return x + y; }
__uint128_t g(__uint128_t x, __uint128_t y) { return y + x; }
testcase with -O2 this regressed with
r9-6788-g0d2a576a1417b8d4526d369fef1d87cee2c49f99
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [9/10/11/12 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (4 preceding siblings ...)
2021-04-28 7:20 ` [Bug rtl-optimization/97756] [9/10/11/12 Regression] " jakub at gcc dot gnu.org
@ 2021-06-01 8:18 ` rguenth at gcc dot gnu.org
2021-08-30 5:18 ` crazylht at gmail dot com
` (10 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-06-01 8:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|9.4 |9.5
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9.4 is being released, retargeting bugs to GCC 9.5.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [9/10/11/12 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (5 preceding siblings ...)
2021-06-01 8:18 ` rguenth at gcc dot gnu.org
@ 2021-08-30 5:18 ` crazylht at gmail dot com
2022-05-27 9:43 ` [Bug rtl-optimization/97756] [10/11/12/13 " rguenth at gcc dot gnu.org
` (9 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: crazylht at gmail dot com @ 2021-08-30 5:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Patrick Palka from comment #3)
> Perhaps related to this PR: On x86_64, the following basic wrapper around
> int128 addition
>
> __uint128_t f(__uint128_t x, __uint128_t y) { return x + y; }
>
> gets compiled (/w -O3, -O2 or -Os) to the seemingly suboptimal
>
> movq %rdi, %r9
> movq %rdx, %rax
> movq %rsi, %r8
> movq %rcx, %rdx
> addq %r9, %rax
> adcq %r8, %rdx
> ret
>
> Clang does:
>
> movq %rdi, %rax
> addq %rdx, %rax
> adcq %rcx, %rsi
> movq %rsi, %rdx
> retq
Remove addti3/ashlti3 from i386.md also helps this.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [10/11/12/13 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (6 preceding siblings ...)
2021-08-30 5:18 ` crazylht at gmail dot com
@ 2022-05-27 9:43 ` rguenth at gcc dot gnu.org
2022-06-28 10:42 ` jakub at gcc dot gnu.org
` (8 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-05-27 9:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|9.5 |10.4
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 9 branch is being closed
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [10/11/12/13 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (7 preceding siblings ...)
2022-05-27 9:43 ` [Bug rtl-optimization/97756] [10/11/12/13 " rguenth at gcc dot gnu.org
@ 2022-06-28 10:42 ` jakub at gcc dot gnu.org
2023-07-07 10:38 ` [Bug rtl-optimization/97756] [11/12/13/14 " rguenth at gcc dot gnu.org
` (7 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-06-28 10:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|10.4 |10.5
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 10.4 is being released, retargeting bugs to GCC 10.5.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [11/12/13/14 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (8 preceding siblings ...)
2022-06-28 10:42 ` jakub at gcc dot gnu.org
@ 2023-07-07 10:38 ` rguenth at gcc dot gnu.org
2023-07-13 21:09 ` pinskia at gcc dot gnu.org
` (6 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|10.5 |11.5
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 10 branch is being closed.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [11/12/13/14 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (9 preceding siblings ...)
2023-07-07 10:38 ` [Bug rtl-optimization/97756] [11/12/13/14 " rguenth at gcc dot gnu.org
@ 2023-07-13 21:09 ` pinskia at gcc dot gnu.org
2023-07-16 11:48 ` tkoenig at gcc dot gnu.org
` (5 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-13 21:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |roger at nextmovesoftware dot com
--- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This seems to be improved on trunk ...
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [11/12/13/14 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (10 preceding siblings ...)
2023-07-13 21:09 ` pinskia at gcc dot gnu.org
@ 2023-07-16 11:48 ` tkoenig at gcc dot gnu.org
2023-11-07 18:38 ` tkoenig at gcc dot gnu.org
` (4 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2023-07-16 11:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
--- Comment #12 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #11)
> This seems to be improved on trunk ...
gcc is down to 37 instructions now for the original test case with -O3.
icc, which appears to be best, has 33, see https://godbolt.org/z/461jeozs9 .
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [11/12/13/14 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (11 preceding siblings ...)
2023-07-16 11:48 ` tkoenig at gcc dot gnu.org
@ 2023-11-07 18:38 ` tkoenig at gcc dot gnu.org
2023-11-13 9:06 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2023-11-07 18:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
--- Comment #13 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to Patrick Palka from comment #3)
> Perhaps related to this PR: On x86_64, the following basic wrapper around
> int128 addition
>
> __uint128_t f(__uint128_t x, __uint128_t y) { return x + y; }
>
> gets compiled (/w -O3, -O2 or -Os) to the seemingly suboptimal
>
> movq %rdi, %r9
> movq %rdx, %rax
> movq %rsi, %r8
> movq %rcx, %rdx
> addq %r9, %rax
> adcq %r8, %rdx
> ret
>
> Clang does:
>
> movq %rdi, %rax
> addq %rdx, %rax
> adcq %rcx, %rsi
> movq %rsi, %rdx
> retq
With current trunk, this is now
movq %rdx, %rax
movq %rcx, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
so it looks OK.
The original test case regressed a bit, it is now 39 instructions.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [11/12/13/14 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (12 preceding siblings ...)
2023-11-07 18:38 ` tkoenig at gcc dot gnu.org
@ 2023-11-13 9:06 ` cvs-commit at gcc dot gnu.org
2023-11-13 17:52 ` tkoenig at gcc dot gnu.org
` (2 subsequent siblings)
16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-13 9:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:0a140730c970870a5125beb1114f6c01679a040e
commit r14-5385-g0a140730c970870a5125beb1114f6c01679a040e
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Mon Nov 13 09:05:16 2023 +0000
i386: Improve reg pressure of double word right shift then truncate.
This patch improves register pressure during reload, inspired by PR 97756.
Normally, a double-word right-shift by a constant produces a double-word
result, the highpart of which is dead when followed by a truncation.
The dead code calculating the high part gets cleaned up post-reload, so
the issue isn't normally visible, except for the increased register
pressure during reload, sometimes leading to odd register assignments.
Providing a post-reload splitter, which clobbers a single wordmode
result register instead of a doubleword result register, helps (a bit).
An example demonstrating this effect is:
unsigned long foo (__uint128_t n)
{
unsigned long a = n & MASK60;
unsigned long b = (n >> 60);
b = b & MASK60;
unsigned long c = (n >> 120);
return a+b+c;
}
which currently with -O2 generates (13 instructions):
foo: movabsq $1152921504606846975, %rcx
xchgq %rdi, %rsi
movq %rsi, %rax
shrdq $60, %rdi, %rax
movq %rax, %rdx
movq %rsi, %rax
movq %rdi, %rsi
andq %rcx, %rax
shrq $56, %rsi
andq %rcx, %rdx
addq %rsi, %rax
addq %rdx, %rax
ret
with this patch, we generate one less mov (12 instructions):
foo: movabsq $1152921504606846975, %rcx
xchgq %rdi, %rsi
movq %rdi, %rdx
movq %rsi, %rax
movq %rdi, %rsi
shrdq $60, %rdi, %rdx
andq %rcx, %rax
shrq $56, %rsi
addq %rsi, %rax
andq %rcx, %rdx
addq %rdx, %rax
ret
The significant difference is easier to see via diff:
< shrdq $60, %rdi, %rax
< movq %rax, %rdx
---
> shrdq $60, %rdi, %rdx
Admittedly a single "mov" isn't much of a saving on modern architectures,
but as demonstrated by the PR, people still track the number of them.
2023-11-13 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): New
define_insn_and_split to optimize register usage of doubleword
right shifts followed by truncation.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [11/12/13/14 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (13 preceding siblings ...)
2023-11-13 9:06 ` cvs-commit at gcc dot gnu.org
@ 2023-11-13 17:52 ` tkoenig at gcc dot gnu.org
2023-11-14 12:20 ` cvs-commit at gcc dot gnu.org
2024-04-26 12:54 ` [Bug rtl-optimization/97756] [11/12/13 " roger at nextmovesoftware dot com
16 siblings, 0 replies; 18+ messages in thread
From: tkoenig at gcc dot gnu.org @ 2023-11-13 17:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
--- Comment #15 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to CVS Commits from comment #14)
> Admittedly a single "mov" isn't much of a saving on modern architectures,
> but as demonstrated by the PR, people still track the number of them.
Thanks :-)
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [11/12/13/14 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (14 preceding siblings ...)
2023-11-13 17:52 ` tkoenig at gcc dot gnu.org
@ 2023-11-14 12:20 ` cvs-commit at gcc dot gnu.org
2024-04-26 12:54 ` [Bug rtl-optimization/97756] [11/12/13 " roger at nextmovesoftware dot com
16 siblings, 0 replies; 18+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-14 12:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
--- Comment #16 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
https://gcc.gnu.org/g:aad65285a1c681feb9fc5b041c86d841b24c3d2a
commit r14-5442-gaad65285a1c681feb9fc5b041c86d841b24c3d2a
Author: Jakub Jelinek <jakub@redhat.com>
Date: Tue Nov 14 13:19:48 2023 +0100
i386: Fix up <insn><dwi>3_doubleword_lowpart [PR112523]
On Sun, Nov 12, 2023 at 09:03:42PM -0000, Roger Sayle wrote:
> This patch improves register pressure during reload, inspired by PR
97756.
> Normally, a double-word right-shift by a constant produces a double-word
> result, the highpart of which is dead when followed by a truncation.
> The dead code calculating the high part gets cleaned up post-reload, so
> the issue isn't normally visible, except for the increased register
> pressure during reload, sometimes leading to odd register assignments.
> Providing a post-reload splitter, which clobbers a single wordmode
> result register instead of a doubleword result register, helps (a bit).
Unfortunately this broke bootstrap on i686-linux, broke all ACATS tests
on x86_64-linux as well as miscompiled e.g. __floattisf in libgcc there
as well.
The bug is that shrd{l,q} instruction expects the low part of the input
to be the same register as the output, rather than the high part as the
patch implemented.
split_double_mode (<DWI>mode, &operands[1], 1, &operands[1],
&operands[3]);
sets operands[1] to the lo_half and operands[3] to the hi_half, so if
operands[0] is not the same register as operands[1] (rather than [3]) after
RA, we should during splitting move operands[1] into operands[0].
Your testcase:
> #define MASK60 ((1ul << 60) - 1)
> unsigned long foo (__uint128_t n)
> {
> unsigned long a = n & MASK60;
> unsigned long b = (n >> 60);
> b = b & MASK60;
> unsigned long c = (n >> 120);
> return a+b+c;
> }
still has the same number of instructions.
Bootstrapped/regtested on x86_64-linux (where it e.g. turns
=== acats Summary ===
-# of unexpected failures 2328
+# of expected passes 2328
+# of unexpected failures 0
and fixes gcc.dg/torture/fp-int-convert-*timode.c FAILs as well)
and i686-linux (where it previously didn't bootstrap, but compared to
Friday evening's bootstrap the testresults are ok).
2023-11-14 Jakub Jelinek <jakub@redhat.com>
PR target/112523
PR ada/112514
* config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): Move
operands[1] aka low part of input rather than operands[3] aka high
part of input to output if not the same register.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [Bug rtl-optimization/97756] [11/12/13 Regression] Inefficient handling of 128-bit arguments
2020-11-08 19:53 [Bug rtl-optimization/97756] New: Inefficient handling of 128-bit arguments tkoenig at gcc dot gnu.org
` (15 preceding siblings ...)
2023-11-14 12:20 ` cvs-commit at gcc dot gnu.org
@ 2024-04-26 12:54 ` roger at nextmovesoftware dot com
16 siblings, 0 replies; 18+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-04-26 12:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |14.0
Summary|[11/12/13/14/15 Regression] |[11/12/13 Regression]
|Inefficient handling of |Inefficient handling of
|128-bit arguments |128-bit arguments
--- Comment #17 from Roger Sayle <roger at nextmovesoftware dot com> ---
I believe this issue is now fixed on mainline (i.e. for both GCC 14 and GCC
15).
Firstly, many thanks to Jakub for correcting the error in my patch. We now
generate optimal code sequences for the code in comments #3 and #5, and use
generate fewer instructions than described in the original description.
The final remaining issue is that with -O3 GCC still uses more instructions
than clang and icc (see Thomas' comments in comments #12 and #13). The good
news is that this is intentional, compiling with -Os (to optimize for size)
generates the same number of instructions as clang and icc [in fact, using icc
-Os generates larger code!?]. So when optimizing for performance, GCC is
taking the opportunity to use more (cheap) instructions to execute faster (or
that's the theory).
^ permalink raw reply [flat|nested] 18+ messages in thread