* [x86 PATCH] Improve reg pressure of double-word right-shift then truncate.
@ 2023-11-12 21:03 Roger Sayle
2023-11-13 7:23 ` Uros Bizjak
2023-11-14 12:06 ` [PATCH] i386: Fix up <insn><dwi>3_doubleword_lowpart [PR112523] Jakub Jelinek
0 siblings, 2 replies; 5+ messages in thread
From: Roger Sayle @ 2023-11-12 21:03 UTC (permalink / raw)
To: gcc-patches; +Cc: 'Uros Bizjak'
[-- Attachment #1: Type: text/plain, Size: 2374 bytes --]
This patch improves register pressure during reload, inspired by PR 97756.
Normally, a double-word right-shift by a constant produces a double-word
result, the highpart of which is dead when followed by a truncation.
The dead code calculating the high part gets cleaned up post-reload, so
the issue isn't normally visible, except for the increased register
pressure during reload, sometimes leading to odd register assignments.
Providing a post-reload splitter, which clobbers a single wordmode
result register instead of a doubleword result register, helps (a bit).
An example demonstrating this effect is:
#define MASK60 ((1ul << 60) - 1)
unsigned long foo (__uint128_t n)
{
unsigned long a = n & MASK60;
unsigned long b = (n >> 60);
b = b & MASK60;
unsigned long c = (n >> 120);
return a+b+c;
}
which currently with -O2 generates (13 instructions):
foo: movabsq $1152921504606846975, %rcx
xchgq %rdi, %rsi
movq %rsi, %rax
shrdq $60, %rdi, %rax
movq %rax, %rdx
movq %rsi, %rax
movq %rdi, %rsi
andq %rcx, %rax
shrq $56, %rsi
andq %rcx, %rdx
addq %rsi, %rax
addq %rdx, %rax
ret
with this patch, we generate one less mov (12 instructions):
foo: movabsq $1152921504606846975, %rcx
xchgq %rdi, %rsi
movq %rdi, %rdx
movq %rsi, %rax
movq %rdi, %rsi
shrdq $60, %rdi, %rdx
andq %rcx, %rax
shrq $56, %rsi
addq %rsi, %rax
andq %rcx, %rdx
addq %rdx, %rax
ret
The significant difference is easier to see via diff:
< shrdq $60, %rdi, %rax
< movq %rax, %rdx
---
> shrdq $60, %rdi, %rdx
Admittedly a single "mov" isn't much of a saving on modern architectures,
but as demonstrated by the PR, people still track the number of them.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
2023-11-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): New
define_insn_and_split to optimize register usage of doubleword
right shifts followed by truncation.
Thanks in advance,
Roger
--
[-- Attachment #2: patchst.txt --]
[-- Type: text/plain, Size: 1449 bytes --]
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 663db73..8a6928f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -14833,6 +14833,31 @@
[(const_int 0)]
"ix86_split_<insn> (operands, operands[3], <DWI>mode); DONE;")
+;; Split truncations of TImode right shifts into x86_64_shrd_1.
+;; Split truncations of DImode right shifts into x86_shrd_1.
+(define_insn_and_split "<insn><dwi>3_doubleword_lowpart"
+ [(set (match_operand:DWIH 0 "register_operand" "=&r")
+ (subreg:DWIH
+ (any_shiftrt:<DWI> (match_operand:<DWI> 1 "register_operand" "r")
+ (match_operand:QI 2 "const_int_operand")) 0))
+ (clobber (reg:CC FLAGS_REG))]
+ "UINTVAL (operands[2]) < <MODE_SIZE> * BITS_PER_UNIT"
+ "#"
+ "&& reload_completed"
+ [(parallel
+ [(set (match_dup 0)
+ (ior:DWIH (lshiftrt:DWIH (match_dup 0) (match_dup 2))
+ (subreg:DWIH
+ (ashift:<DWI> (zero_extend:<DWI> (match_dup 3))
+ (match_dup 4)) 0)))
+ (clobber (reg:CC FLAGS_REG))])]
+{
+ split_double_mode (<DWI>mode, &operands[1], 1, &operands[1], &operands[3]);
+ operands[4] = GEN_INT ((<MODE_SIZE> * BITS_PER_UNIT) - INTVAL (operands[2]));
+ if (!rtx_equal_p (operands[0], operands[3]))
+ emit_move_insn (operands[0], operands[3]);
+})
+
(define_insn "x86_64_shrd"
[(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
(ior:DI (lshiftrt:DI (match_dup 0)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [x86 PATCH] Improve reg pressure of double-word right-shift then truncate.
2023-11-12 21:03 [x86 PATCH] Improve reg pressure of double-word right-shift then truncate Roger Sayle
@ 2023-11-13 7:23 ` Uros Bizjak
2023-11-14 12:06 ` [PATCH] i386: Fix up <insn><dwi>3_doubleword_lowpart [PR112523] Jakub Jelinek
1 sibling, 0 replies; 5+ messages in thread
From: Uros Bizjak @ 2023-11-13 7:23 UTC (permalink / raw)
To: Roger Sayle; +Cc: gcc-patches
On Sun, Nov 12, 2023 at 10:03 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch improves register pressure during reload, inspired by PR 97756.
> Normally, a double-word right-shift by a constant produces a double-word
> result, the highpart of which is dead when followed by a truncation.
> The dead code calculating the high part gets cleaned up post-reload, so
> the issue isn't normally visible, except for the increased register
> pressure during reload, sometimes leading to odd register assignments.
> Providing a post-reload splitter, which clobbers a single wordmode
> result register instead of a doubleword result register, helps (a bit).
>
> An example demonstrating this effect is:
>
> #define MASK60 ((1ul << 60) - 1)
> unsigned long foo (__uint128_t n)
> {
> unsigned long a = n & MASK60;
> unsigned long b = (n >> 60);
> b = b & MASK60;
> unsigned long c = (n >> 120);
> return a+b+c;
> }
>
> which currently with -O2 generates (13 instructions):
> foo: movabsq $1152921504606846975, %rcx
> xchgq %rdi, %rsi
> movq %rsi, %rax
> shrdq $60, %rdi, %rax
> movq %rax, %rdx
> movq %rsi, %rax
> movq %rdi, %rsi
> andq %rcx, %rax
> shrq $56, %rsi
> andq %rcx, %rdx
> addq %rsi, %rax
> addq %rdx, %rax
> ret
>
> with this patch, we generate one less mov (12 instructions):
> foo: movabsq $1152921504606846975, %rcx
> xchgq %rdi, %rsi
> movq %rdi, %rdx
> movq %rsi, %rax
> movq %rdi, %rsi
> shrdq $60, %rdi, %rdx
> andq %rcx, %rax
> shrq $56, %rsi
> addq %rsi, %rax
> andq %rcx, %rdx
> addq %rdx, %rax
> ret
>
> The significant difference is easier to see via diff:
> < shrdq $60, %rdi, %rax
> < movq %rax, %rdx
> ---
> > shrdq $60, %rdi, %rdx
>
>
> Admittedly a single "mov" isn't much of a saving on modern architectures,
> but as demonstrated by the PR, people still track the number of them.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
>
>
> 2023-11-12 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> * config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): New
> define_insn_and_split to optimize register usage of doubleword
> right shifts followed by truncation.
+;; Split truncations of TImode right shifts into x86_64_shrd_1.
+;; Split truncations of DImode right shifts into x86_shrd_1.
You can just say
;; Split truncations of double word right shifts into x86_shrd_1.
OK with the above change.
Thanks,
Uros.
>
> Thanks in advance,
> Roger
> --
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] i386: Fix up <insn><dwi>3_doubleword_lowpart [PR112523]
2023-11-12 21:03 [x86 PATCH] Improve reg pressure of double-word right-shift then truncate Roger Sayle
2023-11-13 7:23 ` Uros Bizjak
@ 2023-11-14 12:06 ` Jakub Jelinek
2023-11-14 12:08 ` Uros Bizjak
2023-11-14 12:48 ` Richard Biener
1 sibling, 2 replies; 5+ messages in thread
From: Jakub Jelinek @ 2023-11-14 12:06 UTC (permalink / raw)
To: Uros Bizjak, Roger Sayle; +Cc: gcc-patches
Hi!
On Sun, Nov 12, 2023 at 09:03:42PM -0000, Roger Sayle wrote:
> This patch improves register pressure during reload, inspired by PR 97756.
> Normally, a double-word right-shift by a constant produces a double-word
> result, the highpart of which is dead when followed by a truncation.
> The dead code calculating the high part gets cleaned up post-reload, so
> the issue isn't normally visible, except for the increased register
> pressure during reload, sometimes leading to odd register assignments.
> Providing a post-reload splitter, which clobbers a single wordmode
> result register instead of a doubleword result register, helps (a bit).
Unfortunately this broke bootstrap on i686-linux, broke all ACATS tests
on x86_64-linux as well as miscompiled e.g. __floattisf in libgcc there
as well.
The bug is that shrd{l,q} instruction expects the low part of the input
to be the same register as the output, rather than the high part as the
patch implemented.
split_double_mode (<DWI>mode, &operands[1], 1, &operands[1], &operands[3]);
sets operands[1] to the lo_half and operands[3] to the hi_half, so if
operands[0] is not the same register as operands[1] (rather than [3]) after
RA, we should during splitting move operands[1] into operands[0].
Your testcase:
> #define MASK60 ((1ul << 60) - 1)
> unsigned long foo (__uint128_t n)
> {
> unsigned long a = n & MASK60;
> unsigned long b = (n >> 60);
> b = b & MASK60;
> unsigned long c = (n >> 120);
> return a+b+c;
> }
still has the same number of instructions.
Bootstrapped/regtested on x86_64-linux (where it e.g. turns
=== acats Summary ===
-# of unexpected failures 2328
+# of expected passes 2328
+# of unexpected failures 0
and fixes gcc.dg/torture/fp-int-convert-*timode.c FAILs as well)
and i686-linux (where it previously didn't bootstrap, but compared to
Friday evening's bootstrap the testresults are ok), ok for trunk?
2023-11-14 Jakub Jelinek <jakub@redhat.com>
PR target/112523
PR ada/112514
* config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): Move
operands[1] aka low part of input rather than operands[3] aka high
part of input to output if not the same register.
--- gcc/config/i386/i386.md.jj 2023-11-14 08:10:18.932549803 +0100
+++ gcc/config/i386/i386.md 2023-11-14 09:31:05.565019207 +0100
@@ -14825,8 +14825,8 @@ (define_insn_and_split "<insn><dwi>3_dou
{
split_double_mode (<DWI>mode, &operands[1], 1, &operands[1], &operands[3]);
operands[4] = GEN_INT ((<MODE_SIZE> * BITS_PER_UNIT) - INTVAL (operands[2]));
- if (!rtx_equal_p (operands[0], operands[3]))
- emit_move_insn (operands[0], operands[3]);
+ if (!rtx_equal_p (operands[0], operands[1]))
+ emit_move_insn (operands[0], operands[1]);
})
(define_insn "x86_64_shrd"
Jakub
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] i386: Fix up <insn><dwi>3_doubleword_lowpart [PR112523]
2023-11-14 12:06 ` [PATCH] i386: Fix up <insn><dwi>3_doubleword_lowpart [PR112523] Jakub Jelinek
@ 2023-11-14 12:08 ` Uros Bizjak
2023-11-14 12:48 ` Richard Biener
1 sibling, 0 replies; 5+ messages in thread
From: Uros Bizjak @ 2023-11-14 12:08 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Roger Sayle, gcc-patches
On Tue, Nov 14, 2023 at 1:07 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> On Sun, Nov 12, 2023 at 09:03:42PM -0000, Roger Sayle wrote:
> > This patch improves register pressure during reload, inspired by PR 97756.
> > Normally, a double-word right-shift by a constant produces a double-word
> > result, the highpart of which is dead when followed by a truncation.
> > The dead code calculating the high part gets cleaned up post-reload, so
> > the issue isn't normally visible, except for the increased register
> > pressure during reload, sometimes leading to odd register assignments.
> > Providing a post-reload splitter, which clobbers a single wordmode
> > result register instead of a doubleword result register, helps (a bit).
>
> Unfortunately this broke bootstrap on i686-linux, broke all ACATS tests
> on x86_64-linux as well as miscompiled e.g. __floattisf in libgcc there
> as well.
>
> The bug is that shrd{l,q} instruction expects the low part of the input
> to be the same register as the output, rather than the high part as the
> patch implemented.
> split_double_mode (<DWI>mode, &operands[1], 1, &operands[1], &operands[3]);
> sets operands[1] to the lo_half and operands[3] to the hi_half, so if
> operands[0] is not the same register as operands[1] (rather than [3]) after
> RA, we should during splitting move operands[1] into operands[0].
>
> Your testcase:
>
> > #define MASK60 ((1ul << 60) - 1)
> > unsigned long foo (__uint128_t n)
> > {
> > unsigned long a = n & MASK60;
> > unsigned long b = (n >> 60);
> > b = b & MASK60;
> > unsigned long c = (n >> 120);
> > return a+b+c;
> > }
>
> still has the same number of instructions.
>
> Bootstrapped/regtested on x86_64-linux (where it e.g. turns
> === acats Summary ===
> -# of unexpected failures 2328
> +# of expected passes 2328
> +# of unexpected failures 0
> and fixes gcc.dg/torture/fp-int-convert-*timode.c FAILs as well)
> and i686-linux (where it previously didn't bootstrap, but compared to
> Friday evening's bootstrap the testresults are ok), ok for trunk?
>
> 2023-11-14 Jakub Jelinek <jakub@redhat.com>
>
> PR target/112523
> PR ada/112514
> * config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): Move
> operands[1] aka low part of input rather than operands[3] aka high
> part of input to output if not the same register.
OK.
Thanks,
Uros.
>
> --- gcc/config/i386/i386.md.jj 2023-11-14 08:10:18.932549803 +0100
> +++ gcc/config/i386/i386.md 2023-11-14 09:31:05.565019207 +0100
> @@ -14825,8 +14825,8 @@ (define_insn_and_split "<insn><dwi>3_dou
> {
> split_double_mode (<DWI>mode, &operands[1], 1, &operands[1], &operands[3]);
> operands[4] = GEN_INT ((<MODE_SIZE> * BITS_PER_UNIT) - INTVAL (operands[2]));
> - if (!rtx_equal_p (operands[0], operands[3]))
> - emit_move_insn (operands[0], operands[3]);
> + if (!rtx_equal_p (operands[0], operands[1]))
> + emit_move_insn (operands[0], operands[1]);
> })
>
> (define_insn "x86_64_shrd"
>
>
> Jakub
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] i386: Fix up <insn><dwi>3_doubleword_lowpart [PR112523]
2023-11-14 12:06 ` [PATCH] i386: Fix up <insn><dwi>3_doubleword_lowpart [PR112523] Jakub Jelinek
2023-11-14 12:08 ` Uros Bizjak
@ 2023-11-14 12:48 ` Richard Biener
1 sibling, 0 replies; 5+ messages in thread
From: Richard Biener @ 2023-11-14 12:48 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Uros Bizjak, Roger Sayle, gcc-patches
On Tue, Nov 14, 2023 at 1:07 PM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> On Sun, Nov 12, 2023 at 09:03:42PM -0000, Roger Sayle wrote:
> > This patch improves register pressure during reload, inspired by PR 97756.
> > Normally, a double-word right-shift by a constant produces a double-word
> > result, the highpart of which is dead when followed by a truncation.
> > The dead code calculating the high part gets cleaned up post-reload, so
> > the issue isn't normally visible, except for the increased register
> > pressure during reload, sometimes leading to odd register assignments.
> > Providing a post-reload splitter, which clobbers a single wordmode
> > result register instead of a doubleword result register, helps (a bit).
>
> Unfortunately this broke bootstrap on i686-linux, broke all ACATS tests
> on x86_64-linux as well as miscompiled e.g. __floattisf in libgcc there
> as well.
>
> The bug is that shrd{l,q} instruction expects the low part of the input
> to be the same register as the output, rather than the high part as the
> patch implemented.
> split_double_mode (<DWI>mode, &operands[1], 1, &operands[1], &operands[3]);
> sets operands[1] to the lo_half and operands[3] to the hi_half, so if
> operands[0] is not the same register as operands[1] (rather than [3]) after
> RA, we should during splitting move operands[1] into operands[0].
>
> Your testcase:
>
> > #define MASK60 ((1ul << 60) - 1)
> > unsigned long foo (__uint128_t n)
> > {
> > unsigned long a = n & MASK60;
> > unsigned long b = (n >> 60);
> > b = b & MASK60;
> > unsigned long c = (n >> 120);
> > return a+b+c;
> > }
>
> still has the same number of instructions.
>
> Bootstrapped/regtested on x86_64-linux (where it e.g. turns
> === acats Summary ===
> -# of unexpected failures 2328
> +# of expected passes 2328
> +# of unexpected failures 0
> and fixes gcc.dg/torture/fp-int-convert-*timode.c FAILs as well)
> and i686-linux (where it previously didn't bootstrap, but compared to
> Friday evening's bootstrap the testresults are ok), ok for trunk?
OK.
Thanks,
Richard.
> 2023-11-14 Jakub Jelinek <jakub@redhat.com>
>
> PR target/112523
> PR ada/112514
> * config/i386/i386.md (<insn><dwi>3_doubleword_lowpart): Move
> operands[1] aka low part of input rather than operands[3] aka high
> part of input to output if not the same register.
>
> --- gcc/config/i386/i386.md.jj 2023-11-14 08:10:18.932549803 +0100
> +++ gcc/config/i386/i386.md 2023-11-14 09:31:05.565019207 +0100
> @@ -14825,8 +14825,8 @@ (define_insn_and_split "<insn><dwi>3_dou
> {
> split_double_mode (<DWI>mode, &operands[1], 1, &operands[1], &operands[3]);
> operands[4] = GEN_INT ((<MODE_SIZE> * BITS_PER_UNIT) - INTVAL (operands[2]));
> - if (!rtx_equal_p (operands[0], operands[3]))
> - emit_move_insn (operands[0], operands[3]);
> + if (!rtx_equal_p (operands[0], operands[1]))
> + emit_move_insn (operands[0], operands[1]);
> })
>
> (define_insn "x86_64_shrd"
>
>
> Jakub
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-11-14 12:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-12 21:03 [x86 PATCH] Improve reg pressure of double-word right-shift then truncate Roger Sayle
2023-11-13 7:23 ` Uros Bizjak
2023-11-14 12:06 ` [PATCH] i386: Fix up <insn><dwi>3_doubleword_lowpart [PR112523] Jakub Jelinek
2023-11-14 12:08 ` Uros Bizjak
2023-11-14 12:48 ` Richard Biener
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).