* [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.
@ 2021-10-13 8:23 Roger Sayle
2021-10-13 9:07 ` Uros Bizjak
0 siblings, 1 reply; 3+ messages in thread
From: Roger Sayle @ 2021-10-13 8:23 UTC (permalink / raw)
To: 'GCC Patches'
[-- Attachment #1: Type: text/plain, Size: 2190 bytes --]
Good catch. I agree with Hongtao that although my testing revealed
no problems with the previous version of this patch, it makes sense to
call gen_reg_rtx to generate an pseudo intermediate instead of attempting
to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an
intermediate. I've left the existing behaviour the same, so that
memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.
This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.
Ok for mainline?
2021-10-13 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.c (ix86_expand_vector_move): Use a
pseudo intermediate when moving a SUBREG into a hard register,
by checking ix86_hardreg_mov_ok.
(ix86_expand_vector_extract): Store zero-extended SImode
intermediate in a pseudo, then set target using a SUBREG_PROMOTED
annotated subreg.
* config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating
complex (SUBREG) sets of (vector) hard registers before reload, by
checking ix86_hardreg_mov_ok.
Thanks,
Roger
-----Original Message-----
From: Hongtao Liu <crazylht@gmail.com>
Sent: 11 October 2021 12:29
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend.
On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
> gcc/ChangeLog
> * config/i386/i386-expand.c (ix86_expand_vector_move): Use a
> pseudo intermediate when moving a SUBREG into a hard register,
> by checking ix86_hardreg_mov_ok.
/* Make operand1 a register if it isn't already. */
if (can_create_pseudo_p ()
- && !register_operand (op0, mode)
- && !register_operand (op1, mode))
+ && (!ix86_hardreg_mov_ok (op0, op1) || (!register_operand (op0,
+ mode)
+ && !register_operand (op1, mode))))
{
rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register.
--
BR,
Hongtao
[-- Attachment #2: patchv2b.txt --]
[-- Type: text/plain, Size: 1836 bytes --]
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 3e6f7d8e..4a8fa2f 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -615,6 +615,16 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
return;
}
+ /* If operand0 is a hard register, make operand1 a pseudo. */
+ if (can_create_pseudo_p ()
+ && !ix86_hardreg_mov_ok (op0, op1))
+ {
+ rtx tmp = gen_reg_rtx (GET_MODE (op0));
+ emit_move_insn (tmp, op1);
+ emit_move_insn (op0, tmp);
+ return;
+ }
+
/* Make operand1 a register if it isn't already. */
if (can_create_pseudo_p ()
&& !register_operand (op0, mode)
@@ -16005,11 +16015,15 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
/* Let the rtl optimizers know about the zero extension performed. */
if (inner_mode == QImode || inner_mode == HImode)
{
+ rtx reg = gen_reg_rtx (SImode);
tmp = gen_rtx_ZERO_EXTEND (SImode, tmp);
- target = gen_lowpart (SImode, target);
+ emit_move_insn (reg, tmp);
+ tmp = gen_lowpart (inner_mode, reg);
+ SUBREG_PROMOTED_VAR_P (tmp) = 1;
+ SUBREG_PROMOTED_SET (tmp, 1);
}
- emit_insn (gen_rtx_SET (target, tmp));
+ emit_move_insn (target, tmp);
}
else
{
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4559b0c..e43f597 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1270,7 +1270,8 @@
" C,<sseconstm1>,vm,v"))]
"TARGET_SSE
&& (register_operand (operands[0], <MODE>mode)
- || register_operand (operands[1], <MODE>mode))"
+ || register_operand (operands[1], <MODE>mode))
+ && ix86_hardreg_mov_ok (operands[0], operands[1])"
{
switch (get_attr_type (insn))
{
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.
2021-10-13 8:23 [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend Roger Sayle
@ 2021-10-13 9:07 ` Uros Bizjak
2021-10-13 23:02 ` H.J. Lu
0 siblings, 1 reply; 3+ messages in thread
From: Uros Bizjak @ 2021-10-13 9:07 UTC (permalink / raw)
To: Roger Sayle; +Cc: GCC Patches, Hongtao Liu
On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Good catch. I agree with Hongtao that although my testing revealed
> no problems with the previous version of this patch, it makes sense to
> call gen_reg_rtx to generate an pseudo intermediate instead of attempting
> to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an
> intermediate. I've left the existing behaviour the same, so that
> memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2021-10-13 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> * config/i386/i386-expand.c (ix86_expand_vector_move): Use a
> pseudo intermediate when moving a SUBREG into a hard register,
> by checking ix86_hardreg_mov_ok.
> (ix86_expand_vector_extract): Store zero-extended SImode
> intermediate in a pseudo, then set target using a SUBREG_PROMOTED
> annotated subreg.
> * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating
> complex (SUBREG) sets of (vector) hard registers before reload, by
> checking ix86_hardreg_mov_ok.
OK.
Thanks,
Uros.
>
> Thanks,
> Roger
>
> -----Original Message-----
> From: Hongtao Liu <crazylht@gmail.com>
> Sent: 11 October 2021 12:29
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend.
>
> On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
> > gcc/ChangeLog
> > * config/i386/i386-expand.c (ix86_expand_vector_move): Use a
> > pseudo intermediate when moving a SUBREG into a hard register,
> > by checking ix86_hardreg_mov_ok.
>
> /* Make operand1 a register if it isn't already. */
> if (can_create_pseudo_p ()
> - && !register_operand (op0, mode)
> - && !register_operand (op1, mode))
> + && (!ix86_hardreg_mov_ok (op0, op1) || (!register_operand (op0,
> + mode)
> + && !register_operand (op1, mode))))
> {
> rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
>
> ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register.
>
> --
> BR,
> Hongtao
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.
2021-10-13 9:07 ` Uros Bizjak
@ 2021-10-13 23:02 ` H.J. Lu
0 siblings, 0 replies; 3+ messages in thread
From: H.J. Lu @ 2021-10-13 23:02 UTC (permalink / raw)
To: Uros Bizjak; +Cc: Roger Sayle, GCC Patches
On Wed, Oct 13, 2021 at 2:08 AM Uros Bizjak via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
> >
> >
> > Good catch. I agree with Hongtao that although my testing revealed
> > no problems with the previous version of this patch, it makes sense to
> > call gen_reg_rtx to generate an pseudo intermediate instead of attempting
> > to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an
> > intermediate. I've left the existing behaviour the same, so that
> > memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> > and "make -k check" with no new failures.
> >
> > Ok for mainline?
> >
> >
> > 2021-10-13 Roger Sayle <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > * config/i386/i386-expand.c (ix86_expand_vector_move): Use a
> > pseudo intermediate when moving a SUBREG into a hard register,
> > by checking ix86_hardreg_mov_ok.
> > (ix86_expand_vector_extract): Store zero-extended SImode
> > intermediate in a pseudo, then set target using a SUBREG_PROMOTED
> > annotated subreg.
> > * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating
> > complex (SUBREG) sets of (vector) hard registers before reload, by
> > checking ix86_hardreg_mov_ok.
>
> OK.
>
> Thanks,
> Uros.
>
> >
> > Thanks,
> > Roger
> >
> > -----Original Message-----
> > From: Hongtao Liu <crazylht@gmail.com>
> > Sent: 11 October 2021 12:29
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend.
> >
> > On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
> > > gcc/ChangeLog
> > > * config/i386/i386-expand.c (ix86_expand_vector_move): Use a
> > > pseudo intermediate when moving a SUBREG into a hard register,
> > > by checking ix86_hardreg_mov_ok.
> >
> > /* Make operand1 a register if it isn't already. */
> > if (can_create_pseudo_p ()
> > - && !register_operand (op0, mode)
> > - && !register_operand (op1, mode))
> > + && (!ix86_hardreg_mov_ok (op0, op1) || (!register_operand (op0,
> > + mode)
> > + && !register_operand (op1, mode))))
> > {
> > rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
> >
> > ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register.
> >
> > --
> > BR,
> > Hongtao
> >
This caused:
https://gcc.gnu.org/pipermail/gcc-regression/2021-October/075498.html
FAIL: gcc.target/i386/avx-1.c (internal compiler error)
FAIL: gcc.target/i386/avx-1.c (test for excess errors)
FAIL: gcc.target/i386/avx-2.c (internal compiler error)
FAIL: gcc.target/i386/avx-2.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (test for excess errors)
FAIL: gcc.target/i386/sse-13.c (internal compiler error)
FAIL: gcc.target/i386/sse-13.c (test for excess errors)
FAIL: gcc.target/i386/sse-14.c (internal compiler error)
FAIL: gcc.target/i386/sse-14.c (test for excess errors)
FAIL: gcc.target/i386/sse-22a.c (internal compiler error)
FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
FAIL: gcc.target/i386/sse-22.c (internal compiler error)
FAIL: gcc.target/i386/sse-22.c (test for excess errors)
FAIL: gcc.target/i386/sse-23.c (internal compiler error)
FAIL: gcc.target/i386/sse-23.c (test for excess errors)
FAIL: gcc.target/i386/sse-24.c (internal compiler error)
FAIL: gcc.target/i386/sse-24.c (test for excess errors)
FAIL: gcc.target/i386/sse-25.c (internal compiler error)
FAIL: gcc.target/i386/sse-25.c (test for excess errors)
FAIL: gcc.target/i386/sse-26.c (internal compiler error)
FAIL: gcc.target/i386/sse-26.c (test for excess errors)
You can reproduce them by adding -march=cascadelake to these tests.
--
H.J.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-10-13 23:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-13 8:23 [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend Roger Sayle
2021-10-13 9:07 ` Uros Bizjak
2021-10-13 23:02 ` H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).