public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.
@ 2021-10-13  8:23 Roger Sayle
  2021-10-13  9:07 ` Uros Bizjak
  0 siblings, 1 reply; 3+ messages in thread
From: Roger Sayle @ 2021-10-13  8:23 UTC (permalink / raw)
  To: 'GCC Patches'

[-- Attachment #1: Type: text/plain, Size: 2190 bytes --]


Good catch.  I agree with Hongtao that although my testing revealed
no problems with the previous version of this patch, it makes sense to
call gen_reg_rtx to generate an pseudo intermediate instead of attempting
to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an
intermediate.  I've left the existing behaviour the same, so that
memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures.

Ok for mainline?


2021-10-13  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
	pseudo intermediate when moving a SUBREG into a hard register,
	by checking ix86_hardreg_mov_ok.
	(ix86_expand_vector_extract): Store zero-extended SImode
	intermediate in a pseudo, then set target using a SUBREG_PROMOTED
	annotated subreg.
	* config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating
	complex (SUBREG) sets of (vector) hard registers before reload, by
	checking ix86_hardreg_mov_ok.

Thanks,
Roger

-----Original Message-----
From: Hongtao Liu <crazylht@gmail.com> 
Sent: 11 October 2021 12:29
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend.

On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
> gcc/ChangeLog
>         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
>         pseudo intermediate when moving a SUBREG into a hard register,
>         by checking ix86_hardreg_mov_ok.

   /* Make operand1 a register if it isn't already.  */
   if (can_create_pseudo_p ()
-      && !register_operand (op0, mode)
-      && !register_operand (op1, mode))
+      && (!ix86_hardreg_mov_ok (op0, op1)  || (!register_operand (op0, 
+ mode)
+      && !register_operand (op1, mode))))
     {
       rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));

ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register.

--
BR,
Hongtao


[-- Attachment #2: patchv2b.txt --]
[-- Type: text/plain, Size: 1836 bytes --]

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 3e6f7d8e..4a8fa2f 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -615,6 +615,16 @@ ix86_expand_vector_move (machine_mode mode, rtx operands[])
       return;
     }
 
+  /* If operand0 is a hard register, make operand1 a pseudo.  */
+  if (can_create_pseudo_p ()
+      && !ix86_hardreg_mov_ok (op0, op1))
+    {
+      rtx tmp = gen_reg_rtx (GET_MODE (op0));
+      emit_move_insn (tmp, op1);
+      emit_move_insn (op0, tmp);
+      return;
+    }
+
   /* Make operand1 a register if it isn't already.  */
   if (can_create_pseudo_p ()
       && !register_operand (op0, mode)
@@ -16005,11 +16015,15 @@ ix86_expand_vector_extract (bool mmx_ok, rtx target, rtx vec, int elt)
       /* Let the rtl optimizers know about the zero extension performed.  */
       if (inner_mode == QImode || inner_mode == HImode)
 	{
+	  rtx reg = gen_reg_rtx (SImode);
 	  tmp = gen_rtx_ZERO_EXTEND (SImode, tmp);
-	  target = gen_lowpart (SImode, target);
+	  emit_move_insn (reg, tmp);
+	  tmp = gen_lowpart (inner_mode, reg);
+	  SUBREG_PROMOTED_VAR_P (tmp) = 1;
+	  SUBREG_PROMOTED_SET (tmp, 1);
 	}
 
-      emit_insn (gen_rtx_SET (target, tmp));
+      emit_move_insn (target, tmp);
     }
   else
     {
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4559b0c..e43f597 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1270,7 +1270,8 @@
 	 " C,<sseconstm1>,vm,v"))]
   "TARGET_SSE
    && (register_operand (operands[0], <MODE>mode)
-       || register_operand (operands[1], <MODE>mode))"
+       || register_operand (operands[1], <MODE>mode))
+   && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
   switch (get_attr_type (insn))
     {

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.
  2021-10-13  8:23 [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend Roger Sayle
@ 2021-10-13  9:07 ` Uros Bizjak
  2021-10-13 23:02   ` H.J. Lu
  0 siblings, 1 reply; 3+ messages in thread
From: Uros Bizjak @ 2021-10-13  9:07 UTC (permalink / raw)
  To: Roger Sayle; +Cc: GCC Patches, Hongtao Liu

On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Good catch.  I agree with Hongtao that although my testing revealed
> no problems with the previous version of this patch, it makes sense to
> call gen_reg_rtx to generate an pseudo intermediate instead of attempting
> to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an
> intermediate.  I've left the existing behaviour the same, so that
> memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.
>
> Ok for mainline?
>
>
> 2021-10-13  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
>         pseudo intermediate when moving a SUBREG into a hard register,
>         by checking ix86_hardreg_mov_ok.
>         (ix86_expand_vector_extract): Store zero-extended SImode
>         intermediate in a pseudo, then set target using a SUBREG_PROMOTED
>         annotated subreg.
>         * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating
>         complex (SUBREG) sets of (vector) hard registers before reload, by
>         checking ix86_hardreg_mov_ok.

OK.

Thanks,
Uros.

>
> Thanks,
> Roger
>
> -----Original Message-----
> From: Hongtao Liu <crazylht@gmail.com>
> Sent: 11 October 2021 12:29
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend.
>
> On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
> > gcc/ChangeLog
> >         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
> >         pseudo intermediate when moving a SUBREG into a hard register,
> >         by checking ix86_hardreg_mov_ok.
>
>    /* Make operand1 a register if it isn't already.  */
>    if (can_create_pseudo_p ()
> -      && !register_operand (op0, mode)
> -      && !register_operand (op1, mode))
> +      && (!ix86_hardreg_mov_ok (op0, op1)  || (!register_operand (op0,
> + mode)
> +      && !register_operand (op1, mode))))
>      {
>        rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
>
> ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register.
>
> --
> BR,
> Hongtao
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend.
  2021-10-13  9:07 ` Uros Bizjak
@ 2021-10-13 23:02   ` H.J. Lu
  0 siblings, 0 replies; 3+ messages in thread
From: H.J. Lu @ 2021-10-13 23:02 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: Roger Sayle, GCC Patches

On Wed, Oct 13, 2021 at 2:08 AM Uros Bizjak via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Wed, Oct 13, 2021 at 10:23 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
> >
> >
> > Good catch.  I agree with Hongtao that although my testing revealed
> > no problems with the previous version of this patch, it makes sense to
> > call gen_reg_rtx to generate an pseudo intermediate instead of attempting
> > to reuse the existing logic that uses ix86_gen_scratch_sse_rtx as an
> > intermediate.  I've left the existing behaviour the same, so that
> > memory-to-memory moves (continue to) use ix86_gen_scatch_sse_rtx.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> > and "make -k check" with no new failures.
> >
> > Ok for mainline?
> >
> >
> > 2021-10-13  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
> >         pseudo intermediate when moving a SUBREG into a hard register,
> >         by checking ix86_hardreg_mov_ok.
> >         (ix86_expand_vector_extract): Store zero-extended SImode
> >         intermediate in a pseudo, then set target using a SUBREG_PROMOTED
> >         annotated subreg.
> >         * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating
> >         complex (SUBREG) sets of (vector) hard registers before reload, by
> >         checking ix86_hardreg_mov_ok.
>
> OK.
>
> Thanks,
> Uros.
>
> >
> > Thanks,
> > Roger
> >
> > -----Original Message-----
> > From: Hongtao Liu <crazylht@gmail.com>
> > Sent: 11 October 2021 12:29
> > To: Roger Sayle <roger@nextmovesoftware.com>
> > Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> > Subject: Re: [PATCH] x86_64: Some SUBREG related optimization tweaks to i386 backend.
> >
> > On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
> > > gcc/ChangeLog
> > >         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
> > >         pseudo intermediate when moving a SUBREG into a hard register,
> > >         by checking ix86_hardreg_mov_ok.
> >
> >    /* Make operand1 a register if it isn't already.  */
> >    if (can_create_pseudo_p ()
> > -      && !register_operand (op0, mode)
> > -      && !register_operand (op1, mode))
> > +      && (!ix86_hardreg_mov_ok (op0, op1)  || (!register_operand (op0,
> > + mode)
> > +      && !register_operand (op1, mode))))
> >      {
> >        rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
> >
> > ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register.
> >
> > --
> > BR,
> > Hongtao
> >

This caused:

https://gcc.gnu.org/pipermail/gcc-regression/2021-October/075498.html

FAIL: gcc.target/i386/avx-1.c (internal compiler error)
FAIL: gcc.target/i386/avx-1.c (test for excess errors)
FAIL: gcc.target/i386/avx-2.c (internal compiler error)
FAIL: gcc.target/i386/avx-2.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesdecwide128kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesdecwide256kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesencwide128kl.c (test for excess errors)
FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (internal compiler error)
FAIL: gcc.target/i386/keylocker-aesencwide256kl.c (test for excess errors)
FAIL: gcc.target/i386/sse-13.c (internal compiler error)
FAIL: gcc.target/i386/sse-13.c (test for excess errors)
FAIL: gcc.target/i386/sse-14.c (internal compiler error)
FAIL: gcc.target/i386/sse-14.c (test for excess errors)
FAIL: gcc.target/i386/sse-22a.c (internal compiler error)
FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
FAIL: gcc.target/i386/sse-22.c (internal compiler error)
FAIL: gcc.target/i386/sse-22.c (test for excess errors)
FAIL: gcc.target/i386/sse-23.c (internal compiler error)
FAIL: gcc.target/i386/sse-23.c (test for excess errors)
FAIL: gcc.target/i386/sse-24.c (internal compiler error)
FAIL: gcc.target/i386/sse-24.c (test for excess errors)
FAIL: gcc.target/i386/sse-25.c (internal compiler error)
FAIL: gcc.target/i386/sse-25.c (test for excess errors)
FAIL: gcc.target/i386/sse-26.c (internal compiler error)
FAIL: gcc.target/i386/sse-26.c (test for excess errors)

You can reproduce them by adding -march=cascadelake to these tests.
-- 
H.J.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-10-13 23:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-13  8:23 [PATCH v2] x86_64: Some SUBREG related optimization tweaks to i386 backend Roger Sayle
2021-10-13  9:07 ` Uros Bizjak
2021-10-13 23:02   ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).