public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]
@ 2022-07-07  8:30 HAO CHEN GUI
  2022-07-07 17:31 ` Segher Boessenkool
  0 siblings, 1 reply; 4+ messages in thread
From: HAO CHEN GUI @ 2022-07-07  8:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: Segher Boessenkool, David, Kewen.Lin, Peter Bergner

Hi,
  This patch modifies the combine pattern after recog fails. With a helper
- change_pseudo_and_mask, it converts a single pseudo to the pseudo AND with
a mask when the outer operator is IOR/XOR/PLUS and inner operator is ASHIFT
or AND. The conversion helps pattern to match rotate and mask insn on some
targets.

  For test case rlwimi-2.c, current trunk fails on
"scan-assembler-times (?n)^\\s+[a-z]". It reports 21305 times. So my patch
reduces the total number of insns from 21305 to 21279.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-07-07 Haochen Gui <guihaoc@linux.ibm.com>

gcc/
	PR target/93453
	* combine.cc (change_pseudo_and_mask): New.
	(recog_for_combine): If recog fails, try again with the pattern
	modified by change_pseudo_and_mask.
	* config/rs6000/rs6000.md (plus_ior_xor): Removed.
	(anonymous split pattern for plus_ior_xor): Removed.

gcc/testsuite/
	PR target/93453
	* gcc.target/powerpc/20050603-3.c: Modify dump check conditions.
	* gcc.target/powerpc/rlwimi-2.c: Likewise.
	* gcc.target/powerpc/pr93453-2.c: New.

patch.diff
diff --git a/gcc/combine.cc b/gcc/combine.cc
index a5fabf397f7..3cd7b2b652b 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -11599,6 +11599,47 @@ change_zero_ext (rtx pat)
   return changed;
 }

+/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is
+   ASHIFT/AND, convert a pseudo to psuedo AND with a mask if its nonzero_bits
+   is less than its mode mask.  The nonzero_bits in other pass doesn't return
+   the same value as it does in combine pass.  */
+static bool
+change_pseudo_and_mask (rtx pat)
+{
+  rtx src = SET_SRC (pat);
+  if ((GET_CODE (src) == IOR
+       || GET_CODE (src) == XOR
+       || GET_CODE (src) == PLUS)
+      && (((GET_CODE (XEXP (src, 0)) == ASHIFT
+	    || GET_CODE (XEXP (src, 0)) == AND)
+	   && REG_P (XEXP (src, 1)))))
+    {
+      rtx *reg = &XEXP (SET_SRC (pat), 1);
+      machine_mode mode = GET_MODE (*reg);
+      unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode);
+      if (nonzero < GET_MODE_MASK (mode))
+	{
+	  int shift;
+
+	  if (GET_CODE (XEXP (src, 0)) == ASHIFT)
+	    shift = INTVAL (XEXP (XEXP (src, 0), 1));
+	  else
+	    shift = ctz_hwi (INTVAL (XEXP (XEXP (src, 0), 1)));
+
+	  if (shift > 0
+	      && ((HOST_WIDE_INT_1U << shift) - 1) >= nonzero)
+	    {
+	      unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << shift) - 1;
+	      rtx x = gen_rtx_AND (mode, *reg, GEN_INT (mask));
+	      SUBST (*reg, x);
+	      maybe_swap_commutative_operands (SET_SRC (pat));
+	      return true;
+	    }
+	}
+     }
+  return false;
+}
+
 /* Like recog, but we receive the address of a pointer to a new pattern.
    We try to match the rtx that the pointer points to.
    If that fails, we may try to modify or replace the pattern,
@@ -11646,7 +11687,10 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes)
 	    }
 	}
       else
-	changed = change_zero_ext (pat);
+	{
+	  changed = change_pseudo_and_mask (pat);
+	  changed |= change_zero_ext (pat);
+	}
     }
   else if (GET_CODE (pat) == PARALLEL)
     {
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 1367a2cb779..2bd6bd5f908 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4207,24 +4207,6 @@ (define_insn_and_split "*rotl<mode>3_insert_3_<code>"
 	(ior:GPR (and:GPR (match_dup 3) (match_dup 4))
 		 (ashift:GPR (match_dup 1) (match_dup 2))))])

-(define_code_iterator plus_ior_xor [plus ior xor])
-
-(define_split
-  [(set (match_operand:GPR 0 "gpc_reg_operand")
-	(plus_ior_xor:GPR (ashift:GPR (match_operand:GPR 1 "gpc_reg_operand")
-				      (match_operand:SI 2 "const_int_operand"))
-			  (match_operand:GPR 3 "gpc_reg_operand")))]
-  "nonzero_bits (operands[3], <MODE>mode)
-   < HOST_WIDE_INT_1U << INTVAL (operands[2])"
-  [(set (match_dup 0)
-	(ior:GPR (and:GPR (match_dup 3)
-			  (match_dup 4))
-		 (ashift:GPR (match_dup 1)
-			     (match_dup 2))))]
-{
-  operands[4] = GEN_INT ((HOST_WIDE_INT_1U << INTVAL (operands[2])) - 1);
-})
-
 (define_insn "*rotlsi3_insert_4"
   [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
 	(ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0")
diff --git a/gcc/testsuite/gcc.target/powerpc/20050603-3.c b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
index 4017d34f429..e628be11532 100644
--- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
@@ -12,7 +12,7 @@ void rotins (unsigned int x)
   b.y = (x<<12) | (x>>20);
 }

-/* { dg-final { scan-assembler-not {\mrlwinm} } } */
+/* { dg-final { scan-assembler-not {\mrlwinm} { target ilp32 } } } */
 /* { dg-final { scan-assembler-not {\mrldic} } } */
 /* { dg-final { scan-assembler-not {\mrot[lr]} } } */
 /* { dg-final { scan-assembler-not {\ms[lr][wd]} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr93453-2.c b/gcc/testsuite/gcc.target/powerpc/pr93453-2.c
new file mode 100644
index 00000000000..34b7834af8d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr93453-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+long foo (char a, long b)
+{
+  long c = a;
+  c = c | (b << 12);
+  return c;
+}
+
+long bar (long b, char a)
+{
+  long c = a;
+  long m = -4096;
+  c = c | (b & m);
+  return c;
+}
+
+/* { dg-final { scan-assembler-times {\mrlwimi\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mrldimi\M} 2 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
index bafa371db73..ffb5f9e450f 100644
--- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
@@ -2,14 +2,14 @@
 /* { dg-options "-O2" } */

 /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+blr} 6750 } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+mr} 643 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+mr} 11 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {(?n)^\s+rldicl} 7790 { target lp64 } } } */

 /* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target lp64 } } } */

 /* { dg-final { scan-assembler-times {(?n)^\s+mulli} 5036 } } */





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]
  2022-07-07  8:30 [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453] HAO CHEN GUI
@ 2022-07-07 17:31 ` Segher Boessenkool
  2022-07-11  2:13   ` HAO CHEN GUI
  0 siblings, 1 reply; 4+ messages in thread
From: Segher Boessenkool @ 2022-07-07 17:31 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: gcc-patches, David, Kewen.Lin, Peter Bergner

Hi!

On Thu, Jul 07, 2022 at 04:30:50PM +0800, HAO CHEN GUI wrote:
>   This patch modifies the combine pattern after recog fails. With a helper

It modifies combine itself, not just a pattern in the machine
description.

> - change_pseudo_and_mask, it converts a single pseudo to the pseudo AND with
> a mask when the outer operator is IOR/XOR/PLUS and inner operator is ASHIFT
> or AND. The conversion helps pattern to match rotate and mask insn on some
> targets.

>   For test case rlwimi-2.c, current trunk fails on
> "scan-assembler-times (?n)^\\s+[a-z]". It reports 21305 times. So my patch
> reduces the total number of insns from 21305 to 21279.

That is incorrect.  You need to figure out what actually changed, and if
that is wanted or not, and then write some explanation about that.

> 	* config/rs6000/rs6000.md (plus_ior_xor): Removed.
> 	(anonymous split pattern for plus_ior_xor): Removed.

"Remove.", in both cases.  Always use imperative in changelogs and
commit messages and the like, not some passive tense.

> +/* When the outer code of set_src is IOR/XOR/PLUS and the inner code is
> +   ASHIFT/AND, convert a pseudo to psuedo AND with a mask if its nonzero_bits

s/psuedo/pseudo/

> +   is less than its mode mask.  The nonzero_bits in other pass doesn't return
> +   the same value as it does in combine pass.  */

That isn't quite the problem.

Later passes can return a mask for nonzero_bits (which means: bits that
are not known to be zero) that is not a superset of what was known
during combine.  If you use nonzero_bits in the insn condition of a
define_insn (or define_insn_and_split, same thing under the covers) you
then can end up with an insns that is fine during combine, but no longer
recog()ed later.

> +static bool
> +change_pseudo_and_mask (rtx pat)
> +{
> +  rtx src = SET_SRC (pat);
> +  if ((GET_CODE (src) == IOR
> +       || GET_CODE (src) == XOR
> +       || GET_CODE (src) == PLUS)
> +      && (((GET_CODE (XEXP (src, 0)) == ASHIFT
> +	    || GET_CODE (XEXP (src, 0)) == AND)
> +	   && REG_P (XEXP (src, 1)))))
> +    {
> +      rtx *reg = &XEXP (SET_SRC (pat), 1);

Why the extra indirection?  SUBST is a macro, it can take lvalues just
fine :-)

> +      machine_mode mode = GET_MODE (*reg);
> +      unsigned HOST_WIDE_INT nonzero = nonzero_bits (*reg, mode);
> +      if (nonzero < GET_MODE_MASK (mode))
> +	{
> +	  int shift;
> +
> +	  if (GET_CODE (XEXP (src, 0)) == ASHIFT)
> +	    shift = INTVAL (XEXP (XEXP (src, 0), 1));
> +	  else
> +	    shift = ctz_hwi (INTVAL (XEXP (XEXP (src, 0), 1)));
> +
> +	  if (shift > 0
> +	      && ((HOST_WIDE_INT_1U << shift) - 1) >= nonzero)

Too many parens.

> +	    {
> +	      unsigned HOST_WIDE_INT mask = (HOST_WIDE_INT_1U << shift) - 1;
> +	      rtx x = gen_rtx_AND (mode, *reg, GEN_INT (mask));
> +	      SUBST (*reg, x);
> +	      maybe_swap_commutative_operands (SET_SRC (pat));
> +	      return true;
> +	    }
> +	}
> +     }
> +  return false;

Broken indentation.

> --- a/gcc/testsuite/gcc.target/powerpc/20050603-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/20050603-3.c
> @@ -12,7 +12,7 @@ void rotins (unsigned int x)
>    b.y = (x<<12) | (x>>20);
>  }
> 
> -/* { dg-final { scan-assembler-not {\mrlwinm} } } */
> +/* { dg-final { scan-assembler-not {\mrlwinm} { target ilp32 } } } */

Why?

> +/* { dg-final { scan-assembler-times {\mrlwimi\M} 2 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times {\mrldimi\M} 2 { target lp64 } } } */

Can this just be
  /* { dg-final { scan-assembler-times {\mrl[wd]imi\M} 2 } } */
or is it necessary to not want rlwimi on 64-bit?

> --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
> @@ -2,14 +2,14 @@
>  /* { dg-options "-O2" } */
> 
>  /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } } */

You are saying there should be 21279 instructions generated by this test
case.  What makes you say that?  Yes, we regressed some time ago, we
generate too many insns in many cases, but that is *bad*.

>  /* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target ilp32 } } } */
> -/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1666 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times {(?n)^\s+rlwimi} 1692 { target lp64 } } } */

This needs an explanation (and then the 32-bit and 64-bit checks can be
merged).


This probably needs changes after 4306339798b6 (if it is still wanted?)


Segher

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]
  2022-07-07 17:31 ` Segher Boessenkool
@ 2022-07-11  2:13   ` HAO CHEN GUI
  2022-07-11 18:09     ` Segher Boessenkool
  0 siblings, 1 reply; 4+ messages in thread
From: HAO CHEN GUI @ 2022-07-11  2:13 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, David, Kewen.Lin, Peter Bergner

Hi, Segher

On 8/7/2022 上午 1:31, Segher Boessenkool wrote:
>> --- a/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/rlwimi-2.c
>> @@ -2,14 +2,14 @@
>>  /* { dg-options "-O2" } */
>>
>>  /* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 14121 { target ilp32 } } } */
>> -/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 20217 { target lp64 } } } */
>> +/* { dg-final { scan-assembler-times {(?n)^\s+[a-z]} 21279 { target lp64 } } } */
> You are saying there should be 21279 instructions generated by this test
> case.  What makes you say that?  Yes, we regressed some time ago, we
> generate too many insns in many cases, but that is *bad*.
> 

The trunk generates 21305. My patch generates 26 "rlwimi" instead of "rlwimn+ior". So
it saves 26 insns and reduce the total number of insns from 21305 to 21279 and
increase the number of "rlwimi" from 1666 to 1692.

I did a biset for the problem. After commit "commit 8d2d39587: combine: Do not combine
moves from hard registers", the case fails. The root cause is it can't combine from the
hard registers and has to use subreg which causes its high part to be undefined. Thus,
there is an additional "AND" generated.

Before the commit
Trying 2 -> 7:
    2: r125:DI=%3:DI
      REG_DEAD %3:DI
    7: r128:SI=r125:DI#0 0>>0x1f
      REG_DEAD r125:DI
Successfully matched this instruction:
(set (reg:SI 128 [ x ])
    (lshiftrt:SI (reg:SI 3 3 [ x ])
        (const_int 31 [0x1f])))
allowing combination of insns 2 and 7

After the commit
Trying 20 -> 7:
   20: r125:DI=r132:DI
      REG_DEAD r132:DI
    7: r128:SI=r125:DI#0 0>>0x1f
      REG_DEAD r125:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 128 [ x ]) 0)
    (zero_extract:DI (reg:DI 132)
        (const_int 32 [0x20])
        (const_int 1 [0x1])))
Successfully matched this instruction:
(set (subreg:DI (reg:SI 128 [ x ]) 0)
    (and:DI (lshiftrt:DI (reg:DI 132)
            (const_int 31 [0x1f]))
        (const_int 4294967295 [0xffffffff])))
allowing combination of insns 20 and 7

The problem should be fixed in another case? Please advice.
Thanks
Gui Haochen

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453]
  2022-07-11  2:13   ` HAO CHEN GUI
@ 2022-07-11 18:09     ` Segher Boessenkool
  0 siblings, 0 replies; 4+ messages in thread
From: Segher Boessenkool @ 2022-07-11 18:09 UTC (permalink / raw)
  To: HAO CHEN GUI; +Cc: gcc-patches, David, Kewen.Lin, Peter Bergner

Hi!

On Mon, Jul 11, 2022 at 10:13:41AM +0800, HAO CHEN GUI wrote:
> I did a biset for the problem. After commit "commit 8d2d39587: combine: Do not combine
> moves from hard registers", the case fails. The root cause is it can't combine from the
> hard registers and has to use subreg which causes its high part to be undefined. Thus,
> there is an additional "AND" generated.
> 
> Before the commit
> Trying 2 -> 7:
>     2: r125:DI=%3:DI
>       REG_DEAD %3:DI
>     7: r128:SI=r125:DI#0 0>>0x1f
>       REG_DEAD r125:DI
> Successfully matched this instruction:
> (set (reg:SI 128 [ x ])
>     (lshiftrt:SI (reg:SI 3 3 [ x ])
>         (const_int 31 [0x1f])))
> allowing combination of insns 2 and 7
> 
> After the commit
> Trying 20 -> 7:
>    20: r125:DI=r132:DI
>       REG_DEAD r132:DI
>     7: r128:SI=r125:DI#0 0>>0x1f
>       REG_DEAD r125:DI
> Failed to match this instruction:
> (set (subreg:DI (reg:SI 128 [ x ]) 0)
>     (zero_extract:DI (reg:DI 132)
>         (const_int 32 [0x20])
>         (const_int 1 [0x1])))
> Successfully matched this instruction:
> (set (subreg:DI (reg:SI 128 [ x ]) 0)
>     (and:DI (lshiftrt:DI (reg:DI 132)
>             (const_int 31 [0x1f]))
>         (const_int 4294967295 [0xffffffff])))
> allowing combination of insns 20 and 7
> 
> The problem should be fixed in another case? Please advice.

You should not change the expected counts to what is currently
generated.  What is currently generated is sub-optimal.  It all starts
with those zero_extracts, which are always bad for us -- it is a harder
to manipulate representation of a limited subset of more basic
operations we *do* have.  And combine and simplify can handle the more
general and simpler formulation just fine.

Ideally combine would not try to use *_extract at all if this is not
used in the machine description (compare to rotatert for example, a
similarly redundant operation).  But it currently needs it as
intermediate form, untangling this all is quite a bit of work.

These testcases (all the rl* ones) should have a big fat comment
explaining what the expected, wanted code is.

This was easier to do originally, when I actually tested all 65536
possibly combinations, because the expected counts were more "regular"
numbers.  But this is too slow to test in normal testsuite runs :-)

It is wrong to pretend the current state makes the wanted code, these
testcases are meant to show exactly when we make suboptimal machine
code :-)


Segher

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-11 18:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07  8:30 [PATCH v2] Modify combine pattern by a pseudo AND with its nonzero bits [PR93453] HAO CHEN GUI
2022-07-07 17:31 ` Segher Boessenkool
2022-07-11  2:13   ` HAO CHEN GUI
2022-07-11 18:09     ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).