[PATCH] Fold (add -1; zero_ext; add +1) operations to zero

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)
@ 2020-04-15  8:47 luoxhu
  2020-04-15  9:18 ` Richard Sandiford
  2020-04-15  9:37 ` [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837) Jakub Jelinek
  0 siblings, 2 replies; 12+ messages in thread
From: luoxhu @ 2020-04-15  8:47 UTC (permalink / raw)
  To: gcc-patches
  Cc: segher, wschmidt, guojiufu, linkw, joseph, pinskia, Xionghu Luo

From: Xionghu Luo <luoxhu@linux.ibm.com>

This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).

Andraw:
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
Joseph:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html

However, the doloop code improved a lot since so many years passed,
gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant
desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop
and no transform done in doloop again.  Thus we can do the simplification
from "subtract/extend/add" to only extend as the condition in doloop will
never be false based on loop ch's optimization.
What's more, this patch is slightly different with Andrw's implementation,
the check of ZERO_EXT and SImode will guard the count won't be changed
from char/short caused cases not time out on slow platforms before.
Any comments?  Thanks.

doloop-1.c.257r.loop2_doloop
...
12: [r129:DI]=r123:SI
  REG_DEAD r129:DI
  REG_DEAD r123:SI
13: r125:SI=r120:DI#0-0x1
  REG_DEAD r120:DI
14: r120:DI=zero_extend(r125:SI#0)
  REG_DEAD r125:SI
16: r126:CC=cmp(r120:DI,0)
17: pc={(r126:CC!=0)?L43:pc}
  REG_DEAD r126:CC
...

Bootstrap and regression tested pass on Power8-LE.

gcc/ChangeLog

	2020-04-15  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, PR target/61837
	loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
	to zero_ext.
---
 gcc/loop-doloop.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index db6a014e43d..9f967fa3a0b 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -477,7 +477,31 @@ doloop_modify (class loop *loop, class niter_desc *desc,
     }
 
   if (increment_count)
-    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    {
+      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
+	 is never zero, as gimple pass loop ch will do optimization to simplify
+	 the loop to NO loop for loop condition is false.  */
+      bool simplify_zext = false;
+      rtx extop0 = XEXP (count, 0);
+      if (mode == E_DImode
+	  && GET_CODE (count) == ZERO_EXTEND
+	  && GET_CODE (extop0) == PLUS)
+	{
+	  rtx addop0 = XEXP (extop0, 0);
+	  rtx addop1 = XEXP (extop0, 1);
+	  if (CONST_SCALAR_INT_P (addop1)
+	      && GET_MODE (addop0) == E_SImode
+	      && addop1 == GEN_INT (-1))
+	    {
+	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
+					  GET_MODE (addop0));
+	      simplify_zext = true;
+	    }
+	}
+
+      if (!simplify_zext)
+	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    }
 
   /* Insert initialization of the count register into the loop header.  */
   start_sequence ();
-- 
2.21.0.777.g83232e3864


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)
  2020-04-15  8:47 [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837) luoxhu
@ 2020-04-15  9:18 ` Richard Sandiford
  2020-04-17  1:21   ` Segher Boessenkool
  2020-04-15  9:37 ` [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837) Jakub Jelinek
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Sandiford @ 2020-04-15  9:18 UTC (permalink / raw)
  To: luoxhu--- via Gcc-patches; +Cc: luoxhu, segher, wschmidt, linkw, joseph

luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> From: Xionghu Luo <luoxhu@linux.ibm.com>
>
> This "subtract/extend/add" existed for a long time and still annoying us
> (PR37451, PR61837) when converting from 32bits to 64bits, as the ctr
> register is used as 64bits on powerpc64, Andraw Pinski had a patch but
> caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).
>
> Andraw:
> http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
> http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
> Joseph:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html
>
> However, the doloop code improved a lot since so many years passed,
> gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant
> desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop
> and no transform done in doloop again.  Thus we can do the simplification
> from "subtract/extend/add" to only extend as the condition in doloop will
> never be false based on loop ch's optimization.
> What's more, this patch is slightly different with Andrw's implementation,
> the check of ZERO_EXT and SImode will guard the count won't be changed
> from char/short caused cases not time out on slow platforms before.
> Any comments?  Thanks.
>
> doloop-1.c.257r.loop2_doloop
> ...
> 12: [r129:DI]=r123:SI
>   REG_DEAD r129:DI
>   REG_DEAD r123:SI
> 13: r125:SI=r120:DI#0-0x1
>   REG_DEAD r120:DI
> 14: r120:DI=zero_extend(r125:SI#0)
>   REG_DEAD r125:SI
> 16: r126:CC=cmp(r120:DI,0)
> 17: pc={(r126:CC!=0)?L43:pc}
>   REG_DEAD r126:CC
> ...
>
> Bootstrap and regression tested pass on Power8-LE.
>
> gcc/ChangeLog
>
> 	2020-04-15  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>
> 	PR rtl-optimization/37451, PR target/61837
> 	loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
> 	to zero_ext.
> ---
>  gcc/loop-doloop.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
> index db6a014e43d..9f967fa3a0b 100644
> --- a/gcc/loop-doloop.c
> +++ b/gcc/loop-doloop.c
> @@ -477,7 +477,31 @@ doloop_modify (class loop *loop, class niter_desc *desc,
>      }
>  
>    if (increment_count)
> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    {
> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
> +	 is never zero, as gimple pass loop ch will do optimization to simplify
> +	 the loop to NO loop for loop condition is false.  */

IMO the code needs to prove this, rather than just assume that previous
passes have made it so.

Thanks,
Richard

> +      bool simplify_zext = false;
> +      rtx extop0 = XEXP (count, 0);
> +      if (mode == E_DImode
> +	  && GET_CODE (count) == ZERO_EXTEND
> +	  && GET_CODE (extop0) == PLUS)
> +	{
> +	  rtx addop0 = XEXP (extop0, 0);
> +	  rtx addop1 = XEXP (extop0, 1);
> +	  if (CONST_SCALAR_INT_P (addop1)
> +	      && GET_MODE (addop0) == E_SImode
> +	      && addop1 == GEN_INT (-1))
> +	    {
> +	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
> +					  GET_MODE (addop0));
> +	      simplify_zext = true;
> +	    }
> +	}
> +
> +      if (!simplify_zext)
> +	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    }
>  
>    /* Insert initialization of the count register into the loop header.  */
>    start_sequence ();

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)
  2020-04-15  8:47 [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837) luoxhu
  2020-04-15  9:18 ` Richard Sandiford
@ 2020-04-15  9:37 ` Jakub Jelinek
  1 sibling, 0 replies; 12+ messages in thread
From: Jakub Jelinek @ 2020-04-15  9:37 UTC (permalink / raw)
  To: luoxhu; +Cc: gcc-patches, segher, wschmidt, linkw, joseph

On Wed, Apr 15, 2020 at 03:47:55AM -0500, luoxhu--- via Gcc-patches wrote:
> 	2020-04-15  Xiong Hu Luo  <luoxhu@linux.ibm.com>
> 
> 	PR rtl-optimization/37451, PR target/61837
> 	loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)

"* " missing before loop-doloop.c

> 	to zero_ext.
> ---
>  gcc/loop-doloop.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
> index db6a014e43d..9f967fa3a0b 100644
> --- a/gcc/loop-doloop.c
> +++ b/gcc/loop-doloop.c
> @@ -477,7 +477,31 @@ doloop_modify (class loop *loop, class niter_desc *desc,
>      }
>  
>    if (increment_count)
> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    {
> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
> +	 is never zero, as gimple pass loop ch will do optimization to simplify
> +	 the loop to NO loop for loop condition is false.  */

There is no guarantee the loop ch pass was run, one can do
-fno-tree-loop-ch, -fdisable-tree-ch, -fdisable-tree-ch=foobar or
perhaps the zext(x-1)+1 has been introduced after it (either the loop
appeared post that at GIMPLE or later)?
IMHO if you want something like that, you need to prove at the RTL level
that addop0 must be non-zero, perhaps using saved VRP info from the GIMPLE
stuff if there is REG_EXPR mapping it to a SSA_NAME with one (though even
that might not be safe if things changed during RTL opts).

> +      bool simplify_zext = false;
> +      rtx extop0 = XEXP (count, 0);
> +      if (mode == E_DImode

Why hardcode DImode and SImode?  In generic code, shouldn't it work with
something more generic, like word_mode and MODE_INT mode twice as big as
the word_mode (or do you want MODE_INT mode with half the size of word_mode
and word_mode)?

> +	  && GET_CODE (count) == ZERO_EXTEND
> +	  && GET_CODE (extop0) == PLUS)
> +	{
> +	  rtx addop0 = XEXP (extop0, 0);
> +	  rtx addop1 = XEXP (extop0, 1);
> +	  if (CONST_SCALAR_INT_P (addop1)
> +	      && GET_MODE (addop0) == E_SImode
> +	      && addop1 == GEN_INT (-1))
> +	    {
> +	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
> +					  GET_MODE (addop0));
> +	      simplify_zext = true;
> +	    }
> +	}
> +
> +      if (!simplify_zext)
> +	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    }

	Jakub


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)
  2020-04-15  9:18 ` Richard Sandiford
@ 2020-04-17  1:21   ` Segher Boessenkool
  2020-04-17 16:32     ` Segher Boessenkool
  0 siblings, 1 reply; 12+ messages in thread
From: Segher Boessenkool @ 2020-04-17  1:21 UTC (permalink / raw)
  To: luoxhu--- via Gcc-patches, luoxhu, wschmidt, linkw, joseph,
	richard.sandiford

On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
> luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> > +    {
> > +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
> > +	 is never zero, as gimple pass loop ch will do optimization to simplify
> > +	 the loop to NO loop for loop condition is false.  */
> 
> IMO the code needs to prove this, rather than just assume that previous
> passes have made it so.

Well, it should gcc_assert it, probably.

It is the left-hand side of a+b...  it cannot be 0, because niter always
is simplified!


Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)
  2020-04-17  1:21   ` Segher Boessenkool
@ 2020-04-17 16:32     ` Segher Boessenkool
  2020-04-20  8:21       ` luoxhu
  0 siblings, 1 reply; 12+ messages in thread
From: Segher Boessenkool @ 2020-04-17 16:32 UTC (permalink / raw)
  To: luoxhu--- via Gcc-patches, luoxhu, wschmidt, linkw, joseph,
	richard.sandiford

On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote:
> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
> > luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > > -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> > > +    {
> > > +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
> > > +	 is never zero, as gimple pass loop ch will do optimization to simplify
> > > +	 the loop to NO loop for loop condition is false.  */
> > 
> > IMO the code needs to prove this, rather than just assume that previous
> > passes have made it so.
> 
> Well, it should gcc_assert it, probably.
> 
> It is the left-hand side of a+b...  it cannot be 0, because niter always
> is simplified!

Scratch that...  it cannot be *constant* 0, but that isn't the issue here.


Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)
  2020-04-17 16:32     ` Segher Boessenkool
@ 2020-04-20  8:21       ` luoxhu
  2020-04-20  9:05         ` luoxhu
  0 siblings, 1 reply; 12+ messages in thread
From: luoxhu @ 2020-04-20  8:21 UTC (permalink / raw)
  To: Segher Boessenkool, luoxhu--- via Gcc-patches, linkw, joseph,
	richard.sandiford, jakub
  Cc: wschmidt, gcc-patches

Hi,

On 2020/4/18 00:32, Segher Boessenkool wrote:
> On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote:
>> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
>>> luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
>>>> +    {
>>>> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
>>>> +	 is never zero, as gimple pass loop ch will do optimization to simplify
>>>> +	 the loop to NO loop for loop condition is false.  */
>>>
>>> IMO the code needs to prove this, rather than just assume that previous
>>> passes have made it so.
>>
>> Well, it should gcc_assert it, probably.
>>
>> It is the left-hand side of a+b...  it cannot be 0, because niter always
>> is simplified!
> 
> Scratch that...  it cannot be *constant* 0, but that isn't the issue here.

Sorry that my comments in the code is a bit misleading, it is actually not
related to loop-ch at all.  The instruction sequence at 255r.loop2_invariant:

   25: NOTE_INSN_BASIC_BLOCK 5
   26: r133:SI=r123:DI#0-0x1
      REG_DEAD r123:DI
   27: r123:DI=zero_extend(r133:SI)
      REG_DEAD r133:SI
   28: r124:DI=r124:DI+0x4
   30: r134:CC=cmp(r123:DI,0)
   31: pc={(r134:CC!=0)?L69:pc}

And 257r.loop2_doloop (inserted #72,#73,#74, and #31 is replaced by #71):   

;; Determined upper bound -1.
Loop 2 is simple:
  simple exit 6 -> 7
  number of iterations: (plus:SI (subreg:SI (reg:DI 123 [ doloop.6 ]) 0)
    (const_int -1 [0xffffffffffffffff]))
  upper bound: 2147483646
  likely upper bound: 2147483646
  realistic bound: -1
...
   72: r144:SI=r123:DI#0-0x1
   73: r143:DI=zero_extend(r144:SI)
   74: r142:DI=r143:DI+0x1
...
   25: NOTE_INSN_BASIC_BLOCK 5
   26: r133:SI=r123:DI#0-0x1
      REG_DEAD r123:DI
   27: r123:DI=zero_extend(r133:SI)
      REG_DEAD r133:SI
   28: r124:DI=r124:DI+0x4
   30: r134:CC=cmp(r123:DI,0)
   71: {pc={(r142:DI!=0x1)?L69:pc};r142:DI=r142:DI-0x1;clobber scratch;clobber scratch;}

increment_count is true ensures the (condition NE const1_rtx), r123:DI#0-0x1 is the loop number
of iterations in doloop, it is definitely >= 0, and r123:DI#0 also shouldn't be zero as the
loop upper bound is 2147483646(0x7fffffffe)???

Since this simplification is in doloop-modify,  there is already some doloop form check like
!desc->simple_p || desc->assumptions|| desc->infinite in doloop_valid_p, so it seems
not necessary to repeat check it here again? 
Maybe we just need check the loop upper bound is LEU than 0x7fffffffe to avoid if
instruction #26 overflow?


Updated patch, thanks:


This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).

Andraw:
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
Joseph:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html

However, the doloop code improved a lot since so many years passed,
gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant
desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop
and no transform done in doloop again.  Thus we can do the simplification
from "subtract/extend/add" to only extend when loop upper_bound is known
to be LE than SINT_MAX-1(not do simplify when r120:DI#0-0x1 overflow).

Bootstrap and regression tested pass on Power8-LE.

gcc/ChangeLog

	2020-04-20  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, PR target/61837
	* loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
	to zero_ext.
---
 gcc/loop-doloop.c | 41 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index db6a014e43d..da537aff60f 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -477,7 +477,46 @@ doloop_modify (class loop *loop, class niter_desc *desc,
     }
 
   if (increment_count)
-    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    {
+      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
+
+	 73: r145:SI=r123:DI#0-0x1
+	 74: r144:DI=zero_extend(r145:SI)
+	 75: r143:DI=r144:DI+0x1
+	 ...
+	 31: r135:CC=cmp(r123:DI,0)
+	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
+	 scratch;clobber scratch;}
+
+	 r123:DI#0-0x1 is the loop iterations be GE than 0, r143 is the loop
+	 count be saved to ctr, if this loop's upper bound is known, r123:DI#0
+	 won't be zero, then the expressions could be simplified to zero_extend
+	 only.  */
+      bool simplify_zext = false;
+      rtx extop0 = XEXP (count, 0);
+      if (loop->any_upper_bound
+	  && wi::leu_p (loop->nb_iterations_upper_bound, 0x7ffffffe)
+	  && GET_CODE (count) == ZERO_EXTEND
+	  && GET_CODE (extop0) == PLUS)
+	{
+	  rtx addop0 = XEXP (extop0, 0);
+	  rtx addop1 = XEXP (extop0, 1);
+
+	  unsigned int_mode
+	    = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));
+	  if (CONST_SCALAR_INT_P (addop1)
+	      && GET_MODE_PRECISION (mode) == int_mode * 2
+	      && addop1 == GEN_INT (-1))
+	    {
+	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
+					  GET_MODE (addop0));
+	      simplify_zext = true;
+	    }
+	}
+
+      if (!simplify_zext)
+	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    }
 
   /* Insert initialization of the count register into the loop header.  */
   start_sequence ();
-- 
2.21.0.777.g83232e3864



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837)
  2020-04-20  8:21       ` luoxhu
@ 2020-04-20  9:05         ` luoxhu
  2020-05-12  6:48           ` [PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837) luoxhu
  0 siblings, 1 reply; 12+ messages in thread
From: luoxhu @ 2020-04-20  9:05 UTC (permalink / raw)
  To: Segher Boessenkool, luoxhu--- via Gcc-patches, linkw, joseph,
	richard.sandiford, jakub
  Cc: wschmidt

Tiny update to accommodate unsigned int compare.

On 2020/4/20 16:21, luoxhu via Gcc-patches wrote:
> Hi,
> 
> On 2020/4/18 00:32, Segher Boessenkool wrote:
>> On Thu, Apr 16, 2020 at 08:21:40PM -0500, Segher Boessenkool wrote:
>>> On Wed, Apr 15, 2020 at 10:18:16AM +0100, Richard Sandiford wrote:
>>>> luoxhu--- via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>>> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
>>>>> +    {
>>>>> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext based on addop0
>>>>> +	 is never zero, as gimple pass loop ch will do optimization to simplify
>>>>> +	 the loop to NO loop for loop condition is false.  */
>>>>
>>>> IMO the code needs to prove this, rather than just assume that previous
>>>> passes have made it so.
>>>
>>> Well, it should gcc_assert it, probably.
>>>
>>> It is the left-hand side of a+b...  it cannot be 0, because niter always
>>> is simplified!
>>
>> Scratch that...  it cannot be *constant* 0, but that isn't the issue here.
> 
> Sorry that my comments in the code is a bit misleading, it is actually not
> related to loop-ch at all.  The instruction sequence at 255r.loop2_invariant:
> 
>     25: NOTE_INSN_BASIC_BLOCK 5
>     26: r133:SI=r123:DI#0-0x1
>        REG_DEAD r123:DI
>     27: r123:DI=zero_extend(r133:SI)
>        REG_DEAD r133:SI
>     28: r124:DI=r124:DI+0x4
>     30: r134:CC=cmp(r123:DI,0)
>     31: pc={(r134:CC!=0)?L69:pc}
> 
> And 257r.loop2_doloop (inserted #72,#73,#74, and #31 is replaced by #71):
> 
> ;; Determined upper bound -1.
> Loop 2 is simple:
>    simple exit 6 -> 7
>    number of iterations: (plus:SI (subreg:SI (reg:DI 123 [ doloop.6 ]) 0)
>      (const_int -1 [0xffffffffffffffff]))
>    upper bound: 2147483646
>    likely upper bound: 2147483646
>    realistic bound: -1
> ...
>     72: r144:SI=r123:DI#0-0x1
>     73: r143:DI=zero_extend(r144:SI)
>     74: r142:DI=r143:DI+0x1
> ...
>     25: NOTE_INSN_BASIC_BLOCK 5
>     26: r133:SI=r123:DI#0-0x1
>        REG_DEAD r123:DI
>     27: r123:DI=zero_extend(r133:SI)
>        REG_DEAD r133:SI
>     28: r124:DI=r124:DI+0x4
>     30: r134:CC=cmp(r123:DI,0)
>     71: {pc={(r142:DI!=0x1)?L69:pc};r142:DI=r142:DI-0x1;clobber scratch;clobber scratch;}
> 
> increment_count is true ensures the (condition NE const1_rtx), r123:DI#0-0x1 is the loop number
> of iterations in doloop, it is definitely >= 0, and r123:DI#0 also shouldn't be zero as the
> loop upper bound is 2147483646(0x7fffffffe)???
> 
> Since this simplification is in doloop-modify,  there is already some doloop form check like
> !desc->simple_p || desc->assumptions|| desc->infinite in doloop_valid_p, so it seems
> not necessary to repeat check it here again?
> Maybe we just need check the loop upper bound is LEU than 0x7fffffffe to avoid if
> instruction #26 overflow?
0xfffffffe

> 
> 
> Updated patch, thanks:
> 
> 
> This "subtract/extend/add" existed for a long time and still annoying us
> (PR37451, PR61837) when converting from 32bits to 64bits, as the ctr
> register is used as 64bits on powerpc64, Andraw Pinski had a patch but
> caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).
> 
> Andraw:
> http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
> http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
> Joseph:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html
> 
> However, the doloop code improved a lot since so many years passed,
> gcc.c-torture/execute/doloop-1.c is no longer a simple loop with constant
> desc->niter_expr since r125:SI#0 is not SImode, so it is not a valid doloop
> and no transform done in doloop again.  Thus we can do the simplification
> from "subtract/extend/add" to only extend when loop upper_bound is known
> to be LE than SINT_MAX-1(not do simplify when r120:DI#0-0x1 overflow).
UINT_MAX-1

> 
> Bootstrap and regression tested pass on Power8-LE.
> 
> gcc/ChangeLog
> 
> 	2020-04-20  Xiong Hu Luo  <luoxhu@linux.ibm.com>
> 
> 	PR rtl-optimization/37451, PR target/61837
> 	* loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
> 	to zero_ext.
> ---
>   gcc/loop-doloop.c | 41 ++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 40 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
> index db6a014e43d..da537aff60f 100644
> --- a/gcc/loop-doloop.c
> +++ b/gcc/loop-doloop.c
> @@ -477,7 +477,46 @@ doloop_modify (class loop *loop, class niter_desc *desc,
>       }
> 
>     if (increment_count)
> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    {
> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
> +
> +	 73: r145:SI=r123:DI#0-0x1
> +	 74: r144:DI=zero_extend(r145:SI)
> +	 75: r143:DI=r144:DI+0x1
> +	 ...
> +	 31: r135:CC=cmp(r123:DI,0)
> +	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
> +	 scratch;clobber scratch;}
> +
> +	 r123:DI#0-0x1 is the loop iterations be GE than 0, r143 is the loop
> +	 count be saved to ctr, if this loop's upper bound is known, r123:DI#0
> +	 won't be zero, then the expressions could be simplified to zero_extend
> +	 only.  */
> +      bool simplify_zext = false;
> +      rtx extop0 = XEXP (count, 0);
> +      if (loop->any_upper_bound
> +	  && wi::leu_p (loop->nb_iterations_upper_bound, 0x7ffffffe)

Should be 4294967294(0xfffffffe) here.

Xionghu

> +	  && GET_CODE (count) == ZERO_EXTEND
> +	  && GET_CODE (extop0) == PLUS)
> +	{
> +	  rtx addop0 = XEXP (extop0, 0);
> +	  rtx addop1 = XEXP (extop0, 1);
> +
> +	  unsigned int_mode
> +	    = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));
> +	  if (CONST_SCALAR_INT_P (addop1)
> +	      && GET_MODE_PRECISION (mode) == int_mode * 2
> +	      && addop1 == GEN_INT (-1))
> +	    {
> +	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
> +					  GET_MODE (addop0));
> +	      simplify_zext = true;
> +	    }
> +	}
> +
> +      if (!simplify_zext)
> +	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    }
> 
>     /* Insert initialization of the count register into the loop header.  */
>     start_sequence ();
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837)
  2020-04-20  9:05         ` luoxhu
@ 2020-05-12  6:48           ` luoxhu
  2020-05-12 14:58             ` Segher Boessenkool
  2020-05-12 18:24             ` Richard Sandiford
  0 siblings, 2 replies; 12+ messages in thread
From: luoxhu @ 2020-05-12  6:48 UTC (permalink / raw)
  To: Segher Boessenkool, luoxhu--- via Gcc-patches, linkw, joseph,
	richard.sandiford, jakub
  Cc: wschmidt

Minor refine of checking iterations nonoverflow and a testcase for stage 1.


This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, part of PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).

Andraw:
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
Joseph:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html

We can do the simplification from "subtract/extend/add" to only extend
when loop iterations is known to be LT than MODE_MAX-1(NOT do simplify
when counter+0x1 overflow).

Bootstrap and regression tested pass on Power8-LE.

gcc/ChangeLog

	2020-05-12  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, part of PR target/61837
	* loop-doloop.c (doloop_modify): Simplify (add -1; zero_ext; add +1)
	to zero_ext when not wrapping overflow.

gcc/testsuite/ChangeLog

	2020-05-12  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, part of PR target/61837
	* gcc.target/powerpc/doloop-2.c: New test.
---
 gcc/loop-doloop.c                           | 46 ++++++++++++++++++++-
 gcc/testsuite/gcc.target/powerpc/doloop-2.c | 14 +++++++
 2 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/doloop-2.c

diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index db6a014e43d..16372382a22 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -477,7 +477,51 @@ doloop_modify (class loop *loop, class niter_desc *desc,
     }
 
   if (increment_count)
-    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    {
+      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
+
+	 73: r145:SI=r123:DI#0-0x1
+	 74: r144:DI=zero_extend (r145:SI)
+	 75: r143:DI=r144:DI+0x1
+	 ...
+	 31: r135:CC=cmp (r123:DI,0)
+	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
+	 scratch;clobber scratch;}
+
+	 r123:DI#0-0x1 is param count derived from loop->niter_expr equal to the
+	 loop iterations, if loop iterations expression doesn't overflow, then
+	 (zero_extend (r123:DI#0-1))+1 could be simplified to zero_extend only.
+       */
+      bool simplify_zext = false;
+      rtx extop0 = XEXP (count, 0);
+      if (GET_CODE (count) == ZERO_EXTEND && GET_CODE (extop0) == PLUS)
+	{
+	  rtx addop0 = XEXP (extop0, 0);
+	  rtx addop1 = XEXP (extop0, 1);
+
+	  int nonoverflow = 0;
+	  unsigned int_mode
+	    = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));
+	  unsigned HOST_WIDE_INT int_mode_max
+	    = (HOST_WIDE_INT_1U << (int_mode - 1) << 1) - 1;
+	  if (get_max_loop_iterations (loop, &iterations)
+	      && wi::ltu_p (iterations, int_mode_max))
+	    nonoverflow = 1;
+
+	  if (nonoverflow
+	      && CONST_SCALAR_INT_P (addop1)
+	      && GET_MODE_PRECISION (mode) == int_mode * 2
+	      && addop1 == GEN_INT (-1))
+	    {
+	      count = simplify_gen_unary (ZERO_EXTEND, mode, addop0,
+					  GET_MODE (addop0));
+	      simplify_zext = true;
+	    }
+	}
+
+      if (!simplify_zext)
+	count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    }
 
   /* Insert initialization of the count register into the loop header.  */
   start_sequence ();
diff --git a/gcc/testsuite/gcc.target/powerpc/doloop-2.c b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
new file mode 100644
index 00000000000..dc8516bb0ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target powerpc*-*-* } } */
+/* { dg-options "-O2 -fno-unroll-loops" } */
+
+int f(int l, int *a)
+{
+    int i;
+      for(i = 0;i < l; i++)
+	    a[i] = i;
+        return l;
+}
+
+/* { dg-final { scan-assembler-not "-1" } } */
+/* { dg-final { scan-assembler "bdnz" } } */
+/* { dg-final { scan-assembler-times "mtctr" 1 } } */
-- 
2.21.0.777.g83232e3864




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837)
  2020-05-12  6:48           ` [PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837) luoxhu
@ 2020-05-12 14:58             ` Segher Boessenkool
  2020-05-12 18:24             ` Richard Sandiford
  1 sibling, 0 replies; 12+ messages in thread
From: Segher Boessenkool @ 2020-05-12 14:58 UTC (permalink / raw)
  To: luoxhu
  Cc: luoxhu--- via Gcc-patches, linkw, joseph, richard.sandiford,
	jakub, wschmidt

Hi!

On Tue, May 12, 2020 at 02:48:40PM +0800, luoxhu wrote:
> diff --git a/gcc/testsuite/gcc.target/powerpc/doloop-2.c b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
> new file mode 100644
> index 00000000000..dc8516bb0ab
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target powerpc*-*-* } } */

Just { dg-do compiler } please, *everything* that runs this testsuite
is powerpc*-*-*; but compile is the default as well, so you can leave
that line completely out as well, if you want.

> +/* { dg-final { scan-assembler-not "-1" } } */

This will fail the test for the string "-1" anywhere in the file.  Like,
if it was called "doloop-1.c" it would fail, or "doloop-12345.c".  \m
and \M can help for that last case, but you probably want to make the
regex a bit more selective ;-)  (And, document what it doesn't want to
see, if it isn't really obvious?)

Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837)
  2020-05-12  6:48           ` [PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837) luoxhu
  2020-05-12 14:58             ` Segher Boessenkool
@ 2020-05-12 18:24             ` Richard Sandiford
  2020-05-14  2:48               ` luoxhu
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Sandiford @ 2020-05-12 18:24 UTC (permalink / raw)
  To: luoxhu
  Cc: Segher Boessenkool, luoxhu--- via Gcc-patches, linkw, joseph,
	jakub, wschmidt

luoxhu <luoxhu@linux.ibm.com> writes:
> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
> +
> +	 73: r145:SI=r123:DI#0-0x1
> +	 74: r144:DI=zero_extend (r145:SI)
> +	 75: r143:DI=r144:DI+0x1
> +	 ...
> +	 31: r135:CC=cmp (r123:DI,0)
> +	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
> +	 scratch;clobber scratch;}

Minor, but it might be worth stubbing out the clobbers, since they're
not really necessary to understand the comment:

	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;...}

> +
> +	 r123:DI#0-0x1 is param count derived from loop->niter_expr equal to the
> +	 loop iterations, if loop iterations expression doesn't overflow, then
> +	 (zero_extend (r123:DI#0-1))+1 could be simplified to zero_extend only.
> +       */
> +      bool simplify_zext = false;

I think it'd be easier to follow if this was split out into
a subroutine, rather than having the simplify_zext variable.

> +      rtx extop0 = XEXP (count, 0);
> +      if (GET_CODE (count) == ZERO_EXTEND && GET_CODE (extop0) == PLUS)

This isn't valid: we can only do XEXP (count, 0) *after* checking
for a ZERO_EXTEND.  (It'd be good to test the patch with
--enable-checking=yes,extra,rtl , which hopefully would have
caught this.)

> +	{
> +	  rtx addop0 = XEXP (extop0, 0);
> +	  rtx addop1 = XEXP (extop0, 1);
> +
> +	  int nonoverflow = 0;
> +	  unsigned int_mode
> +	    = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));

Heh.  I wondered at first how on earth this compiled.  It looked like
there was a missing "(...)" around the GET_MODE.  But of course,
GET_MODE adds its own parentheses, so it all works out. :-)

Please add the "(...)" anyway though.  We shouldn't rely on that.

"int_mode" seems a bit of a confusing name, since it's actually a precision
in bits rather than a mode.

> +	  unsigned HOST_WIDE_INT int_mode_max
> +	    = (HOST_WIDE_INT_1U << (int_mode - 1) << 1) - 1;
> +	  if (get_max_loop_iterations (loop, &iterations)
> +	      && wi::ltu_p (iterations, int_mode_max))

You could use GET_MODE_MASK instead of int_mode_max here.

For extra safety, it would be good to add a HWI_COMPUTABLE_P test,
to make sure that using HWIs is valid.

> +	    nonoverflow = 1;
> +
> +	  if (nonoverflow

Having the nonoverflow variable doesn't seem necessary.  We could
just fuse the two "if" conditions together.

> +	      && CONST_SCALAR_INT_P (addop1)
> +	      && GET_MODE_PRECISION (mode) == int_mode * 2

This GET_MODE_PRECISION condition also shouldn't be necessary.
If we can prove that the subtraction doesn't wrap, we can extend
to any wider mode, not just to double the width.

> +	      && addop1 == GEN_INT (-1))

This can just be:

   addop1 == constm1_rtx

There's then no need for the CONST_SCALAR_INT_P check.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837)
  2020-05-12 18:24             ` Richard Sandiford
@ 2020-05-14  2:48               ` luoxhu
  2020-05-14 11:15                 ` Richard Sandiford
  0 siblings, 1 reply; 12+ messages in thread
From: luoxhu @ 2020-05-14  2:48 UTC (permalink / raw)
  To: Segher Boessenkool, luoxhu--- via Gcc-patches, linkw, joseph,
	jakub, wschmidt, richard.sandiford

On 2020/5/13 02:24, Richard Sandiford wrote:
> luoxhu <luoxhu@linux.ibm.com> writes:
>> +      /* Fold (add -1; zero_ext; add +1) operations to zero_ext. i.e:
>> +
>> +	 73: r145:SI=r123:DI#0-0x1
>> +	 74: r144:DI=zero_extend (r145:SI)
>> +	 75: r143:DI=r144:DI+0x1
>> +	 ...
>> +	 31: r135:CC=cmp (r123:DI,0)
>> +	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;clobber
>> +	 scratch;clobber scratch;}
> 
> Minor, but it might be worth stubbing out the clobbers, since they're
> not really necessary to understand the comment:
> 
> 	 72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;...}
> 
>> +
>> +	 r123:DI#0-0x1 is param count derived from loop->niter_expr equal to the
>> +	 loop iterations, if loop iterations expression doesn't overflow, then
>> +	 (zero_extend (r123:DI#0-1))+1 could be simplified to zero_extend only.
>> +       */
>> +      bool simplify_zext = false;
> 
> I think it'd be easier to follow if this was split out into
> a subroutine, rather than having the simplify_zext variable.
> 
>> +      rtx extop0 = XEXP (count, 0);
>> +      if (GET_CODE (count) == ZERO_EXTEND && GET_CODE (extop0) == PLUS)
> 
> This isn't valid: we can only do XEXP (count, 0) *after* checking
> for a ZERO_EXTEND.  (It'd be good to test the patch with
> --enable-checking=yes,extra,rtl , which hopefully would have
> caught this.)
> 
>> +	{
>> +	  rtx addop0 = XEXP (extop0, 0);
>> +	  rtx addop1 = XEXP (extop0, 1);
>> +
>> +	  int nonoverflow = 0;
>> +	  unsigned int_mode
>> +	    = GET_MODE_PRECISION (as_a<scalar_int_mode> GET_MODE (addop0));
> 
> Heh.  I wondered at first how on earth this compiled.  It looked like
> there was a missing "(...)" around the GET_MODE.  But of course,
> GET_MODE adds its own parentheses, so it all works out. :-)
> 
> Please add the "(...)" anyway though.  We shouldn't rely on that.
> 
> "int_mode" seems a bit of a confusing name, since it's actually a precision
> in bits rather than a mode.
> 
>> +	  unsigned HOST_WIDE_INT int_mode_max
>> +	    = (HOST_WIDE_INT_1U << (int_mode - 1) << 1) - 1;
>> +	  if (get_max_loop_iterations (loop, &iterations)
>> +	      && wi::ltu_p (iterations, int_mode_max))
> 
> You could use GET_MODE_MASK instead of int_mode_max here.
> 
> For extra safety, it would be good to add a HWI_COMPUTABLE_P test,
> to make sure that using HWIs is valid.
> 
>> +	    nonoverflow = 1;
>> +
>> +	  if (nonoverflow
> 
> Having the nonoverflow variable doesn't seem necessary.  We could
> just fuse the two "if" conditions together.
> 
>> +	      && CONST_SCALAR_INT_P (addop1)
>> +	      && GET_MODE_PRECISION (mode) == int_mode * 2
> 
> This GET_MODE_PRECISION condition also shouldn't be necessary.
> If we can prove that the subtraction doesn't wrap, we can extend
> to any wider mode, not just to double the width.
> 
>> +	      && addop1 == GEN_INT (-1))
> 
> This can just be:
> 
>     addop1 == constm1_rtx
> 
> There's then no need for the CONST_SCALAR_INT_P check.
> 
> Thanks,
> Richard
> 

Thanks for all your great comments, addressed them all with below update,
"--enable-checking=yes,extra,rtl" did catch the ICE with performance penalty.


This "subtract/extend/add" existed for a long time and still annoying us
(PR37451, part of PR61837) when converting from 32bits to 64bits, as the ctr
register is used as 64bits on powerpc64, Andraw Pinski had a patch but
caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).

Andraw:
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
Joseph:
https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html

We still can do the simplification from "subtract/zero_ext/add" to "zero_ext"
when loop iterations is known to be LT than MODE_MAX (only do simplify
when counter+0x1 NOT overflow).

Bootstrap and regression tested pass on Power8-LE.

gcc/ChangeLog

	2020-05-14  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, part of PR target/61837
	* loop-doloop.c (doloop_simplify_count): New function.  Simplify
	(add -1; zero_ext; add +1) to zero_ext when not wrapping.
	(doloop_modify): Call doloop_simplify_count.

gcc/testsuite/ChangeLog

	2020-05-14  Xiong Hu Luo  <luoxhu@linux.ibm.com>

	PR rtl-optimization/37451, part of PR target/61837
	* gcc.target/powerpc/doloop-2.c: New test.
---
 gcc/loop-doloop.c                           | 38 ++++++++++++++++++++-
 gcc/testsuite/gcc.target/powerpc/doloop-2.c | 29 ++++++++++++++++
 2 files changed, 66 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/doloop-2.c

diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
index db6a014e43d..02282d45bd5 100644
--- a/gcc/loop-doloop.c
+++ b/gcc/loop-doloop.c
@@ -397,6 +397,42 @@ add_test (rtx cond, edge *e, basic_block dest)
   return true;
 }
 
+/* Fold (add -1; zero_ext; add +1) operations to zero_ext if not wrapping. i.e:
+
+   73: r145:SI=r123:DI#0-0x1
+   74: r144:DI=zero_extend (r145:SI)
+   75: r143:DI=r144:DI+0x1
+   ...
+   31: r135:CC=cmp (r123:DI,0)
+   72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;...}
+
+   r123:DI#0-0x1 is param count derived from loop->niter_expr equal to number of
+   loop iterations, if loop iterations expression doesn't overflow, then
+   (zero_extend (r123:DI#0-1))+1 can be simplified to zero_extend.  */
+
+static rtx
+doloop_simplify_count (class loop *loop, scalar_int_mode mode, rtx count)
+{
+  widest_int iterations;
+  if (GET_CODE (count) == ZERO_EXTEND)
+    {
+      rtx extop0 = XEXP (count, 0);
+      if (GET_CODE (extop0) == PLUS)
+	{
+	  rtx addop0 = XEXP (extop0, 0);
+	  rtx addop1 = XEXP (extop0, 1);
+
+	  if (get_max_loop_iterations (loop, &iterations)
+	      && wi::ltu_p (iterations, GET_MODE_MASK (GET_MODE (addop0)))
+	      && addop1 == constm1_rtx)
+	    return simplify_gen_unary (ZERO_EXTEND, mode, addop0,
+				       GET_MODE (addop0));
+	}
+    }
+
+  return simplify_gen_binary (PLUS, mode, count, const1_rtx);
+}
+
 /* Modify the loop to use the low-overhead looping insn where LOOP
    describes the loop, DESC describes the number of iterations of the
    loop, and DOLOOP_INSN is the low-overhead looping insn to emit at the
@@ -477,7 +513,7 @@ doloop_modify (class loop *loop, class niter_desc *desc,
     }
 
   if (increment_count)
-    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
+    count = doloop_simplify_count (loop, mode, count);
 
   /* Insert initialization of the count register into the loop header.  */
   start_sequence ();
diff --git a/gcc/testsuite/gcc.target/powerpc/doloop-2.c b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
new file mode 100644
index 00000000000..3199fe56d35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-unroll-loops" } */
+
+unsigned int
+foo1 (unsigned int l, int *a)
+{
+  unsigned int i;
+  for(i = 0;i < l; i++)
+    a[i] = i;
+  return l;
+}
+
+int
+foo2 (int l, int *a)
+{
+  int i;
+  for(i = 0;i < l; i++)
+    a[i] = i;
+  return l;
+}
+
+/* The place where we were getting an extra -1 is when converting from 32bits
+   to 64bits as the ctr register is used as 64bits on powerpc64.  We should be
+   able to do this loop without "add -1/zero_ext/add 1" to the l to get the
+   number of iterations of this loop still doing a do-loop.  */
+
+/* { dg-final { scan-assembler-not {(?n)\maddi .*,.*,-1$} } } */
+/* { dg-final { scan-assembler-times "bdnz" 2 } } */
+/* { dg-final { scan-assembler-times "mtctr" 2 } } */
-- 
2.21.0.777.g83232e3864



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837)
  2020-05-14  2:48               ` luoxhu
@ 2020-05-14 11:15                 ` Richard Sandiford
  0 siblings, 0 replies; 12+ messages in thread
From: Richard Sandiford @ 2020-05-14 11:15 UTC (permalink / raw)
  To: luoxhu
  Cc: Segher Boessenkool, luoxhu--- via Gcc-patches, linkw, joseph,
	jakub, wschmidt

luoxhu <luoxhu@linux.ibm.com> writes:
> This "subtract/extend/add" existed for a long time and still annoying us
> (PR37451, part of PR61837) when converting from 32bits to 64bits, as the ctr
> register is used as 64bits on powerpc64, Andraw Pinski had a patch but
> caused some issue and reverted by Joseph S. Myers(PR37451, PR37782).
>
> Andraw:
> http://gcc.gnu.org/ml/gcc-patches/2008-09/msg01070.html
> http://gcc.gnu.org/ml/gcc-patches/2008-10/msg01321.html
> Joseph:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2011-11/msg02405.html
>
> We still can do the simplification from "subtract/zero_ext/add" to "zero_ext"
> when loop iterations is known to be LT than MODE_MAX (only do simplify
> when counter+0x1 NOT overflow).
>
> Bootstrap and regression tested pass on Power8-LE.
>
> gcc/ChangeLog
>
> 	2020-05-14  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>
> 	PR rtl-optimization/37451, part of PR target/61837
> 	* loop-doloop.c (doloop_simplify_count): New function.  Simplify
> 	(add -1; zero_ext; add +1) to zero_ext when not wrapping.
> 	(doloop_modify): Call doloop_simplify_count.
>
> gcc/testsuite/ChangeLog
>
> 	2020-05-14  Xiong Hu Luo  <luoxhu@linux.ibm.com>
>
> 	PR rtl-optimization/37451, part of PR target/61837
> 	* gcc.target/powerpc/doloop-2.c: New test.

OK, thanks.

Richard

> ---
>  gcc/loop-doloop.c                           | 38 ++++++++++++++++++++-
>  gcc/testsuite/gcc.target/powerpc/doloop-2.c | 29 ++++++++++++++++
>  2 files changed, 66 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/doloop-2.c
>
> diff --git a/gcc/loop-doloop.c b/gcc/loop-doloop.c
> index db6a014e43d..02282d45bd5 100644
> --- a/gcc/loop-doloop.c
> +++ b/gcc/loop-doloop.c
> @@ -397,6 +397,42 @@ add_test (rtx cond, edge *e, basic_block dest)
>    return true;
>  }
>  
> +/* Fold (add -1; zero_ext; add +1) operations to zero_ext if not wrapping. i.e:
> +
> +   73: r145:SI=r123:DI#0-0x1
> +   74: r144:DI=zero_extend (r145:SI)
> +   75: r143:DI=r144:DI+0x1
> +   ...
> +   31: r135:CC=cmp (r123:DI,0)
> +   72: {pc={(r143:DI!=0x1)?L70:pc};r143:DI=r143:DI-0x1;...}
> +
> +   r123:DI#0-0x1 is param count derived from loop->niter_expr equal to number of
> +   loop iterations, if loop iterations expression doesn't overflow, then
> +   (zero_extend (r123:DI#0-1))+1 can be simplified to zero_extend.  */
> +
> +static rtx
> +doloop_simplify_count (class loop *loop, scalar_int_mode mode, rtx count)
> +{
> +  widest_int iterations;
> +  if (GET_CODE (count) == ZERO_EXTEND)
> +    {
> +      rtx extop0 = XEXP (count, 0);
> +      if (GET_CODE (extop0) == PLUS)
> +	{
> +	  rtx addop0 = XEXP (extop0, 0);
> +	  rtx addop1 = XEXP (extop0, 1);
> +
> +	  if (get_max_loop_iterations (loop, &iterations)
> +	      && wi::ltu_p (iterations, GET_MODE_MASK (GET_MODE (addop0)))
> +	      && addop1 == constm1_rtx)
> +	    return simplify_gen_unary (ZERO_EXTEND, mode, addop0,
> +				       GET_MODE (addop0));
> +	}
> +    }
> +
> +  return simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +}
> +
>  /* Modify the loop to use the low-overhead looping insn where LOOP
>     describes the loop, DESC describes the number of iterations of the
>     loop, and DOLOOP_INSN is the low-overhead looping insn to emit at the
> @@ -477,7 +513,7 @@ doloop_modify (class loop *loop, class niter_desc *desc,
>      }
>  
>    if (increment_count)
> -    count = simplify_gen_binary (PLUS, mode, count, const1_rtx);
> +    count = doloop_simplify_count (loop, mode, count);
>  
>    /* Insert initialization of the count register into the loop header.  */
>    start_sequence ();
> diff --git a/gcc/testsuite/gcc.target/powerpc/doloop-2.c b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
> new file mode 100644
> index 00000000000..3199fe56d35
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/doloop-2.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-unroll-loops" } */
> +
> +unsigned int
> +foo1 (unsigned int l, int *a)
> +{
> +  unsigned int i;
> +  for(i = 0;i < l; i++)
> +    a[i] = i;
> +  return l;
> +}
> +
> +int
> +foo2 (int l, int *a)
> +{
> +  int i;
> +  for(i = 0;i < l; i++)
> +    a[i] = i;
> +  return l;
> +}
> +
> +/* The place where we were getting an extra -1 is when converting from 32bits
> +   to 64bits as the ctr register is used as 64bits on powerpc64.  We should be
> +   able to do this loop without "add -1/zero_ext/add 1" to the l to get the
> +   number of iterations of this loop still doing a do-loop.  */
> +
> +/* { dg-final { scan-assembler-not {(?n)\maddi .*,.*,-1$} } } */
> +/* { dg-final { scan-assembler-times "bdnz" 2 } } */
> +/* { dg-final { scan-assembler-times "mtctr" 2 } } */

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-05-14 11:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-15  8:47 [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837) luoxhu
2020-04-15  9:18 ` Richard Sandiford
2020-04-17  1:21   ` Segher Boessenkool
2020-04-17 16:32     ` Segher Boessenkool
2020-04-20  8:21       ` luoxhu
2020-04-20  9:05         ` luoxhu
2020-05-12  6:48           ` [PATCH v2] Fold (add -1; zero_ext; add +1) operations to zero_ext when not overflow (PR37451, part of PR61837) luoxhu
2020-05-12 14:58             ` Segher Boessenkool
2020-05-12 18:24             ` Richard Sandiford
2020-05-14  2:48               ` luoxhu
2020-05-14 11:15                 ` Richard Sandiford
2020-04-15  9:37 ` [PATCH] Fold (add -1; zero_ext; add +1) operations to zero_ext when not zero (PR37451, PR61837) Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).