[PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
@ 2017-01-18 10:21 Martin Liška
  2017-01-18 11:08 ` Bin.Cheng
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Martin Liška @ 2017-01-18 10:21 UTC (permalink / raw)
  To: GCC Patches; +Cc: vp

[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]

Hello.

After basic understanding of loop predictive commoning, the problematic combined chain is:

Loads-only chain 0x38b6730 (combined)
  max distance 0
  references:
    MEM[(real(kind=8) *)vectp_a.29_81] (id 1)
      offset 20
      distance 0
    MEM[(real(kind=8) *)vectp_a.38_141] (id 3)
      offset 20
      distance 0

Loads-only chain 0x38b68b0 (combined)
  max distance 0
  references:
    MEM[(real(kind=8) *)vectp_a.23_102] (id 0)
      offset 0
      distance 0
    MEM[(real(kind=8) *)vectp_a.33_33] (id 2)
      offset 0
      distance 0

Combination chain 0x38b65b0
  max distance 0, may reuse first
  equal to 0x38b6730 + 0x38b68b0 in type vector(2) real(kind=8)
  references:
    combination ref
      in statement predreastmp.48_10 = vect__32.31_78 + vect__28.25_100;

      distance 0
    combination ref
      in statement predreastmp.50_17 = vect__42.41_138 + vect__38.36_29;

      distance 0

It's important to note that distance is equal to zero (happening within a same loop iteration).
Aforementioned chains correspond to:

...
r2:  vect__28.25_100 = MEM[(real(kind=8) *)vectp_a.23_102];
  vectp_a.23_99 = vectp_a.23_102 + 16;
  vect__28.26_98 = MEM[(real(kind=8) *)vectp_a.23_99];
  vect__82.27_97 = vect__22.22_108;
  vect__82.27_96 = vect__22.22_107;
  vect__79.28_95 = vect__82.27_97 + vect__84.17_120;
  vect__79.28_94 = vect__82.27_96 + vect__84.17_119;
r1:  vect__32.31_78 = MEM[(real(kind=8) *)vectp_a.29_81];
  vectp_a.29_77 = vectp_a.29_81 + 16;
  vect__32.32_76 = MEM[(real(kind=8) *)vectp_a.29_77];
  vect__38.35_39 = MEM[(real(kind=8) *)vectp_a.33_57];
r2':  vectp_a.33_33 = vectp_a.33_57 + 16;
  vect__38.36_29 = MEM[(real(kind=8) *)vectp_a.33_33];
  vect__56.37_23 = vect__38.35_39;
  vect__56.37_15 = vect__32.32_76;
  vect__42.40_161 = MEM[(real(kind=8) *)vectp_a.38_163];
  vectp_a.38_141 = vectp_a.38_163 + 16;
r1':  vect__42.41_138 = MEM[(real(kind=8) *)vectp_a.38_141];
  vect__54.42_135 = vect__42.40_161 + vect__56.37_23;
r1'+r2':  predreastmp.50_17 = vect__42.41_138 + vect__38.36_29;
  predreastmp.51_18 = vect__56.37_15;
  vect__54.42_134 = predreastmp.50_17;
r1+r2:  predreastmp.48_10 = vect__32.31_78 + vect__28.25_100;
...

Problematic construct is that while having load-only chains r1->r1' and r2->r2', the combination
is actually r1'+r2'->r1+r2, which cause the troubles. I believe the proper fix is to reject such
combinations where combined root stmt does not dominate usages. It's probably corner case as it does
not reuse any values among loop iterations (which is main motivation of the pass), it's doing PRE
if I'm right.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin


[-- Attachment #2: 0001-Be-careful-about-combined-chain-with-length-0-PR-tre.patch --]
[-- Type: text/x-patch, Size: 2590 bytes --]

From 41b153cf975374fff48419ec8ac5991ac134735f Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Tue, 17 Jan 2017 14:22:40 +0100
Subject: [PATCH] Be careful about combined chain with length == 0 (PR
 tree-optimization/70754).

gcc/testsuite/ChangeLog:

2017-01-17  Martin Liska  <mliska@suse.cz>

	PR tree-optimization/70754
	* gfortran.dg/pr70754.f90: New test.

gcc/ChangeLog:

2017-01-17  Martin Liska  <mliska@suse.cz>

	PR tree-optimization/70754
	* tree-predcom.c (combine_chains): Do not create a combined chain
	with length equal to zero when root_stmt does not dominate
	stmts of references.
---
 gcc/testsuite/gfortran.dg/pr70754.f90 | 35 +++++++++++++++++++++++++++++++++++
 gcc/tree-predcom.c                    | 10 ++++++++++
 2 files changed, 45 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/pr70754.f90

diff --git a/gcc/testsuite/gfortran.dg/pr70754.f90 b/gcc/testsuite/gfortran.dg/pr70754.f90
new file mode 100644
index 00000000000..758901ce2b2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr70754.f90
@@ -0,0 +1,35 @@
+! { dg-options "-Ofast" }
+
+module m
+  implicit none
+  private
+  save
+
+  integer, parameter, public :: &
+    ii4          = selected_int_kind(6), &
+    rr8          = selected_real_kind(13)
+
+  integer (ii4), dimension(40,40,199), public :: xyz
+  public :: foo
+contains
+  subroutine foo(a)
+    real (rr8), dimension(40,40), intent(out) :: a
+    real (rr8), dimension(40,40) :: b
+    integer (ii4), dimension(40,40) :: c
+    integer  i, j
+
+    do i=1,8
+      b(i,j) = 123 * a(i,j) + a(i,j+1) &
+             + a(i,j) + a(i+1,j+1) &
+             + a(i+1,j) + a(i-1,j+1) &
+             + a(i-1,j)
+      c(i,j) = 123
+    end do
+
+    where ((xyz(:,:,2) /= 0) .and. (c /= 0))
+      a = b/real(c)
+    elsewhere
+      a = 456
+    endwhere
+ end subroutine foo
+end module m
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index f0b70a61fb8..768f976cb87 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -2323,6 +2323,16 @@ combine_chains (chain_p ch1, chain_p ch2)
   root_stmt = get_chain_root (new_chain)->stmt;
   for (i = 1; new_chain->refs.iterate (i, &nw); i++)
     {
+      /* PR tree-optimization/70754
+	 For a combined chain with length equal to zero, we have to guarantee
+	 that ROOT_STMT dominates all references.  */
+      if (new_chain->length == 0
+	  && !stmt_dominates_stmt_p (root_stmt, nw->stmt))
+	{
+	  release_chain (new_chain);
+	  return NULL;
+	}
+
       if (nw->distance == new_chain->length
 	  && !stmt_dominates_stmt_p (nw->stmt, root_stmt))
 	{
-- 
2.11.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
  2017-01-18 10:21 [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754) Martin Liška
@ 2017-01-18 11:08 ` Bin.Cheng
  2017-01-18 14:59 ` Richard Biener
  2017-01-18 15:00 ` Vidya Praveen
  2 siblings, 0 replies; 11+ messages in thread
From: Bin.Cheng @ 2017-01-18 11:08 UTC (permalink / raw)
  To: Martin Liška; +Cc: GCC Patches, vp

On Wed, Jan 18, 2017 at 10:10 AM, Martin Liška <mliska@suse.cz> wrote:
> Hello.
>
> After basic understanding of loop predictive commoning, the problematic combined chain is:
>
> Loads-only chain 0x38b6730 (combined)
>   max distance 0
>   references:
>     MEM[(real(kind=8) *)vectp_a.29_81] (id 1)
>       offset 20
>       distance 0
>     MEM[(real(kind=8) *)vectp_a.38_141] (id 3)
>       offset 20
>       distance 0
>
> Loads-only chain 0x38b68b0 (combined)
>   max distance 0
>   references:
>     MEM[(real(kind=8) *)vectp_a.23_102] (id 0)
>       offset 0
>       distance 0
>     MEM[(real(kind=8) *)vectp_a.33_33] (id 2)
>       offset 0
>       distance 0
>
> Combination chain 0x38b65b0
>   max distance 0, may reuse first
>   equal to 0x38b6730 + 0x38b68b0 in type vector(2) real(kind=8)
>   references:
>     combination ref
>       in statement predreastmp.48_10 = vect__32.31_78 + vect__28.25_100;
>
>       distance 0
>     combination ref
>       in statement predreastmp.50_17 = vect__42.41_138 + vect__38.36_29;
>
>       distance 0
>
> It's important to note that distance is equal to zero (happening within a same loop iteration).
> Aforementioned chains correspond to:
>
> ...
> r2:  vect__28.25_100 = MEM[(real(kind=8) *)vectp_a.23_102];
>   vectp_a.23_99 = vectp_a.23_102 + 16;
>   vect__28.26_98 = MEM[(real(kind=8) *)vectp_a.23_99];
>   vect__82.27_97 = vect__22.22_108;
>   vect__82.27_96 = vect__22.22_107;
>   vect__79.28_95 = vect__82.27_97 + vect__84.17_120;
>   vect__79.28_94 = vect__82.27_96 + vect__84.17_119;
> r1:  vect__32.31_78 = MEM[(real(kind=8) *)vectp_a.29_81];
>   vectp_a.29_77 = vectp_a.29_81 + 16;
>   vect__32.32_76 = MEM[(real(kind=8) *)vectp_a.29_77];
>   vect__38.35_39 = MEM[(real(kind=8) *)vectp_a.33_57];
> r2':  vectp_a.33_33 = vectp_a.33_57 + 16;
>   vect__38.36_29 = MEM[(real(kind=8) *)vectp_a.33_33];
>   vect__56.37_23 = vect__38.35_39;
>   vect__56.37_15 = vect__32.32_76;
>   vect__42.40_161 = MEM[(real(kind=8) *)vectp_a.38_163];
>   vectp_a.38_141 = vectp_a.38_163 + 16;
> r1':  vect__42.41_138 = MEM[(real(kind=8) *)vectp_a.38_141];
>   vect__54.42_135 = vect__42.40_161 + vect__56.37_23;
> r1'+r2':  predreastmp.50_17 = vect__42.41_138 + vect__38.36_29;
>   predreastmp.51_18 = vect__56.37_15;
>   vect__54.42_134 = predreastmp.50_17;
> r1+r2:  predreastmp.48_10 = vect__32.31_78 + vect__28.25_100;
> ...
>
> Problematic construct is that while having load-only chains r1->r1' and r2->r2', the combination
> is actually r1'+r2'->r1+r2, which cause the troubles. I believe the proper fix is to reject such
> combinations where combined root stmt does not dominate usages. It's probably corner case as it does
> not reuse any values among loop iterations (which is main motivation of the pass), it's doing PRE
> if I'm right.
I think this is OK, though I can't approve it.  Later DOM pass should
be able to pick up the redundant combination operation between
predreastmp.50_17 and predreastmp.48_10, could you double check this?

Thanks,
bin
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?
> Martin
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
  2017-01-18 10:21 [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754) Martin Liška
  2017-01-18 11:08 ` Bin.Cheng
@ 2017-01-18 14:59 ` Richard Biener
  2017-01-18 15:34   ` Bin.Cheng
  2017-01-18 15:00 ` Vidya Praveen
  2 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2017-01-18 14:59 UTC (permalink / raw)
  To: Martin Liška; +Cc: GCC Patches, vp

On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote:
> Hello.
>
> After basic understanding of loop predictive commoning, the problematic combined chain is:
>
> Loads-only chain 0x38b6730 (combined)
>   max distance 0
>   references:
>     MEM[(real(kind=8) *)vectp_a.29_81] (id 1)
>       offset 20
>       distance 0
>     MEM[(real(kind=8) *)vectp_a.38_141] (id 3)
>       offset 20
>       distance 0
>
> Loads-only chain 0x38b68b0 (combined)
>   max distance 0
>   references:
>     MEM[(real(kind=8) *)vectp_a.23_102] (id 0)
>       offset 0
>       distance 0
>     MEM[(real(kind=8) *)vectp_a.33_33] (id 2)
>       offset 0
>       distance 0
>
> Combination chain 0x38b65b0
>   max distance 0, may reuse first
>   equal to 0x38b6730 + 0x38b68b0 in type vector(2) real(kind=8)
>   references:
>     combination ref
>       in statement predreastmp.48_10 = vect__32.31_78 + vect__28.25_100;
>
>       distance 0
>     combination ref
>       in statement predreastmp.50_17 = vect__42.41_138 + vect__38.36_29;
>
>       distance 0
>
> It's important to note that distance is equal to zero (happening within a same loop iteration).
> Aforementioned chains correspond to:
>
> ...
> r2:  vect__28.25_100 = MEM[(real(kind=8) *)vectp_a.23_102];
>   vectp_a.23_99 = vectp_a.23_102 + 16;
>   vect__28.26_98 = MEM[(real(kind=8) *)vectp_a.23_99];
>   vect__82.27_97 = vect__22.22_108;
>   vect__82.27_96 = vect__22.22_107;
>   vect__79.28_95 = vect__82.27_97 + vect__84.17_120;
>   vect__79.28_94 = vect__82.27_96 + vect__84.17_119;
> r1:  vect__32.31_78 = MEM[(real(kind=8) *)vectp_a.29_81];
>   vectp_a.29_77 = vectp_a.29_81 + 16;
>   vect__32.32_76 = MEM[(real(kind=8) *)vectp_a.29_77];
>   vect__38.35_39 = MEM[(real(kind=8) *)vectp_a.33_57];
> r2':  vectp_a.33_33 = vectp_a.33_57 + 16;
>   vect__38.36_29 = MEM[(real(kind=8) *)vectp_a.33_33];
>   vect__56.37_23 = vect__38.35_39;
>   vect__56.37_15 = vect__32.32_76;
>   vect__42.40_161 = MEM[(real(kind=8) *)vectp_a.38_163];
>   vectp_a.38_141 = vectp_a.38_163 + 16;
> r1':  vect__42.41_138 = MEM[(real(kind=8) *)vectp_a.38_141];
>   vect__54.42_135 = vect__42.40_161 + vect__56.37_23;
> r1'+r2':  predreastmp.50_17 = vect__42.41_138 + vect__38.36_29;
>   predreastmp.51_18 = vect__56.37_15;
>   vect__54.42_134 = predreastmp.50_17;
> r1+r2:  predreastmp.48_10 = vect__32.31_78 + vect__28.25_100;
> ...
>
> Problematic construct is that while having load-only chains r1->r1' and r2->r2', the combination
> is actually r1'+r2'->r1+r2, which cause the troubles. I believe the proper fix is to reject such
> combinations where combined root stmt does not dominate usages. It's probably corner case as it does
> not reuse any values among loop iterations (which is main motivation of the pass), it's doing PRE
> if I'm right.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

I'm not sure.  If we have such zero distance refs in the IL at the
time pcom runs then not handling
them will pessimize code-gen for cases where they are part of a larger
chain.  Esp. I don't like
another stmt_dominates_stmt_p call and thus rather not handle length
== 0 at all...

We already seem to go great length in associating stuff when combining
stuff thus isn't this
maybe an artifact of this association?  Maybe we simply need to sort
the new chain after
combining it so the root stmt comes last?

Note that there seems to be only a single length per chain but not all
refs in a chain need to
have the same distance.  This means your fix is likely incomplete?
What prevents the situation
to arise for distance != 0?

Richard.

> Martin
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
  2017-01-18 10:21 [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754) Martin Liška
  2017-01-18 11:08 ` Bin.Cheng
  2017-01-18 14:59 ` Richard Biener
@ 2017-01-18 15:00 ` Vidya Praveen
  2 siblings, 0 replies; 11+ messages in thread
From: Vidya Praveen @ 2017-01-18 15:00 UTC (permalink / raw)
  To: GCC Patches; +Cc: vp, nd, Martin Liška

On Wed, Jan 18, 2017 at 11:10:32AM +0100, Martin LiÅ¡ka wrote:
> Hello.
> 
> After basic understanding of loop predictive commoning, the problematic combined chain is:
> 
> Loads-only chain 0x38b6730 (combined)
>   max distance 0
>   references:
>     MEM[(real(kind=8) *)vectp_a.29_81] (id 1)
>       offset 20
>       distance 0
>     MEM[(real(kind=8) *)vectp_a.38_141] (id 3)
>       offset 20
>       distance 0
> 
> Loads-only chain 0x38b68b0 (combined)
>   max distance 0
>   references:
>     MEM[(real(kind=8) *)vectp_a.23_102] (id 0)
>       offset 0
>       distance 0
>     MEM[(real(kind=8) *)vectp_a.33_33] (id 2)
>       offset 0
>       distance 0
> 
> Combination chain 0x38b65b0
>   max distance 0, may reuse first
>   equal to 0x38b6730 + 0x38b68b0 in type vector(2) real(kind=8)
>   references:
>     combination ref
>       in statement predreastmp.48_10 = vect__32.31_78 + vect__28.25_100;
> 
>       distance 0
>     combination ref
>       in statement predreastmp.50_17 = vect__42.41_138 + vect__38.36_29;
> 
>       distance 0
> 
> It's important to note that distance is equal to zero (happening within a same loop iteration).
> Aforementioned chains correspond to:
> 
> ...
> r2:  vect__28.25_100 = MEM[(real(kind=8) *)vectp_a.23_102];
>   vectp_a.23_99 = vectp_a.23_102 + 16;
>   vect__28.26_98 = MEM[(real(kind=8) *)vectp_a.23_99];
>   vect__82.27_97 = vect__22.22_108;
>   vect__82.27_96 = vect__22.22_107;
>   vect__79.28_95 = vect__82.27_97 + vect__84.17_120;
>   vect__79.28_94 = vect__82.27_96 + vect__84.17_119;
> r1:  vect__32.31_78 = MEM[(real(kind=8) *)vectp_a.29_81];
>   vectp_a.29_77 = vectp_a.29_81 + 16;
>   vect__32.32_76 = MEM[(real(kind=8) *)vectp_a.29_77];
>   vect__38.35_39 = MEM[(real(kind=8) *)vectp_a.33_57];
> r2':  vectp_a.33_33 = vectp_a.33_57 + 16;
>   vect__38.36_29 = MEM[(real(kind=8) *)vectp_a.33_33];
>   vect__56.37_23 = vect__38.35_39;
>   vect__56.37_15 = vect__32.32_76;
>   vect__42.40_161 = MEM[(real(kind=8) *)vectp_a.38_163];
>   vectp_a.38_141 = vectp_a.38_163 + 16;
> r1':  vect__42.41_138 = MEM[(real(kind=8) *)vectp_a.38_141];
>   vect__54.42_135 = vect__42.40_161 + vect__56.37_23;
> r1'+r2':  predreastmp.50_17 = vect__42.41_138 + vect__38.36_29;
>   predreastmp.51_18 = vect__56.37_15;
>   vect__54.42_134 = predreastmp.50_17;
> r1+r2:  predreastmp.48_10 = vect__32.31_78 + vect__28.25_100;
> ...
> 
> Problematic construct is that while having load-only chains r1->r1' and r2->r2', the combination
> is actually r1'+r2'->r1+r2, which cause the troubles. I believe the proper fix is to reject such
> combinations where combined root stmt does not dominate usages. It's probably corner case as it does
> not reuse any values among loop iterations (which is main motivation of the pass), it's doing PRE
> if I'm right.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

I could bootstrap on aarch64-none-linux-gnu without any issues, regression
tests are fine and the testcase compiles without ICE.

Thanks for fixing this.

VP.




> 
> Ready to be installed?
> Martin
> 

> From 41b153cf975374fff48419ec8ac5991ac134735f Mon Sep 17 00:00:00 2001
> From: marxin <mliska@suse.cz>
> Date: Tue, 17 Jan 2017 14:22:40 +0100
> Subject: [PATCH] Be careful about combined chain with length == 0 (PR
>  tree-optimization/70754).
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-01-17  Martin Liska  <mliska@suse.cz>
> 
> 	PR tree-optimization/70754
> 	* gfortran.dg/pr70754.f90: New test.
> 
> gcc/ChangeLog:
> 
> 2017-01-17  Martin Liska  <mliska@suse.cz>
> 
> 	PR tree-optimization/70754
> 	* tree-predcom.c (combine_chains): Do not create a combined chain
> 	with length equal to zero when root_stmt does not dominate
> 	stmts of references.
> ---
>  gcc/testsuite/gfortran.dg/pr70754.f90 | 35 +++++++++++++++++++++++++++++++++++
>  gcc/tree-predcom.c                    | 10 ++++++++++
>  2 files changed, 45 insertions(+)
>  create mode 100644 gcc/testsuite/gfortran.dg/pr70754.f90
> 
> diff --git a/gcc/testsuite/gfortran.dg/pr70754.f90 b/gcc/testsuite/gfortran.dg/pr70754.f90
> new file mode 100644
> index 00000000000..758901ce2b2
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/pr70754.f90
> @@ -0,0 +1,35 @@
> +! { dg-options "-Ofast" }
> +
> +module m
> +  implicit none
> +  private
> +  save
> +
> +  integer, parameter, public :: &
> +    ii4          = selected_int_kind(6), &
> +    rr8          = selected_real_kind(13)
> +
> +  integer (ii4), dimension(40,40,199), public :: xyz
> +  public :: foo
> +contains
> +  subroutine foo(a)
> +    real (rr8), dimension(40,40), intent(out) :: a
> +    real (rr8), dimension(40,40) :: b
> +    integer (ii4), dimension(40,40) :: c
> +    integer  i, j
> +
> +    do i=1,8
> +      b(i,j) = 123 * a(i,j) + a(i,j+1) &
> +             + a(i,j) + a(i+1,j+1) &
> +             + a(i+1,j) + a(i-1,j+1) &
> +             + a(i-1,j)
> +      c(i,j) = 123
> +    end do
> +
> +    where ((xyz(:,:,2) /= 0) .and. (c /= 0))
> +      a = b/real(c)
> +    elsewhere
> +      a = 456
> +    endwhere
> + end subroutine foo
> +end module m
> diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
> index f0b70a61fb8..768f976cb87 100644
> --- a/gcc/tree-predcom.c
> +++ b/gcc/tree-predcom.c
> @@ -2323,6 +2323,16 @@ combine_chains (chain_p ch1, chain_p ch2)
>    root_stmt = get_chain_root (new_chain)->stmt;
>    for (i = 1; new_chain->refs.iterate (i, &nw); i++)
>      {
> +      /* PR tree-optimization/70754
> +	 For a combined chain with length equal to zero, we have to guarantee
> +	 that ROOT_STMT dominates all references.  */
> +      if (new_chain->length == 0
> +	  && !stmt_dominates_stmt_p (root_stmt, nw->stmt))
> +	{
> +	  release_chain (new_chain);
> +	  return NULL;
> +	}
> +
>        if (nw->distance == new_chain->length
>  	  && !stmt_dominates_stmt_p (nw->stmt, root_stmt))
>  	{
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
  2017-01-18 14:59 ` Richard Biener
@ 2017-01-18 15:34   ` Bin.Cheng
  2017-01-19  9:44     ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2017-01-18 15:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: Martin Liška, GCC Patches, vp

On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote:
>> Hello.
>>
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>
> I'm not sure.  If we have such zero distance refs in the IL at the
> time pcom runs then not handling
> them will pessimize code-gen for cases where they are part of a larger
> chain.  Esp. I don't like
Do you mean different chains distributed because of MAX_DISTANCE by
"larger chain"?  With the patch, such chain of refs would still be
pred-commoned, just the arithmetic operation not combined, which could
be handled by later DOM?
> another stmt_dominates_stmt_p call and thus rather not handle length
> == 0 at all...
Not handle length == 0 chains at all may be sub-optimal.  As you said,
such chain of refs at the point may simply because previous dom/cse
fail to analyze the references.
>
> We already seem to go great length in associating stuff when combining
> stuff thus isn't this
> maybe an artifact of this association?  Maybe we simply need to sort
> the new chain after
> combining it so the root stmt comes last?
>
> Note that there seems to be only a single length per chain but not all
> refs in a chain need to
> have the same distance.  This means your fix is likely incomplete?
> What prevents the situation
> to arise for distance != 0?
Yes, it's possible for two refs have the same distance in a chain with
length > 0.  But that should not be a problem, because existing uses
are replaced by newly generated PHI variables which always dominate
the uses, right?

Thanks,
bin
>
> Richard.
>
>> Martin
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
  2017-01-18 15:34   ` Bin.Cheng
@ 2017-01-19  9:44     ` Richard Biener
  2017-01-19 10:25       ` Bin.Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2017-01-19  9:44 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Martin Liška, GCC Patches, vp

On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote:
>>> Hello.
>>>
>>>
>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>
>>> Ready to be installed?
>>
>> I'm not sure.  If we have such zero distance refs in the IL at the
>> time pcom runs then not handling
>> them will pessimize code-gen for cases where they are part of a larger
>> chain.  Esp. I don't like
> Do you mean different chains distributed because of MAX_DISTANCE by
> "larger chain"?  With the patch, such chain of refs would still be
> pred-commoned, just the arithmetic operation not combined, which could
> be handled by later DOM?
>> another stmt_dominates_stmt_p call and thus rather not handle length
>> == 0 at all...
> Not handle length == 0 chains at all may be sub-optimal.  As you said,
> such chain of refs at the point may simply because previous dom/cse
> fail to analyze the references.
>>
>> We already seem to go great length in associating stuff when combining
>> stuff thus isn't this
>> maybe an artifact of this association?  Maybe we simply need to sort
>> the new chain after
>> combining it so the root stmt comes last?
>>
>> Note that there seems to be only a single length per chain but not all
>> refs in a chain need to
>> have the same distance.  This means your fix is likely incomplete?
>> What prevents the situation
>> to arise for distance != 0?
> Yes, it's possible for two refs have the same distance in a chain with
> length > 0.  But that should not be a problem, because existing uses
> are replaced by newly generated PHI variables which always dominate
> the uses, right?

I must admit I don't know predcom in such detail but then can we handle
distance == 0 by simply inserting a PHI for those as well (a degenerate
one of course)?  Or can for distance == 0 the ref be not loop invariant?

Note that for length == 0 all refs in the chain will have a dependence distance
of zero.  So my first argument probably doesn't hold and we could simply
remove handling of length == 0 chains and rely on CSE?

Richard.

> Thanks,
> bin
>>
>> Richard.
>>
>>> Martin
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
  2017-01-19  9:44     ` Richard Biener
@ 2017-01-19 10:25       ` Bin.Cheng
  2017-01-19 11:23         ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2017-01-19 10:25 UTC (permalink / raw)
  To: Richard Biener; +Cc: Martin Liška, GCC Patches, vp

On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote:
>>>> Hello.
>>>>
>>>>
>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>>
>>>> Ready to be installed?
>>>
>>> I'm not sure.  If we have such zero distance refs in the IL at the
>>> time pcom runs then not handling
>>> them will pessimize code-gen for cases where they are part of a larger
>>> chain.  Esp. I don't like
>> Do you mean different chains distributed because of MAX_DISTANCE by
>> "larger chain"?  With the patch, such chain of refs would still be
>> pred-commoned, just the arithmetic operation not combined, which could
>> be handled by later DOM?
>>> another stmt_dominates_stmt_p call and thus rather not handle length
>>> == 0 at all...
>> Not handle length == 0 chains at all may be sub-optimal.  As you said,
>> such chain of refs at the point may simply because previous dom/cse
>> fail to analyze the references.
>>>
>>> We already seem to go great length in associating stuff when combining
>>> stuff thus isn't this
>>> maybe an artifact of this association?  Maybe we simply need to sort
>>> the new chain after
>>> combining it so the root stmt comes last?
>>>
>>> Note that there seems to be only a single length per chain but not all
>>> refs in a chain need to
>>> have the same distance.  This means your fix is likely incomplete?
>>> What prevents the situation
>>> to arise for distance != 0?
>> Yes, it's possible for two refs have the same distance in a chain with
>> length > 0.  But that should not be a problem, because existing uses
>> are replaced by newly generated PHI variables which always dominate
>> the uses, right?
>
> I must admit I don't know predcom in such detail but then can we handle
> distance == 0 by simply inserting a PHI for those as well (a degenerate
> one of course)?  Or can for distance == 0 the ref be not loop invariant?
Not sure if I understand the question correctly.  Distance is
difference of niter between one ref and the root ref of the chain, so
0 distance/length doesn't mean a loop invariant, it's very likely two
(exactly the same) references in each loop iteration, the address of
reference is still a SCEV.  OTOH, invariant chain has invariant
address, and is handled separately.  For the first question, it's
length, rather than distance that decides how the chain is handled.
For length > 0 chain, we have to insert PHIs to pass carried result of
memory reference, even some refs may have 0 distance to the root ref.
>
> Note that for length == 0 all refs in the chain will have a dependence distance
> of zero.  So my first argument probably doesn't hold and we could simply
> remove handling of length == 0 chains and rely on CSE?
I am not sure, that CSE opportunity of references exists at this point
means previous cse pass failed for some reason.  Predcom could be the
only pass that can handle such case as it understands data reference
better.  Note Martin's patch is not to skip handling of length == 0
chain, later ref will still be CSEed with result of root ref, only the
combination operation like chain1 + chain2 is skipped.  In this case,
following dom should be able to handle such (loop independent) cse
opportunities.

Thanks,
bin
>
> Richard.
>
>> Thanks,
>> bin
>>>
>>> Richard.
>>>
>>>> Martin
>>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
  2017-01-19 10:25       ` Bin.Cheng
@ 2017-01-19 11:23         ` Richard Biener
  2017-01-19 12:08           ` Bin.Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Richard Biener @ 2017-01-19 11:23 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Martin Liška, GCC Patches, vp

On Thu, Jan 19, 2017 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote:
>>>>> Hello.
>>>>>
>>>>>
>>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>>>
>>>>> Ready to be installed?
>>>>
>>>> I'm not sure.  If we have such zero distance refs in the IL at the
>>>> time pcom runs then not handling
>>>> them will pessimize code-gen for cases where they are part of a larger
>>>> chain.  Esp. I don't like
>>> Do you mean different chains distributed because of MAX_DISTANCE by
>>> "larger chain"?  With the patch, such chain of refs would still be
>>> pred-commoned, just the arithmetic operation not combined, which could
>>> be handled by later DOM?
>>>> another stmt_dominates_stmt_p call and thus rather not handle length
>>>> == 0 at all...
>>> Not handle length == 0 chains at all may be sub-optimal.  As you said,
>>> such chain of refs at the point may simply because previous dom/cse
>>> fail to analyze the references.
>>>>
>>>> We already seem to go great length in associating stuff when combining
>>>> stuff thus isn't this
>>>> maybe an artifact of this association?  Maybe we simply need to sort
>>>> the new chain after
>>>> combining it so the root stmt comes last?
>>>>
>>>> Note that there seems to be only a single length per chain but not all
>>>> refs in a chain need to
>>>> have the same distance.  This means your fix is likely incomplete?
>>>> What prevents the situation
>>>> to arise for distance != 0?
>>> Yes, it's possible for two refs have the same distance in a chain with
>>> length > 0.  But that should not be a problem, because existing uses
>>> are replaced by newly generated PHI variables which always dominate
>>> the uses, right?
>>
>> I must admit I don't know predcom in such detail but then can we handle
>> distance == 0 by simply inserting a PHI for those as well (a degenerate
>> one of course)?  Or can for distance == 0 the ref be not loop invariant?
> Not sure if I understand the question correctly.  Distance is
> difference of niter between one ref and the root ref of the chain, so
> 0 distance/length doesn't mean a loop invariant, it's very likely two
> (exactly the same) references in each loop iteration, the address of
> reference is still a SCEV.  OTOH, invariant chain has invariant
> address, and is handled separately.  For the first question, it's
> length, rather than distance that decides how the chain is handled.
> For length > 0 chain, we have to insert PHIs to pass carried result of
> memory reference, even some refs may have 0 distance to the root ref.
>>
>> Note that for length == 0 all refs in the chain will have a dependence distance
>> of zero.  So my first argument probably doesn't hold and we could simply
>> remove handling of length == 0 chains and rely on CSE?
> I am not sure, that CSE opportunity of references exists at this point
> means previous cse pass failed for some reason.

Or a later pass introduced it (in this case, the vectorizer).

> Predcom could be the
> only pass that can handle such case as it understands data reference
> better.  Note Martin's patch is not to skip handling of length == 0
> chain, later ref will still be CSEed with result of root ref, only the
> combination operation like chain1 + chain2 is skipped.  In this case,
> following dom should be able to handle such (loop independent) cse
> opportunities.

I must admit I don't completely understand the consequences of this
disabling but of course DOM should also be able to handle the CSE
(ok, DOM is known to be quite weak with memory equivalence but
it's not that data-dependence is really better in all cases).

Can't we simply re-order refs in new_chain appropriately or handle
this case in combinable_refs_p instead?

That is, I understand the patch as a hack as it should be always
possible to find dominating refs?

In fact the point of the assert tells us a simpler fix may be

Index: tree-predcom.c
===================================================================
--- tree-predcom.c      (revision 244519)
+++ tree-predcom.c      (working copy)
@@ -2330,6 +2334,12 @@ combine_chains (chain_p ch1, chain_p ch2
          break;
        }
     }
+  if (new_chain->length == 0
+      && ! new_chain->has_max_use_after)
+    {
+      release_chain (new_chain);
+      return NULL;
+    }

   ch1->combined = true;
   ch2->combined = true;

which obviously matches the assert we run into for the testcase?  I'd
be ok with that
(no stmt_dominates_stmt_p, heh) with a suitable comment before it.

Richard.


>
> Thanks,
> bin
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>>>
>>>> Richard.
>>>>
>>>>> Martin
>>>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
  2017-01-19 11:23         ` Richard Biener
@ 2017-01-19 12:08           ` Bin.Cheng
  2017-01-20 14:31             ` Bin.Cheng
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2017-01-19 12:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: Martin Liška, GCC Patches, vp

On Thu, Jan 19, 2017 at 11:22 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Jan 19, 2017 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Wed, Jan 18, 2017 at 4:32 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Wed, Jan 18, 2017 at 2:54 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Wed, Jan 18, 2017 at 11:10 AM, Martin Liška <mliska@suse.cz> wrote:
>>>>>> Hello.
>>>>>>
>>>>>>
>>>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>>>>
>>>>>> Ready to be installed?
>>>>>
>>>>> I'm not sure.  If we have such zero distance refs in the IL at the
>>>>> time pcom runs then not handling
>>>>> them will pessimize code-gen for cases where they are part of a larger
>>>>> chain.  Esp. I don't like
>>>> Do you mean different chains distributed because of MAX_DISTANCE by
>>>> "larger chain"?  With the patch, such chain of refs would still be
>>>> pred-commoned, just the arithmetic operation not combined, which could
>>>> be handled by later DOM?
>>>>> another stmt_dominates_stmt_p call and thus rather not handle length
>>>>> == 0 at all...
>>>> Not handle length == 0 chains at all may be sub-optimal.  As you said,
>>>> such chain of refs at the point may simply because previous dom/cse
>>>> fail to analyze the references.
>>>>>
>>>>> We already seem to go great length in associating stuff when combining
>>>>> stuff thus isn't this
>>>>> maybe an artifact of this association?  Maybe we simply need to sort
>>>>> the new chain after
>>>>> combining it so the root stmt comes last?
>>>>>
>>>>> Note that there seems to be only a single length per chain but not all
>>>>> refs in a chain need to
>>>>> have the same distance.  This means your fix is likely incomplete?
>>>>> What prevents the situation
>>>>> to arise for distance != 0?
>>>> Yes, it's possible for two refs have the same distance in a chain with
>>>> length > 0.  But that should not be a problem, because existing uses
>>>> are replaced by newly generated PHI variables which always dominate
>>>> the uses, right?
>>>
>>> I must admit I don't know predcom in such detail but then can we handle
>>> distance == 0 by simply inserting a PHI for those as well (a degenerate
>>> one of course)?  Or can for distance == 0 the ref be not loop invariant?
>> Not sure if I understand the question correctly.  Distance is
>> difference of niter between one ref and the root ref of the chain, so
>> 0 distance/length doesn't mean a loop invariant, it's very likely two
>> (exactly the same) references in each loop iteration, the address of
>> reference is still a SCEV.  OTOH, invariant chain has invariant
>> address, and is handled separately.  For the first question, it's
>> length, rather than distance that decides how the chain is handled.
>> For length > 0 chain, we have to insert PHIs to pass carried result of
>> memory reference, even some refs may have 0 distance to the root ref.
>>>
>>> Note that for length == 0 all refs in the chain will have a dependence distance
>>> of zero.  So my first argument probably doesn't hold and we could simply
>>> remove handling of length == 0 chains and rely on CSE?
>> I am not sure, that CSE opportunity of references exists at this point
>> means previous cse pass failed for some reason.
>
> Or a later pass introduced it (in this case, the vectorizer).
>
>> Predcom could be the
>> only pass that can handle such case as it understands data reference
>> better.  Note Martin's patch is not to skip handling of length == 0
>> chain, later ref will still be CSEed with result of root ref, only the
>> combination operation like chain1 + chain2 is skipped.  In this case,
>> following dom should be able to handle such (loop independent) cse
>> opportunities.
>
> I must admit I don't completely understand the consequences of this
> disabling but of course DOM should also be able to handle the CSE
> (ok, DOM is known to be quite weak with memory equivalence but
> it's not that data-dependence is really better in all cases).
>
> Can't we simply re-order refs in new_chain appropriately or handle
> this case in combinable_refs_p instead?
It's not refs need to be reordered, root ref always dominates others.
But yes, we need to find a dominator insertion place for combined
operation.  Looking at function reassociate_to_the_same_stmt, it
simply inserts new_stmt at root_stmt of root ref, which causes ICE in
this case.  The new_stmt needs to be inserted at a place also
dominating combination of later refs.  We can either compute the
information in place, or compute and pass the information from
previous combinable_refs_p.  This should be the real fix.

Thanks,
bin
>
> That is, I understand the patch as a hack as it should be always
> possible to find dominating refs?
>
> In fact the point of the assert tells us a simpler fix may be
>
> Index: tree-predcom.c
> ===================================================================
> --- tree-predcom.c      (revision 244519)
> +++ tree-predcom.c      (working copy)
> @@ -2330,6 +2334,12 @@ combine_chains (chain_p ch1, chain_p ch2
>           break;
>         }
>      }
> +  if (new_chain->length == 0
> +      && ! new_chain->has_max_use_after)
> +    {
> +      release_chain (new_chain);
> +      return NULL;
> +    }
>
>    ch1->combined = true;
>    ch2->combined = true;
>
> which obviously matches the assert we run into for the testcase?  I'd
> be ok with that
> (no stmt_dominates_stmt_p, heh) with a suitable comment before it.
>
> Richard.
>
>
>>
>> Thanks,
>> bin
>>>
>>> Richard.
>>>
>>>> Thanks,
>>>> bin
>>>>>
>>>>> Richard.
>>>>>
>>>>>> Martin
>>>>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
  2017-01-19 12:08           ` Bin.Cheng
@ 2017-01-20 14:31             ` Bin.Cheng
  2017-01-23  9:45               ` Richard Biener
  0 siblings, 1 reply; 11+ messages in thread
From: Bin.Cheng @ 2017-01-20 14:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: Martin Liška, GCC Patches, vp

[-- Attachment #1: Type: text/plain, Size: 3540 bytes --]

On Thu, Jan 19, 2017 at 12:07 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Jan 19, 2017 at 11:22 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Thu, Jan 19, 2017 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>> On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener
>>
>> Or a later pass introduced it (in this case, the vectorizer).
>>
>>> Predcom could be the
>>> only pass that can handle such case as it understands data reference
>>> better.  Note Martin's patch is not to skip handling of length == 0
>>> chain, later ref will still be CSEed with result of root ref, only the
>>> combination operation like chain1 + chain2 is skipped.  In this case,
>>> following dom should be able to handle such (loop independent) cse
>>> opportunities.
>>
>> I must admit I don't completely understand the consequences of this
>> disabling but of course DOM should also be able to handle the CSE
>> (ok, DOM is known to be quite weak with memory equivalence but
>> it's not that data-dependence is really better in all cases).
>>
>> Can't we simply re-order refs in new_chain appropriately or handle
>> this case in combinable_refs_p instead?
> It's not refs need to be reordered, root ref always dominates others.
> But yes, we need to find a dominator insertion place for combined
> operation.  Looking at function reassociate_to_the_same_stmt, it
> simply inserts new_stmt at root_stmt of root ref, which causes ICE in
> this case.  The new_stmt needs to be inserted at a place also
> dominating combination of later refs.  We can either compute the
> information in place, or compute and pass the information from
> previous combinable_refs_p.  This should be the real fix.
Hi All,
As analyzed, root cause is Predcom inserts combined stmt at place only
wrto the root refs.  This is not enough because after combination,
result is also used by the following combined refs, not only root
refs.  This patch fixes the ICE by processing all refs reversely and
computing dominance point for insertion.  When it comes to the root
refs, dominance point is ready for use.
Bootstrap and test on x86_64 and AArch64 ongoing, is it OK if no failures?

Thanks,
bin
2017-01-19  Bin Cheng  <bin.cheng@arm.com>

    PR tree-optimization/70754
    * tree-predcom.c (stmt_combining_refs): New parameter INSERT_BEFORE.
    (reassociate_to_the_same_stmt): New parameter INSERT_BEFORE.  Insert
    combined stmt before it if not NULL.
    (combine_chains): Process refs reversely and compute dominance point
    for root ref.

gcc/testsuite/ChangeLog
2017-01-19  Bin Cheng  <bin.cheng@arm.com>

    PR tree-optimization/70754
    * gfortran.dg/pr70754.f90: New test.

>> That is, I understand the patch as a hack as it should be always
>> possible to find dominating refs?
>>
>> In fact the point of the assert tells us a simpler fix may be
>>
>> Index: tree-predcom.c
>> ===================================================================
>> --- tree-predcom.c      (revision 244519)
>> +++ tree-predcom.c      (working copy)
>> @@ -2330,6 +2334,12 @@ combine_chains (chain_p ch1, chain_p ch2
>>           break;
>>         }
>>      }
>> +  if (new_chain->length == 0
>> +      && ! new_chain->has_max_use_after)
>> +    {
>> +      release_chain (new_chain);
>> +      return NULL;
>> +    }
>>
>>    ch1->combined = true;
>>    ch2->combined = true;
>>
>> which obviously matches the assert we run into for the testcase?  I'd
>> be ok with that
>> (no stmt_dominates_stmt_p, heh) with a suitable comment before it.
>>
>> Richard.
>>

[-- Attachment #2: pr70754-20170118.txt --]
[-- Type: text/plain, Size: 5361 bytes --]

diff --git a/gcc/testsuite/gfortran.dg/pr70754.f90 b/gcc/testsuite/gfortran.dg/pr70754.f90
new file mode 100644
index 0000000..d7e790c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr70754.f90
@@ -0,0 +1,35 @@
+! { dg-do compile }
+! { dg-options "-Ofast" }
+module m
+  implicit none
+  private
+  save
+
+  integer, parameter, public :: &
+    ii4          = selected_int_kind(6), &
+    rr8          = selected_real_kind(13)
+
+  integer (ii4), dimension(40,40,199), public :: xyz
+  public :: foo
+contains
+  subroutine foo(a)
+    real (rr8), dimension(40,40), intent(out) :: a
+    real (rr8), dimension(40,40) :: b
+    integer (ii4), dimension(40,40) :: c
+    integer  i, j
+
+    do i=1,20
+      b(i,j) = 123 * a(i,j) + 34 * a(i,j+1) &
+             + 34 * a(i,j-1) + a(i+1,j+1) &
+             + a(i+1,j-1) + a(i-1,j+1) &
+             + a(i-1,j-1)
+      c(i,j) = 123
+    end do
+
+    where ((xyz(:,:,2) /= 0) .and. (c /= 0))
+      a = b/real(c)
+    elsewhere
+      a = 456
+    endwhere
+ end subroutine foo
+end module m
diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index f0b70a6..9723e9c 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -2164,10 +2164,11 @@ remove_name_from_operation (gimple *stmt, tree op)
 }
 
 /* Reassociates the expression in that NAME1 and NAME2 are used so that they
-   are combined in a single statement, and returns this statement.  */
+   are combined in a single statement, and returns this statement.  Note the
+   statement is inserted before INSERT_BEFORE if it's not NULL.  */
 
 static gimple *
-reassociate_to_the_same_stmt (tree name1, tree name2)
+reassociate_to_the_same_stmt (tree name1, tree name2, gimple *insert_before)
 {
   gimple *stmt1, *stmt2, *root1, *root2, *s1, *s2;
   gassign *new_stmt, *tmp_stmt;
@@ -2224,6 +2225,12 @@ reassociate_to_the_same_stmt (tree name1, tree name2)
   var = create_tmp_reg (type, "predreastmp");
   new_name = make_ssa_name (var);
   new_stmt = gimple_build_assign (new_name, code, name1, name2);
+  if (insert_before && stmt_dominates_stmt_p (insert_before, s1))
+    bsi = gsi_for_stmt (insert_before);
+  else
+    bsi = gsi_for_stmt (s1);
+
+  gsi_insert_before (&bsi, new_stmt, GSI_SAME_STMT);
 
   var = create_tmp_reg (type, "predreastmp");
   tmp_name = make_ssa_name (var);
@@ -2240,7 +2247,6 @@ reassociate_to_the_same_stmt (tree name1, tree name2)
   s1 = gsi_stmt (bsi);
   update_stmt (s1);
 
-  gsi_insert_before (&bsi, new_stmt, GSI_SAME_STMT);
   gsi_insert_before (&bsi, tmp_stmt, GSI_SAME_STMT);
 
   return new_stmt;
@@ -2249,10 +2255,11 @@ reassociate_to_the_same_stmt (tree name1, tree name2)
 /* Returns the statement that combines references R1 and R2.  In case R1
    and R2 are not used in the same statement, but they are used with an
    associative and commutative operation in the same expression, reassociate
-   the expression so that they are used in the same statement.  */
+   the expression so that they are used in the same statement.  The combined
+   statement is inserted before INSERT_BEFORE if it's not NULL.  */
 
 static gimple *
-stmt_combining_refs (dref r1, dref r2)
+stmt_combining_refs (dref r1, dref r2, gimple *insert_before)
 {
   gimple *stmt1, *stmt2;
   tree name1 = name_for_ref (r1);
@@ -2263,7 +2270,7 @@ stmt_combining_refs (dref r1, dref r2)
   if (stmt1 == stmt2)
     return stmt1;
 
-  return reassociate_to_the_same_stmt (name1, name2);
+  return reassociate_to_the_same_stmt (name1, name2, insert_before);
 }
 
 /* Tries to combine chains CH1 and CH2 together.  If this succeeds, the
@@ -2309,14 +2316,42 @@ combine_chains (chain_p ch1, chain_p ch2)
   new_chain->rslt_type = rslt_type;
   new_chain->length = ch1->length;
 
-  for (i = 0; (ch1->refs.iterate (i, &r1)
-	       && ch2->refs.iterate (i, &r2)); i++)
+  gimple *insert = NULL;
+  auto_vec<dref> tmp_refs;
+  gcc_assert (ch1->refs.length () == ch2->refs.length ());
+  /* Process in reverse order so dominance point is ready when it comes
+     to the root ref.  */
+  for (i = ch1->refs.length (); i > 0; i--)
     {
+      r1 = ch1->refs[i - 1];
+      r2 = ch2->refs[i - 1];
       nw = XCNEW (struct dref_d);
-      nw->stmt = stmt_combining_refs (r1, r2);
       nw->distance = r1->distance;
+      nw->stmt = stmt_combining_refs (r1, r2, i == 1 ? insert : NULL);
+
+      /* Record dominance point where root combined stmt should be inserted
+	 for chains with 0 length.  Though all root refs dominate following
+	 refs, it's possible the combined stmt doesn't.  See PR70754.  */
+      if (ch1->length == 0
+	  && (insert == NULL || stmt_dominates_stmt_p (nw->stmt, insert)))
+	insert = nw->stmt;
+
+      tmp_refs.safe_push (nw);
+    }
+
+  /* Restore the order for new chain's refs.  */
+  for (i = tmp_refs.length (); i > 0; i--)
+    new_chain->refs.safe_push (tmp_refs[i - 1]);
+
+  ch1->combined = true;
+  ch2->combined = true;
 
-      new_chain->refs.safe_push (nw);
+  /* For chains with 0 length, has_max_use_after must be true since root
+     combined stmt must dominates others.  */
+  if (new_chain->length == 0)
+    {
+      new_chain->has_max_use_after = true;
+      return new_chain;
     }
 
   new_chain->has_max_use_after = false;
@@ -2331,8 +2366,6 @@ combine_chains (chain_p ch1, chain_p ch2)
 	}
     }
 
-  ch1->combined = true;
-  ch2->combined = true;
   return new_chain;
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754).
  2017-01-20 14:31             ` Bin.Cheng
@ 2017-01-23  9:45               ` Richard Biener
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Biener @ 2017-01-23  9:45 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: Martin Liška, GCC Patches, vp

On Fri, Jan 20, 2017 at 3:30 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Thu, Jan 19, 2017 at 12:07 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>> On Thu, Jan 19, 2017 at 11:22 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Thu, Jan 19, 2017 at 11:25 AM, Bin.Cheng <amker.cheng@gmail.com> wrote:
>>>> On Thu, Jan 19, 2017 at 9:42 AM, Richard Biener
>>>
>>> Or a later pass introduced it (in this case, the vectorizer).
>>>
>>>> Predcom could be the
>>>> only pass that can handle such case as it understands data reference
>>>> better.  Note Martin's patch is not to skip handling of length == 0
>>>> chain, later ref will still be CSEed with result of root ref, only the
>>>> combination operation like chain1 + chain2 is skipped.  In this case,
>>>> following dom should be able to handle such (loop independent) cse
>>>> opportunities.
>>>
>>> I must admit I don't completely understand the consequences of this
>>> disabling but of course DOM should also be able to handle the CSE
>>> (ok, DOM is known to be quite weak with memory equivalence but
>>> it's not that data-dependence is really better in all cases).
>>>
>>> Can't we simply re-order refs in new_chain appropriately or handle
>>> this case in combinable_refs_p instead?
>> It's not refs need to be reordered, root ref always dominates others.
>> But yes, we need to find a dominator insertion place for combined
>> operation.  Looking at function reassociate_to_the_same_stmt, it
>> simply inserts new_stmt at root_stmt of root ref, which causes ICE in
>> this case.  The new_stmt needs to be inserted at a place also
>> dominating combination of later refs.  We can either compute the
>> information in place, or compute and pass the information from
>> previous combinable_refs_p.  This should be the real fix.
> Hi All,
> As analyzed, root cause is Predcom inserts combined stmt at place only
> wrto the root refs.  This is not enough because after combination,
> result is also used by the following combined refs, not only root
> refs.  This patch fixes the ICE by processing all refs reversely and
> computing dominance point for insertion.  When it comes to the root
> refs, dominance point is ready for use.
> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK if no failures?

Ok.

RIchard.

> Thanks,
> bin
> 2017-01-19  Bin Cheng  <bin.cheng@arm.com>
>
>     PR tree-optimization/70754
>     * tree-predcom.c (stmt_combining_refs): New parameter INSERT_BEFORE.
>     (reassociate_to_the_same_stmt): New parameter INSERT_BEFORE.  Insert
>     combined stmt before it if not NULL.
>     (combine_chains): Process refs reversely and compute dominance point
>     for root ref.
>
> gcc/testsuite/ChangeLog
> 2017-01-19  Bin Cheng  <bin.cheng@arm.com>
>
>     PR tree-optimization/70754
>     * gfortran.dg/pr70754.f90: New test.
>
>>> That is, I understand the patch as a hack as it should be always
>>> possible to find dominating refs?
>>>
>>> In fact the point of the assert tells us a simpler fix may be
>>>
>>> Index: tree-predcom.c
>>> ===================================================================
>>> --- tree-predcom.c      (revision 244519)
>>> +++ tree-predcom.c      (working copy)
>>> @@ -2330,6 +2334,12 @@ combine_chains (chain_p ch1, chain_p ch2
>>>           break;
>>>         }
>>>      }
>>> +  if (new_chain->length == 0
>>> +      && ! new_chain->has_max_use_after)
>>> +    {
>>> +      release_chain (new_chain);
>>> +      return NULL;
>>> +    }
>>>
>>>    ch1->combined = true;
>>>    ch2->combined = true;
>>>
>>> which obviously matches the assert we run into for the testcase?  I'd
>>> be ok with that
>>> (no stmt_dominates_stmt_p, heh) with a suitable comment before it.
>>>
>>> Richard.
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-01-23  9:39 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-18 10:21 [PATCH] Be careful about combined chain with length == 0 (PR, tree-optimization/70754) Martin Liška
2017-01-18 11:08 ` Bin.Cheng
2017-01-18 14:59 ` Richard Biener
2017-01-18 15:34   ` Bin.Cheng
2017-01-19  9:44     ` Richard Biener
2017-01-19 10:25       ` Bin.Cheng
2017-01-19 11:23         ` Richard Biener
2017-01-19 12:08           ` Bin.Cheng
2017-01-20 14:31             ` Bin.Cheng
2017-01-23  9:45               ` Richard Biener
2017-01-18 15:00 ` Vidya Praveen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).