From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tnfchris@sourceware.org>
Received: by sourceware.org (Postfix, from userid 1984)
 id 869AA3858407; Wed, 10 Nov 2021 16:03:32 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 869AA3858407
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="utf-8"
From: Tamar Christina <tnfchris@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc r12-5129] middle-end: Add an RPO pass after successful
 vectorization
X-Act-Checkin: gcc
X-Git-Author: Tamar Christina <tamar.christina@arm.com>
X-Git-Refname: refs/heads/master
X-Git-Oldrev: eaec20fde587e0695b100dcba5ff56944c3ae8c0
X-Git-Newrev: 8ed62c929c7c44627f41627e085e15d77b2e6ed4
Message-Id: <20211110160332.869AA3858407@sourceware.org>
Date: Wed, 10 Nov 2021 16:03:32 +0000 (GMT)
X-BeenThere: gcc-cvs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-cvs mailing list <gcc-cvs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-cvs>,
 <mailto:gcc-cvs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-cvs/>
List-Help: <mailto:gcc-cvs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-cvs>,
 <mailto:gcc-cvs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Nov 2021 16:03:32 -0000

https://gcc.gnu.org/g:8ed62c929c7c44627f41627e085e15d77b2e6ed4

commit r12-5129-g8ed62c929c7c44627f41627e085e15d77b2e6ed4
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Wed Nov 10 15:58:15 2021 +0000

    middle-end: Add an RPO pass after successful vectorization
    
    Following my current SVE predicate optimization series a problem has presented
    itself in that the way vector masks are generated for masked operations relies
    on CSE to share masks efficiently.
    
    The issue however is that masking is done using the & operand and & is
    associative and so reassoc decides to reassociate the masked operations.
    
    This makes CSE then unable to CSE an unmasked and a masked operation leading to
    duplicate operations being performed.
    
    To counter this we want to add an RPO pass over the vectorized loop body when
    vectorization succeeds.  This makes it then no longer reliant on the RTL level
    CSE.
    
    I have not added a testcase for this as it requires the changes in my patch
    series, however the entire series relies on this patch to work so all the
    tests there cover it.
    
    gcc/ChangeLog:
    
            * tree-vectorizer.c (vectorize_loops): Do local CSE through RPVN upon
            successful vectorization.

Diff:
---
 gcc/tree-vectorizer.c | 53 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 71f12b3257e..3247c9af23b 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -81,7 +81,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-pretty-print.h"
 #include "opt-problem.h"
 #include "internal-fn.h"
-
+#include "tree-ssa-sccvn.h"
 
 /* Loop or bb location, with hotness information.  */
 dump_user_location_t vect_location;
@@ -1276,23 +1276,6 @@ vectorize_loops (void)
 	  }
       }
 
-  for (i = 1; i < number_of_loops (cfun); i++)
-    {
-      loop_vec_info loop_vinfo;
-      bool has_mask_store;
-
-      loop = get_loop (cfun, i);
-      if (!loop || !loop->aux)
-	continue;
-      loop_vinfo = (loop_vec_info) loop->aux;
-      has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo);
-      delete loop_vinfo;
-      if (has_mask_store
-	  && targetm.vectorize.empty_mask_is_expensive (IFN_MASK_STORE))
-	optimize_mask_stores (loop);
-      loop->aux = NULL;
-    }
-
   /* Fold IFN_GOMP_SIMD_{VF,LANE,LAST_LANE,ORDERED_{START,END}} builtins.  */
   if (cfun->has_simduid_loops)
     {
@@ -1300,14 +1283,12 @@ vectorize_loops (void)
       /* Avoid stale SCEV cache entries for the SIMD_LANE defs.  */
       scev_reset ();
     }
-
   /* Shrink any "omp array simd" temporary arrays to the
      actual vectorization factors.  */
   if (simd_array_to_simduid_htab)
     shrink_simd_arrays (simd_array_to_simduid_htab, simduid_to_vf_htab);
   delete simduid_to_vf_htab;
   cfun->has_simduid_loops = false;
-  vect_slp_fini ();
 
   if (num_vectorized_loops > 0)
     {
@@ -1315,9 +1296,39 @@ vectorize_loops (void)
 	 ???  Also while we try hard to update loop-closed SSA form we fail
 	 to properly do this in some corner-cases (see PR56286).  */
       rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa_only_virtuals);
-      return TODO_cleanup_cfg;
+      ret |= TODO_cleanup_cfg;
     }
 
+  for (i = 1; i < number_of_loops (cfun); i++)
+    {
+      loop_vec_info loop_vinfo;
+      bool has_mask_store;
+
+      loop = get_loop (cfun, i);
+      if (!loop || !loop->aux)
+	continue;
+      loop_vinfo = (loop_vec_info) loop->aux;
+      has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo);
+      delete loop_vinfo;
+      if (has_mask_store
+	  && targetm.vectorize.empty_mask_is_expensive (IFN_MASK_STORE))
+	optimize_mask_stores (loop);
+
+      auto_bitmap exit_bbs;
+      /* Perform local CSE, this esp. helps because we emit code for
+	 predicates that need to be shared for optimal predicate usage.
+	 However reassoc will re-order them and prevent CSE from working
+	 as it should.  CSE only the loop body, not the entry.  */
+      bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
+
+      edge entry = EDGE_PRED (loop_preheader_edge (loop)->src, 0);
+      do_rpo_vn (cfun, entry, exit_bbs);
+
+      loop->aux = NULL;
+    }
+
+  vect_slp_fini ();
+
   return ret;
 }