From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenther@suse.de>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
 by sourceware.org (Postfix) with ESMTPS id 57D40385801F
 for <gcc-patches@gcc.gnu.org>; Tue,  2 Nov 2021 14:23:45 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 57D40385801F
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out1.suse.de (Postfix) with ESMTP id 405C62191E;
 Tue,  2 Nov 2021 14:23:44 +0000 (UTC)
Received: from murzim.suse.de (murzim.suse.de [10.160.4.192])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id 3A65AA3B96;
 Tue,  2 Nov 2021 14:23:44 +0000 (UTC)
Date: Tue, 2 Nov 2021 15:23:44 +0100 (CET)
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <tamar.christina@arm.com>
cc: gcc-patches@gcc.gnu.org, nd@arm.com
Subject: Re: [PATCH]middle-end Add an RPO pass after successful vectorization
In-Reply-To: <patch-15007-tamar@arm.com>
Message-ID: <9nnp8so9-p3nq-r26-3098-s96334191030@fhfr.qr>
References: <patch-15007-tamar@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Nov 2021 14:23:46 -0000

On Tue, 2 Nov 2021, Tamar Christina wrote:

> Hi All,
> 
> Following my current SVE predicate optimization series a problem has presented
> itself in that the way vector masks are generated for masked operations relies
> on CSE to share masks efficiently.
> 
> The issue however is that masking is done using the & operand and & is
> associative and so reassoc decides to reassociate the masked operations.

But it does this for the purpose of canonicalization and thus CSE.

> This makes CSE then unable to CSE an unmasked and a masked operation leading to
> duplicate operations being performed.
> 
> To counter this we want to add an RPO pass over the vectorized loop body when
> vectorization succeeds.  This makes it then no longer reliant on the RTL level
> CSE.
> 
> I have not added a testcase for this as it requires the changes in my patch
> series, however the entire series relies on this patch to work so all the
> tests there cover it.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-linux-gnu and no issues.
> 
> Ok for master?

You are running VN over _all_ loop bodies rather only those vectorized.
We loop over vectorized loops earlier for optimizing masked store
sequences.  I suppose you could hook in there.  I'll also notice
that we have pass_pre_slp_scalar_cleanup which eventually runs
plus we have a late FRE.  So I don't understand why it doesn't
work to CSE later.

  for (i = 1; i < number_of_loops (cfun); i++)
    {
      loop_vec_info loop_vinfo;
      bool has_mask_store;

      loop = get_loop (cfun, i);
      if (!loop || !loop->aux)
        continue;
      loop_vinfo = (loop_vec_info) loop->aux;
      has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo);
      delete loop_vinfo;
      if (has_mask_store
          && targetm.vectorize.empty_mask_is_expensive (IFN_MASK_STORE))
        optimize_mask_stores (loop);
      loop->aux = NULL;
    }


> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree-vectorizer.c (vectorize_loops): Do local CSE through RPVN upon
> 	successful vectorization.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> index 4712dc6e7f907637774482a71036a0bd381c2bd2..1e370d60fb19b03c3b6bce45c660af4b6d32dc51 100644
> --- a/gcc/tree-vectorizer.c
> +++ b/gcc/tree-vectorizer.c
> @@ -81,7 +81,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-pretty-print.h"
>  #include "opt-problem.h"
>  #include "internal-fn.h"
> -
> +#include "tree-ssa-sccvn.h"
>  
>  /* Loop or bb location, with hotness information.  */
>  dump_user_location_t vect_location;
> @@ -1323,6 +1323,27 @@ vectorize_loops (void)
>  	 ???  Also while we try hard to update loop-closed SSA form we fail
>  	 to properly do this in some corner-cases (see PR56286).  */
>        rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa_only_virtuals);
> +
> +      for (i = 1; i < number_of_loops (cfun); i++)
> +	{
> +	  loop = get_loop (cfun, i);
> +	  if (!loop || !single_exit (loop))
> +	    continue;
> +
> +	  bitmap exit_bbs;
> +	  /* Perform local CSE, this esp. helps because we emit code for
> +	     predicates that need to be shared for optimal predicate usage.
> +	     However reassoc will re-order them and prevent CSE from working
> +	     as it should.  CSE only the loop body, not the entry.  */
> +	  exit_bbs = BITMAP_ALLOC (NULL);
> +	  bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index);
> +	  bitmap_set_bit (exit_bbs, loop->latch->index);
> +
> +	  do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs);
> +
> +	  BITMAP_FREE (exit_bbs);
> +	}
> +
>        return TODO_cleanup_cfg;
>      }
>  
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)