From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 57D40385801F for ; Tue, 2 Nov 2021 14:23:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 57D40385801F Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 405C62191E; Tue, 2 Nov 2021 14:23:44 +0000 (UTC) Received: from murzim.suse.de (murzim.suse.de [10.160.4.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 3A65AA3B96; Tue, 2 Nov 2021 14:23:44 +0000 (UTC) Date: Tue, 2 Nov 2021 15:23:44 +0100 (CET) From: Richard Biener To: Tamar Christina cc: gcc-patches@gcc.gnu.org, nd@arm.com Subject: Re: [PATCH]middle-end Add an RPO pass after successful vectorization In-Reply-To: Message-ID: <9nnp8so9-p3nq-r26-3098-s96334191030@fhfr.qr> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Nov 2021 14:23:46 -0000 On Tue, 2 Nov 2021, Tamar Christina wrote: > Hi All, > > Following my current SVE predicate optimization series a problem has presented > itself in that the way vector masks are generated for masked operations relies > on CSE to share masks efficiently. > > The issue however is that masking is done using the & operand and & is > associative and so reassoc decides to reassociate the masked operations. But it does this for the purpose of canonicalization and thus CSE. > This makes CSE then unable to CSE an unmasked and a masked operation leading to > duplicate operations being performed. > > To counter this we want to add an RPO pass over the vectorized loop body when > vectorization succeeds. This makes it then no longer reliant on the RTL level > CSE. > > I have not added a testcase for this as it requires the changes in my patch > series, however the entire series relies on this patch to work so all the > tests there cover it. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > x86_64-linux-gnu and no issues. > > Ok for master? You are running VN over _all_ loop bodies rather only those vectorized. We loop over vectorized loops earlier for optimizing masked store sequences. I suppose you could hook in there. I'll also notice that we have pass_pre_slp_scalar_cleanup which eventually runs plus we have a late FRE. So I don't understand why it doesn't work to CSE later. for (i = 1; i < number_of_loops (cfun); i++) { loop_vec_info loop_vinfo; bool has_mask_store; loop = get_loop (cfun, i); if (!loop || !loop->aux) continue; loop_vinfo = (loop_vec_info) loop->aux; has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo); delete loop_vinfo; if (has_mask_store && targetm.vectorize.empty_mask_is_expensive (IFN_MASK_STORE)) optimize_mask_stores (loop); loop->aux = NULL; } > Thanks, > Tamar > > gcc/ChangeLog: > > * tree-vectorizer.c (vectorize_loops): Do local CSE through RPVN upon > successful vectorization. > > --- inline copy of patch -- > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c > index 4712dc6e7f907637774482a71036a0bd381c2bd2..1e370d60fb19b03c3b6bce45c660af4b6d32dc51 100644 > --- a/gcc/tree-vectorizer.c > +++ b/gcc/tree-vectorizer.c > @@ -81,7 +81,7 @@ along with GCC; see the file COPYING3. If not see > #include "gimple-pretty-print.h" > #include "opt-problem.h" > #include "internal-fn.h" > - > +#include "tree-ssa-sccvn.h" > > /* Loop or bb location, with hotness information. */ > dump_user_location_t vect_location; > @@ -1323,6 +1323,27 @@ vectorize_loops (void) > ??? Also while we try hard to update loop-closed SSA form we fail > to properly do this in some corner-cases (see PR56286). */ > rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa_only_virtuals); > + > + for (i = 1; i < number_of_loops (cfun); i++) > + { > + loop = get_loop (cfun, i); > + if (!loop || !single_exit (loop)) > + continue; > + > + bitmap exit_bbs; > + /* Perform local CSE, this esp. helps because we emit code for > + predicates that need to be shared for optimal predicate usage. > + However reassoc will re-order them and prevent CSE from working > + as it should. CSE only the loop body, not the entry. */ > + exit_bbs = BITMAP_ALLOC (NULL); > + bitmap_set_bit (exit_bbs, single_exit (loop)->dest->index); > + bitmap_set_bit (exit_bbs, loop->latch->index); > + > + do_rpo_vn (cfun, loop_preheader_edge (loop), exit_bbs); > + > + BITMAP_FREE (exit_bbs); > + } > + > return TODO_cleanup_cfg; > } > > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)