From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id 710E93858CDB for ; Tue, 9 Jan 2024 12:30:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 710E93858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 710E93858CDB Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704803436; cv=none; b=mMFWyX/gb95IQF1yKZjBoCPQ4+skjWjYQQJk9aBiKzkQrTBv3wwxoFZ2zItePvVG/UxJz2s3XvI/AocU+dvRx+c8dq8DrgJct0sFnAB/lW9ItezG7kTNZ8PJ8Jv2WSGi9Nn+bAseJU/RK6eNqwBY4mUM9JrVi5mKSCDPBxDtGc4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1704803436; c=relaxed/simple; bh=yzuOEsln0+CUxO7LOoUe8J9NFPc3gKsroKQBv0Y7JZA=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=Sr7YVRaOylFhjpp1HD7ZgL1iwghkE00gDjHxMOkNmPkmpuT/sCdRfijnivl/vR9kYmP8X/wWG8jpoS7k/Ik4ACzGREuqlUymvTwARz67a7zKCzS8+oB13Kg0cHzNsIpyTFRSdwVSciB+wRJZ51RTYE+ayW8eE51kBIeK1lveY80= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 5F8D61F7FD; Tue, 9 Jan 2024 12:30:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1704803432; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DbimKP1pcbQHxeJUpOAsDweFaKggICUFV8FnTUqX0o0=; b=YgxYL8dNW6QoYziyQ800DKHodPGmDD1Kc5hY06oBvgqi0+/05KGdqyErbgIyRBzZed3o75 z+KLtPwejm10TPRTPTNJdXjOKTTaSyfHhbACPESDIYA2DHiceCcVMnTIF6PYp5EcX1ln30 wSZ3VDZo/1z9hevS6NOnCMmJz0K6bx0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1704803432; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DbimKP1pcbQHxeJUpOAsDweFaKggICUFV8FnTUqX0o0=; b=aJ+V2i4ZiLBVzOaCiKPEuczi0RNNiuQK1qIjRNa2BrssykurII9zZbdhVE7x6QBHesRskC jyA+z7DbnIVLQkBw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1704803432; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DbimKP1pcbQHxeJUpOAsDweFaKggICUFV8FnTUqX0o0=; b=YgxYL8dNW6QoYziyQ800DKHodPGmDD1Kc5hY06oBvgqi0+/05KGdqyErbgIyRBzZed3o75 z+KLtPwejm10TPRTPTNJdXjOKTTaSyfHhbACPESDIYA2DHiceCcVMnTIF6PYp5EcX1ln30 wSZ3VDZo/1z9hevS6NOnCMmJz0K6bx0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1704803432; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=DbimKP1pcbQHxeJUpOAsDweFaKggICUFV8FnTUqX0o0=; b=aJ+V2i4ZiLBVzOaCiKPEuczi0RNNiuQK1qIjRNa2BrssykurII9zZbdhVE7x6QBHesRskC jyA+z7DbnIVLQkBw== Date: Tue, 9 Jan 2024 13:25:33 +0100 (CET) From: Richard Biener To: Tamar Christina cc: "gcc-patches@gcc.gnu.org" , nd , "jlaw@ventanamicro.com" Subject: RE: [PATCH]middle-end: Fix dominators updates when peeling with multiple exits [PR113144] In-Reply-To: Message-ID: <8pn44s40-pqs9-s99r-6qs4-1p9432692p7q@fhfr.qr> References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: Authentication-Results: smtp-out2.suse.de; none X-Spam-Level: X-Spam-Score: 0.20 X-Spamd-Result: default: False [0.20 / 50.00]; ARC_NA(0.00)[]; TO_DN_EQ_ADDR_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-0.990]; NEURAL_SPAM_LONG(3.50)[1.000]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 9 Jan 2024, Tamar Christina wrote: > > This makes it quadratic in the number of vectorized early exit loops > > in a function. The vectorizer CFG manipulation operates in a local > > enough bubble that programmatic updating of dominators should be > > possible (after all we manage to produce correct SSA form!), the > > proposed change gets us too far off to a point where re-computating > > dominance info is likely cheaper (but no, we shouldn't do this either). > > > > Can you instead give manual updating a try again? I think > > versioning should produce up-to-date dominator info, it's only > > when you redirect branches during peeling that you'd need > > adjustments - but IIRC we're never introducing new merges? > > > > IIRC we can't wipe dominators during transform since we query them > > during code generation. We possibly could code generate all > > CFG manipulations of all vectorized loops, recompute all dominators > > and then do code generation of all vectorized loops. > > > > But then we're doing a loop transform and the exits will ultimatively > > end up in the same place, so the CFG and dominator update is bound to > > where the original exits went to. > > Yeah that's a fair point, the issue is specifically with at_exit. So how about: > > When we peel at_exit we are moving the new loop at the exit of the previous > loop. This means that the blocks outside the loop dat the previous loop used to > dominate are no longer being dominated by it. Hmm, indeed. Note this does make the dominator update O(function-size) and when vectorizing multiple loops in a function this becomes quadratic. That's quite unfortunate so I wonder if we can delay the update to the parts we do not need up-to-date dominators during vectorization (of course it gets fragile with having only partly correct dominators). > The new dominators however are hard to predict since if the loop has multiple > exits and all the exits are an "early" one then we always execute the scalar > loop. In this case the scalar loop can completely dominate the new loop. > > If we later have skip_vector then there's an additional skip edge added that > might change the dominators. > > The previous patch would force an update of all blocks reachable from the new > exits. This one updates *only* blocks that we know the scalar exits dominated. > > For the examples this reduces the blocks to update from 18 to 3. > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > and no issues normally and with --enable-checking=release --enable-lto > --with-build-config=bootstrap-O3 --enable-checking=yes,rtl,extra. > > Ok for master? See below. > Thanks, > Tamar > > gcc/ChangeLog: > > PR tree-optimization/113144 > PR tree-optimization/113145 > * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): > Update all BB that the original exits dominated. > > gcc/testsuite/ChangeLog: > > PR tree-optimization/113144 > PR tree-optimization/113145 > * gcc.dg/vect/vect-early-break_94-pr113144.c: New test. > > --- inline copy of patch --- > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c b/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c > new file mode 100644 > index 0000000000000000000000000000000000000000..903fe7be6621e81db6f29441e4309fa213d027c5 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-early-break_94-pr113144.c > @@ -0,0 +1,41 @@ > +/* { dg-do compile } */ > +/* { dg-add-options vect_early_break } */ > +/* { dg-require-effective-target vect_early_break } */ > +/* { dg-require-effective-target vect_int } */ > + > +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */ > + > +long tar_atol256_max, tar_atol256_size, tar_atosl_min; > +char tar_atol256_s; > +void __errno_location(); > + > + > +inline static long tar_atol256(long min) { > + char c; > + int sign; > + c = tar_atol256_s; > + sign = c; > + while (tar_atol256_size) { > + if (c != sign) > + return sign ? min : tar_atol256_max; > + c = tar_atol256_size--; > + } > + if ((c & 128) != (sign & 128)) > + return sign ? min : tar_atol256_max; > + return 0; > +} > + > +inline static long tar_atol(long min) { > + return tar_atol256(min); > +} > + > +long tar_atosl() { > + long n = tar_atol(-1); > + if (tar_atosl_min) { > + __errno_location(); > + return 0; > + } > + if (n > 0) > + return 0; > + return n; > +} > diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc > index 76d4979c0b3b374dcaacf6825a95a8714114a63b..9bacaa182a3919cae1cb99dfc5ae4923e1f93376 100644 > --- a/gcc/tree-vect-loop-manip.cc > +++ b/gcc/tree-vect-loop-manip.cc > @@ -1719,8 +1719,6 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit, > /* Now link the alternative exits. */ > if (multiple_exits_p) > { > - set_immediate_dominator (CDI_DOMINATORS, new_preheader, > - main_loop_exit_block); > for (auto gsi_from = gsi_start_phis (loop->header), > gsi_to = gsi_start_phis (new_preheader); > !gsi_end_p (gsi_from) && !gsi_end_p (gsi_to); > @@ -1776,7 +1774,14 @@ slpeel_tree_duplicate_loop_to_edge_cfg (class loop *loop, edge loop_exit, > { > update_loop = new_loop; > for (edge e : get_loop_exit_edges (loop)) > - doms.safe_push (e->dest); > + { > + /* Basic blocks that the old loop dominated are now dominated by > + the new loop and so we have to update those. */ > + for (auto bb : get_all_dominated_blocks (CDI_DOMINATORS, e->src)) > + if (!flow_bb_inside_loop_p (loop, bb)) > + doms.safe_push (bb); > + doms.safe_push (e->dest); > + } I think you'll get duplicate blocks that way. Maybe simplify this all by instead doing auto doms = get_all_dominated_blocks (CDI_DOMINATORS, loop->header); for (unsigned i = 0; i < doms.length (); ++i) if (flow_bb_inside_loop_p (loop, doms[i])) doms.unordered_remove (i); ? OK with that change, but really we should see to avoid this quadraticness :/ It's probably not too bad right now given we have quite some restrictions on vectorizing loops with multiple exits, but I suggest you try an artificial testcase with the "same" loop repeated N times to see whether dominance compute creeps up in the profile. Richard.