From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 36993 invoked by alias); 13 Jan 2017 18:48:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 36426 invoked by uid 89); 13 Jan 2017 18:48:56 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-5.1 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=Wont, Won't, our X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 13 Jan 2017 18:48:45 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7BA4E445E0; Fri, 13 Jan 2017 18:48:45 +0000 (UTC) Received: from abulafia.quesejoda.com (ovpn-116-205.ams2.redhat.com [10.36.116.205]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v0DImhY3009811; Fri, 13 Jan 2017 13:48:44 -0500 Subject: Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2) To: Richard Biener References: <5e79620e-8adb-7768-a34e-32fd286995f0@redhat.com> <1e426a06-270b-07fe-0e95-ef3382b614c2@redhat.com> Cc: gcc-patches , Jeff Law From: Aldy Hernandez Message-ID: <0ff3a68e-3e2c-c0df-8c48-b02dac40d621@redhat.com> Date: Fri, 13 Jan 2017 18:48:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2017-01/txt/msg00989.txt.bz2 [Sorry for the delay, I was sick.] On 01/09/2017 04:30 AM, Richard Biener wrote: > On Sat, Jan 7, 2017 at 1:54 PM, Aldy Hernandez wrote: >> On 01/04/2017 07:11 AM, Richard Biener wrote: >>> >>> On Tue, Jan 3, 2017 at 6:36 PM, Aldy Hernandez wrote: >>>> >>>> On 12/20/2016 09:16 AM, Richard Biener wrote: >>>>> >>>>> >>>>> On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi folks. >>>>>> >>>>>> This is a follow-up on Jeff and Richi's interaction on the >>>>>> aforementioned >>>>>> PR >>>>>> here: >>>>>> >>>>>> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html >>>>>> >>>>>> I decided to explore the idea of analyzing may-undefness on-demand, >>>>>> which >>>>>> actually looks rather cheap. >>>>>> >>>>>> BTW, I don't understand why we don't have auto_bitmap's, as we already >>>>>> have >>>>>> auto_sbitmap's. I've implemented the former based on auto_sbitmap's >>>>>> code >>>>>> we >>>>>> already have. >>>>>> >>>>>> The attached patch fixes the bug without introducing any regressions. >>>>>> >>>>>> I also tested the patch by compiling 242 .ii files with -O3. These >>>>>> were >>>>>> gathered from a stage1 build with -save-temps. There is a slight time >>>>>> degradation of 4 seconds within 27 minutes of user time: >>>>>> >>>>>> tainted: 26:52 >>>>>> orig: 26:48 >>>>>> >>>>>> This was the average aggregate time of two runs compiling all 242 .ii >>>>>> files. >>>>>> IMO, this looks reasonable. It is after all, -O3. Is it acceptable? >>>>> >>>>> >>>>> >>>>> + while (!worklist.is_empty ()) >>>>> + { >>>>> + name = worklist.pop (); >>>>> + gcc_assert (TREE_CODE (name) == SSA_NAME); >>>>> + >>>>> + if (ssa_undefined_value_p (name, true)) >>>>> + return true; >>>>> + >>>>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); >>>>> >>>>> it should be already set as we use visited_ssa as "was it ever on the >>>>> worklist", >>>>> so maybe renaming it would be a good thing as well. >>>> >>>> >>>> >>>> I don't understand what you would prefer here. >>> >>> >>> Set the bit when you put name on the worklist (and do not do that if the >>> bit is set). Thus simply remove the above and add a bitmap_set_bit >>> for the initial name you put on the worklist. >>> >>>>> >>>>> + if (TREE_CODE (name) == SSA_NAME) >>>>> + { >>>>> + /* If an SSA has already been seen, this may be a >>>>> loop. >>>>> + Fail conservatively. */ >>>>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION >>>>> (name))) >>>>> + return false; >>>>> >>>>> so to me "conservative" is returning true, not false. >>>> >>>> >>>> >>>> OK >>>> >>>>> >>>>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); >>>>> + worklist.safe_push (name); >>>>> >>>>> but for loops we can just continue and ignore this use. And >>>>> bitmap_set_bit >>>>> returns whether it set a bit, thus >>>>> >>>>> if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION >>>>> (name))) >>>>> worklist.safe_push (name); >>>>> >>>>> should work? >>>> >>>> >>>> >>>> Fixed. >>>> >>>>> >>>>> + /* Check that any SSA names used to define NAME is also fully >>>>> + defined. */ >>>>> + use_operand_p use_p; >>>>> + ssa_op_iter iter; >>>>> + FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE) >>>>> + { >>>>> + name = USE_FROM_PTR (use_p); >>>>> + if (TREE_CODE (name) == SSA_NAME) >>>>> >>>>> always true. >>>>> >>>>> + { >>>>> + /* If an SSA has already been seen, this may be a loop. >>>>> + Fail conservatively. */ >>>>> + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name))) >>>>> + return false; >>>>> + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); >>>>> + worklist.safe_push (name); >>>>> >>>>> See above. >>>>> >>>>> In principle the thing is sound but I'd like to be able to pass in a >>>>> bitmap of >>>>> known maybe-undefined/must-defined SSA names to have a cache for >>>>> multiple invocations from within a pass (like this unswitching case). >>>> >>>> >>>> >>>> Done, though bitmaps are now calculated as part of the instantiation. >>>> >>>>> >>>>> Also once you hit defs that are in a post-dominated region of the loop >>>>> entry >>>>> you can treat them as not undefined (as their use invokes undefined >>>>> behavior). This is also how you treat function parameters (well, >>>>> ssa_undefined_value_p does), where the call site invokes undefined >>>>> behavior >>>>> when passing in undefined values. So we need an extra parameter >>>>> specifying >>>>> the post-dominance region. >>>> >>>> >>>> >>>> Done. >>>> >>>>> >>>>> You do not handle memory or calls conservatively which means the >>>>> existing >>>>> testcase only needs some obfuscation to become a problem again. To fix >>>>> that before /* Check that any SSA names used to define NAME is also >>>>> fully >>>>> defined. */ bail out conservatively, like >>>>> >>>>> if (! is_gimple_assign (def) >>>>> || gimple_assign_single_p (def)) >>>>> return true; >>>> >>>> >>>> >>>> As I asked previously, I understand the !is_gimple_assign, which will >>>> skip >>>> over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) >>>> == >>>> true"?? >>>> >>>> Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't >>>> we >>>> want to follow up the def chain precisely on these? >>>> >>>> The attached implementation uses a cache, and a pre-computed >>>> post-dominance >>>> region. A previous incantation of this patch (sans the post-dominance >>>> stuff) had performance within the noise of the previous implementation. >>>> >>>> I am testing again, and will do some performance comparisons in a bit, >>>> but >>>> for now-- are we on the same page here? Is this what you wanted? >>> >>> >>> + /* DEFs in the post-dominated region of the loop entry invoke >>> + undefined behavior. Adding any use won't make things any >>> + worse. */ >>> + for (unsigned i = 0; i < postdom.length (); ++i) >>> + if (def->bb == postdom[i]) >>> >>> gimple_bb (def) >>> >>> + { >>> + set_defined (t); >>> + return false; >>> + } >>> >>> I think you can't really return false here but need to continue processing >>> the worklist. I'd say rather than a linear walk you can as well use >>> dominated_by_p (CDI_POST_DOMINATORS, ...) and record the entry >>> block instead? >>> >>> Note that the way you compute post-dominators doesn't work -- nothing >>> keeps them up-to-date when unswitching does CFG manipulations :/ >>> Fortunately we have a way to compute them per region, see how >>> tree-if-conv.c >>> does that (build_region, calculate_dominance_info_for_region). >> >> >> If I understand correctly, we could compute them per region in >> tree_unswitch_single_loop() for each individual loop with what >> tree-if-conv.c uses. >> >> I could certainly abstract build_region/calculate_dominance_info_for_region >> into something new, say calculate_dominance_info_for_loop_region(). Then we >> could use dominated_by_p(CDI_POST_DOMINATORS, ...) in my class as you >> suggest. >> >> Question... computing the post-dom tree per region as I've just described >> will only create post-dom information for the loop region, which means that >> any definition outside of the loop will have zero dominance info. > > Yes. > >> What if the definition in question post-dominates the loop entry but is >> outside/after of the loop? > > How would we ever arrive at such def? Once we reach a def before the > region/loop > we know it's fine to use (the missing "stop" point I pointed out). Hmmm, it looks like we can't even build the post-dom region appropriately in the presence of infinite loops. First, build_region() fails if it can't find a loop post-header: /* The last element is loop post-header. */ gcc_assert (exit_bb); which we won't have in an infinite loop like: bb2: //loop header | V loop 1: bb3: <-------+ bar(); | | | V | bb4: | goto bb3; >---+ We could just build the region without the post-header, but then we fail while building the DFS dominance region: dom_info::calc_dfs_tree_nonrec (basic_block bb) .... gcc_assert (bn != m_start_block); Presumably because we've looped back to the start block (bb4 in a post dominator world). This doesn't happen calculating going forward because we always have a start block outside of the region (the function entry block). I tried to fake edge my way out of it, but things get really messed up while putting/removing fake edges in the presence of loop info. I'm assuming all this is by design. Would you prefer me to continue down this path, trying to build a post-dom region with infinite loops and fixing build_region / calc_dfs_tree_nonrec, or could we do without this dominance optimization? Pretty please? > >> We will have no post-dom information for this. >> In this case, could we just ignore and continue evaluating the SSA_NAME with >> the rest of our heuristics, or did you have something else in mind? Another >> option would be to calculate the post-dom information for the entire >> function on every loop (calculate_dominance_info(CDI_POST_DOMINATORS)), but >> that seems rather expensive. >> >> As a side note, at this point I just want to fix/close the PR in an >> acceptable manner, not come up with the end-all catch-all most-efficient >> patch for an unlikely scenario ;-). > > Yeah, which is why I wonder whether the caching is worth the trouble (in it's > current form) for the unswitching use (given it's other restrictions > on conditions > to unswitch). We could go back to my original, no caching version (with the other suggestions). That solves the problem quite simply, with no regressions. Thanks again. Aldy