On 12/20/2016 09:16 AM, Richard Biener wrote: > On Fri, Dec 16, 2016 at 3:41 PM, Aldy Hernandez wrote: >> Hi folks. >> >> This is a follow-up on Jeff and Richi's interaction on the aforementioned PR >> here: >> >> https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01397.html >> >> I decided to explore the idea of analyzing may-undefness on-demand, which >> actually looks rather cheap. >> >> BTW, I don't understand why we don't have auto_bitmap's, as we already have >> auto_sbitmap's. I've implemented the former based on auto_sbitmap's code we >> already have. >> >> The attached patch fixes the bug without introducing any regressions. >> >> I also tested the patch by compiling 242 .ii files with -O3. These were >> gathered from a stage1 build with -save-temps. There is a slight time >> degradation of 4 seconds within 27 minutes of user time: >> >> tainted: 26:52 >> orig: 26:48 >> >> This was the average aggregate time of two runs compiling all 242 .ii files. >> IMO, this looks reasonable. It is after all, -O3. Is it acceptable? > > + while (!worklist.is_empty ()) > + { > + name = worklist.pop (); > + gcc_assert (TREE_CODE (name) == SSA_NAME); > + > + if (ssa_undefined_value_p (name, true)) > + return true; > + > + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); > > it should be already set as we use visited_ssa as "was it ever on the worklist", > so maybe renaming it would be a good thing as well. I don't understand what you would prefer here. > > + if (TREE_CODE (name) == SSA_NAME) > + { > + /* If an SSA has already been seen, this may be a loop. > + Fail conservatively. */ > + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name))) > + return false; > > so to me "conservative" is returning true, not false. OK > > + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); > + worklist.safe_push (name); > > but for loops we can just continue and ignore this use. And bitmap_set_bit > returns whether it set a bit, thus > > if (bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name))) > worklist.safe_push (name); > > should work? Fixed. > > + /* Check that any SSA names used to define NAME is also fully > + defined. */ > + use_operand_p use_p; > + ssa_op_iter iter; > + FOR_EACH_SSA_USE_OPERAND (use_p, def, iter, SSA_OP_USE) > + { > + name = USE_FROM_PTR (use_p); > + if (TREE_CODE (name) == SSA_NAME) > > always true. > > + { > + /* If an SSA has already been seen, this may be a loop. > + Fail conservatively. */ > + if (bitmap_bit_p (visited_ssa, SSA_NAME_VERSION (name))) > + return false; > + bitmap_set_bit (visited_ssa, SSA_NAME_VERSION (name)); > + worklist.safe_push (name); > > See above. > > In principle the thing is sound but I'd like to be able to pass in a bitmap of > known maybe-undefined/must-defined SSA names to have a cache for > multiple invocations from within a pass (like this unswitching case). Done, though bitmaps are now calculated as part of the instantiation. > > Also once you hit defs that are in a post-dominated region of the loop entry > you can treat them as not undefined (as their use invokes undefined > behavior). This is also how you treat function parameters (well, > ssa_undefined_value_p does), where the call site invokes undefined behavior > when passing in undefined values. So we need an extra parameter specifying > the post-dominance region. Done. > > You do not handle memory or calls conservatively which means the existing > testcase only needs some obfuscation to become a problem again. To fix > that before /* Check that any SSA names used to define NAME is also fully > defined. */ bail out conservatively, like > > if (! is_gimple_assign (def) > || gimple_assign_single_p (def)) > return true; As I asked previously, I understand the !is_gimple_assign, which will skip over GIMPLE_CALLs setting a value, but the "gimple_assign_single_p(def) == true"?? Won't this last one bail on things like e.3_7 = e, or x_77 = y_88? Don't we want to follow up the def chain precisely on these? The attached implementation uses a cache, and a pre-computed post-dominance region. A previous incantation of this patch (sans the post-dominance stuff) had performance within the noise of the previous implementation. I am testing again, and will do some performance comparisons in a bit, but for now-- are we on the same page here? Is this what you wanted? Aldy p.s. I could turn the post-dominance region into a bitmap for faster access if preferred.