On Thu, 3 Aug 2023 at 17:48, Richard Biener wrote: > > On Thu, 3 Aug 2023, Richard Biener wrote: > > > On Thu, 3 Aug 2023, Richard Biener wrote: > > > > > On Thu, 3 Aug 2023, Prathamesh Kulkarni wrote: > > > > > > > On Wed, 2 Aug 2023 at 14:17, Richard Biener via Gcc-patches > > > > wrote: > > > > > > > > > > On Mon, 31 Jul 2023, Jeff Law wrote: > > > > > > > > > > > > > > > > > > > > > > > On 7/28/23 01:05, Richard Biener via Gcc-patches wrote: > > > > > > > The following delays sinking of loads within the same innermost > > > > > > > loop when it was unconditional before. That's a not uncommon > > > > > > > issue preventing vectorization when masked loads are not available. > > > > > > > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > > > > > > > > > > > I have a followup patch improving sinking that without this would > > > > > > > cause more of the problematic sinking - now that we have a second > > > > > > > sink pass after loop opts this looks like a reasonable approach? > > > > > > > > > > > > > > OK? > > > > > > > > > > > > > > Thanks, > > > > > > > Richard. > > > > > > > > > > > > > > PR tree-optimization/92335 > > > > > > > * tree-ssa-sink.cc (select_best_block): Before loop > > > > > > > optimizations avoid sinking unconditional loads/stores > > > > > > > in innermost loops to conditional executed places. > > > > > > > > > > > > > > * gcc.dg/tree-ssa/ssa-sink-10.c: Disable vectorizing. > > > > > > > * gcc.dg/tree-ssa/predcom-9.c: Clone from ssa-sink-10.c, > > > > > > > expect predictive commoning to happen instead of sinking. > > > > > > > * gcc.dg/vect/pr65947-3.c: Adjust. > > > > > > I think it's reasonable -- there's probably going to be cases where it's not > > > > > > great, but more often than not I think it's going to be a reasonable > > > > > > heuristic. > > > > > > > > > > > > If there is undesirable fallout, better to find it over the coming months than > > > > > > next spring. So I'd suggest we go forward now to give more time to find any > > > > > > pathological cases (if they exist). > > > > > > > > > > Agreed, I've pushed this now. > > > > Hi Richard, > > > > After this patch (committed in 399c8dd44ff44f4b496223c7cc980651c4d6f6a0), > > > > pr65947-7.c "failed" for aarch64-linux-gnu: > > > > FAIL: gcc.dg/vect/pr65947-7.c scan-tree-dump-not vect "LOOP VECTORIZED" > > > > FAIL: gcc.dg/vect/pr65947-7.c -flto -ffat-lto-objects > > > > scan-tree-dump-not vect "LOOP VECTORIZED" > > > > > > > > /* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { > > > > ! vect_fold_extract_last } } } } */ > > > > > > > > With your commit, condition_reduction in pr65947-7.c gets vectorized > > > > regardless of vect_fold_extract_last, > > > > which gates the above test (which is an improvement, because the > > > > function didn't get vectorized before the commit). > > > > > > > > The attached patch thus removes the gating on vect_fold_extract_last, > > > > and the test passes again. > > > > OK to commit ? > > > > > > OK. > > > > Or wait - the loop doesn't vectorize on x86_64, so I guess one > > critical target condition is missing. Can you figure out which? > > I see > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > note: vect_is_simple_use: operand last_19 = PHI , > type of def: reduction > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > note: vect_is_simple_use: vectype vector(4) int > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > missed: multiple types in double reduction or condition reduction or > fold-left reduction. > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:13:1: > missed: not vectorized: relevant phi not supported: last_19 = PHI > > /space/rguenther/src/gcc/gcc/testsuite/gcc.dg/vect/pr65947-7.c:18:21: > missed: bad operation or unsupported loop bound. Hi Richard, Looking at the aarch64 vect dump, it seems the loop in condition_reduction gets vectorized with V4HI mode while fails for other modes in vectorizable_condition: if ((double_reduc || reduction_type != TREE_CODE_REDUCTION) && ncopies > 1) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "multiple types in double reduction or condition " "reduction or fold-left reduction.\n"); return false; } From the dump: foo.c:9:21: note: === vect_analyze_loop_operations === foo.c:9:21: note: examining phi: last_19 = PHI foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI , type of def: reduction foo.c:9:21: note: vect_is_simple_use: vectype vector(4) int For V8HI, VF = 8, and vectype_in = vector(4) int. Thus ncopies = VF / length(vectype_in) = 2, which is greater than 1, and thus fails: foo.c:9:21: missed: multiple types in double reduction or condition reduction or fold-left reduction. foo.c:4:1: missed: not vectorized: relevant phi not supported: last_19 = PHI While for V4HI, VF = 4 and thus ncopies = 1, so it succeeds. For x86_64, it seems the vectorizer doesn't seem to try V4HI mode. If I "force" the vectorizer to use V4HI mode, we get the following dump: foo.c:9:21: note: === vect_analyze_loop_operations === foo.c:9:21: note: examining phi: last_19 = PHI foo.c:9:21: note: vect_is_simple_use: operand (int) aval_13, type of def: internal foo.c:9:21: note: vect_is_simple_use: vectype vector(2) int foo.c:9:21: note: vect_is_simple_use: operand last_19 = PHI , type of def: reduction foo.c:9:21: note: vect_is_simple_use: vectype vector(2) int foo.c:9:21: missed: multiple types in double reduction or condition reduction or fold-left reduction. Not sure tho if this is the only reason for the test to fail to vectorize on the target. Will investigate in more details next week. Thanks, Prathamesh > > Richard.