Hi, As PR100794 shows, in the current implementation PRE bypasses some optimization to avoid introducing loop carried dependence which stops loop vectorizer to vectorize the loop. At -O2, there is no downstream pass to re-catch this kind of opportunity if loop vectorizer fails to vectorize that loop. This patch follows Richi's suggestion in the PR, if predcom flag isn't set and loop vectorization will enable predcom without any unrolling implicitly. The Power9 SPEC2017 evaluation showed it can speed up 521.wrf_r 3.30% and 554.roms_r 1.08% at very-cheap cost model, no remarkable impact at cheap cost model, the build time and size impact is fine (see the PR for the details). By the way, I tested another proposal to guard PRE not skip the optimization for cheap and very-cheap vect cost models, the evaluation results showed it's fine with very cheap cost model, but it can degrade some bmks like 521.wrf_r -9.17% and 549.fotonik3d_r -2.07% etc. Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. Is it ok for trunk? BR, Kewen ----- gcc/ChangeLog: PR tree-optimization/100794 * tree-predcom.c (tree_predictive_commoning_loop): Add parameter allow_unroll_p and only allow unrolling when it's true. (tree_predictive_commoning): Add parameter allow_unroll_p and adjust for it. (run_tree_predictive_commoning): Likewise. (class pass_predcom): Add private member allow_unroll_p. (pass_predcom::pass_predcom): Init allow_unroll_p. (pass_predcom::gate): Check flag_tree_loop_vectorize and global_options_set.x_flag_predictive_commoning. (pass_predcom::execute): Adjust for allow_unroll_p. gcc/testsuite/ChangeLog: PR tree-optimization/100794 * gcc.dg/tree-ssa/pr100794.c: New test.