public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization @ 2023-06-29 4:35 hliu at amperecomputing dot com 2023-06-29 7:33 ` [Bug tree-optimization/110474] " rguenth at gcc dot gnu.org ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: hliu at amperecomputing dot com @ 2023-06-29 4:35 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474 Bug ID: 110474 Summary: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hliu at amperecomputing dot com Target Milestone: --- Hi, I'm trying to use tune loop unrolling during vectorization (see more: tree-vect-loop.cc suggested_unroll_factor). I find the unrolling may hurt performance as unrolling also increases the VF (vector factor) of epilog vect loop. For example: int foo(short *A, char *B, int N) { int sum = 0; for (int i = 0; i < N; ++i) { sum += A[i] * B[i]; } return sum; } Compile it with "-O3 -mtune=neoverse-n2 -mcpu=neoverse-n1 --param aarch64-vect-unroll-limit=2" (I'm using -mcpu n1 as I want to try a target without SVE). GCC vectorization pass unrolls the loop by 2 and generates code as following: if N >= 32: main vect loop ... if N >= 16: # This may hurt performance if N is small (e.g. 8) epilog vect loop ... epilog scalar code ... If the loop is not unrolled (i.e. use "--param aarch64-vect-unroll-limit=1"). GCC generates code as following: if N >= 16: main vect loop ... if N >= 8: epilog vect loop ... epilog scalar code ... The runtime check is based on the VF of epilog vectorization. There is code in tree-vect-loop.cc (line 2990) to choose epilog vect VF: /* If we're vectorizing an epilogue loop, the vectorized loop either needs to be able to handle fewer than VF scalars, or needs to have a lower VF than the main loop. */ if (LOOP_VINFO_EPILOGUE_P (loop_vinfo) && !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) && maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo), LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo))) return opt_result::failure_at (vect_location, "Vectorization factor too high for" " epilogue loop.\n"); But it doesn't consider about the suggested_unroll_factor. So I'm thinking about adding following code to unscale the orig_loop_vinfo's VF by unroll_factor: unscaled_orig_vf = exact_div (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo), orig_loop_vinfo->suggested_unroll_factor); Is this reasonable? ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/110474] Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization 2023-06-29 4:35 [Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization hliu at amperecomputing dot com @ 2023-06-29 7:33 ` rguenth at gcc dot gnu.org 2023-07-06 2:07 ` cvs-commit at gcc dot gnu.org 2023-07-06 2:21 ` hliu at amperecomputing dot com 2 siblings, 0 replies; 4+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-06-29 7:33 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org, | |rsandifo at gcc dot gnu.org --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- I think it's reasonable to not apply unrolling to the vectorized epilogue, but note we have to be careful to adjust the maximum number of iterations for it we compute. Note this will also necessarily make the vectorized epilogue iterate. We could also leave the decision to the target, providing a suggested_epilog_"unroll" factor. That could also be used to for example get a 128bit vector epilogue for a 512bit main loop to address similar concerns there. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/110474] Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization 2023-06-29 4:35 [Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization hliu at amperecomputing dot com 2023-06-29 7:33 ` [Bug tree-optimization/110474] " rguenth at gcc dot gnu.org @ 2023-07-06 2:07 ` cvs-commit at gcc dot gnu.org 2023-07-06 2:21 ` hliu at amperecomputing dot com 2 siblings, 0 replies; 4+ messages in thread From: cvs-commit at gcc dot gnu.org @ 2023-07-06 2:07 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474 --- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Hao Liu <hliu@gcc.gnu.org>: https://gcc.gnu.org/g:7339e725b995912747c01c3ec80ce602512f45df commit r14-2335-g7339e725b995912747c01c3ec80ce602512f45df Author: Hao Liu <hliu@os.amperecomputing.com> Date: Thu Jul 6 10:03:47 2023 +0800 tree-optimization/110474 - Vect: select small VF for epilog of unrolled loop If a loop is unrolled during vectorization (i.e. suggested_unroll_factor > 1), the VFs of both main and epilog loop are enlarged. The epilog vect loop is specific for a loop with small iteration counts, so a large VF may hurt performance. This patch unscales the main loop VF by suggested_unroll_factor while selecting the epilog loop VF, so that it will be the same as vectorized loop without unrolling (i.e. suggested_unroll_factor = 1). gcc/ChangeLog: PR tree-optimization/110474 * tree-vect-loop.cc (vect_analyze_loop_2): unscale the VF by suggested unroll factor while selecting the epilog vect loop VF. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr110474.c: New testcase. ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/110474] Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization 2023-06-29 4:35 [Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization hliu at amperecomputing dot com 2023-06-29 7:33 ` [Bug tree-optimization/110474] " rguenth at gcc dot gnu.org 2023-07-06 2:07 ` cvs-commit at gcc dot gnu.org @ 2023-07-06 2:21 ` hliu at amperecomputing dot com 2 siblings, 0 replies; 4+ messages in thread From: hliu at amperecomputing dot com @ 2023-07-06 2:21 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474 Hao Liu <hliu at amperecomputing dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #3 from Hao Liu <hliu at amperecomputing dot com> --- It's better to have a suggested_epilog_"unroll" factor or support multiple epilogues. But need a lot of work. Let's support the simple patch firstly. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-07-06 2:21 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-06-29 4:35 [Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization hliu at amperecomputing dot com 2023-06-29 7:33 ` [Bug tree-optimization/110474] " rguenth at gcc dot gnu.org 2023-07-06 2:07 ` cvs-commit at gcc dot gnu.org 2023-07-06 2:21 ` hliu at amperecomputing dot com
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).