public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization
@ 2023-06-29 4:35 hliu at amperecomputing dot com
2023-06-29 7:33 ` [Bug tree-optimization/110474] " rguenth at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: hliu at amperecomputing dot com @ 2023-06-29 4:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474
Bug ID: 110474
Summary: Vect: the epilog vect loop should have small VF if the
loop is unrolled during vectorization
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: hliu at amperecomputing dot com
Target Milestone: ---
Hi, I'm trying to use tune loop unrolling during vectorization (see more:
tree-vect-loop.cc suggested_unroll_factor). I find the unrolling may hurt
performance as unrolling also increases the VF (vector factor) of epilog vect
loop.
For example:
int foo(short *A, char *B, int N) {
int sum = 0;
for (int i = 0; i < N; ++i) {
sum += A[i] * B[i];
}
return sum;
}
Compile it with "-O3 -mtune=neoverse-n2 -mcpu=neoverse-n1 --param
aarch64-vect-unroll-limit=2" (I'm using -mcpu n1 as I want to try a target
without SVE). GCC vectorization pass unrolls the loop by 2 and generates code
as following:
if N >= 32:
main vect loop ...
if N >= 16: # This may hurt performance if N is small (e.g. 8)
epilog vect loop ...
epilog scalar code ...
If the loop is not unrolled (i.e. use "--param aarch64-vect-unroll-limit=1").
GCC generates code as following:
if N >= 16:
main vect loop ...
if N >= 8:
epilog vect loop ...
epilog scalar code ...
The runtime check is based on the VF of epilog vectorization. There is code in
tree-vect-loop.cc (line 2990) to choose epilog vect VF:
/* If we're vectorizing an epilogue loop, the vectorized loop either needs
to be able to handle fewer than VF scalars, or needs to have a lower VF
than the main loop. */
if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)
&& !LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
&& maybe_ge (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo)))
return opt_result::failure_at (vect_location,
"Vectorization factor too high for"
" epilogue loop.\n");
But it doesn't consider about the suggested_unroll_factor. So I'm thinking
about adding following code to unscale the orig_loop_vinfo's VF by
unroll_factor:
unscaled_orig_vf = exact_div (LOOP_VINFO_VECT_FACTOR (orig_loop_vinfo),
orig_loop_vinfo->suggested_unroll_factor);
Is this reasonable?
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/110474] Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization
2023-06-29 4:35 [Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization hliu at amperecomputing dot com
@ 2023-06-29 7:33 ` rguenth at gcc dot gnu.org
2023-07-06 2:07 ` cvs-commit at gcc dot gnu.org
2023-07-06 2:21 ` hliu at amperecomputing dot com
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-29 7:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org,
| |rsandifo at gcc dot gnu.org
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think it's reasonable to not apply unrolling to the vectorized epilogue, but
note we have to be careful to adjust the maximum number of iterations for it we
compute. Note this will also necessarily make the vectorized epilogue iterate.
We could also leave the decision to the target, providing a
suggested_epilog_"unroll" factor. That could also be used to for example
get a 128bit vector epilogue for a 512bit main loop to address similar
concerns there.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/110474] Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization
2023-06-29 4:35 [Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization hliu at amperecomputing dot com
2023-06-29 7:33 ` [Bug tree-optimization/110474] " rguenth at gcc dot gnu.org
@ 2023-07-06 2:07 ` cvs-commit at gcc dot gnu.org
2023-07-06 2:21 ` hliu at amperecomputing dot com
2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-06 2:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474
--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Hao Liu <hliu@gcc.gnu.org>:
https://gcc.gnu.org/g:7339e725b995912747c01c3ec80ce602512f45df
commit r14-2335-g7339e725b995912747c01c3ec80ce602512f45df
Author: Hao Liu <hliu@os.amperecomputing.com>
Date: Thu Jul 6 10:03:47 2023 +0800
tree-optimization/110474 - Vect: select small VF for epilog of unrolled
loop
If a loop is unrolled during vectorization (i.e. suggested_unroll_factor >
1),
the VFs of both main and epilog loop are enlarged. The epilog vect loop is
specific for a loop with small iteration counts, so a large VF may hurt
performance.
This patch unscales the main loop VF by suggested_unroll_factor while
selecting
the epilog loop VF, so that it will be the same as vectorized loop without
unrolling (i.e. suggested_unroll_factor = 1).
gcc/ChangeLog:
PR tree-optimization/110474
* tree-vect-loop.cc (vect_analyze_loop_2): unscale the VF by
suggested
unroll factor while selecting the epilog vect loop VF.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr110474.c: New testcase.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/110474] Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization
2023-06-29 4:35 [Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization hliu at amperecomputing dot com
2023-06-29 7:33 ` [Bug tree-optimization/110474] " rguenth at gcc dot gnu.org
2023-07-06 2:07 ` cvs-commit at gcc dot gnu.org
@ 2023-07-06 2:21 ` hliu at amperecomputing dot com
2 siblings, 0 replies; 4+ messages in thread
From: hliu at amperecomputing dot com @ 2023-07-06 2:21 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110474
Hao Liu <hliu at amperecomputing dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED
--- Comment #3 from Hao Liu <hliu at amperecomputing dot com> ---
It's better to have a suggested_epilog_"unroll" factor or support multiple
epilogues. But need a lot of work. Let's support the simple patch firstly.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-07-06 2:21 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-29 4:35 [Bug tree-optimization/110474] New: Vect: the epilog vect loop should have small VF if the loop is unrolled during vectorization hliu at amperecomputing dot com
2023-06-29 7:33 ` [Bug tree-optimization/110474] " rguenth at gcc dot gnu.org
2023-07-06 2:07 ` cvs-commit at gcc dot gnu.org
2023-07-06 2:21 ` hliu at amperecomputing dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).