public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled. [not found] <bug-44794-4@http.gcc.gnu.org/bugzilla/> @ 2023-03-09 8:06 ` rguenth at gcc dot gnu.org 2023-04-19 11:51 ` cvs-commit at gcc dot gnu.org ` (2 subsequent siblings) 3 siblings, 0 replies; 8+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-03-09 8:06 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Last reconfirmed| |2023-03-09 Status|UNCONFIRMED |ASSIGNED Ever confirmed|0 |1 --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- On trunk we end up with a vectorized loop with unrolled vectorized epilogue and scalar iteration and with a scalar loop copy used for n == 1 and the case when 'a' and 'b' alias, so this loop covers all possible 'n'. Prefetching then (rightfully) prefetches both of these which makes four out of them. What can be improved is the upper bound on the iteration for the epilog loop as created by tree_unroll_loop. That results in the RTL unroller seeing - upper bound: 2147483646 - likely upper bound: 2147483646 + upper bound: 14 + likely upper bound: 14 realistic bound: -1 ;; Unable to prove that the loop iterates constant times +;; Not unrolling loop, doesn't roll ... - upper bound: 536870910 - likely upper bound: 536870910 + upper bound: 2 + likely upper bound: 2 realistic bound: -1 ;; Unable to prove that the loop iterates constant times +;; Not unrolling loop, doesn't roll and thus only unrolling the main loops again. The prefetching unrolling should have been enough of course - I suppose the RTL unroller could detect prefetch instructions and refrain from unrolling loops with prefetches on the basis they are already tuned well (that could be also implemented in the unroll control target hook). I am testing a patch to improve the situation. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled. [not found] <bug-44794-4@http.gcc.gnu.org/bugzilla/> 2023-03-09 8:06 ` [Bug tree-optimization/44794] pre- and post-loops should not be unrolled rguenth at gcc dot gnu.org @ 2023-04-19 11:51 ` cvs-commit at gcc dot gnu.org 2023-04-19 11:53 ` rguenth at gcc dot gnu.org 2023-04-19 11:55 ` rguenth at gcc dot gnu.org 3 siblings, 0 replies; 8+ messages in thread From: cvs-commit at gcc dot gnu.org @ 2023-04-19 11:51 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794 --- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>: https://gcc.gnu.org/g:a243ce2a52a6c62bc0d6be0b756a85dd9c1bceb7 commit r14-71-ga243ce2a52a6c62bc0d6be0b756a85dd9c1bceb7 Author: Richard Biener <rguenther@suse.de> Date: Thu Mar 9 09:02:07 2023 +0100 tree-optimization/44794 - avoid excessive RTL unrolling on epilogues The following adjusts tree_[transform_and_]unroll_loop to set an upper bound on the number of iterations on the epilogue loop it creates. For the testcase at hand which involves array prefetching this avoids applying RTL unrolling to them when -funroll-loops is specified. Other users of this API includes predictive commoning and unroll-and-jam. PR tree-optimization/44794 * tree-ssa-loop-manip.cc (tree_transform_and_unroll_loop): If an epilogue loop is required set its iteration upper bound. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled. [not found] <bug-44794-4@http.gcc.gnu.org/bugzilla/> 2023-03-09 8:06 ` [Bug tree-optimization/44794] pre- and post-loops should not be unrolled rguenth at gcc dot gnu.org 2023-04-19 11:51 ` cvs-commit at gcc dot gnu.org @ 2023-04-19 11:53 ` rguenth at gcc dot gnu.org 2023-04-19 11:55 ` rguenth at gcc dot gnu.org 3 siblings, 0 replies; 8+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-04-19 11:53 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794 --- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- Heuristic to not unroll loops with prefetches is missing. The aprefetch pass could set ->unroll to 1 in the loop structure. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled. [not found] <bug-44794-4@http.gcc.gnu.org/bugzilla/> ` (2 preceding siblings ...) 2023-04-19 11:53 ` rguenth at gcc dot gnu.org @ 2023-04-19 11:55 ` rguenth at gcc dot gnu.org 3 siblings, 0 replies; 8+ messages in thread From: rguenth at gcc dot gnu.org @ 2023-04-19 11:55 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|rguenth at gcc dot gnu.org |unassigned at gcc dot gnu.org Status|ASSIGNED |NEW ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/44794] New: pre- and post-loops should not be unrolled. @ 2010-07-02 23:54 changpeng dot fang at amd dot com 2010-07-03 10:48 ` [Bug tree-optimization/44794] " rguenth at gcc dot gnu dot org ` (3 more replies) 0 siblings, 4 replies; 8+ messages in thread From: changpeng dot fang at amd dot com @ 2010-07-02 23:54 UTC (permalink / raw) To: gcc-bugs void foo(int *a, int *b, int n) { int i; for(i = 0; i < n; i++) a[i] = a[i] + b[i]; } For this simple loop, the vectorizer does its job and peels the last few iterations as post-loop that is not vectorized. But the RTL loop unroller does not know that it just has a few (at most 3 in this case) iterations, and will unroll the post-loop. What is worse, if you compile it with: gcc -O3 -fprefetch-loop-arrays -funroll-loops You may find the prefetch pass will also unroll the post-loop, and generate a new post-loop (post-post-loop) for this post-loop. Again, the RTL loop unroller could not recognize this post-post-loop, and will unroll it. (the RTL loop unroller will generate yet another post loop (post-post-post-loop) for the post-post-loop :-)) This will cause compilation time and code size increase dramastically without any performance benefit. -- Summary: pre- and post-loops should not be unrolled. Product: gcc Version: lno Status: UNCONFIRMED Severity: major Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled. 2010-07-02 23:54 [Bug tree-optimization/44794] New: " changpeng dot fang at amd dot com @ 2010-07-03 10:48 ` rguenth at gcc dot gnu dot org 2010-07-06 17:59 ` changpeng dot fang at amd dot com ` (2 subsequent siblings) 3 siblings, 0 replies; 8+ messages in thread From: rguenth at gcc dot gnu dot org @ 2010-07-03 10:48 UTC (permalink / raw) To: gcc-bugs ------- Comment #1 from rguenth at gcc dot gnu dot org 2010-07-03 10:48 ------- It would be interesting to know why/if number-of-iteration analysis fails and if the code the vectorizer emits can be adjusted to fix that. -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu dot | |org, rakdver at gcc dot gnu | |dot org Severity|major |enhancement Version|lno |4.6.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled. 2010-07-02 23:54 [Bug tree-optimization/44794] New: " changpeng dot fang at amd dot com 2010-07-03 10:48 ` [Bug tree-optimization/44794] " rguenth at gcc dot gnu dot org @ 2010-07-06 17:59 ` changpeng dot fang at amd dot com 2010-07-06 18:36 ` changpeng dot fang at amd dot com 2010-07-15 1:50 ` changpeng dot fang at amd dot com 3 siblings, 0 replies; 8+ messages in thread From: changpeng dot fang at amd dot com @ 2010-07-06 17:59 UTC (permalink / raw) To: gcc-bugs ------- Comment #2 from changpeng dot fang at amd dot com 2010-07-06 17:58 ------- We also need to handle the post loop of unrolling. Suppose the unroll_factor is 16, then the post-loop should have up to 15 iterations. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled. 2010-07-02 23:54 [Bug tree-optimization/44794] New: " changpeng dot fang at amd dot com 2010-07-03 10:48 ` [Bug tree-optimization/44794] " rguenth at gcc dot gnu dot org 2010-07-06 17:59 ` changpeng dot fang at amd dot com @ 2010-07-06 18:36 ` changpeng dot fang at amd dot com 2010-07-15 1:50 ` changpeng dot fang at amd dot com 3 siblings, 0 replies; 8+ messages in thread From: changpeng dot fang at amd dot com @ 2010-07-06 18:36 UTC (permalink / raw) To: gcc-bugs ------- Comment #3 from changpeng dot fang at amd dot com 2010-07-06 18:35 ------- Here is the impact of loop unrolling on the compilation time and code size on polyhedron test_fpu.f90: -O3 -ftree-vectorize -fno-prefetch-loop-arrays -fno-unroll-loops: timing: 12.62s, size: 67069 bytes -O3 -ftree-vectorize -fprefetch-loop-arrays -funroll-loops: timing: 51.77s, size: 234045 bytes I also did an experiment on prefetching that we don't unroll the pre- and post-loop generated by the vectorizer: -O3 -ftree-vectorize -fprefetch-loop-arrays: timing: 29.32s size: 92541 bytes -O3 -ftree-vectorize -fprefetch-loop-arrays (don't unroll pre- postloops) timing: 18.34s size: 78909 bytes -O3 -ftree-vectorize -fno-prefetch-loop-arrays timing: 12.62s, size: 67069 bytes -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled. 2010-07-02 23:54 [Bug tree-optimization/44794] New: " changpeng dot fang at amd dot com ` (2 preceding siblings ...) 2010-07-06 18:36 ` changpeng dot fang at amd dot com @ 2010-07-15 1:50 ` changpeng dot fang at amd dot com 3 siblings, 0 replies; 8+ messages in thread From: changpeng dot fang at amd dot com @ 2010-07-15 1:50 UTC (permalink / raw) To: gcc-bugs ------- Comment #4 from changpeng dot fang at amd dot com 2010-07-15 01:50 ------- Created an attachment (id=21205) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21205&action=view) Do not unroll pre and post loops I did a quick test on polyhedron before and after applying the preliminary patch. Tests are based on -O3 -fprefetch-loop-arrays -funroll-loops. timing (s) | size (B) before after %deduc | before after %deduc cacacita 14.35 10.88 24.18 | 90715 72843 19.7 gas_dyn 34.68 21.58 37.77 | 149608 100936 32.53 nf 33.91 19.32 43.03 | 139150 83054 40.31 protein 51.35 33.23 35.29 | 163672 122808 24.97 rnflow 60.9 43.28 28.93 | 268784 169152 37.07 test_fpu 52.61 30.35 42.31 | 234045 144285 38.35 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794 ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-04-19 11:55 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <bug-44794-4@http.gcc.gnu.org/bugzilla/> 2023-03-09 8:06 ` [Bug tree-optimization/44794] pre- and post-loops should not be unrolled rguenth at gcc dot gnu.org 2023-04-19 11:51 ` cvs-commit at gcc dot gnu.org 2023-04-19 11:53 ` rguenth at gcc dot gnu.org 2023-04-19 11:55 ` rguenth at gcc dot gnu.org 2010-07-02 23:54 [Bug tree-optimization/44794] New: " changpeng dot fang at amd dot com 2010-07-03 10:48 ` [Bug tree-optimization/44794] " rguenth at gcc dot gnu dot org 2010-07-06 17:59 ` changpeng dot fang at amd dot com 2010-07-06 18:36 ` changpeng dot fang at amd dot com 2010-07-15 1:50 ` changpeng dot fang at amd dot com
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).