From: Tamar Christina <Tamar.Christina@arm.com>
To: Jan Hubicka <hubicka@ucw.cz>, Richard Biener <rguenther@suse.de>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
nd <nd@arm.com>, "jlaw@ventanamicro.com" <jlaw@ventanamicro.com>
Subject: RE: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
Date: Fri, 7 Jul 2023 12:27:56 +0000 [thread overview]
Message-ID: <VI1PR08MB53258470C38896D93FD58B5CFF2DA@VI1PR08MB5325.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <ZKgC/9WHxvuD3qdA@kam.mff.cuni.cz>
Hi Both,
Thanks for all the reviews/patches so far 😊
> >
> > Looks good, but I wonder what we can do to at least make the multiple
> > exit case behave reasonably? The vectorizer keeps track
>
> > of a "canonical" exit, would it be possible to pass in the main exit
> > edge and use that instead of single_exit (), would other exits then
> > behave somewhat reasonable or would we totally screw things up here?
> > That is, the "canonical" exit would be the counting exit while the
> > other exits are on data driven conditions and thus wouldn't change
> > probability when we reduce the number of iterations(?)
>
> I can add canonical_exit parameter and make the function to direct flow to it if
> possible. However overall I think fixup depends on what transformation led to
> the change.
>
> Assuming that vectorizer did no prologues and apilogues and we vectorized
> with factor N, then I think the update could be done more specifically as
> follows.
>
If it helps, how this patch series addresses multiple exits by forcing a scalar
epilogue, all non canonical_exits would have been redirected to this scalar
epilogue, so the remaining scalar iteration count will be at most VF.
Regards,
Tamar
> We know that header block count dropped by 4. So we can start from that
> and each time we reach basic block with exit edge, we know the original count
> of the edge. This count is unchanged, so one can rescale probabilities out of
> that BB accordingly. If loop has no inner loops, we can just walk the body in
> RPO and propagate scales downwards and we sould arrive to right result
>
> I originally added the bound parameter to handle prologues/epilogues which
> gets new artificial bound. In prologue I think you are right that the flow will be
> probably directed to the conditional counting iterations.
>
> In epilogue we add no artificial iteration cap, so maybe it is more realistic to
> simply scale up probability of all exits?
>
> To see what is going on I tried following testcase:
>
> int a[99];
> test()
> {
> for (int i = 0; i < 99; i++)
> a[i]++;
> }
>
> What surprises me is that vectorizer at -O2 does nothing and we end up
> unrolling the loop:
>
> L2:
> addl $1, (%rax)
> addl $1, 4(%rax)
> addl $1, 8(%rax)
> addq $12, %rax
> cmpq $a+396, %rax
>
> Which seems sily thing to do. Vectorized loop with epilogue doing 2 and
> 1 addition would be better.
>
> With -O3 we vectorize it:
>
>
> .L2:
> movdqa (%rax), %xmm0
> addq $16, %rax
> paddd %xmm1, %xmm0
> movaps %xmm0, -16(%rax)
> cmpq %rax, %rdx
> jne .L2
> movq a+384(%rip), %xmm0
> addl $1, a+392(%rip)
> movq .LC1(%rip), %xmm1
> paddd %xmm1, %xmm0
> movq %xmm0, a+384(%rip)
>
>
> and correctly drop vectorized loop body to 24 iterations. However the
> epilogue has loop for vector size 2 predicted to iterate once (it won't)
>
> ;; basic block 7, loop depth 0, count 10737416 (estimated locally), maybe
> hot
> ;; prev block 5, next block 8, flags: (NEW, VISITED)
> ;; pred: 3 [4.0% (adjusted)] count:10737416 (estimated locally)
> (FALSE_VALUE,EXECUTABLE)
> ;; succ: 8 [always] count:10737416 (estimated locally)
> (FALLTHRU,EXECUTABLE)
>
> ;; basic block 8, loop depth 1, count 21474835 (estimated locally), maybe
> hot
> ;; prev block 7, next block 9, flags: (NEW, REACHABLE, VISITED)
> ;; pred: 9 [always] count:10737417 (estimated locally)
> (FALLTHRU,DFS_BACK,EXECUTABLE)
> ;; 7 [always] count:10737416 (estimated locally)
> (FALLTHRU,EXECUTABLE)
> # i_9 = PHI <i_17(9), 96(7)>
> # ivtmp_13 = PHI <ivtmp_18(9), 3(7)>
> # vectp_a.14_40 = PHI <vectp_a.14_41(9), &MEM <int[99]> [(void *)&a +
> 384B](7)>
> # vectp_a.18_46 = PHI <vectp_a.18_47(9), &MEM <int[99]> [(void *)&a +
> 384B](7)>
> # ivtmp_49 = PHI <ivtmp_50(9), 0(7)>
> vect__14.16_42 = MEM <vector(2) int> [(int *)vectp_a.14_40];
> _14 = a[i_9];
> vect__15.17_44 = vect__14.16_42 + { 1, 1 };
> _15 = _14 + 1;
> MEM <vector(2) int> [(int *)vectp_a.18_46] = vect__15.17_44;
> i_17 = i_9 + 1;
> ivtmp_18 = ivtmp_13 - 1;
> vectp_a.14_41 = vectp_a.14_40 + 8;
> vectp_a.18_47 = vectp_a.18_46 + 8;
> ivtmp_50 = ivtmp_49 + 1;
> if (ivtmp_50 < 1)
> goto <bb 9>; [50.00%]
> else
> goto <bb 12>; [50.00%]
>
> and finally the scalar copy
>
> ;; basic block 12, loop depth 0, count 10737416 (estimated locally), maybe
> hot
> ;; prev block 9, next block 13, flags: (NEW, VISITED)
> ;; pred: 8 [50.0% (adjusted)] count:10737418 (estimated locally)
> (FALSE_VALUE,EXECUTABLE)
> ;; succ: 13 [always] count:10737416 (estimated locally) (FALLTHRU)
>
> ;; basic block 13, loop depth 1, count 1063004409 (estimated locally),
> maybe hot
> ;; prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)
> ;; pred: 14 [always] count:1052266996 (estimated locally)
> (FALLTHRU,DFS_BACK,EXECUTABLE)
> ;; 12 [always] count:10737416 (estimated locally) (FALLTHRU)
> # i_30 = PHI <i_36(14), 98(12)>
> # ivtmp_32 = PHI <ivtmp_37(14), 1(12)>
> _33 = a[i_30];
> _34 = _33 + 1;
> a[i_30] = _34;
> i_36 = i_30 + 1;
> ivtmp_37 = ivtmp_32 - 1;
> if (ivtmp_37 != 0)
> goto <bb 14>; [98.99%]
> else
> goto <bb 4>; [1.01%]
>
> With also small but non-zero iteration probability. This is papered
> over by my yesterday patch. But it seems to me that it would be a lot better if
> vectorizer understood that the epilogue will be loopless and accounted it to
> the cost model that would probably make it easy to enable it at cheap costs
> too.
>
> Clang 16 at -O2 is much more aggressive by both vectorizing and unroling:
>
> test: # @test
> .cfi_startproc
> # %bb.0:
> movdqa a(%rip), %xmm1
> pcmpeqd %xmm0, %xmm0
> psubd %xmm0, %xmm1
> movdqa %xmm1, a(%rip)
> movdqa a+16(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+16(%rip)
> movdqa a+32(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+32(%rip)
> movdqa a+48(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+48(%rip)
> movdqa a+64(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+64(%rip)
> movdqa a+80(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+80(%rip)
> movdqa a+96(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+96(%rip)
> movdqa a+112(%rip), %xmm1
> psubd %xmm0, %xmm1
> ....
> movdqa %xmm1, a+240(%rip)
> movdqa a+256(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+256(%rip)
> movdqa a+272(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+272(%rip)
> movdqa a+288(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+288(%rip)
> movdqa a+304(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+304(%rip)
> movdqa a+320(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+320(%rip)
> movdqa a+336(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+336(%rip)
> movdqa a+352(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+352(%rip)
> movdqa a+368(%rip), %xmm1
> psubd %xmm0, %xmm1
> movdqa %xmm1, a+368(%rip)
> addl $1, a+384(%rip)
> addl $1, a+388(%rip)
> addl $1, a+392(%rip)
> retq
>
> Honza
next prev parent reply other threads:[~2023-07-07 12:28 UTC|newest]
Thread overview: 200+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
2023-07-04 11:29 ` Richard Biener
2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
2023-06-29 22:17 ` Jason Merrill
2023-06-30 16:18 ` Tamar Christina
2023-06-30 16:44 ` Jason Merrill
2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
2023-06-28 13:54 ` Tamar Christina
2023-07-04 11:31 ` Richard Biener
2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
2023-07-04 11:52 ` Richard Biener
2023-07-04 14:57 ` Jan Hubicka
2023-07-06 14:34 ` Jan Hubicka
2023-07-07 5:59 ` Richard Biener
2023-07-07 12:20 ` Jan Hubicka
2023-07-07 12:27 ` Tamar Christina [this message]
2023-07-07 14:10 ` Jan Hubicka
2023-07-10 7:07 ` Richard Biener
2023-07-10 8:33 ` Jan Hubicka
2023-07-10 9:24 ` Richard Biener
2023-07-10 9:23 ` Jan Hubicka
2023-07-10 9:29 ` Richard Biener
2023-07-11 9:28 ` Jan Hubicka
2023-07-11 10:31 ` Richard Biener
2023-07-11 12:40 ` Jan Hubicka
2023-07-11 13:04 ` Richard Biener
2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
2023-07-04 12:05 ` Richard Biener
2023-07-10 15:32 ` Tamar Christina
2023-07-11 11:03 ` Richard Biener
2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
2023-07-04 12:10 ` Richard Biener
2023-07-06 10:37 ` Tamar Christina
2023-07-06 10:51 ` Richard Biener
2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-07-13 11:32 ` Richard Biener
2023-07-13 11:54 ` Tamar Christina
2023-07-13 12:10 ` Richard Biener
2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-07-13 11:49 ` Richard Biener
2023-07-13 12:03 ` Tamar Christina
2023-07-14 9:09 ` Richard Biener
2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
2023-06-28 13:55 ` [PATCH 9/19] " Tamar Christina
2023-07-13 16:23 ` Richard Biener
2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
2023-07-13 17:31 ` Richard Biener
2023-07-13 19:05 ` Tamar Christina
2023-07-14 13:34 ` Richard Biener
2023-07-17 10:56 ` Tamar Christina
2023-07-17 12:48 ` Richard Biener
2023-08-18 11:35 ` Tamar Christina
2023-08-18 12:53 ` Richard Biener
2023-08-18 13:12 ` Tamar Christina
2023-08-18 13:15 ` Richard Biener
2023-10-23 20:21 ` Tamar Christina
2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
[not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
2023-06-28 14:49 ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
2023-06-28 16:00 ` Tamar Christina
2023-11-06 7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
2023-11-06 7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
2023-11-07 9:46 ` Richard Biener
2023-11-06 7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
2023-11-07 9:52 ` Richard Biener
2023-11-16 10:53 ` Richard Biener
2023-11-06 7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
2023-11-07 10:53 ` Richard Biener
2023-11-07 11:34 ` Tamar Christina
2023-11-07 14:23 ` Richard Biener
2023-12-19 10:11 ` Tamar Christina
2023-12-19 14:05 ` Richard Biener
2023-12-20 10:51 ` Tamar Christina
2023-12-20 12:24 ` Richard Biener
2023-11-06 7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
2023-11-15 0:00 ` Tamar Christina
2023-11-15 12:40 ` Richard Biener
2023-11-20 21:51 ` Tamar Christina
2023-11-24 10:16 ` Tamar Christina
2023-11-24 12:38 ` Richard Biener
2023-11-06 7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
2023-11-07 15:04 ` Richard Biener
2023-11-07 23:10 ` Tamar Christina
2023-11-13 20:11 ` Tamar Christina
2023-11-14 7:56 ` Richard Biener
2023-11-14 8:07 ` Tamar Christina
2023-11-14 23:59 ` Tamar Christina
2023-11-15 12:14 ` Richard Biener
2023-11-06 7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
2023-11-07 14:54 ` Richard Biener
2023-11-06 7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
2023-11-15 0:03 ` Tamar Christina
2023-11-15 13:01 ` Richard Biener
2023-11-15 13:09 ` Tamar Christina
2023-11-15 13:22 ` Richard Biener
2023-11-15 14:14 ` Tamar Christina
2023-11-16 10:40 ` Richard Biener
2023-11-16 11:08 ` Tamar Christina
2023-11-16 11:27 ` Richard Biener
2023-11-16 12:01 ` Tamar Christina
2023-11-16 12:30 ` Richard Biener
2023-11-16 13:22 ` Tamar Christina
2023-11-16 13:35 ` Richard Biener
2023-11-16 14:14 ` Tamar Christina
2023-11-16 14:17 ` Richard Biener
2023-11-16 15:19 ` Tamar Christina
2023-11-16 18:41 ` Tamar Christina
2023-11-17 10:40 ` Tamar Christina
2023-11-17 12:13 ` Richard Biener
2023-11-20 21:54 ` Tamar Christina
2023-11-24 10:18 ` Tamar Christina
2023-11-24 12:41 ` Richard Biener
2023-11-06 7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
2023-11-15 0:05 ` Tamar Christina
2023-11-15 13:41 ` Richard Biener
2023-11-15 14:26 ` Tamar Christina
2023-11-16 11:16 ` Richard Biener
2023-11-20 21:57 ` Tamar Christina
2023-11-24 10:20 ` Tamar Christina
2023-11-24 13:23 ` Richard Biener
2023-11-27 22:47 ` Tamar Christina
2023-11-29 13:28 ` Richard Biener
2023-11-29 21:22 ` Tamar Christina
2023-11-30 13:23 ` Richard Biener
2023-12-06 4:21 ` Tamar Christina
2023-12-06 9:33 ` Richard Biener
2023-11-06 7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49 ` Tamar Christina
2023-11-29 13:50 ` Richard Biener
2023-12-06 4:37 ` Tamar Christina
2023-12-06 9:37 ` Richard Biener
2023-12-08 8:58 ` Tamar Christina
2023-12-08 10:28 ` Richard Biener
2023-12-08 13:45 ` Tamar Christina
2023-12-08 13:59 ` Richard Biener
2023-12-08 15:01 ` Tamar Christina
2023-12-11 7:09 ` Tamar Christina
2023-12-11 9:36 ` Richard Biener
2023-12-11 23:12 ` Tamar Christina
2023-12-12 10:10 ` Richard Biener
2023-12-12 10:27 ` Tamar Christina
2023-12-12 10:59 ` Richard Sandiford
2023-12-12 11:30 ` Richard Biener
2023-12-13 14:13 ` Tamar Christina
2023-12-14 13:12 ` Richard Biener
2023-12-14 18:44 ` Tamar Christina
2023-11-06 7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
2023-11-27 22:49 ` Tamar Christina
2023-11-29 14:47 ` Richard Biener
2023-12-06 4:10 ` Tamar Christina
2023-12-06 9:44 ` Richard Biener
2023-11-06 7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
2023-11-06 7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
2023-11-27 22:48 ` Tamar Christina
2023-12-06 8:31 ` Richard Biener
2023-12-06 9:10 ` Tamar Christina
2023-12-06 9:27 ` Richard Biener
2023-11-06 7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
2023-11-27 22:48 ` Tamar Christina
2023-12-06 4:00 ` Tamar Christina
2023-12-06 8:18 ` Richard Biener
2023-12-06 8:52 ` Tamar Christina
2023-12-06 9:15 ` Richard Biener
2023-12-06 9:29 ` Tamar Christina
2023-11-06 7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
2023-11-06 14:44 ` Richard Biener
2023-11-06 7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
2023-12-09 10:38 ` Richard Sandiford
2023-12-11 7:38 ` Richard Biener
2023-12-11 8:49 ` Tamar Christina
2023-12-11 9:00 ` Richard Biener
2023-11-06 7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-11-06 7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-11-28 16:37 ` Richard Sandiford
2023-11-28 17:55 ` Richard Sandiford
2023-12-06 16:25 ` Tamar Christina
2023-12-07 0:56 ` Richard Sandiford
2023-12-14 18:40 ` Tamar Christina
2023-12-14 19:34 ` Richard Sandiford
2023-11-06 7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-11-06 7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-11-06 7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-11-27 12:48 ` Kyrylo Tkachov
2023-11-06 7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
2023-11-27 12:47 ` Kyrylo Tkachov
2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
2023-11-06 15:17 ` Tamar Christina
2023-11-07 9:42 ` Richard Biener
2023-11-07 10:47 ` Tamar Christina
2023-11-07 13:58 ` Richard Biener
2023-11-27 18:30 ` Richard Sandiford
2023-11-28 8:11 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VI1PR08MB53258470C38896D93FD58B5CFF2DA@VI1PR08MB5325.eurprd08.prod.outlook.com \
--to=tamar.christina@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=hubicka@ucw.cz \
--cc=jlaw@ventanamicro.com \
--cc=nd@arm.com \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).