public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jan Hubicka <hubicka@ucw.cz>
To: Richard Biener <rguenther@suse.de>
Cc: Tamar Christina <tamar.christina@arm.com>,
	gcc-patches@gcc.gnu.org, nd@arm.com, jlaw@ventanamicro.com
Subject: Re: [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits
Date: Fri, 7 Jul 2023 14:20:15 +0200	[thread overview]
Message-ID: <ZKgC/9WHxvuD3qdA@kam.mff.cuni.cz> (raw)
In-Reply-To: <nycvar.YFH.7.77.849.2307070557080.4723@jbgna.fhfr.qr>

> 
> Looks good, but I wonder what we can do to at least make the
> multiple exit case behave reasonably?  The vectorizer keeps track

> of a "canonical" exit, would it be possible to pass in the main
> exit edge and use that instead of single_exit (), would other
> exits then behave somewhat reasonable or would we totally screw
> things up here?  That is, the "canonical" exit would be the
> counting exit while the other exits are on data driven conditions
> and thus wouldn't change probability when we reduce the number
> of iterations(?)

I can add canonical_exit parameter and make the function to direct flow
to it if possible.  However overall I think fixup depends on what
transformation led to the change.

Assuming that vectorizer did no prologues and apilogues and we
vectorized with factor N, then I think the update could be done more
specifically as follows.

We know that header block count dropped by 4. So we can start from that
and each time we reach basic block with exit edge, we know the original
count of the edge.  This count is unchanged, so one can rescale
probabilities out of that BB accordingly.  If loop has no inner loops,
we can just walk the body in RPO and propagate scales downwards and we
sould arrive to right result

I originally added the bound parameter to handle prologues/epilogues
which gets new artificial bound.  In prologue I think you are right that
the flow will be probably directed to the conditional counting
iterations.

In epilogue we add no artificial iteration cap, so maybe it is more
realistic to simply scale up probability of all exits?

To see what is going on I tried following testcase:

int a[99];
test()
{
  for (int i = 0; i < 99; i++)
      a[i]++;
}

What surprises me is that vectorizer at -O2 does nothing and we end up
unrolling the loop:

L2:
        addl    $1, (%rax)
        addl    $1, 4(%rax)
        addl    $1, 8(%rax)
        addq    $12, %rax
        cmpq    $a+396, %rax

Which seems sily thing to do. Vectorized loop with epilogue doing 2 and
1 addition would be better.

With -O3 we vectorize it:


.L2:
        movdqa  (%rax), %xmm0
        addq    $16, %rax
        paddd   %xmm1, %xmm0
        movaps  %xmm0, -16(%rax)
        cmpq    %rax, %rdx
        jne     .L2
        movq    a+384(%rip), %xmm0
        addl    $1, a+392(%rip)
        movq    .LC1(%rip), %xmm1
        paddd   %xmm1, %xmm0
        movq    %xmm0, a+384(%rip)


and correctly drop vectorized loop body to 24 iterations. However the
epilogue has loop for vector size 2 predicted to iterate once (it won't)

;;   basic block 7, loop depth 0, count 10737416 (estimated locally), maybe hot 
;;    prev block 5, next block 8, flags: (NEW, VISITED)                         
;;    pred:       3 [4.0% (adjusted)]  count:10737416 (estimated locally) (FALSE_VALUE,EXECUTABLE)
;;    succ:       8 [always]  count:10737416 (estimated locally) (FALLTHRU,EXECUTABLE)
                                                                                
;;   basic block 8, loop depth 1, count 21474835 (estimated locally), maybe hot 
;;    prev block 7, next block 9, flags: (NEW, REACHABLE, VISITED)              
;;    pred:       9 [always]  count:10737417 (estimated locally) (FALLTHRU,DFS_BACK,EXECUTABLE)
;;                7 [always]  count:10737416 (estimated locally) (FALLTHRU,EXECUTABLE)
  # i_9 = PHI <i_17(9), 96(7)>                                                  
  # ivtmp_13 = PHI <ivtmp_18(9), 3(7)>                                          
  # vectp_a.14_40 = PHI <vectp_a.14_41(9), &MEM <int[99]> [(void *)&a + 384B](7)>
  # vectp_a.18_46 = PHI <vectp_a.18_47(9), &MEM <int[99]> [(void *)&a + 384B](7)>
  # ivtmp_49 = PHI <ivtmp_50(9), 0(7)>                                          
  vect__14.16_42 = MEM <vector(2) int> [(int *)vectp_a.14_40];                  
  _14 = a[i_9];                                                                 
  vect__15.17_44 = vect__14.16_42 + { 1, 1 };                                   
  _15 = _14 + 1;                                                                
  MEM <vector(2) int> [(int *)vectp_a.18_46] = vect__15.17_44;                  
  i_17 = i_9 + 1;                                                               
  ivtmp_18 = ivtmp_13 - 1;                                                      
  vectp_a.14_41 = vectp_a.14_40 + 8;                                            
  vectp_a.18_47 = vectp_a.18_46 + 8;                                            
  ivtmp_50 = ivtmp_49 + 1;                                                      
  if (ivtmp_50 < 1)                                                             
    goto <bb 9>; [50.00%]                                                       
  else                                                                          
    goto <bb 12>; [50.00%]                                                      

and finally the scalar copy

;;   basic block 12, loop depth 0, count 10737416 (estimated locally), maybe hot
;;    prev block 9, next block 13, flags: (NEW, VISITED)                        
;;    pred:       8 [50.0% (adjusted)]  count:10737418 (estimated locally) (FALSE_VALUE,EXECUTABLE)
;;    succ:       13 [always]  count:10737416 (estimated locally) (FALLTHRU)    
                                                                                
;;   basic block 13, loop depth 1, count 1063004409 (estimated locally), maybe hot
;;    prev block 12, next block 14, flags: (NEW, REACHABLE, VISITED)            
;;    pred:       14 [always]  count:1052266996 (estimated locally) (FALLTHRU,DFS_BACK,EXECUTABLE)
;;                12 [always]  count:10737416 (estimated locally) (FALLTHRU)    
  # i_30 = PHI <i_36(14), 98(12)>                                               
  # ivtmp_32 = PHI <ivtmp_37(14), 1(12)>                                        
  _33 = a[i_30];                                                                
  _34 = _33 + 1;                                                                
  a[i_30] = _34;                                                                
  i_36 = i_30 + 1;                                                              
  ivtmp_37 = ivtmp_32 - 1;                                                      
  if (ivtmp_37 != 0)                                                            
    goto <bb 14>; [98.99%]                                                      
  else                                                                          
    goto <bb 4>; [1.01%]                                                        

With also small but non-zero iteration probability.   This is papered
over by my yesterday patch. But it seems to me that it would be a lot
better if vectorizer understood that the epilogue will be loopless and
accounted it to the cost model that would probably make it easy to
enable it at cheap costs too.

Clang 16 at -O2 is much more aggressive by both vectorizing and unroling:

test:                                   # @test
        .cfi_startproc
# %bb.0:
        movdqa  a(%rip), %xmm1
        pcmpeqd %xmm0, %xmm0
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a(%rip)
        movdqa  a+16(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+16(%rip)
        movdqa  a+32(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+32(%rip)
        movdqa  a+48(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+48(%rip)
        movdqa  a+64(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+64(%rip)
        movdqa  a+80(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+80(%rip)
        movdqa  a+96(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+96(%rip)
        movdqa  a+112(%rip), %xmm1
        psubd   %xmm0, %xmm1
....
        movdqa  %xmm1, a+240(%rip)
        movdqa  a+256(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+256(%rip)
        movdqa  a+272(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+272(%rip)
        movdqa  a+288(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+288(%rip)
        movdqa  a+304(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+304(%rip)
        movdqa  a+320(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+320(%rip)
        movdqa  a+336(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+336(%rip)
        movdqa  a+352(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+352(%rip)
        movdqa  a+368(%rip), %xmm1
        psubd   %xmm0, %xmm1
        movdqa  %xmm1, a+368(%rip)
        addl    $1, a+384(%rip)
        addl    $1, a+388(%rip)
        addl    $1, a+392(%rip)
        retq

Honza

  reply	other threads:[~2023-07-07 12:20 UTC|newest]

Thread overview: 200+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 13:40 [PATCH v5 0/19] Support early break/return auto-vectorization Tamar Christina
2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
2023-07-04 11:29   ` Richard Biener
2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
2023-06-29 22:17   ` Jason Merrill
2023-06-30 16:18     ` Tamar Christina
2023-06-30 16:44       ` Jason Merrill
2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
2023-06-28 13:54   ` Tamar Christina
2023-07-04 11:31   ` Richard Biener
2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
2023-07-04 11:52   ` Richard Biener
2023-07-04 14:57     ` Jan Hubicka
2023-07-06 14:34       ` Jan Hubicka
2023-07-07  5:59         ` Richard Biener
2023-07-07 12:20           ` Jan Hubicka [this message]
2023-07-07 12:27             ` Tamar Christina
2023-07-07 14:10               ` Jan Hubicka
2023-07-10  7:07             ` Richard Biener
2023-07-10  8:33               ` Jan Hubicka
2023-07-10  9:24                 ` Richard Biener
2023-07-10  9:23               ` Jan Hubicka
2023-07-10  9:29                 ` Richard Biener
2023-07-11  9:28                   ` Jan Hubicka
2023-07-11 10:31                     ` Richard Biener
2023-07-11 12:40                       ` Jan Hubicka
2023-07-11 13:04                         ` Richard Biener
2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
2023-07-04 12:05   ` Richard Biener
2023-07-10 15:32     ` Tamar Christina
2023-07-11 11:03       ` Richard Biener
2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
2023-07-04 12:10   ` Richard Biener
2023-07-06 10:37     ` Tamar Christina
2023-07-06 10:51       ` Richard Biener
2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-07-13 11:32   ` Richard Biener
2023-07-13 11:54     ` Tamar Christina
2023-07-13 12:10       ` Richard Biener
2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-07-13 11:49   ` Richard Biener
2023-07-13 12:03     ` Tamar Christina
2023-07-14  9:09     ` Richard Biener
2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
2023-06-28 13:55   ` [PATCH 9/19] " Tamar Christina
2023-07-13 16:23     ` Richard Biener
2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
2023-07-13 17:31   ` Richard Biener
2023-07-13 19:05     ` Tamar Christina
2023-07-14 13:34       ` Richard Biener
2023-07-17 10:56         ` Tamar Christina
2023-07-17 12:48           ` Richard Biener
2023-08-18 11:35         ` Tamar Christina
2023-08-18 12:53           ` Richard Biener
2023-08-18 13:12             ` Tamar Christina
2023-08-18 13:15               ` Richard Biener
2023-10-23 20:21         ` Tamar Christina
2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
     [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
2023-06-28 14:49   ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
2023-06-28 16:00     ` Tamar Christina
2023-11-06  7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
2023-11-07  9:46   ` Richard Biener
2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
2023-11-07  9:52   ` Richard Biener
2023-11-16 10:53     ` Richard Biener
2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
2023-11-07 10:53   ` Richard Biener
2023-11-07 11:34     ` Tamar Christina
2023-11-07 14:23       ` Richard Biener
2023-12-19 10:11         ` Tamar Christina
2023-12-19 14:05           ` Richard Biener
2023-12-20 10:51             ` Tamar Christina
2023-12-20 12:24               ` Richard Biener
2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
2023-11-15  0:00   ` Tamar Christina
2023-11-15 12:40     ` Richard Biener
2023-11-20 21:51       ` Tamar Christina
2023-11-24 10:16         ` Tamar Christina
2023-11-24 12:38           ` Richard Biener
2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
2023-11-07 15:04   ` Richard Biener
2023-11-07 23:10     ` Tamar Christina
2023-11-13 20:11     ` Tamar Christina
2023-11-14  7:56       ` Richard Biener
2023-11-14  8:07         ` Tamar Christina
2023-11-14 23:59           ` Tamar Christina
2023-11-15 12:14             ` Richard Biener
2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
2023-11-07 14:54   ` Richard Biener
2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
2023-11-15  0:03   ` Tamar Christina
2023-11-15 13:01     ` Richard Biener
2023-11-15 13:09       ` Tamar Christina
2023-11-15 13:22         ` Richard Biener
2023-11-15 14:14           ` Tamar Christina
2023-11-16 10:40             ` Richard Biener
2023-11-16 11:08               ` Tamar Christina
2023-11-16 11:27                 ` Richard Biener
2023-11-16 12:01                   ` Tamar Christina
2023-11-16 12:30                     ` Richard Biener
2023-11-16 13:22                       ` Tamar Christina
2023-11-16 13:35                         ` Richard Biener
2023-11-16 14:14                           ` Tamar Christina
2023-11-16 14:17                             ` Richard Biener
2023-11-16 15:19                               ` Tamar Christina
2023-11-16 18:41                                 ` Tamar Christina
2023-11-17 10:40                                   ` Tamar Christina
2023-11-17 12:13                                     ` Richard Biener
2023-11-20 21:54                                       ` Tamar Christina
2023-11-24 10:18                                         ` Tamar Christina
2023-11-24 12:41                                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
2023-11-15  0:05   ` Tamar Christina
2023-11-15 13:41     ` Richard Biener
2023-11-15 14:26       ` Tamar Christina
2023-11-16 11:16         ` Richard Biener
2023-11-20 21:57           ` Tamar Christina
2023-11-24 10:20             ` Tamar Christina
2023-11-24 13:23               ` Richard Biener
2023-11-27 22:47                 ` Tamar Christina
2023-11-29 13:28                   ` Richard Biener
2023-11-29 21:22                     ` Tamar Christina
2023-11-30 13:23                       ` Richard Biener
2023-12-06  4:21                         ` Tamar Christina
2023-12-06  9:33                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 13:50     ` Richard Biener
2023-12-06  4:37       ` Tamar Christina
2023-12-06  9:37         ` Richard Biener
2023-12-08  8:58           ` Tamar Christina
2023-12-08 10:28             ` Richard Biener
2023-12-08 13:45               ` Tamar Christina
2023-12-08 13:59                 ` Richard Biener
2023-12-08 15:01                   ` Tamar Christina
2023-12-11  7:09                   ` Tamar Christina
2023-12-11  9:36                     ` Richard Biener
2023-12-11 23:12                       ` Tamar Christina
2023-12-12 10:10                         ` Richard Biener
2023-12-12 10:27                           ` Tamar Christina
2023-12-12 10:59                           ` Richard Sandiford
2023-12-12 11:30                             ` Richard Biener
2023-12-13 14:13                               ` Tamar Christina
2023-12-14 13:12                                 ` Richard Biener
2023-12-14 18:44                                   ` Tamar Christina
2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 14:47     ` Richard Biener
2023-12-06  4:10       ` Tamar Christina
2023-12-06  9:44         ` Richard Biener
2023-11-06  7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  8:31   ` Richard Biener
2023-12-06  9:10     ` Tamar Christina
2023-12-06  9:27       ` Richard Biener
2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  4:00     ` Tamar Christina
2023-12-06  8:18   ` Richard Biener
2023-12-06  8:52     ` Tamar Christina
2023-12-06  9:15       ` Richard Biener
2023-12-06  9:29         ` Tamar Christina
2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
2023-11-06 14:44   ` Richard Biener
2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
2023-12-09 10:38   ` Richard Sandiford
2023-12-11  7:38     ` Richard Biener
2023-12-11  8:49       ` Tamar Christina
2023-12-11  9:00         ` Richard Biener
2023-11-06  7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-11-28 16:37   ` Richard Sandiford
2023-11-28 17:55     ` Richard Sandiford
2023-12-06 16:25       ` Tamar Christina
2023-12-07  0:56         ` Richard Sandiford
2023-12-14 18:40           ` Tamar Christina
2023-12-14 19:34             ` Richard Sandiford
2023-11-06  7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-11-06  7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-11-27 12:48   ` Kyrylo Tkachov
2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
2023-11-27 12:47   ` Kyrylo Tkachov
2023-11-06 14:25 ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Richard Biener
2023-11-06 15:17   ` Tamar Christina
2023-11-07  9:42     ` Richard Biener
2023-11-07 10:47       ` Tamar Christina
2023-11-07 13:58         ` Richard Biener
2023-11-27 18:30           ` Richard Sandiford
2023-11-28  8:11             ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZKgC/9WHxvuD3qdA@kam.mff.cuni.cz \
    --to=hubicka@ucw.cz \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=nd@arm.com \
    --cc=rguenther@suse.de \
    --cc=tamar.christina@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).