public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <rguenther@suse.de>
To: Tamar Christina <tamar.christina@arm.com>
Cc: gcc-patches@gcc.gnu.org, nd@arm.com
Subject: Re: [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization
Date: Mon, 6 Nov 2023 14:25:15 +0000 (UTC)	[thread overview]
Message-ID: <nycvar.YFH.7.77.849.2311061418210.8772@jbgna.fhfr.qr> (raw)
In-Reply-To: <patch-17494-tamar@arm.com>

On Mon, 6 Nov 2023, Tamar Christina wrote:

> Hi All,
> 
> This patch adds initial support for early break vectorization in GCC.
> The support is added for any target that implements a vector cbranch optab,
> this includes both fully masked and non-masked targets.
> 
> Depending on the operation, the vectorizer may also require support for boolean
> mask reductions using Inclusive OR.  This is however only checked then the
> comparison would produce multiple statements.
> 
> Note: I am currently struggling to get patch 7 correct in all cases and could use
>       some feedback there.
> 
> Concretely the kind of loops supported are of the forms:
> 
>  for (int i = 0; i < N; i++)
>  {
>    <statements1>
>    if (<condition>)
>      {
>        ...
>        <action>;
>      }
>    <statements2>
>  }
> 
> where <action> can be:
>  - break
>  - return
>  - goto
> 
> Any number of statements can be used before the <action> occurs.
> 
> Since this is an initial version for GCC 14 it has the following limitations and
> features:
> 
> - Only fixed sized iterations and buffers are supported.  That is to say any
>   vectors loaded or stored must be to statically allocated arrays with known
>   sizes. N must also be known.  This limitation is because our primary target
>   for this optimization is SVE.  For VLA SVE we can't easily do cross page
>   iteraion checks. The result is likely to also not be beneficial. For that
>   reason we punt support for variable buffers till we have First-Faulting
>   support in GCC.
> - any stores in <statements1> should not be to the same objects as in
>   <condition>.  Loads are fine as long as they don't have the possibility to
>   alias.  More concretely, we block RAW dependencies when the intermediate value
>   can't be separated fromt the store, or the store itself can't be moved.
> - Prologue peeling, alignment peelinig and loop versioning are supported.
> - Fully masked loops, unmasked loops and partially masked loops are supported
> - Any number of loop early exits are supported.
> - No support for epilogue vectorization.  The only epilogue supported is the
>   scalar final one.  Peeling code supports it but the code motion code cannot
>   find instructions to make the move in the epilog.
> - Early breaks are only supported for inner loop vectorization.
> 
> I have pushed a branch to refs/users/tnfchris/heads/gcc-14-early-break
> 
> With the help of IPA and LTO this still gets hit quite often.  During bootstrap
> it hit rather frequently.  Additionally TSVC s332, s481 and s482 all pass now
> since these are tests for support for early exit vectorization.
> 
> This implementation does not support completely handling the early break inside
> the vector loop itself but instead supports adding checks such that if we know
> that we have to exit in the current iteration then we branch to scalar code to
> actually do the final VF iterations which handles all the code in <action>.
> 
> For the scalar loop we know that whatever exit you take you have to perform at
> most VF iterations.  For vector code we only case about the state of fully
> performed iteration and reset the scalar code to the (partially) remaining loop.
> 
> That is to say, the first vector loop executes so long as the early exit isn't
> needed.  Once the exit is taken, the scalar code will perform at most VF extra
> iterations.  The exact number depending on peeling and iteration start and which
> exit was taken (natural or early).   For this scalar loop, all early exits are
> treated the same.
> 
> When we vectorize we move any statement not related to the early break itself
> and that would be incorrect to execute before the break (i.e. has side effects)
> to after the break.  If this is not possible we decline to vectorize.
> 
> This means that we check at the start of iterations whether we are going to exit
> or not.  During the analyis phase we check whether we are allowed to do this
> moving of statements.  Also note that we only move the scalar statements, but
> only do so after peeling but just before we start transforming statements.
> 
> Codegen:
> 
> for e.g.
> 
> #define N 803
> unsigned vect_a[N];
> unsigned vect_b[N];
> 
> unsigned test4(unsigned x)
> {
>  unsigned ret = 0;
>  for (int i = 0; i < N; i++)
>  {
>    vect_b[i] = x + i;
>    if (vect_a[i] > x)
>      break;
>    vect_a[i] = x;
> 
>  }
>  return ret;
> }
> 
> We generate for Adv. SIMD:
> 
> test4:
>         adrp    x2, .LC0
>         adrp    x3, .LANCHOR0
>         dup     v2.4s, w0
>         add     x3, x3, :lo12:.LANCHOR0
>         movi    v4.4s, 0x4
>         add     x4, x3, 3216
>         ldr     q1, [x2, #:lo12:.LC0]
>         mov     x1, 0
>         mov     w2, 0
>         .p2align 3,,7
> .L3:
>         ldr     q0, [x3, x1]
>         add     v3.4s, v1.4s, v2.4s
>         add     v1.4s, v1.4s, v4.4s
>         cmhi    v0.4s, v0.4s, v2.4s
>         umaxp   v0.4s, v0.4s, v0.4s
>         fmov    x5, d0
>         cbnz    x5, .L6
>         add     w2, w2, 1
>         str     q3, [x1, x4]
>         str     q2, [x3, x1]
>         add     x1, x1, 16
>         cmp     w2, 200
>         bne     .L3
>         mov     w7, 3
> .L2:
>         lsl     w2, w2, 2
>         add     x5, x3, 3216
>         add     w6, w2, w0
>         sxtw    x4, w2
>         ldr     w1, [x3, x4, lsl 2]
>         str     w6, [x5, x4, lsl 2]
>         cmp     w0, w1
>         bcc     .L4
>         add     w1, w2, 1
>         str     w0, [x3, x4, lsl 2]
>         add     w6, w1, w0
>         sxtw    x1, w1
>         ldr     w4, [x3, x1, lsl 2]
>         str     w6, [x5, x1, lsl 2]
>         cmp     w0, w4
>         bcc     .L4
>         add     w4, w2, 2
>         str     w0, [x3, x1, lsl 2]
>         sxtw    x1, w4
>         add     w6, w1, w0
>         ldr     w4, [x3, x1, lsl 2]
>         str     w6, [x5, x1, lsl 2]
>         cmp     w0, w4
>         bcc     .L4
>         str     w0, [x3, x1, lsl 2]
>         add     w2, w2, 3
>         cmp     w7, 3
>         beq     .L4
>         sxtw    x1, w2
>         add     w2, w2, w0
>         ldr     w4, [x3, x1, lsl 2]
>         str     w2, [x5, x1, lsl 2]
>         cmp     w0, w4
>         bcc     .L4
>         str     w0, [x3, x1, lsl 2]
> .L4:
>         mov     w0, 0
>         ret
>         .p2align 2,,3
> .L6:
>         mov     w7, 4
>         b       .L2
> 
> and for SVE:
> 
> test4:
>         adrp    x2, .LANCHOR0
>         add     x2, x2, :lo12:.LANCHOR0
>         add     x5, x2, 3216
>         mov     x3, 0
>         mov     w1, 0
>         cntw    x4
>         mov     z1.s, w0
>         index   z0.s, #0, #1
>         ptrue   p1.b, all
>         ptrue   p0.s, all
>         .p2align 3,,7
> .L3:
>         ld1w    z2.s, p1/z, [x2, x3, lsl 2]
>         add     z3.s, z0.s, z1.s
>         cmplo   p2.s, p0/z, z1.s, z2.s
>         b.any   .L2
>         st1w    z3.s, p1, [x5, x3, lsl 2]
>         add     w1, w1, 1
>         st1w    z1.s, p1, [x2, x3, lsl 2]
>         add     x3, x3, x4
>         incw    z0.s
>         cmp     w3, 803
>         bls     .L3
> .L5:
>         mov     w0, 0
>         ret
>         .p2align 2,,3
> .L2:
>         cntw    x5
>         mul     w1, w1, w5
>         cbz     w5, .L5
>         sxtw    x1, w1
>         sub     w5, w5, #1
>         add     x5, x5, x1
>         add     x6, x2, 3216
>         b       .L6
>         .p2align 2,,3
> .L14:
>         str     w0, [x2, x1, lsl 2]
>         cmp     x1, x5
>         beq     .L5
>         mov     x1, x4
> .L6:
>         ldr     w3, [x2, x1, lsl 2]
>         add     w4, w0, w1
>         str     w4, [x6, x1, lsl 2]
>         add     x4, x1, 1
>         cmp     w0, w3
>         bcs     .L14
>         mov     w0, 0
>         ret
> 
> On the workloads this work is based on we see between 2-3x performance uplift
> using this patch.
> 
> Follow up plan:
>  - Boolean vectorization has several shortcomings.  I've filed PR110223 with the
>    bigger ones that cause vectorization to fail with this patch.
>  - SLP support.  This is planned for GCC 15 as for majority of the cases build
>    SLP itself fails.

It would be nice to get at least single-lane SLP support working.  I think
you need to treat the gcond as SLP root stmt and basically do discovery
on the condition as to as if it were a mask generating condition.

Code generation would then simply schedule the gcond root instances
first (that would get you the code motion automagically).

So, add a new slp_instance_kind, for example slp_inst_kind_early_break,
and record the gcond as root stmt.  Possibly "pattern" recognizing

 gcond <_1 != _2>

as

 _mask = _1 != _2;
 gcond <_mask != 0>

makes the SLP discovery less fiddly (but in theory you can of course
handle gconds directly).

Is there any part of the series that can be pushed independelty?  If
so I'll try to look at those parts first.

Thanks,
Richard.

  parent reply	other threads:[~2023-11-06 14:25 UTC|newest]

Thread overview: 200+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-28 13:40 [PATCH v5 0/19] " Tamar Christina
2023-06-28 13:41 ` [PATCH 1/19]middle-end ifcvt: Support bitfield lowering of multiple-exit loops Tamar Christina
2023-07-04 11:29   ` Richard Biener
2023-06-28 13:41 ` [PATCH 2/19][front-end] C/C++ front-end: add pragma GCC novector Tamar Christina
2023-06-29 22:17   ` Jason Merrill
2023-06-30 16:18     ` Tamar Christina
2023-06-30 16:44       ` Jason Merrill
2023-06-28 13:42 ` [PATCH 3/19]middle-end clean up vect testsuite using pragma novector Tamar Christina
2023-06-28 13:54   ` Tamar Christina
2023-07-04 11:31   ` Richard Biener
2023-06-28 13:43 ` [PATCH 4/19]middle-end: Fix scale_loop_frequencies segfault on multiple-exits Tamar Christina
2023-07-04 11:52   ` Richard Biener
2023-07-04 14:57     ` Jan Hubicka
2023-07-06 14:34       ` Jan Hubicka
2023-07-07  5:59         ` Richard Biener
2023-07-07 12:20           ` Jan Hubicka
2023-07-07 12:27             ` Tamar Christina
2023-07-07 14:10               ` Jan Hubicka
2023-07-10  7:07             ` Richard Biener
2023-07-10  8:33               ` Jan Hubicka
2023-07-10  9:24                 ` Richard Biener
2023-07-10  9:23               ` Jan Hubicka
2023-07-10  9:29                 ` Richard Biener
2023-07-11  9:28                   ` Jan Hubicka
2023-07-11 10:31                     ` Richard Biener
2023-07-11 12:40                       ` Jan Hubicka
2023-07-11 13:04                         ` Richard Biener
2023-06-28 13:43 ` [PATCH 5/19]middle-end: Enable bit-field vectorization to work correctly when we're vectoring inside conds Tamar Christina
2023-07-04 12:05   ` Richard Biener
2023-07-10 15:32     ` Tamar Christina
2023-07-11 11:03       ` Richard Biener
2023-06-28 13:44 ` [PATCH 6/19]middle-end: Don't enter piecewise expansion if VF is not constant Tamar Christina
2023-07-04 12:10   ` Richard Biener
2023-07-06 10:37     ` Tamar Christina
2023-07-06 10:51       ` Richard Biener
2023-06-28 13:44 ` [PATCH 7/19]middle-end: Refactor vectorizer loop conditionals and separate out IV to new variables Tamar Christina
2023-07-13 11:32   ` Richard Biener
2023-07-13 11:54     ` Tamar Christina
2023-07-13 12:10       ` Richard Biener
2023-06-28 13:45 ` [PATCH 8/19]middle-end: updated niters analysis to handle multiple exits Tamar Christina
2023-07-13 11:49   ` Richard Biener
2023-07-13 12:03     ` Tamar Christina
2023-07-14  9:09     ` Richard Biener
2023-06-28 13:45 ` [PATCH 9/19]AArch64 middle-end: refactor vectorizable_comparison to make the main body re-usable Tamar Christina
2023-06-28 13:55   ` [PATCH 9/19] " Tamar Christina
2023-07-13 16:23     ` Richard Biener
2023-06-28 13:46 ` [PATCH 10/19]middle-end: implement vectorizable_early_break Tamar Christina
2023-06-28 13:46 ` [PATCH 11/19]middle-end: implement code motion for early break Tamar Christina
2023-06-28 13:47 ` [PATCH 12/19]middle-end: implement loop peeling and IV updates " Tamar Christina
2023-07-13 17:31   ` Richard Biener
2023-07-13 19:05     ` Tamar Christina
2023-07-14 13:34       ` Richard Biener
2023-07-17 10:56         ` Tamar Christina
2023-07-17 12:48           ` Richard Biener
2023-08-18 11:35         ` Tamar Christina
2023-08-18 12:53           ` Richard Biener
2023-08-18 13:12             ` Tamar Christina
2023-08-18 13:15               ` Richard Biener
2023-10-23 20:21         ` Tamar Christina
2023-06-28 13:47 ` [PATCH 13/19]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-06-28 13:47 ` [PATCH 14/19]middle-end testsuite: Add new tests for early break vectorization Tamar Christina
2023-06-28 13:48 ` [PATCH 15/19]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-06-28 13:48 ` [PATCH 16/19]AArch64 Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-06-28 13:48 ` [PATCH 17/19]AArch64 Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-06-28 13:49 ` [PATCH 18/19]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-06-28 13:50 ` [PATCH 19/19]Arm: Add MVE " Tamar Christina
     [not found] ` <MW5PR11MB5908414D8B2AB0580A888ECAA924A@MW5PR11MB5908.namprd11.prod.outlook.com>
2023-06-28 14:49   ` FW: [PATCH v5 0/19] Support early break/return auto-vectorization 钟居哲
2023-06-28 16:00     ` Tamar Christina
2023-11-06  7:36 ` [PATCH v6 0/21]middle-end: " Tamar Christina
2023-11-06  7:37 ` [PATCH 1/21]middle-end testsuite: Add more pragma novector to new tests Tamar Christina
2023-11-07  9:46   ` Richard Biener
2023-11-06  7:37 ` [PATCH 2/21]middle-end testsuite: Add tests for early break vectorization Tamar Christina
2023-11-07  9:52   ` Richard Biener
2023-11-16 10:53     ` Richard Biener
2023-11-06  7:37 ` [PATCH 3/21]middle-end: Implement code motion and dependency analysis for early breaks Tamar Christina
2023-11-07 10:53   ` Richard Biener
2023-11-07 11:34     ` Tamar Christina
2023-11-07 14:23       ` Richard Biener
2023-12-19 10:11         ` Tamar Christina
2023-12-19 14:05           ` Richard Biener
2023-12-20 10:51             ` Tamar Christina
2023-12-20 12:24               ` Richard Biener
2023-11-06  7:38 ` [PATCH 4/21]middle-end: update loop peeling code to maintain LCSSA form " Tamar Christina
2023-11-15  0:00   ` Tamar Christina
2023-11-15 12:40     ` Richard Biener
2023-11-20 21:51       ` Tamar Christina
2023-11-24 10:16         ` Tamar Christina
2023-11-24 12:38           ` Richard Biener
2023-11-06  7:38 ` [PATCH 5/21]middle-end: update vectorizer's control update to support picking an exit other than loop latch Tamar Christina
2023-11-07 15:04   ` Richard Biener
2023-11-07 23:10     ` Tamar Christina
2023-11-13 20:11     ` Tamar Christina
2023-11-14  7:56       ` Richard Biener
2023-11-14  8:07         ` Tamar Christina
2023-11-14 23:59           ` Tamar Christina
2023-11-15 12:14             ` Richard Biener
2023-11-06  7:38 ` [PATCH 6/21]middle-end: support multiple exits in loop versioning Tamar Christina
2023-11-07 14:54   ` Richard Biener
2023-11-06  7:39 ` [PATCH 7/21]middle-end: update IV update code to support early breaks and arbitrary exits Tamar Christina
2023-11-15  0:03   ` Tamar Christina
2023-11-15 13:01     ` Richard Biener
2023-11-15 13:09       ` Tamar Christina
2023-11-15 13:22         ` Richard Biener
2023-11-15 14:14           ` Tamar Christina
2023-11-16 10:40             ` Richard Biener
2023-11-16 11:08               ` Tamar Christina
2023-11-16 11:27                 ` Richard Biener
2023-11-16 12:01                   ` Tamar Christina
2023-11-16 12:30                     ` Richard Biener
2023-11-16 13:22                       ` Tamar Christina
2023-11-16 13:35                         ` Richard Biener
2023-11-16 14:14                           ` Tamar Christina
2023-11-16 14:17                             ` Richard Biener
2023-11-16 15:19                               ` Tamar Christina
2023-11-16 18:41                                 ` Tamar Christina
2023-11-17 10:40                                   ` Tamar Christina
2023-11-17 12:13                                     ` Richard Biener
2023-11-20 21:54                                       ` Tamar Christina
2023-11-24 10:18                                         ` Tamar Christina
2023-11-24 12:41                                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 8/21]middle-end: update vectorizable_live_reduction with support for multiple exits and different exits Tamar Christina
2023-11-15  0:05   ` Tamar Christina
2023-11-15 13:41     ` Richard Biener
2023-11-15 14:26       ` Tamar Christina
2023-11-16 11:16         ` Richard Biener
2023-11-20 21:57           ` Tamar Christina
2023-11-24 10:20             ` Tamar Christina
2023-11-24 13:23               ` Richard Biener
2023-11-27 22:47                 ` Tamar Christina
2023-11-29 13:28                   ` Richard Biener
2023-11-29 21:22                     ` Tamar Christina
2023-11-30 13:23                       ` Richard Biener
2023-12-06  4:21                         ` Tamar Christina
2023-12-06  9:33                           ` Richard Biener
2023-11-06  7:39 ` [PATCH 9/21]middle-end: implement vectorizable_early_exit for codegen of exit code Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 13:50     ` Richard Biener
2023-12-06  4:37       ` Tamar Christina
2023-12-06  9:37         ` Richard Biener
2023-12-08  8:58           ` Tamar Christina
2023-12-08 10:28             ` Richard Biener
2023-12-08 13:45               ` Tamar Christina
2023-12-08 13:59                 ` Richard Biener
2023-12-08 15:01                   ` Tamar Christina
2023-12-11  7:09                   ` Tamar Christina
2023-12-11  9:36                     ` Richard Biener
2023-12-11 23:12                       ` Tamar Christina
2023-12-12 10:10                         ` Richard Biener
2023-12-12 10:27                           ` Tamar Christina
2023-12-12 10:59                           ` Richard Sandiford
2023-12-12 11:30                             ` Richard Biener
2023-12-13 14:13                               ` Tamar Christina
2023-12-14 13:12                                 ` Richard Biener
2023-12-14 18:44                                   ` Tamar Christina
2023-11-06  7:39 ` [PATCH 10/21]middle-end: implement relevancy analysis support for control flow Tamar Christina
2023-11-27 22:49   ` Tamar Christina
2023-11-29 14:47     ` Richard Biener
2023-12-06  4:10       ` Tamar Christina
2023-12-06  9:44         ` Richard Biener
2023-11-06  7:40 ` [PATCH 11/21]middle-end: wire through peeling changes and dominator updates after guard edge split Tamar Christina
2023-11-06  7:40 ` [PATCH 12/21]middle-end: Add remaining changes to peeling and vectorizer to support early breaks Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  8:31   ` Richard Biener
2023-12-06  9:10     ` Tamar Christina
2023-12-06  9:27       ` Richard Biener
2023-11-06  7:40 ` [PATCH 13/21]middle-end: Update loop form analysis to support early break Tamar Christina
2023-11-27 22:48   ` Tamar Christina
2023-12-06  4:00     ` Tamar Christina
2023-12-06  8:18   ` Richard Biener
2023-12-06  8:52     ` Tamar Christina
2023-12-06  9:15       ` Richard Biener
2023-12-06  9:29         ` Tamar Christina
2023-11-06  7:41 ` [PATCH 14/21]middle-end: Change loop analysis from looking at at number of BB to actual cfg Tamar Christina
2023-11-06 14:44   ` Richard Biener
2023-11-06  7:41 ` [PATCH 15/21]middle-end: [RFC] conditionally support forcing final edge for debugging Tamar Christina
2023-12-09 10:38   ` Richard Sandiford
2023-12-11  7:38     ` Richard Biener
2023-12-11  8:49       ` Tamar Christina
2023-12-11  9:00         ` Richard Biener
2023-11-06  7:41 ` [PATCH 16/21]middle-end testsuite: un-xfail TSVC loops that check for exit control flow vectorization Tamar Christina
2023-11-06  7:41 ` [PATCH 17/21]AArch64: Add implementation for vector cbranch for Advanced SIMD Tamar Christina
2023-11-28 16:37   ` Richard Sandiford
2023-11-28 17:55     ` Richard Sandiford
2023-12-06 16:25       ` Tamar Christina
2023-12-07  0:56         ` Richard Sandiford
2023-12-14 18:40           ` Tamar Christina
2023-12-14 19:34             ` Richard Sandiford
2023-11-06  7:42 ` [PATCH 18/21]AArch64: Add optimization for vector != cbranch fed into compare with 0 " Tamar Christina
2023-11-06  7:42 ` [PATCH 19/21]AArch64: Add optimization for vector cbranch combining SVE and " Tamar Christina
2023-11-06  7:42 ` [PATCH 20/21]Arm: Add Advanced SIMD cbranch implementation Tamar Christina
2023-11-27 12:48   ` Kyrylo Tkachov
2023-11-06  7:43 ` [PATCH 21/21]Arm: Add MVE " Tamar Christina
2023-11-27 12:47   ` Kyrylo Tkachov
2023-11-06 14:25 ` Richard Biener [this message]
2023-11-06 15:17   ` [PATCH v6 0/21]middle-end: Support early break/return auto-vectorization Tamar Christina
2023-11-07  9:42     ` Richard Biener
2023-11-07 10:47       ` Tamar Christina
2023-11-07 13:58         ` Richard Biener
2023-11-27 18:30           ` Richard Sandiford
2023-11-28  8:11             ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.YFH.7.77.849.2311061418210.8772@jbgna.fhfr.qr \
    --to=rguenther@suse.de \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=nd@arm.com \
    --cc=tamar.christina@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).