public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code
@ 2023-12-25 11:29 juzhe.zhong at rivai dot ai
  2023-12-25 12:35 ` [Bug c/113134] " tnfchris at gcc dot gnu.org
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-25 11:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

            Bug ID: 113134
           Summary: Middle end early break vectorization: Fail to
                    vectorize a simple early break code
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Hi, as reference shows:

https://compiler-explorer.com/z/zMzba7WT1

void add(int N, int *__restrict a, int *__restrict b, int *__restrict c) {
  for (int i = 0; i < N; i++) {
    c[i] = a[i] + b[i];
    if (i > 1000) {
        break;
    }
  }
}

GCC failed to vectorize it:

add:
        cmp     w0, 0
        ble     .L1
        sbfiz   x6, x0, 2, 32
        mov     x4, 0
.L3:
        ldr     w0, [x1, x4]
        ldr     w5, [x2, x4]
        add     w0, w0, w5
        str     w0, [x3, x4]
        cmp     x4, 4004
        beq     .L1
        add     x4, x4, 4
        cmp     x6, x4
        bne     .L3
.L1:
        ret

But clang is able to vectorize it:

add:                                    // @add
        cmp     w0, #1
        b.lt    .LBB0_8
        mov     w8, w0
        mov     w9, #1001                       // =0x3e9
        sub     x8, x8, #1
        cmp     x8, #1001
        csel    x9, x8, x9, lo
        add     x10, x9, #1
        cnth    x9
        cmp     x10, x9
        b.hs    .LBB0_3
        mov     x9, xzr
        b       .LBB0_6
.LBB0_3:
        ptrue   p0.s
        neg     x9, x9
        mov     x11, xzr
        and     x9, x10, x9
        addvl   x12, x1, #1
        addvl   x13, x2, #1
        addvl   x14, x3, #1
.LBB0_4:                                // =>This Inner Loop Header: Depth=1
        ld1w    { z0.s }, p0/z, [x1, x11, lsl #2]
        ld1w    { z1.s }, p0/z, [x2, x11, lsl #2]
        ld1w    { z2.s }, p0/z, [x12, x11, lsl #2]
        ld1w    { z3.s }, p0/z, [x13, x11, lsl #2]
        add     z0.s, z1.s, z0.s
        add     z1.s, z3.s, z2.s
        st1w    { z0.s }, p0, [x3, x11, lsl #2]
        st1w    { z1.s }, p0, [x14, x11, lsl #2]
        inch    x11
        cmp     x9, x11
        b.ne    .LBB0_4
        cmp     x10, x9
        b.eq    .LBB0_8
.LBB0_6:                                // =>This Inner Loop Header: Depth=1
        lsl     x10, x9, #2
        cmp     x9, #1001
        ldr     w11, [x1, x10]
        ldr     w12, [x2, x10]
        add     w11, w12, w11
        str     w11, [x3, x10]
        b.eq    .LBB0_8
        cmp     x8, x9
        add     x9, x9, #1
        b.ne    .LBB0_6
.LBB0_8:
        ret

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] Middle end early break vectorization: Fail to vectorize a simple early break code
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
@ 2023-12-25 12:35 ` tnfchris at gcc dot gnu.org
  2023-12-27 15:21 ` [Bug c/113134] gcc does not version loops with side-effect early breaks tnfchris at gcc dot gnu.org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-12-25 12:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-12-25
           Keywords|                            |missed-optimization

--- Comment #1 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Yeah, the patch wasn't intended to handle that case.

As the patch says it requires the sizes of the memories being accessed to be
constant. i.e. https://compiler-explorer.com/z/87d61h3x5 works and that already
works today.


my patch would allow this if we relax vect_analyze_early_break_dependences to
not worry about memory accesses if the condition itself does not access memory.

However I don't think this should get to my patch at all. this loop

void add(int N, int *__restrict a, int *__restrict b, int *__restrict c) {
  for (int i = 0; i < N; i++) {
    c[i] = a[i] + b[i];
    if (i > 1000) {
        break;
    }
  }
}

is essentially

void add2(int N, int *__restrict a, int *__restrict b, int *__restrict c) {
  for (int i = 0; i < 1001; i++) {
    c[i] = a[i] + b[i];
  }
}

and is what clang has rewritten it to.

GCC does the same in ivcanon but then doesn't realize the additional exit is
unreachable and should be removed.  So this case needs to be handled before it
reaches the vectorizer.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with side-effect early breaks
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
  2023-12-25 12:35 ` [Bug c/113134] " tnfchris at gcc dot gnu.org
@ 2023-12-27 15:21 ` tnfchris at gcc dot gnu.org
  2023-12-28  1:21 ` [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects juzhe.zhong at rivai dot ai
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-12-27 15:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|2023-12-25 00:00:00         |2023-12-27
            Summary|Middle end early break      |gcc does not version loops
                   |vectorization: Fail to      |with side-effect early
                   |vectorize a simple early    |breaks
                   |break code                  |
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
So GCC's approach is much different than clang.

I think this should be handled by IVcannon as it makes the vectorizer code much
easier.  At the moment the vectorizer assumes that any exit it sees are
actually needed.  So even if I relax my patch to allow this we still produce a
pointless compare.

Looking at IVcannon it does for a constant sized array:

Loop 1 iterates 1001 times.
Loop 1 iterates at most 999 times.
Loop 1 likely iterates at most 999 times.
Analyzing # of iterations of loop 1
  exit condition [0, + , 1](no_overflow) <= 1000
  bounds on difference of bases: 1000 ... 1000
  result:
    # of iterations 1001, bounded by 1001
Removed pointless exit: if (i_13 > 1000)

but for the example attached:

Loop 1 iterates 1001 times.
Loop 1 iterates at most 1001 times.
Loop 1 likely iterates at most 1001 times.
Analyzing # of iterations of loop 1
  exit condition [1, + , 1](no_overflow) < N_13(D)
  bounds on difference of bases: 0 ... 2147483646

It has correctly determined that the loop bounds is at most 1001 but since N
can  be < 1001 it doesn't think the additional exit is useless.

However like clang we can just version the loop. Unlike clang however we can
probably do better.

if N >= 1000 then we can enter the vector code without the additional exit, but
if N < 1000 we can use my new pass.

It's not hard to allow this through the pass, but I doubt this will be accepted
in stage3..

For best result the loop should be versioned like clang does.

Richi?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
  2023-12-25 12:35 ` [Bug c/113134] " tnfchris at gcc dot gnu.org
  2023-12-27 15:21 ` [Bug c/113134] gcc does not version loops with side-effect early breaks tnfchris at gcc dot gnu.org
@ 2023-12-28  1:21 ` juzhe.zhong at rivai dot ai
  2023-12-28  3:48 ` tnfchris at gcc dot gnu.org
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-28  1:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Thanks Tamar. And I agree with you. This is not supposed to let early break to
handle that.  It should be optimization on scalar IR instead of loop
vectorizer.

I believe it is GCC-15 topic.

Btw, I am trying to enable early break on RISC-V port. But I failed to do that
since this following codes look quite to length target:

      if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
        {
          if (direct_internal_fn_supported_p (IFN_VCOND_MASK_LEN, vectype,
                                              OPTIMIZE_FOR_SPEED))
            return false;
          else
            vect_record_loop_mask (loop_vinfo, masks, ncopies, vectype, NULL);
        }

I guess this code is just disabling partial vector for length for now.

And need me to test and port this part for length in the followup patches.

Am I right ?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2023-12-28  1:21 ` [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects juzhe.zhong at rivai dot ai
@ 2023-12-28  3:48 ` tnfchris at gcc dot gnu.org
  2023-12-28  3:55 ` juzhe.zhong at rivai dot ai
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-12-28  3:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #3)
> I guess this code is just disabling partial vector for length for now.
> 
> And need me to test and port this part for length in the followup patches.
> 
> Am I right ?

Yeah, it needed to safely not allow it through for now. Once implemented 
you'll hit an assert in vectorizable_live_operations where you need to provide
a way to also get the first active element from a vector.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2023-12-28  3:48 ` tnfchris at gcc dot gnu.org
@ 2023-12-28  3:55 ` juzhe.zhong at rivai dot ai
  2023-12-28  4:02 ` tnfchris at gcc dot gnu.org
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-28  3:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #5 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Tamar Christina from comment #4)
> (In reply to JuzheZhong from comment #3)
> > I guess this code is just disabling partial vector for length for now.
> > 
> > And need me to test and port this part for length in the followup patches.
> > 
> > Am I right ?
> 
> Yeah, it needed to safely not allow it through for now. Once implemented 
> you'll hit an assert in vectorizable_live_operations where you need to
> provide a way to also get the first active element from a vector.

So for a length target, I enable cbranch optab but no vcond_mask_len optab.
Will it behavior wrong ?

Another question is could you give me more hints about
vectorizable_live_operation?

I thought vectorizable_live_operation is doing extract last active element,
I didn't see extract first active element.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (4 preceding siblings ...)
  2023-12-28  3:55 ` juzhe.zhong at rivai dot ai
@ 2023-12-28  4:02 ` tnfchris at gcc dot gnu.org
  2023-12-28  4:05 ` tnfchris at gcc dot gnu.org
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-12-28  4:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #6 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #5)
> (In reply to Tamar Christina from comment #4)
> > (In reply to JuzheZhong from comment #3)
> > > I guess this code is just disabling partial vector for length for now.
> > > 
> > > And need me to test and port this part for length in the followup patches.
> > > 
> > > Am I right ?
> > 
> > Yeah, it needed to safely not allow it through for now. Once implemented 
> > you'll hit an assert in vectorizable_live_operations where you need to
> > provide a way to also get the first active element from a vector.
> 
> So for a length target, I enable cbranch optab but no vcond_mask_len optab.
> Will it behavior wrong ?
> 

You need both, if the operation requires a mask it'll reject it without
vcond_mask_len support.  Because I didn't know how to extract first element
using vcond_mask_len I had to disable it.

> Another question is could you give me more hints about
> vectorizable_live_operation?
> 
> I thought vectorizable_live_operation is doing extract last active element,
> I didn't see extract first active element.

Normally yes, but I added extract first active element for this patch.  This is
because when you hit and take an early exit we restrart the vector iteration
since there may be partial effects to perform between where the loop started
and where the element is found.

specifically look at vectorizable_live_operation_1 there's an assert under 
  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))

with a comment saying what's needed.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (5 preceding siblings ...)
  2023-12-28  4:02 ` tnfchris at gcc dot gnu.org
@ 2023-12-28  4:05 ` tnfchris at gcc dot gnu.org
  2023-12-28  4:23 ` juzhe.zhong at rivai dot ai
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-12-28  4:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #7 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
You may be able to use the same approach as

  else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))

that is, reverse both the mask and the vector and using extract last.
It's not going to be performance critical so it's more important to be correct
rather than fast.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (6 preceding siblings ...)
  2023-12-28  4:05 ` tnfchris at gcc dot gnu.org
@ 2023-12-28  4:23 ` juzhe.zhong at rivai dot ai
  2023-12-28  4:30 ` tnfchris at gcc dot gnu.org
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-28  4:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #8 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Tamar Christina from comment #7)
> You may be able to use the same approach as
> 
>   else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
> 
> that is, reverse both the mask and the vector and using extract last.
> It's not going to be performance critical so it's more important to be
> correct rather than fast.

I just carefully read this code:

      /* For an inverted control flow with early breaks we want EXTRACT_FIRST
         instead of EXTRACT_LAST.  Emulate by reversing the vector and mask. */
      if (restart_loop && LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
        {
          /* First create the permuted mask.  */
          tree perm_mask = perm_mask_for_reverse (TREE_TYPE (mask));
          tree perm_dest = copy_ssa_name (mask);
          gimple *perm_stmt
                = gimple_build_assign (perm_dest, VEC_PERM_EXPR, mask,
                                       mask, perm_mask);
          vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
                                       &gsi);
          mask = perm_dest;

          /* Then permute the vector contents.  */
          tree perm_elem = perm_mask_for_reverse (vectype);
          perm_dest = copy_ssa_name (vec_lhs_phi);
          perm_stmt
                = gimple_build_assign (perm_dest, VEC_PERM_EXPR, vec_lhs_phi,
                                       vec_lhs_phi, perm_elem);
          vect_finish_stmt_generation (loop_vinfo, stmt_info, perm_stmt,
                                       &gsi);
          vec_lhs_phi = perm_dest;
        }


Suppose the loop mask is generated by whilelo instruction of ARM SVE.

Suppose we have 8 elements in a single whole vector.

mask = whilo (0, res) if res = 6, then mask = 11111000.
data = 12345678

Then if it is early break. You are reversing both data and mask as follows:

new_mask = 00011111
new_data = 87654321

Then use the EXTRACT_LAST, we will get value = 1 for early break.

Am I right ?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (7 preceding siblings ...)
  2023-12-28  4:23 ` juzhe.zhong at rivai dot ai
@ 2023-12-28  4:30 ` tnfchris at gcc dot gnu.org
  2023-12-28  4:35 ` juzhe.zhong at rivai dot ai
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-12-28  4:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #9 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #8)
> Suppose the loop mask is generated by whilelo instruction of ARM SVE.
> 
> Suppose we have 8 elements in a single whole vector.
> 
> mask = whilo (0, res) if res = 6, then mask = 11111000.
> data = 12345678
> 
> Then if it is early break. You are reversing both data and mask as follows:
> 
> new_mask = 00011111
> new_data = 87654321
> 
> Then use the EXTRACT_LAST, we will get value = 1 for early break.
> 
> Am I right ?

Yeah, the idea being the scalar loop will then run from 1 to 6 to do any side
effects that we couldn't apply.

We went with this approach first because it works for non-masked architectures
too. In GCC-15 we'll try to implement staying entirely inside a vector loop by
splitting the mask in elements until first active and element from first active
so we can correctly mask the operations.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (8 preceding siblings ...)
  2023-12-28  4:30 ` tnfchris at gcc dot gnu.org
@ 2023-12-28  4:35 ` juzhe.zhong at rivai dot ai
  2023-12-28  4:45 ` tnfchris at gcc dot gnu.org
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-28  4:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #10 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Tamar Christina from comment #9)
> (In reply to JuzheZhong from comment #8)
> > Suppose the loop mask is generated by whilelo instruction of ARM SVE.
> > 
> > Suppose we have 8 elements in a single whole vector.
> > 
> > mask = whilo (0, res) if res = 6, then mask = 11111000.
> > data = 12345678
> > 
> > Then if it is early break. You are reversing both data and mask as follows:
> > 
> > new_mask = 00011111
> > new_data = 87654321
> > 
> > Then use the EXTRACT_LAST, we will get value = 1 for early break.
> > 
> > Am I right ?
> 
> Yeah, the idea being the scalar loop will then run from 1 to 6 to do any
> side effects that we couldn't apply.
> 
> We went with this approach first because it works for non-masked
> architectures too. In GCC-15 we'll try to implement staying entirely inside
> a vector loop by splitting the mask in elements until first active and
> element from first active so we can correctly mask the operations.

Ok. For the current approach. Isn't it the first element is always element 0 ?

Since for ARM SVE loop mask is generated by whilelo instructions, it always set
mask bit from 0 to the last active element - 1.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (9 preceding siblings ...)
  2023-12-28  4:35 ` juzhe.zhong at rivai dot ai
@ 2023-12-28  4:45 ` tnfchris at gcc dot gnu.org
  2023-12-28  4:46 ` juzhe.zhong at rivai dot ai
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-12-28  4:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #11 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #10)
> (In reply to Tamar Christina from comment #9)
> > (In reply to JuzheZhong from comment #8)
> > > Suppose the loop mask is generated by whilelo instruction of ARM SVE.
> > > 
> > > Suppose we have 8 elements in a single whole vector.
> > > 
> > > mask = whilo (0, res) if res = 6, then mask = 11111000.
> > > data = 12345678
> > > 
> > > Then if it is early break. You are reversing both data and mask as follows:
> > > 
> > > new_mask = 00011111
> > > new_data = 87654321
> > > 
> > > Then use the EXTRACT_LAST, we will get value = 1 for early break.
> > > 
> > > Am I right ?
> > 
> > Yeah, the idea being the scalar loop will then run from 1 to 6 to do any
> > side effects that we couldn't apply.
> > 
> > We went with this approach first because it works for non-masked
> > architectures too. In GCC-15 we'll try to implement staying entirely inside
> > a vector loop by splitting the mask in elements until first active and
> > element from first active so we can correctly mask the operations.
> 
> Ok. For the current approach. Isn't it the first element is always element 0
> ?
> 
> Since for ARM SVE loop mask is generated by whilelo instructions, it always
> set
> mask bit from 0 to the last active element - 1.

sure, but you can't use BIT_FIELD_REF on VLA vectors.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (10 preceding siblings ...)
  2023-12-28  4:45 ` tnfchris at gcc dot gnu.org
@ 2023-12-28  4:46 ` juzhe.zhong at rivai dot ai
  2023-12-28  4:49 ` tnfchris at gcc dot gnu.org
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-28  4:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #12 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Tamar Christina from comment #11)
> (In reply to JuzheZhong from comment #10)
> > (In reply to Tamar Christina from comment #9)
> > > (In reply to JuzheZhong from comment #8)
> > > > Suppose the loop mask is generated by whilelo instruction of ARM SVE.
> > > > 
> > > > Suppose we have 8 elements in a single whole vector.
> > > > 
> > > > mask = whilo (0, res) if res = 6, then mask = 11111000.
> > > > data = 12345678
> > > > 
> > > > Then if it is early break. You are reversing both data and mask as follows:
> > > > 
> > > > new_mask = 00011111
> > > > new_data = 87654321
> > > > 
> > > > Then use the EXTRACT_LAST, we will get value = 1 for early break.
> > > > 
> > > > Am I right ?
> > > 
> > > Yeah, the idea being the scalar loop will then run from 1 to 6 to do any
> > > side effects that we couldn't apply.
> > > 
> > > We went with this approach first because it works for non-masked
> > > architectures too. In GCC-15 we'll try to implement staying entirely inside
> > > a vector loop by splitting the mask in elements until first active and
> > > element from first active so we can correctly mask the operations.
> > 
> > Ok. For the current approach. Isn't it the first element is always element 0
> > ?
> > 
> > Since for ARM SVE loop mask is generated by whilelo instructions, it always
> > set
> > mask bit from 0 to the last active element - 1.
> 
> sure, but you can't use BIT_FIELD_REF on VLA vectors.

So, for length partial vector. We can use VEC_EXTRACT with index = 0 since
VEC_EXTRACT optab allows VLA vectors now for length target.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (11 preceding siblings ...)
  2023-12-28  4:46 ` juzhe.zhong at rivai dot ai
@ 2023-12-28  4:49 ` tnfchris at gcc dot gnu.org
  2023-12-28  4:51 ` juzhe.zhong at rivai dot ai
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-12-28  4:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #13 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #12)
> (In reply to Tamar Christina from comment #11)
> > (In reply to JuzheZhong from comment #10)
> > > (In reply to Tamar Christina from comment #9)
> > > > (In reply to JuzheZhong from comment #8)
> > > > > Suppose the loop mask is generated by whilelo instruction of ARM SVE.
> > > > > 
> > > > > Suppose we have 8 elements in a single whole vector.
> > > > > 
> > > > > mask = whilo (0, res) if res = 6, then mask = 11111000.
> > > > > data = 12345678
> > > > > 
> > > > > Then if it is early break. You are reversing both data and mask as follows:
> > > > > 
> > > > > new_mask = 00011111
> > > > > new_data = 87654321
> > > > > 
> > > > > Then use the EXTRACT_LAST, we will get value = 1 for early break.
> > > > > 
> > > > > Am I right ?
> > > > 
> > > > Yeah, the idea being the scalar loop will then run from 1 to 6 to do any
> > > > side effects that we couldn't apply.
> > > > 
> > > > We went with this approach first because it works for non-masked
> > > > architectures too. In GCC-15 we'll try to implement staying entirely inside
> > > > a vector loop by splitting the mask in elements until first active and
> > > > element from first active so we can correctly mask the operations.
> > > 
> > > Ok. For the current approach. Isn't it the first element is always element 0
> > > ?
> > > 
> > > Since for ARM SVE loop mask is generated by whilelo instructions, it always
> > > set
> > > mask bit from 0 to the last active element - 1.
> > 
> > sure, but you can't use BIT_FIELD_REF on VLA vectors.
> 
> So, for length partial vector. We can use VEC_EXTRACT with index = 0 since
> VEC_EXTRACT optab allows VLA vectors now for length target.

Sounds good :)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (12 preceding siblings ...)
  2023-12-28  4:49 ` tnfchris at gcc dot gnu.org
@ 2023-12-28  4:51 ` juzhe.zhong at rivai dot ai
  2023-12-28  4:53 ` tnfchris at gcc dot gnu.org
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-28  4:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #14 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Tamar Christina from comment #13)
> (In reply to JuzheZhong from comment #12)
> > (In reply to Tamar Christina from comment #11)
> > > (In reply to JuzheZhong from comment #10)
> > > > (In reply to Tamar Christina from comment #9)
> > > > > (In reply to JuzheZhong from comment #8)
> > > > > > Suppose the loop mask is generated by whilelo instruction of ARM SVE.
> > > > > > 
> > > > > > Suppose we have 8 elements in a single whole vector.
> > > > > > 
> > > > > > mask = whilo (0, res) if res = 6, then mask = 11111000.
> > > > > > data = 12345678
> > > > > > 
> > > > > > Then if it is early break. You are reversing both data and mask as follows:
> > > > > > 
> > > > > > new_mask = 00011111
> > > > > > new_data = 87654321
> > > > > > 
> > > > > > Then use the EXTRACT_LAST, we will get value = 1 for early break.
> > > > > > 
> > > > > > Am I right ?
> > > > > 
> > > > > Yeah, the idea being the scalar loop will then run from 1 to 6 to do any
> > > > > side effects that we couldn't apply.
> > > > > 
> > > > > We went with this approach first because it works for non-masked
> > > > > architectures too. In GCC-15 we'll try to implement staying entirely inside
> > > > > a vector loop by splitting the mask in elements until first active and
> > > > > element from first active so we can correctly mask the operations.
> > > > 
> > > > Ok. For the current approach. Isn't it the first element is always element 0
> > > > ?
> > > > 
> > > > Since for ARM SVE loop mask is generated by whilelo instructions, it always
> > > > set
> > > > mask bit from 0 to the last active element - 1.
> > > 
> > > sure, but you can't use BIT_FIELD_REF on VLA vectors.
> > 
> > So, for length partial vector. We can use VEC_EXTRACT with index = 0 since
> > VEC_EXTRACT optab allows VLA vectors now for length target.
> 
> Sounds good :)

I wonder whether ARM SVE can also use this approach VEC_EXTRACT with index = 0.

I guess the only issue is that when mask = all zero. That is, there is no
active elements, What behavior should be here for early break ?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (13 preceding siblings ...)
  2023-12-28  4:51 ` juzhe.zhong at rivai dot ai
@ 2023-12-28  4:53 ` tnfchris at gcc dot gnu.org
  2023-12-28  5:08 ` tnfchris at gcc dot gnu.org
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-12-28  4:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #15 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #14)
> > > > sure, but you can't use BIT_FIELD_REF on VLA vectors.
> > > 
> > > So, for length partial vector. We can use VEC_EXTRACT with index = 0 since
> > > VEC_EXTRACT optab allows VLA vectors now for length target.
> > 
> > Sounds good :)
> 
> I wonder whether ARM SVE can also use this approach VEC_EXTRACT with index =
> 0.
> 
> I guess the only issue is that when mask = all zero. That is, there is no
> active elements, What behavior should be here for early break ?

That shouldn't happen, in that case you wouldn't have entered the loop. To
prevent this there's always a compare of n > 0 at the start of the loops to
skip the vector body entirely.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (14 preceding siblings ...)
  2023-12-28  4:53 ` tnfchris at gcc dot gnu.org
@ 2023-12-28  5:08 ` tnfchris at gcc dot gnu.org
  2023-12-28  9:11 ` juzhe.zhong at rivai dot ai
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2023-12-28  5:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #16 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> 
> I wonder whether ARM SVE can also use this approach VEC_EXTRACT with index =
> 0.

Perhaps, I'll look into it thanks. though this is ofcourse only applicable when
the mask comes from whilelo.

In the future when we get to loops such as:

for (int i = ..;;)
{
  if (a)
    {
      ....
      if (b)
        return i;
    }
}

the reduction would come from the first active element of the mask created by
the condition a and not the whilelo.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (15 preceding siblings ...)
  2023-12-28  5:08 ` tnfchris at gcc dot gnu.org
@ 2023-12-28  9:11 ` juzhe.zhong at rivai dot ai
  2023-12-28 21:20 ` [Bug tree-optimization/113134] " pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-12-28  9:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #17 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Tamar Christina from comment #16)
> > 
> > I wonder whether ARM SVE can also use this approach VEC_EXTRACT with index =
> > 0.
> 
> Perhaps, I'll look into it thanks. though this is ofcourse only applicable
> when the mask comes from whilelo.
> 
> In the future when we get to loops such as:
> 
> for (int i = ..;;)
> {
>   if (a)
>     {
>       ....
>       if (b)
>         return i;
>     }
> }
> 
> the reduction would come from the first active element of the mask created
> by the condition a and not the whilelo.

If the mask comes from a condition, VEC_EXTRACT approach is definitely
incorrect.

However, look into vectorizable_live_operation_1:
The mask should always come from whilo instruction (or say it is always loop
mask):

      tree mask = vect_get_loop_mask (loop_vinfo, &gsi,
                                      &LOOP_VINFO_MASKS (loop_vinfo),
                                      1, vectype, 0);

So I think it should be correct using VEC_EXTRACT with index = 0.

But if we look into vectorizable_early_break which will handle mask come from
condition, that is another story.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (16 preceding siblings ...)
  2023-12-28  9:11 ` juzhe.zhong at rivai dot ai
@ 2023-12-28 21:20 ` pinskia at gcc dot gnu.org
  2024-01-08  8:11 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-28 21:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
          Component|c                           |tree-optimization

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (17 preceding siblings ...)
  2023-12-28 21:20 ` [Bug tree-optimization/113134] " pinskia at gcc dot gnu.org
@ 2024-01-08  8:11 ` rguenth at gcc dot gnu.org
  2024-01-31 11:48 ` juzhe.zhong at rivai dot ai
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-08  8:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
The loop is rotated by header copying and the "combined" exit test looks like

  if (i_21 == 1001)
    goto <bb 5>; [1.00%]
  else
    goto <bb 4>; [99.00%]

  <bb 4> [local count: 1004539166]:
  i_18 = i_21 + 1;
  if (N_13(D) > i_18)
    goto <bb 3>; [94.50%]
  else
    goto <bb 5>; [5.50%]

this could be rewritten to use min(1002, N_13(D)) with the knowledge how
'i' evolves.  We get i_21 != 1001 || N_13(D) > (i_21 + 1) for the iteration
condition which I think we cannot combine in general.

The "easiest" way would be to have loop splitting split the loop on the
i_21 == 1001 condition I think.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (18 preceding siblings ...)
  2024-01-08  8:11 ` rguenth at gcc dot gnu.org
@ 2024-01-31 11:48 ` juzhe.zhong at rivai dot ai
  2024-01-31 12:04 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-01-31 11:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---

The loop is:

bb 3 -> bb 4 -> bb 5
  |       |______⬆
  |______________⬆

The condition in bb 3 is if (i_21 == 1001).
The condition in bb 4 is if (N_13(D) > i_18).

Look into lsplit:
This loop doesn't satisfy the check of:
if (split_loop (loop) || split_loop_on_cond (loop))

In split_loop_on_cond, it's trying to split the loop that condition
is loop invariant.  However, no matter bb 3 or bb 4, their conditions
are not loop invariant.

I wonder whether we should add a new kind of loop splitter like:

diff --git a/gcc/tree-ssa-loop-split.cc b/gcc/tree-ssa-loop-split.cc
index 04215fe7937..a4081b9b6f5 100644
--- a/gcc/tree-ssa-loop-split.cc
+++ b/gcc/tree-ssa-loop-split.cc
@@ -1769,7 +1769,8 @@ tree_ssa_split_loops (void)
       if (optimize_loop_for_size_p (loop))
        continue;

-      if (split_loop (loop) || split_loop_on_cond (loop))
+      if (split_loop (loop) || split_loop_on_cond (loop)
+         || split_loop_for_early_break (loop))
        {
          /* Mark our containing loop as having had some split inner loops.  */
          loop_outer (loop)->aux = loop;

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (19 preceding siblings ...)
  2024-01-31 11:48 ` juzhe.zhong at rivai dot ai
@ 2024-01-31 12:04 ` rguenth at gcc dot gnu.org
  2024-02-02  3:38 ` juzhe.zhong at rivai dot ai
  2024-02-02  8:49 ` juzhe.zhong at rivai dot ai
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-31 12:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think we want split_loop () handle this case.  That means extending it to
handle loops with multiple exits.  OTOH after loop rotation to

  if (i_21 == 1001)
    goto <bb 5>; [1.00%]
  else
    goto <bb 4>; [99.00%]

  <bb 4> [local count: 1004539166]:
  i_18 = i_21 + 1;
  if (N_13(D) > i_18)
    goto <bb 3>; [94.50%]
  else
    goto <bb 5>; [5.50%]

it could be also IVCANONs job to rewrite the exit test so the bound is
loop invariant and it becomes a single exit.

There's another recent PR where an exit condition like i < N && i < M
should become i < MIN(N,M).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (20 preceding siblings ...)
  2024-01-31 12:04 ` rguenth at gcc dot gnu.org
@ 2024-02-02  3:38 ` juzhe.zhong at rivai dot ai
  2024-02-02  8:49 ` juzhe.zhong at rivai dot ai
  22 siblings, 0 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-02-02  3:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, Richard. I looked into ivcanon.

I found that:

      /* If the loop has more than one exit, try checking all of them
         for # of iterations determinable through scev.  */
      if (!exit)
        niter = find_loop_niter (loop, &exit);

In find_loop_niter, we iterate 2 exit edges:

1. bb 5 -> bb 6 with niter = (unsigned int) N_13(D).
2. bb 3 -> bb 6 with niter = 1001.

It just skip niter = (unsigned int) N_13(D) in:
      if (!integer_zerop (desc.may_be_zero))
        continue;

find_loop_niter (loop, &exit) return 1001 with skipping  (unsigned int)
N_13(D).

Should it return MIN (1001, (unsigned int) N_13(D)).

I prefer fix it in ivcanon since I believe it would be more elegant than fix it
in loop splitter.

I am still investigating, any guides will be really appreciated.

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/113134] gcc does not version loops with early break conditions that don't have side-effects
  2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
                   ` (21 preceding siblings ...)
  2024-02-02  3:38 ` juzhe.zhong at rivai dot ai
@ 2024-02-02  8:49 ` juzhe.zhong at rivai dot ai
  22 siblings, 0 replies; 24+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2024-02-02  8:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #22 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
I have done this following experiment.


diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index bf017137260..8c36cc63d3b 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -1260,6 +1260,39 @@ canonicalize_loop_induction_variables (class loop *loop,
          may_be_zero = false;
        }

+      if (!exit)
+       {
+         auto_vec<edge> exits = get_loop_exit_edges (loop);
+         exit = exits[0];
+         class tree_niter_desc desc1;
+         class tree_niter_desc desc2;
+         if (number_of_iterations_exit (loop, exits[0], &desc1, false)
+             && number_of_iterations_exit (loop, exits[1], &desc2, false))
+           {
+             niter = fold_build2 (MIN_EXPR, unsigned_type_node, desc1.niter,
+                                  desc2.niter);
+             create_canonical_iv (loop, exit, niter);
+             gcond *cond_stmt;
+             class nb_iter_bound *elt;
+             for (elt = loop->bounds; elt; elt = elt->next)
+               {
+                 if (elt->is_exit
+                     && !wi::ltu_p (loop->nb_iterations_upper_bound,
+                                    elt->bound))
+                   {
+                     cond_stmt = as_a <gcond *> (elt->stmt);
+                     break;
+                   }
+               }
+             if (exits[1]->flags & EDGE_TRUE_VALUE)
+               gimple_cond_make_false (cond_stmt);
+             else
+               gimple_cond_make_true (cond_stmt);
+             update_stmt (cond_stmt);
+             return false;
+           }
+       }
+

I know the check is wrong just for experiment, Then:

  <bb 2> [local count: 69202658]:
  _21 = (unsigned int) N_13(D);
  _22 = MIN_EXPR <_21, 1001>;    ---- > Use MIN_EXPR as the check.
  _23 = _22 + 1;
  goto <bb 5>; [100.00%]

  <bb 3> [local count: 1014686025]:
  _1 = (long unsigned int) i_9;
  _2 = _1 * 4;
  _3 = a_14(D) + _2;
  _4 = *_3;
  _5 = b_15(D) + _2;
  _6 = *_5;
  _7 = c_16(D) + _2;
  _8 = _4 + _6;
  *_7 = _8;
  if (0 != 0)
    goto <bb 6>; [1.00%]
  else
    goto <bb 4>; [99.00%]

  <bb 4> [local count: 1004539166]:
  i_18 = i_9 + 1;

  <bb 5> [local count: 1073741824]:
  # i_9 = PHI <0(2), i_18(4)>
  # ivtmp_19 = PHI <_23(2), ivtmp_20(4)>
  ivtmp_20 = ivtmp_19 - 1;
  if (ivtmp_20 != 0)
    goto <bb 3>; [94.50%]
  else
    goto <bb 6>; [5.50%]

  <bb 6> [local count: 69202658]:
  return;

Then it can vectorize.

I am not sure whether it is the right place to put the codes.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-02-02  8:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-25 11:29 [Bug c/113134] New: Middle end early break vectorization: Fail to vectorize a simple early break code juzhe.zhong at rivai dot ai
2023-12-25 12:35 ` [Bug c/113134] " tnfchris at gcc dot gnu.org
2023-12-27 15:21 ` [Bug c/113134] gcc does not version loops with side-effect early breaks tnfchris at gcc dot gnu.org
2023-12-28  1:21 ` [Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects juzhe.zhong at rivai dot ai
2023-12-28  3:48 ` tnfchris at gcc dot gnu.org
2023-12-28  3:55 ` juzhe.zhong at rivai dot ai
2023-12-28  4:02 ` tnfchris at gcc dot gnu.org
2023-12-28  4:05 ` tnfchris at gcc dot gnu.org
2023-12-28  4:23 ` juzhe.zhong at rivai dot ai
2023-12-28  4:30 ` tnfchris at gcc dot gnu.org
2023-12-28  4:35 ` juzhe.zhong at rivai dot ai
2023-12-28  4:45 ` tnfchris at gcc dot gnu.org
2023-12-28  4:46 ` juzhe.zhong at rivai dot ai
2023-12-28  4:49 ` tnfchris at gcc dot gnu.org
2023-12-28  4:51 ` juzhe.zhong at rivai dot ai
2023-12-28  4:53 ` tnfchris at gcc dot gnu.org
2023-12-28  5:08 ` tnfchris at gcc dot gnu.org
2023-12-28  9:11 ` juzhe.zhong at rivai dot ai
2023-12-28 21:20 ` [Bug tree-optimization/113134] " pinskia at gcc dot gnu.org
2024-01-08  8:11 ` rguenth at gcc dot gnu.org
2024-01-31 11:48 ` juzhe.zhong at rivai dot ai
2024-01-31 12:04 ` rguenth at gcc dot gnu.org
2024-02-02  3:38 ` juzhe.zhong at rivai dot ai
2024-02-02  8:49 ` juzhe.zhong at rivai dot ai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).