public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable
@ 2023-11-08  6:43 juzhe.zhong at rivai dot ai
  2023-11-08  6:58 ` [Bug target/112438] " kito at gcc dot gnu.org
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08  6:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

            Bug ID: 112438
           Summary: RISC-V: Failed to AVL propagation through induction
                    variable
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

void
foo (int n, int * __restrict in, int * __restrict out)
{
  for (int i = 0; i < n; i += 1)
    {
      out[i] = in[i] + i;
    }
}

ASM:

foo(int, int*, int*):
        ble     a0,zero,.L5
        csrr    a5,vlenb
        srli    a5,a5,2
        vsetvli a3,zero,e32,m1,ta,ma
        vmv.v.x v4,a5
        vid.v   v2
.L3:
        vsetvli a5,a0,e32,m1,ta,ma
        slli    a4,a5,2
        vle32.v v1,0(a1)
        sub     a0,a0,a5
        vadd.vv v1,v1,v2
        vse32.v v1,0(a2)
        add     a1,a1,a4
        vsetvli a5,zero,e32,m1,ta,ma --- > redundant
        add     a2,a2,a4
        vadd.vv v2,v2,v4
        bne     a0,zero,.L3
.L5:
        ret

It's known issue that I realized it when I was supporting AVL propagation.
Now, I find the time to support AVL propagation now.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
@ 2023-11-08  6:58 ` kito at gcc dot gnu.org
  2023-11-08  7:00 ` kito at gcc dot gnu.org
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: kito at gcc dot gnu.org @ 2023-11-08  6:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

Kito Cheng <kito at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kito at gcc dot gnu.org

--- Comment #1 from Kito Cheng <kito at gcc dot gnu.org> ---
Actually I suspect that should be a bug rather than missed-optimization, that
will only trigger on some CPU implementation, because ISA spec didn't guarantee
penultimate iteration will always got VLMAX for vl...

https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#63-constraints-on-setting-vl

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
  2023-11-08  6:58 ` [Bug target/112438] " kito at gcc dot gnu.org
@ 2023-11-08  7:00 ` kito at gcc dot gnu.org
  2023-11-08  7:05 ` juzhe.zhong at rivai dot ai
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: kito at gcc dot gnu.org @ 2023-11-08  7:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #2 from Kito Cheng <kito at gcc dot gnu.org> ---
oh, but the root cause might be little bit deeper, not just the problem of
propagation or not propagation the AVL.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
  2023-11-08  6:58 ` [Bug target/112438] " kito at gcc dot gnu.org
  2023-11-08  7:00 ` kito at gcc dot gnu.org
@ 2023-11-08  7:05 ` juzhe.zhong at rivai dot ai
  2023-11-08  7:25 ` juzhe.zhong at rivai dot ai
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08  7:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
You mean current codegen is bug ?


No, I don't think there is a bug in current codegen.

It's induction variable.


        ble     a0,zero,.L5
        ...
        vsetvli a3,zero,e32,m1,ta,ma
        ...
        vid.v   v2                 
.L3:
        vsetvli a5,a0,e32,m1,ta,ma
        ...
        vadd.vv v1,v1,v2
        ...
        vsetvli a5,zero,e32,m1,ta,ma --- > redundant
        ..
        vadd.vv v2,v2,v4
        bne     a0,zero,.L3


You can see the 'v2' of vadd.vv v1,v1,v2 is either come from the preheader vid,
or come from previous iterator 'vadd.vv v2,v2,v4'

It's generated by # vect_vec_iv_.6_22 = PHI <_21(4), { 0, 1, 2, ... }(3)>
induction variable.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2023-11-08  7:05 ` juzhe.zhong at rivai dot ai
@ 2023-11-08  7:25 ` juzhe.zhong at rivai dot ai
  2023-11-08  7:26 ` kito at gcc dot gnu.org
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08  7:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #4 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Oh. I see what you mean.

I think it may not be the valid optimization.

Since the following codes:

.L3:
        vsetvli a5,a0,e32,m1,ta,ma
        slli    a4,a5,2
        vle32.v v1,0(a1)
        sub     a0,a0,a5
        vadd.vv v1,v1,v2
        vse32.v v1,0(a2)
        add     a1,a1,a4
        vsetvli a5,zero,e32,m1,ta,ma --- > seems redundant
        add     a2,a2,a4
        vadd.vv v2,v2,v4
        bne     a0,zero,.L3

Suppose the VLEN = 8 elments. a0 is 13 in the last 2 iterations.

If we remove the VLMAX vsetvl which seems redundant. We may have issues in
some hardware.

Since 13 elements, we can choose to process 6 elements int last second,
and 7 elements in the last iteration.

The VLMAX vadd.vv result is used by next iteration NOT the current iteration.
Then, the vadd.vv will generate 6 elements to the last iteration which need 7 
elements.

Then it will cause a bug. So, it is not invalid optimization...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2023-11-08  7:25 ` juzhe.zhong at rivai dot ai
@ 2023-11-08  7:26 ` kito at gcc dot gnu.org
  2023-11-08  7:29 ` kito at gcc dot gnu.org
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: kito at gcc dot gnu.org @ 2023-11-08  7:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #5 from Kito Cheng <kito at gcc dot gnu.org> ---
Assume:

VLEN = 128 and n = 5, *in is {0, 0, 0, 0, 0}
so VLMAX = 4 for e32m1

It can be run with vl = 4 for first iteration, and vl = 1 vl for second
iteration

But it could be something like that: vl = 3 for first iteration and vl = 2 for
second iteration, ok, let run the code with that:

foo(int, int*, int*):
        ble     a0,zero,.L5
        csrr    a5,vlenb
        srli    a5,a5,2
        vsetvli a3,zero,e32,m1,ta,ma
        vmv.v.x v4,a5                 # v4 = {4, 4, 4, 4}
        vid.v   v2                    # v2 = {0, 1, 2, 3}
.L3:
        vsetvli a5,a0,e32,m1,ta,ma    # first iteration got vl = 3
        slli    a4,a5,2
        vle32.v v1,0(a1)              # v1 = {0, 0, 0}
        sub     a0,a0,a5
        vadd.vv v1,v1,v2              # v1 = {0, 0, 0} + {0, 1, 2}
        vse32.v v1,0(a2)              # out = {0, 1, 2, 0, 0}
        add     a1,a1,a4
        vsetvli a5,zero,e32,m1,ta,ma
        add     a2,a2,a4
        vadd.vv v2,v2,v4              # v2 = {0, 1, 2, 3} + {4, 4, 4, 4}
                                      #    = {4, 5, 6, 7}
        bne     a0,zero,.L3
.L5:
        ret

Ok, let run second iteration:

.L3:
        vsetvli a5,a0,e32,m1,ta,ma    # first iteration got vl = 2
        slli    a4,a5,2
        vle32.v v1,0(a1)              # v1 = {0, 0}
        sub     a0,a0,a5
        vadd.vv v1,v1,v2              # v1 = {0, 0} + {4, 5}
        vse32.v v1,0(a2)              # out = {0, 1, 2, 4, 5}
        add     a1,a1,a4
        vsetvli a5,zero,e32,m1,ta,ma
        add     a2,a2,a4
        vadd.vv v2,v2,v4              # v2 = {4, 5, 6, 7} + {4, 4, 4, 4}
                                      #    = {8, 9, 10, 11}
        bne     a0,zero,.L3

And the you will got {0, 1, 2, 4, 5} rather than {0, 1, 2, 3, 4}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (4 preceding siblings ...)
  2023-11-08  7:26 ` kito at gcc dot gnu.org
@ 2023-11-08  7:29 ` kito at gcc dot gnu.org
  2023-11-08  7:51 ` juzhe.zhong at rivai dot ai
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: kito at gcc dot gnu.org @ 2023-11-08  7:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #6 from Kito Cheng <kito at gcc dot gnu.org> ---
The key is the splat of VLMAX instruction need move into loop body, but AVL
propagation should still able to do:

```
foo(int, int*, int*):
        ble     a0,zero,.L5
        csrr    a5,vlenb
        srli    a5,a5,2
        vsetvli a3,zero,e32,m1,ta,ma
        vid.v   v2
.L3:
        vsetvli a5,a0,e32,m1,ta,ma
        slli    a4,a5,2
        vle32.v v1,0(a1)
        sub     a0,a0,a5
        vadd.vv v1,v1,v2
        vse32.v v1,0(a2)
        add     a1,a1,a4
        vmv.v.x v4,a5           # Move to here, splat vl to a5 rather than
VLMAX
        vsetvli a5,zero,e32,m1,ta,ma --- > redundant

        add     a2,a2,a4
        vadd.vv v2,v2,v4
        bne     a0,zero,.L3
.L5:
        ret
```

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Failed to AVL propagation through induction variable
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (5 preceding siblings ...)
  2023-11-08  7:29 ` kito at gcc dot gnu.org
@ 2023-11-08  7:51 ` juzhe.zhong at rivai dot ai
  2023-11-08  8:06 ` [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV kito at gcc dot gnu.org
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #7 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Kito Cheng from comment #6)
> The key is the splat of VLMAX instruction need move into loop body, but AVL
> propagation should still able to do:
> 
> ```
> foo(int, int*, int*):
>         ble     a0,zero,.L5
>         csrr    a5,vlenb
>         srli    a5,a5,2
>         vsetvli a3,zero,e32,m1,ta,ma
>         vid.v   v2
> .L3:
>         vsetvli a5,a0,e32,m1,ta,ma
>         slli    a4,a5,2
>         vle32.v v1,0(a1)
>         sub     a0,a0,a5
>         vadd.vv v1,v1,v2
>         vse32.v v1,0(a2)
>         add     a1,a1,a4
>         vmv.v.x v4,a5           # Move to here, splat vl to a5 rather than
> VLMAX
>         vsetvli a5,zero,e32,m1,ta,ma --- > redundant
> 
>         add     a2,a2,a4
>         vadd.vv v2,v2,v4
>         bne     a0,zero,.L3
> .L5:
>         ret
> ```

Oh. I understand it now. I think it's a bug.

And.. I just take a look at my internal LLVM...
Also has same issue....

I think we need to adapt the Gimple IR here:

  _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]);
  _21 = vect_vec_iv_.6_22 + { POLY_INT_CST [4, 4], ... };

change it into:

  _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]);
  _21 = vect_vec_iv_.6_22 + _35;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (6 preceding siblings ...)
  2023-11-08  7:51 ` juzhe.zhong at rivai dot ai
@ 2023-11-08  8:06 ` kito at gcc dot gnu.org
  2023-11-08  9:08 ` juzhe.zhong at rivai dot ai
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: kito at gcc dot gnu.org @ 2023-11-08  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #8 from Kito Cheng <kito at gcc dot gnu.org> ---
> Oh. I understand it now. I think it's a bug.
> 
> And.. I just take a look at my internal LLVM...
> Also has same issue....
> 
> I think we need to adapt the Gimple IR here:
> 
>   _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]);
>   _21 = vect_vec_iv_.6_22 + { POLY_INT_CST [4, 4], ... };
> 
> change it into:
> 
>   _35 = .SELECT_VL (ivtmp_33, POLY_INT_CST [4, 4]);
>   _21 = vect_vec_iv_.6_22 + _35;

Yeah, so...I guess the original report still valid, it's just bring up another
potential bug :P

Personally I really hate that magic constraint for vl but it's just toooooo
late.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (7 preceding siblings ...)
  2023-11-08  8:06 ` [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV kito at gcc dot gnu.org
@ 2023-11-08  9:08 ` juzhe.zhong at rivai dot ai
  2023-11-08  9:15 ` kito at gcc dot gnu.org
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08  9:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #9 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
I have a draft patch to fix it:

foo:
        ble     a0,zero,.L5
        vsetvli a5,zero,e32,m1,ta,ma
        vid.v   v2
.L3:
        vsetvli a5,a0,e32,m1,ta,ma
        slli    a4,a5,2
        vle32.v v3,0(a1)
        sub     a0,a0,a5
        vadd.vv v1,v2,v3
        vse32.v v1,0(a2)
        add     a1,a1,a4
        add     a2,a2,a4
        vsetvli a4,zero,e32,m1,ta,ma
        vmv.v.x v1,a5
        vadd.vv v2,v2,v1
        bne     a0,zero,.L3
.L5:
        ret

Seems correct ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (8 preceding siblings ...)
  2023-11-08  9:08 ` juzhe.zhong at rivai dot ai
@ 2023-11-08  9:15 ` kito at gcc dot gnu.org
  2023-11-08  9:17 ` juzhe.zhong at rivai dot ai
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: kito at gcc dot gnu.org @ 2023-11-08  9:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #10 from Kito Cheng <kito at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #9)
> I have a draft patch to fix it:
> 
> foo:
> 	ble	a0,zero,.L5
> 	vsetvli	a5,zero,e32,m1,ta,ma
> 	vid.v	v2
> .L3:
> 	vsetvli	a5,a0,e32,m1,ta,ma
> 	slli	a4,a5,2
> 	vle32.v	v3,0(a1)
> 	sub	a0,a0,a5
> 	vadd.vv	v1,v2,v3
> 	vse32.v	v1,0(a2)
> 	add	a1,a1,a4
> 	add	a2,a2,a4
> 	vsetvli	a4,zero,e32,m1,ta,ma
> 	vmv.v.x	v1,a5

^^^^ this splat must be under "vsetvli  a5,a0,e32,m1,ta,ma" rather than
"vsetvli        a4,zero,e32,m1,ta,ma"

> 	vadd.vv	v2,v2,v1
> 	bne	a0,zero,.L3
> .L5:
> 	ret
> 
> Seems correct ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (9 preceding siblings ...)
  2023-11-08  9:15 ` kito at gcc dot gnu.org
@ 2023-11-08  9:17 ` juzhe.zhong at rivai dot ai
  2023-11-08  9:20 ` kito at gcc dot gnu.org
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08  9:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #11 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Why the splat can't be VLMAX ?

I think it must be VLMAX, otherwise, it could be wrong.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (10 preceding siblings ...)
  2023-11-08  9:17 ` juzhe.zhong at rivai dot ai
@ 2023-11-08  9:20 ` kito at gcc dot gnu.org
  2023-11-08 10:57 ` juzhe.zhong at rivai dot ai
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: kito at gcc dot gnu.org @ 2023-11-08  9:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #12 from Kito Cheng <kito at gcc dot gnu.org> ---
oh, yeah, you are right, it already take a5 to splat, so it's right, and as you
said it must be VLMAX, unless it AVL prorogation for both splat and the
following vadd.vv

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (11 preceding siblings ...)
  2023-11-08  9:20 ` kito at gcc dot gnu.org
@ 2023-11-08 10:57 ` juzhe.zhong at rivai dot ai
  2023-11-10 14:33 ` cvs-commit at gcc dot gnu.org
  2023-11-10 15:16 ` juzhe.zhong at rivai dot ai
  14 siblings, 0 replies; 16+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-08 10:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #13 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, kito.

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635688.html 

Candidate patch to fix this.

Could you comment and give more explanation to Richards since I don't think I
can explain it better than you.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (12 preceding siblings ...)
  2023-11-08 10:57 ` juzhe.zhong at rivai dot ai
@ 2023-11-10 14:33 ` cvs-commit at gcc dot gnu.org
  2023-11-10 15:16 ` juzhe.zhong at rivai dot ai
  14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-10 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <panli@gcc.gnu.org>:

https://gcc.gnu.org/g:fb906061e10662280f602886c3659ac1c7522a37

commit r14-5326-gfb906061e10662280f602886c3659ac1c7522a37
Author: Juzhe-Zhong <juzhe.zhong@rivai.ai>
Date:   Fri Nov 10 20:20:11 2023 +0800

    Middle-end: Fix bug of induction variable vectorization for RVV

    PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

    1. Since SELECT_VL result is not necessary always VF in non-final
iteration.

    Current GIMPLE IR is wrong:

    ...
    _35 = .SELECT_VL (ivtmp_33, VF);
    _21 = vect_vec_iv_.8_22 + { VF, ... };

    E.g. Consider the total iterations N = 6, the VF = 4.
    Since SELECT_VL output is defined as not always to be VF in non-final
iteration
    which needs to depend on hardware implementation.

    Suppose we have a RVV CPU core with vsetvl doing even distribution workload
optimization.
    It may process 3 elements at the 1st iteration and 3 elements at the last
iteration.
    Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST
[4, 4], ... };
    is wrong which is adding VF, which is 4, actually, we didn't process 4
elements.

    It should be adding 3 elements which is the result of SELECT_VL.
    So, here the correct IR should be:

      _36 = .SELECT_VL (ivtmp_34, VF);
      _22 = (int) _36;
      vect_cst__21 = [vec_duplicate_expr] _22;

    2. This issue only happens on non-SLP vectorization single rgroup since:

         if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
        {
          tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
          if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
                                              OPTIMIZE_FOR_SPEED)
              && LOOP_VINFO_LENS (loop_vinfo).length () == 1
              && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
              && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
                  || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
            LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
        }

    3. This issue doesn't appears on nested loop no matter
LOOP_VINFO_USING_SELECT_VL_P is true or false.

    Since:

      # vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)>
      # vect_diff_15.7_20 = PHI <vect_diff_9.8_22(3), vect_diff_18.5_11(5)>
      _19 = vect_vec_iv_.6_5 + { 1, ... };
      vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5,
vect_diff_15.7_20, vect_diff_15.7_20, _28, 0);
      ivtmp_1 = ivtmp_4 + 4294967295;
      ....
      <bb 5> [local count: 6549826]:
      # vect_diff_18.5_11 = PHI <vect_diff_9.8_22(4), { 0, ... }(2)>
      # ivtmp_26 = PHI <ivtmp_27(4), 40(2)>
      _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]);
      goto <bb 3>; [100.00%]

    Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST
[4, 4], ... }; update induction variable
    independent on VF (or don't care about how many elements are processed in
the iteration).

    The update is loop invariant. So it won't be the problem even if
LOOP_VINFO_USING_SELECT_VL_P is true.

    Testing passed, Ok for trunk ?

            PR tree-optimization/112438

    gcc/ChangeLog:

            * tree-vect-loop.cc (vectorizable_induction): Bugfix when
            LOOP_VINFO_USING_SELECT_VL_P.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/autovec/pr112438.c: New test.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV
  2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
                   ` (13 preceding siblings ...)
  2023-11-10 14:33 ` cvs-commit at gcc dot gnu.org
@ 2023-11-10 15:16 ` juzhe.zhong at rivai dot ai
  14 siblings, 0 replies; 16+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-10 15:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #15 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-11-10 15:16 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-08  6:43 [Bug c/112438] New: RISC-V: Failed to AVL propagation through induction variable juzhe.zhong at rivai dot ai
2023-11-08  6:58 ` [Bug target/112438] " kito at gcc dot gnu.org
2023-11-08  7:00 ` kito at gcc dot gnu.org
2023-11-08  7:05 ` juzhe.zhong at rivai dot ai
2023-11-08  7:25 ` juzhe.zhong at rivai dot ai
2023-11-08  7:26 ` kito at gcc dot gnu.org
2023-11-08  7:29 ` kito at gcc dot gnu.org
2023-11-08  7:51 ` juzhe.zhong at rivai dot ai
2023-11-08  8:06 ` [Bug target/112438] RISC-V: Wrong auto-vectorization on induction variable of RVV kito at gcc dot gnu.org
2023-11-08  9:08 ` juzhe.zhong at rivai dot ai
2023-11-08  9:15 ` kito at gcc dot gnu.org
2023-11-08  9:17 ` juzhe.zhong at rivai dot ai
2023-11-08  9:20 ` kito at gcc dot gnu.org
2023-11-08 10:57 ` juzhe.zhong at rivai dot ai
2023-11-10 14:33 ` cvs-commit at gcc dot gnu.org
2023-11-10 15:16 ` juzhe.zhong at rivai dot ai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).