public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function
@ 2023-10-12 19:56 kito at gcc dot gnu.org
  2023-10-12 20:49 ` [Bug tree-optimization/111791] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: kito at gcc dot gnu.org @ 2023-10-12 19:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

            Bug ID: 111791
           Summary: RISC-V: Strange loop vectorizaion on popcount function
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: kito at gcc dot gnu.org
  Target Milestone: ---

Symptom:

A typical popcount implementation with Brian Kernighan’s algorithm, vectorizer
has recognized that as popcount, but...come with strange vectorization result,
I  know that might because I add -fno-vect-cost-model, but I still don't
understand why it vectorized, so I guess maybe it's something worth to report.

NOTE:
Those bad/strange code gen will gone once scalar popcount instruction
available. 

Case:
```
int popcount(unsigned long value)
  {
    int nbits;
    for (nbits = 0; value != 0; value &= value - 1)
      nbits++;
    return nbits;
  }

```

Command to reproduce:
```
$ riscv64-unknown-linux-gnu-gcc x.c -march=rv64gcv -o - -S -fno-vect-cost-model
-O3
```

Sha1: g:faae30c49560f1481f036061fa2f894b0f7257f8 (some random point of top of
trunk)

Current output:
```
        .globl  popcount
        .type   popcount, @function
popcount:
.LFB0:
        .cfi_startproc
        beq     a0,zero,.L4
        addi    sp,sp,-16
        .cfi_def_cfa_offset 16
        sd      ra,8(sp)
        .cfi_offset 1, -8
        call    __popcountdi2
        csrr    a2,vlenb
        sext.w  a0,a0
        srli    a2,a2,2
        vsetvli a3,zero,e32,m1,ta,ma
        vid.v   v1
.L3:
        vsetvli a5,a0,e8,mf4,ta,ma
        sub     a0,a0,a5
        vsetvli a3,zero,e32,m1,ta,ma
        vmv1r.v v3,v1
        vmv.v.x v2,a2
        vadd.vv v1,v1,v2
        bne     a0,zero,.L3
        ld      ra,8(sp)
        .cfi_restore 1
        addi    a5,a5,-1
        vadd.vi v3,v3,1
        vslidedown.vx   v3,v3,a5
        addi    sp,sp,16
        .cfi_def_cfa_offset 0
        vmv.x.s a0,v3
        jr      ra
.L4:
        li      a0,0
        ret
        .cfi_endproc
.LFE0:
        .size   popcount, .-popcount

```

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function
  2023-10-12 19:56 [Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function kito at gcc dot gnu.org
@ 2023-10-12 20:49 ` pinskia at gcc dot gnu.org
  2023-10-13  6:54 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-12 20:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-10-12
                 CC|                            |pinskia at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed.

I almost want to say SCCP should always do this loop into _builtin_popcountN if
there are no other statements in it and then it will be removed and not be
vectorized but maybe that there is another way to fix ...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function
  2023-10-12 19:56 [Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function kito at gcc dot gnu.org
  2023-10-12 20:49 ` [Bug tree-optimization/111791] " pinskia at gcc dot gnu.org
@ 2023-10-13  6:54 ` rguenth at gcc dot gnu.org
  2023-10-13 23:13 ` juzhe.zhong at rivai dot ai
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-13  6:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It does do that (if there's a POPCOUNT optab, that is).  Replacing with
__builtin_popcount would eventually lead to an infinite recursion in this case
;) (see that old memcpy + ldist bug)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function
  2023-10-12 19:56 [Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function kito at gcc dot gnu.org
  2023-10-12 20:49 ` [Bug tree-optimization/111791] " pinskia at gcc dot gnu.org
  2023-10-13  6:54 ` rguenth at gcc dot gnu.org
@ 2023-10-13 23:13 ` juzhe.zhong at rivai dot ai
  2023-10-18 19:53 ` rdapp at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-13 23:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |juzhe.zhong at rivai dot ai

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
This is because RISC-V didn't vectorize popcount:

It just a scalar call popcount:

call    __popcountdi2

I have a patch support popcount optab:

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625870.html

However, I don't think we should enable popcount optab if we don't have
a single vector popcount instruction.

An ideal way is to add fallback popcount in loop vectorizer if the target
doesn't
enable vector popcount optab, Robin is working on that:

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626567.html

So, that's why I drop my patch.

More details need Robin comments.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function
  2023-10-12 19:56 [Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function kito at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-10-13 23:13 ` juzhe.zhong at rivai dot ai
@ 2023-10-18 19:53 ` rdapp at gcc dot gnu.org
  2023-10-18 23:15 ` vineetg at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-10-18 19:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

--- Comment #4 from Robin Dapp <rdapp at gcc dot gnu.org> ---
This is a scalar popcount and as Kito already noted we will just emit

  cpop a0, a0

once the zbb extension is present.

As to the question what is actually being vectorized here, I'm not so sure :D
It looks like we're generating a vectorized scalar popcount by something like a
reduction?  But we already did the call to __popcountdi2?

Analyzing loop at pr111791.c:8
pr111791.c:8:25: note:  === analyze_loop_nest ===
pr111791.c:8:25: note:   === vect_analyze_loop_form ===
pr111791.c:8:25: note:    === get_loop_niters ===
Matching expression match.pd:1919, generic-match-8.cc:27
Applying pattern match.pd:1975, generic-match-2.cc:4670
Matching expression match.pd:2707, generic-match-4.cc:36
Matching expression match.pd:2710, generic-match-3.cc:53
Matching expression match.pd:2717, generic-match-2.cc:23
Matching expression match.pd:2707, generic-match-4.cc:36
Matching expression match.pd:2710, generic-match-3.cc:53
Matching expression match.pd:2717, generic-match-2.cc:23
Matching expression match.pd:2707, generic-match-4.cc:36
Matching expression match.pd:2710, generic-match-3.cc:53
Matching expression match.pd:2717, generic-match-2.cc:23
Matching expression match.pd:148, generic-match-10.cc:27
Matching expression match.pd:148, generic-match-10.cc:27
Applying pattern match.pd:4519, generic-match-4.cc:2923
Applying pattern match.pd:201, generic-match-4.cc:3103
Applying pattern match.pd:3393, generic-match-2.cc:182
pr111791.c:8:25: note:   Symbolic number of iterations is (unsigned intD.4)
__builtin_popcountlD.1952 (value_4(D))

Ah, interesting: ranger(?) recognizes that the loop runs "popcount" iterations.
Shouldn't that still be 64?  Well, it probably knows better :)

Regardless, we use this symbolic value as number of iterations:
  _5 = __builtin_popcountlD.1952 (value_4(D));
  niters.4_9 = (unsigned intD.4) _5;
  _2 = __builtin_popcountlD.1952 (value_4(D));
  bnd.5_3 = (unsigned intD.4) _2;
  _23 = (unsigned long) bnd.5_3;

Then, it gets funnier:

  # nbits_11 = PHI <nbits_7(6), 0(5)>
  # vect_vec_iv_.6_15 = PHI <_16(6), { 0, 1, 2, ... }(5)>
  # ivtmp_24 = PHI <ivtmp_25(6), _23(5)>
  _26 = .SELECT_VL (ivtmp_24, POLY_INT_CST [4, 4]);
  _16 = vect_vec_iv_.6_15 + { POLY_INT_CST [4, 4], ... };
  vect_nbits_7.7_18 = vect_vec_iv_.6_15 + { 1, ... };
  # RANGE [irange] int [1, 65]
  nbits_7 = nbits_11 + 1;
  # RANGE [irange] long unsigned int [0, 18446744073709551614]
  _1 = value_10 + 18446744073709551615;
  # RANGE [irange] long unsigned int [0, 18446744073709551614]
  value_8 = _1 & value_10;
  ivtmp_25 = ivtmp_24 - _26;

i.e. we have a vector IV that we add to the vectorized nbits.  Finally we
extract the niter-th (=popcount) element from that vector only to get -
popcount :)

Still not sure why that happens but a vector-mode popcount expander doesn't
help here as everything is scalar.  Maybe the explanation is simple in that we
would vectorize such a loop anyway and here it just looks particularly bad
because we already know the result via ranger?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function
  2023-10-12 19:56 [Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function kito at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-10-18 19:53 ` rdapp at gcc dot gnu.org
@ 2023-10-18 23:15 ` vineetg at gcc dot gnu.org
  2023-10-18 23:17 ` pinskia at gcc dot gnu.org
  2023-10-19  2:26 ` juzhe.zhong at rivai dot ai
  6 siblings, 0 replies; 8+ messages in thread
From: vineetg at gcc dot gnu.org @ 2023-10-18 23:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

--- Comment #5 from Vineet Gupta <vineetg at gcc dot gnu.org> ---
(In reply to Robin Dapp from comment #4)

> Analyzing loop at pr111791.c:8
> pr111791.c:8:25: note:  === analyze_loop_nest ===
> pr111791.c:8:25: note:   === vect_analyze_loop_form ===
> pr111791.c:8:25: note:    === get_loop_niters ===
> Matching expression match.pd:1919, generic-match-8.cc:27
> Applying pattern match.pd:1975, generic-match-2.cc:4670
> Matching expression match.pd:2707, generic-match-4.cc:36
> Matching expression match.pd:2710, generic-match-3.cc:53
> Matching expression match.pd:2717, generic-match-2.cc:23
> Matching expression match.pd:2707, generic-match-4.cc:36
> Matching expression match.pd:2710, generic-match-3.cc:53
> Matching expression match.pd:2717, generic-match-2.cc:23
> Matching expression match.pd:2707, generic-match-4.cc:36
> Matching expression match.pd:2710, generic-match-3.cc:53
> Matching expression match.pd:2717, generic-match-2.cc:23
> Matching expression match.pd:148, generic-match-10.cc:27
> Matching expression match.pd:148, generic-match-10.cc:27
> Applying pattern match.pd:4519, generic-match-4.cc:2923
> Applying pattern match.pd:201, generic-match-4.cc:3103
> Applying pattern match.pd:3393, generic-match-2.cc:182
> pr111791.c:8:25: note:   Symbolic number of iterations is (unsigned intD.4)
> __builtin_popcountlD.1952 (value_4(D))

Curious, how did you get this debug output - is this just one of -fdump-tree-?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function
  2023-10-12 19:56 [Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function kito at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-10-18 23:15 ` vineetg at gcc dot gnu.org
@ 2023-10-18 23:17 ` pinskia at gcc dot gnu.org
  2023-10-19  2:26 ` juzhe.zhong at rivai dot ai
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-10-18 23:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Vineet Gupta from comment #5)
> (In reply to Robin Dapp from comment #4)
> 
> > Analyzing loop at pr111791.c:8
> > pr111791.c:8:25: note:  === analyze_loop_nest ===
> > pr111791.c:8:25: note:   === vect_analyze_loop_form ===
> > pr111791.c:8:25: note:    === get_loop_niters ===
> > Matching expression match.pd:1919, generic-match-8.cc:27
> > Applying pattern match.pd:1975, generic-match-2.cc:4670
> > Matching expression match.pd:2707, generic-match-4.cc:36
> > Matching expression match.pd:2710, generic-match-3.cc:53
> > Matching expression match.pd:2717, generic-match-2.cc:23
> > Matching expression match.pd:2707, generic-match-4.cc:36
> > Matching expression match.pd:2710, generic-match-3.cc:53
> > Matching expression match.pd:2717, generic-match-2.cc:23
> > Matching expression match.pd:2707, generic-match-4.cc:36
> > Matching expression match.pd:2710, generic-match-3.cc:53
> > Matching expression match.pd:2717, generic-match-2.cc:23
> > Matching expression match.pd:148, generic-match-10.cc:27
> > Matching expression match.pd:148, generic-match-10.cc:27
> > Applying pattern match.pd:4519, generic-match-4.cc:2923
> > Applying pattern match.pd:201, generic-match-4.cc:3103
> > Applying pattern match.pd:3393, generic-match-2.cc:182
> > pr111791.c:8:25: note:   Symbolic number of iterations is (unsigned intD.4)
> > __builtin_popcountlD.1952 (value_4(D))
> 
> Curious, how did you get this debug output - is this just one of
> -fdump-tree-?

The `applying pattern`/`matching expression` comes from `-folding` option of
`-fdump-tree-`. It is enabled with `-all` at the end too.
So in this case it looks like it was:
`-fdump-tree-vect-all` since both __builtin_popcount and the type `unsigned
int` has the decl ID at the end (that is what `D.4` and `D.1952` are).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/111791] RISC-V: Strange loop vectorizaion on popcount function
  2023-10-12 19:56 [Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function kito at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-10-18 23:17 ` pinskia at gcc dot gnu.org
@ 2023-10-19  2:26 ` juzhe.zhong at rivai dot ai
  6 siblings, 0 replies; 8+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-10-19  2:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111791

--- Comment #7 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
I don't think this is popcount vectorization issue.

This code should not be vectorized. It's true this code won' be vectorized if
we
use default COST model.

So this is not an issue.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-19  2:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-12 19:56 [Bug tree-optimization/111791] New: RISC-V: Strange loop vectorizaion on popcount function kito at gcc dot gnu.org
2023-10-12 20:49 ` [Bug tree-optimization/111791] " pinskia at gcc dot gnu.org
2023-10-13  6:54 ` rguenth at gcc dot gnu.org
2023-10-13 23:13 ` juzhe.zhong at rivai dot ai
2023-10-18 19:53 ` rdapp at gcc dot gnu.org
2023-10-18 23:15 ` vineetg at gcc dot gnu.org
2023-10-18 23:17 ` pinskia at gcc dot gnu.org
2023-10-19  2:26 ` juzhe.zhong at rivai dot ai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).