public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd
@ 2020-10-20 14:44 christophe.leroy at csgroup dot eu
  2020-10-20 16:25 ` [Bug c/97503] " jakub at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: christophe.leroy at csgroup dot eu @ 2020-10-20 14:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

            Bug ID: 97503
           Summary: Suboptimal use of cntlzw and cntlzd
           Product: gcc
           Version: 10.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: christophe.leroy at csgroup dot eu
  Target Milestone: ---

int f(int x)
{
        return x ? __builtin_clz(x) : 32;
}

Is built as

00000000 <f>:
   0:   2c 03 00 00     cmpwi   r3,0
   4:   40 82 00 0c     bne     10 <f+0x10>
   8:   38 60 00 20     li      r3,32
   c:   4e 80 00 20     blr
  10:   7c 63 00 34     cntlzw  r3,r3
  14:   4e 80 00 20     blr


I would expect

00000000 <f>:
   0:   7c 63 00 34     cntlzw  r3,r3
   4:   4e 80 00 20     blr

Because cntlzw (Count Leading Zeros Word) is documentated in powerpc
instruction set as returning 0 to 32 inclusive

The same applies to the 64 bits version:

long f(long x)
{
        return x ? __builtin_clzll(x) : 64;
}

0000000000000000 <.f>:
   0:   2c 23 00 00     cmpdi   r3,0
   4:   41 82 00 0c     beq     10 <.f+0x10>
   8:   7c 63 00 74     cntlzd  r3,r3
   c:   4e 80 00 20     blr
  10:   38 60 00 40     li      r3,64
  14:   4e 80 00 20     blr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/97503] Suboptimal use of cntlzw and cntlzd
  2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
@ 2020-10-20 16:25 ` jakub at gcc dot gnu.org
  2020-10-20 16:49 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-20 16:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Created attachment 49409
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49409&action=edit
gcc11-pr97503.patch

While that is something that can and often is done during ifcvt, I think for
various architectures we can do it at the GIMPLE level too, as done in this
patch (untested so far).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug c/97503] Suboptimal use of cntlzw and cntlzd
  2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
  2020-10-20 16:25 ` [Bug c/97503] " jakub at gcc dot gnu.org
@ 2020-10-20 16:49 ` jakub at gcc dot gnu.org
  2020-10-21  8:54 ` [Bug target/97503] " cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-20 16:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
On the RTL side, there is simplify_cond_clz_ctz that should simplify it and
noce_try_ifelse_collapse that should be matching it (it does on x86 with -mbmi
-mlzcnt).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
  2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
  2020-10-20 16:25 ` [Bug c/97503] " jakub at gcc dot gnu.org
  2020-10-20 16:49 ` jakub at gcc dot gnu.org
@ 2020-10-21  8:54 ` cvs-commit at gcc dot gnu.org
  2023-11-09  5:59 ` lh_mouse at 126 dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-10-21  8:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:5244b4af5e47bc98a2a9cf36f048981583a1b163

commit r11-4183-g5244b4af5e47bc98a2a9cf36f048981583a1b163
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Oct 21 10:51:33 2020 +0200

    phiopt: Optimize x ? __builtin_clz (x) : 32 in GIMPLE [PR97503]

    While we have at the RTL level noce_try_ifelse_collapse combined with
    simplify_cond_clz_ctz, that optimization doesn't always trigger because
    e.g. on powerpc there is an define_insn to compare a reg against zero and
    copy that register to another one and so we end up with a different pseudo
    in the simplify_cond_clz_ctz test and punt.

    For targets that define C?Z_DEFINED_VALUE_AT_ZERO to 2 for certain modes,
    we can optimize it already in phiopt though, just need to ensure that
    we transform the __builtin_c?z* calls into .C?Z ifns because my recent
    VRP changes codified that the builtin calls are always undefined at zero,
    while ifns honor C?Z_DEFINED_VALUE_AT_ZERO equal to 2.
    And, in phiopt we already have popcount handling that does pretty much the
    same thing, except for always using a zero value rather than the one set
    by C?Z_DEFINED_VALUE_AT_ZERO.

    So, this patch extends that function to handle not just popcount, but also
    clz and ctz.

    2020-10-21  Jakub Jelinek  <jakub@redhat.com>

            PR tree-optimization/97503
            * tree-ssa-phiopt.c: Include internal-fn.h.
            (cond_removal_in_popcount_pattern): Rename to ...
            (cond_removal_in_popcount_clz_ctz_pattern): ... this.  Handle not
just
            popcount, but also clz and ctz if it has C?Z_DEFINED_VALUE_AT_ZERO
2.

            * gcc.dg/tree-ssa/pr97503.c: New test.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
  2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
                   ` (2 preceding siblings ...)
  2020-10-21  8:54 ` [Bug target/97503] " cvs-commit at gcc dot gnu.org
@ 2023-11-09  5:59 ` lh_mouse at 126 dot com
  2023-11-09  6:05 ` lh_mouse at 126 dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: lh_mouse at 126 dot com @ 2023-11-09  5:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

LIU Hao <lh_mouse at 126 dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lh_mouse at 126 dot com

--- Comment #4 from LIU Hao <lh_mouse at 126 dot com> ---
Are there any reasons why this was not done for 64?
(https://gcc.godbolt.org/z/7vddPdxaP)


```
using int32_t = int;
using int64_t = long long;
using uint32_t = unsigned int;
using uint64_t = unsigned long long;

void
xlzcnt32(int32_t& val)
  {
    val = val ? (__builtin_clz(val) & 31) : 32;
  }

void
xlzcnt64(int64_t& val)
  {
    val = val ? (__builtin_clzll(val) & 63) : 64;
  }
```

results in
```
xlzcnt32(int&):
        xor     eax, eax
        lzcnt   eax, DWORD PTR [rdi]
        mov     DWORD PTR [rdi], eax
        ret
xlzcnt64(long long&):
        mov     rdx, QWORD PTR [rdi]
        xor     eax, eax
        lzcnt   rax, rdx
        test    rdx, rdx
        mov     edx, 64
        cmove   rax, rdx
        mov     QWORD PTR [rdi], rax
        ret
```

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
  2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
                   ` (3 preceding siblings ...)
  2023-11-09  5:59 ` lh_mouse at 126 dot com
@ 2023-11-09  6:05 ` lh_mouse at 126 dot com
  2023-11-09 16:51 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: lh_mouse at 126 dot com @ 2023-11-09  6:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

--- Comment #5 from LIU Hao <lh_mouse at 126 dot com> ---
(In reply to LIU Hao from comment #4)
>         lzcnt   rax, rdx
>         test    rdx, rdx
>         mov     edx, 64
>         cmove   rax, rdx

There is actually another missed optimization here. LZCNT sets CF if the source
operand is zero. so the TEST instruction is totally unnecessary. We can do
this:

```
  ...
  xor eax, eax
  lzcnt rax, rdx
  mov edx, 64           # or something else, whatever
  cmovb eax, edx
```

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
  2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
                   ` (4 preceding siblings ...)
  2023-11-09  6:05 ` lh_mouse at 126 dot com
@ 2023-11-09 16:51 ` ubizjak at gmail dot com
  2023-11-09 16:53 ` ubizjak at gmail dot com
  2023-11-27  8:14 ` lh_mouse at 126 dot com
  7 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2023-11-09 16:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

--- Comment #6 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to LIU Hao from comment #4)
> Are there any reasons why this was not done for 64?
> (https://gcc.godbolt.org/z/7vddPdxaP)

There is zero-extension from the result of __builtin_clzll that confuses
optimizers.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
  2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
                   ` (5 preceding siblings ...)
  2023-11-09 16:51 ` ubizjak at gmail dot com
@ 2023-11-09 16:53 ` ubizjak at gmail dot com
  2023-11-27  8:14 ` lh_mouse at 126 dot com
  7 siblings, 0 replies; 9+ messages in thread
From: ubizjak at gmail dot com @ 2023-11-09 16:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

--- Comment #7 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Uroš Bizjak from comment #6)
> (In reply to LIU Hao from comment #4)
> > Are there any reasons why this was not done for 64?
> > (https://gcc.godbolt.org/z/7vddPdxaP)
> 
> There is zero-extension from the result of __builtin_clzll that confuses
> optimizers.

Actually, sign-extension, but the result is never sign-extended.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/97503] Suboptimal use of cntlzw and cntlzd
  2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
                   ` (6 preceding siblings ...)
  2023-11-09 16:53 ` ubizjak at gmail dot com
@ 2023-11-27  8:14 ` lh_mouse at 126 dot com
  7 siblings, 0 replies; 9+ messages in thread
From: lh_mouse at 126 dot com @ 2023-11-27  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97503

--- Comment #8 from LIU Hao <lh_mouse at 126 dot com> ---
(In reply to Uroš Bizjak from comment #7)
> Actually, sign-extension, but the result is never sign-extended.

Yes but it should be a no-op right?

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-11-27  8:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-20 14:44 [Bug c/97503] New: Suboptimal use of cntlzw and cntlzd christophe.leroy at csgroup dot eu
2020-10-20 16:25 ` [Bug c/97503] " jakub at gcc dot gnu.org
2020-10-20 16:49 ` jakub at gcc dot gnu.org
2020-10-21  8:54 ` [Bug target/97503] " cvs-commit at gcc dot gnu.org
2023-11-09  5:59 ` lh_mouse at 126 dot com
2023-11-09  6:05 ` lh_mouse at 126 dot com
2023-11-09 16:51 ` ubizjak at gmail dot com
2023-11-09 16:53 ` ubizjak at gmail dot com
2023-11-27  8:14 ` lh_mouse at 126 dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).