public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "wilco at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/90838] Detect table-based ctz implementation
Date: Fri, 17 Feb 2023 14:27:19 +0000	[thread overview]
Message-ID: <bug-90838-4-yDWu1ixilw@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-90838-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90838

--- Comment #17 from Wilco <wilco at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #16)
> (In reply to Wilco from comment #15)
> > It would make more sense to move x86 backends to CTZ_DEFINED_VALUE_AT_ZERO
> > == 2 so that you always get the same result even when you don't have tzcnt.
> > A conditional move would be possible, so it adds an extra 2 instructions at
> > worst (ie. still significantly faster than doing the table lookup, multiply
> > etc). And it could be optimized when you know CLZ/CTZ input is non-zero.
> 
> Conditional moves are a lottery on x86, in many cases very bad idea.  And
> when people actually use __builtin_clz*, they state that they don't care
> about the 0 value, so emitting terribly performing code for it just in case
> would be wrong.
> If forwprop emits the conditional in separate blocks for the CTZ_DVAZ!=2
> case, on targets where conditional moves are beneficial for it it can also
> emit them, or emit the jump which say on x86 will be most likely faster than
> cmov.

Well GCC emits a cmov for this (-O2 -march=x86-64-v2):

int ctz(long a)
{
  return (a == 0) ? 64 : __builtin_ctzl (a);
}

ctz:
        xor     edx, edx
        mov     eax, 64
        rep bsf rdx, rdi
        test    rdi, rdi
        cmovne  eax, edx
        ret

Note the extra 'test' seems redundant since IIRC bsf sets Z=1 if the input is
zero.

On Zen 2 this has identical performance as the plain builtin when you loop it
as res = ctz (res) + 1; (ie. measuring latency of non-zero case). So I find it
hard to believe cmov is expensive on modern cores.

  parent reply	other threads:[~2023-02-17 14:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-90838-4@http.gcc.gnu.org/bugzilla/>
2019-06-11 22:47 ` wdijkstr at arm dot com
2023-02-17  2:20 ` gabravier at gmail dot com
2023-02-17  2:44 ` pinskia at gcc dot gnu.org
2023-02-17 10:34 ` jakub at gcc dot gnu.org
2023-02-17 12:57 ` wilco at gcc dot gnu.org
2023-02-17 13:03 ` jakub at gcc dot gnu.org
2023-02-17 14:27 ` wilco at gcc dot gnu.org [this message]
2023-02-17 14:33 ` jakub at gcc dot gnu.org
2023-02-17 14:41 ` gabravier at gmail dot com
2023-02-17 14:45 ` jakub at gcc dot gnu.org
2023-02-17 16:32 ` wilco at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-90838-4-yDWu1ixilw@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).