public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Builtin for consulting value analysis (better ffs() code gen)
@ 2024-03-14  0:03 Andrew Cooper
  2024-03-14  6:04 ` Alexander Monakov
  2024-03-14 11:02 ` Florian Weimer
  0 siblings, 2 replies; 8+ messages in thread
From: Andrew Cooper @ 2024-03-14  0:03 UTC (permalink / raw)
  To: gcc

Hello,

I've come across an issue that I would have thought there would be a
builtin for, but perhaps that's just wishful thinking.  I'd like to be
able to write something like this:

    if (__builtin_expr_is_true(x > 0))
        ... // one thing
    else
        ... // something else

This stems from trying to clean up the mess of bit operation helpers in Xen.

On x86, __builtin_ffs() doesn't have great code generation.  This is a
consequence of the BSF instruction having miserable semantics, and the
builtin emits code with a branch or cmov to compensate for undefined
case of passing 0 in.

On x86_64 at least, Intel and AMD have made enough guarantees in writing
to allow a condition-less form:

    mov $-1, %dst
    bsf %src, %dst
    add $1, %dst

which is good, but not great.  It is common to have an __ffs() variant
which states that a src of 0 is undefined, and while this makes a
reasonable improvement to the code generation within loops, it's still
not great to rely on the programmer to get this right.

A common pattern to find is something like:

    while (x) {
        int b = ffs(x);
        ... // do something with x and b

where range analysis can know that x is nonzero.  Indeed, the builtin
manages to spot this, and emits a condition-less form too.

However, doing this for a local implementation of ffs() doesn't work.  With:

unsigned int my_ffs(unsigned int x)
{
    int res;

    if (x) {
        asm ("bsf ..." : "=r" (res) : "rm" (x));
    } else {
        res = -1;
        asm ("bsf ..." : "+r" (res) : "rm" (x));
    }

    return res + 1;
}

the while() example above really does get generated with ideal form. 
However, in general code where the value of x is unknown, the entire
if/else chain is emitted, which is strictly worse than just emitting the
else case which is the safe catch-all.

I suppose that what I'm looking for is something a little like
__builtin_constant_p() which can either be used in a straight if(), or
in a __builtin_choose_expr().

Anyway - is there a way of doing this that I've managed to overlook?

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Builtin for consulting value analysis (better ffs() code gen)
  2024-03-14  0:03 Builtin for consulting value analysis (better ffs() code gen) Andrew Cooper
@ 2024-03-14  6:04 ` Alexander Monakov
  2024-03-14 11:30   ` Andrew Cooper
  2024-03-14 11:02 ` Florian Weimer
  1 sibling, 1 reply; 8+ messages in thread
From: Alexander Monakov @ 2024-03-14  6:04 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: gcc


On Thu, 14 Mar 2024, Andrew Cooper via Gcc wrote:

> I suppose that what I'm looking for is something a little like
> __builtin_constant_p() which can either be used in a straight if(), or
> in a __builtin_choose_expr().
> 
> Anyway - is there a way of doing this that I've managed to overlook?

I am missing what is lacking for you with __builtin_constant_p, I would
do it like this:

unsigned ffs(unsigned x)
{
    unsigned res;
    unsigned nonzero = x != 0;
    if (__builtin_constant_p(nonzero) && nonzero)
        asm("bsf %1, %0" : "=r"(res) : "rm"(x));
    else {
        res = -1;
        asm("bsf %1, %0" : "+r"(res) : "rm"(x));
    }
    return res;
}

or with handling known-zero-input case like this:

unsigned ffs(unsigned x)
{
    unsigned res;
    unsigned nonzero = x != 0;
    if (!__builtin_constant_p(nonzero)) {
        res = -1;
        asm("bsf %1, %0" : "+r"(res) : "rm"(x));
    } else if (nonzero) {
        asm("bsf %1, %0" : "=r"(res) : "rm"(x));
    } else {
        res = -1;
    }
    return res;
}


Does it work for you?

HTH
Alexander

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Builtin for consulting value analysis (better ffs() code gen)
  2024-03-14  0:03 Builtin for consulting value analysis (better ffs() code gen) Andrew Cooper
  2024-03-14  6:04 ` Alexander Monakov
@ 2024-03-14 11:02 ` Florian Weimer
  2024-03-14 11:52   ` Andrew Cooper
  1 sibling, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2024-03-14 11:02 UTC (permalink / raw)
  To: Andrew Cooper via Gcc; +Cc: Andrew Cooper

* Andrew Cooper via Gcc:

> Anyway - is there a way of doing this that I've managed to overlook?

Use __builtin_ffs?

I think there's some concern for GCC that some of the alternative x86-64
implementations do not follow the AMD and Intel behavior.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Builtin for consulting value analysis (better ffs() code gen)
  2024-03-14  6:04 ` Alexander Monakov
@ 2024-03-14 11:30   ` Andrew Cooper
  2024-03-14 12:03     ` Andreas Schwab
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2024-03-14 11:30 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc

On 14/03/2024 6:04 am, Alexander Monakov wrote:
> On Thu, 14 Mar 2024, Andrew Cooper via Gcc wrote:
>
>> I suppose that what I'm looking for is something a little like
>> __builtin_constant_p() which can either be used in a straight if(), or
>> in a __builtin_choose_expr().
>>
>> Anyway - is there a way of doing this that I've managed to overlook?
> I am missing what is lacking for you with __builtin_constant_p, I would
> do it like this:
>
> unsigned ffs(unsigned x)
> {
>     unsigned res;
>     unsigned nonzero = x != 0;
>     if (__builtin_constant_p(nonzero) && nonzero)
>         asm("bsf %1, %0" : "=r"(res) : "rm"(x));
>     else {
>         res = -1;
>         asm("bsf %1, %0" : "+r"(res) : "rm"(x));
>     }
>     return res;
> }

Oh - so it does.  I'd not considered that expressing it like that would
still work.

>
> or with handling known-zero-input case like this:
>
> unsigned ffs(unsigned x)
> {
>     unsigned res;
>     unsigned nonzero = x != 0;
>     if (!__builtin_constant_p(nonzero)) {
>         res = -1;
>         asm("bsf %1, %0" : "+r"(res) : "rm"(x));
>     } else if (nonzero) {
>         asm("bsf %1, %0" : "=r"(res) : "rm"(x));
>     } else {
>         res = -1;
>     }
>     return res;
> }
>
>
> Does it work for you?

I simplified things when asking the question.  The real implementation
has a general

    if (__builtin_constant_p(x))
        return __builtin_ffs(x);

so any known-constant value can be folded.  What I'm dealing with is the
remainder of the cases.

Anyway - thankyou for your help.

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Builtin for consulting value analysis (better ffs() code gen)
  2024-03-14 11:02 ` Florian Weimer
@ 2024-03-14 11:52   ` Andrew Cooper
  0 siblings, 0 replies; 8+ messages in thread
From: Andrew Cooper @ 2024-03-14 11:52 UTC (permalink / raw)
  To: Florian Weimer, Andrew Cooper via Gcc

On 14/03/2024 11:02 am, Florian Weimer wrote:
> * Andrew Cooper via Gcc:
>
>> Anyway - is there a way of doing this that I've managed to overlook?
> Use __builtin_ffs?
>
> I think there's some concern for GCC that some of the alternative x86-64
> implementations do not follow the AMD and Intel behavior.

This is why I'm not asking "please improve the code gen of
__builtin_ffs()".  Its a can of worms I don't expect anyone wants to open.

But in the case that I can do better owing to a narrower scope, I'd like
to do so.

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Builtin for consulting value analysis (better ffs() code gen)
  2024-03-14 11:30   ` Andrew Cooper
@ 2024-03-14 12:03     ` Andreas Schwab
  2024-03-14 15:33       ` Andrew Cooper
  0 siblings, 1 reply; 8+ messages in thread
From: Andreas Schwab @ 2024-03-14 12:03 UTC (permalink / raw)
  To: Andrew Cooper via Gcc; +Cc: Alexander Monakov, Andrew Cooper

On Mär 14 2024, Andrew Cooper via Gcc wrote:

> so any known-constant value can be folded.  What I'm dealing with is the
> remainder of the cases.

Which cases remain?

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Builtin for consulting value analysis (better ffs() code gen)
  2024-03-14 12:03     ` Andreas Schwab
@ 2024-03-14 15:33       ` Andrew Cooper
  2024-03-21  8:15         ` LIU Hao
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2024-03-14 15:33 UTC (permalink / raw)
  To: Andreas Schwab, Andrew Cooper via Gcc; +Cc: Alexander Monakov

On 14/03/2024 12:03 pm, Andreas Schwab wrote:
> On Mär 14 2024, Andrew Cooper via Gcc wrote:
>
>> so any known-constant value can be folded.  What I'm dealing with is the
>> remainder of the cases.
> Which cases remain?

None, thanks to the answers on this thread.

The overall structure I've got now is:

unsigned int ffs(unsigned int x)
{
    if ( __builtin_constant_p(x) )
        return __builtin_ffs(x);  // Allows constant folding

#ifndef arch_ffs
#define arch_ffs __builtin_ffs
#endif

    return arch_ffs(x);
}

And for x86's arch_ffs(),

unsigned int arch_ffs(unsigned int x)
{
    unsigned int res;

    if ( __builtin_constant_p(x > 0) && x > 0 )
    {
        // Well defined when x is known non-zero
        asm("bsf %1, %0" : "=r"(res) : "rm"(x));
    }
    else
    {
        // The architects say this is safe even for 0.
        res = -1;
        asm("bsf %1, %0" : "+r"(res) : "rm"(x));
    }

    return res + 1;
}


This gives the same code gen as before (give or take some register
shuffling), without having to rely on the programmer to remember to get
their ffs()'s separated from their __ffs()'s as it pertains to undefined
input.

The other architectures which have better-defined instructions don't
need any of these games to retain the same good code-gen that is
currently expressed with a maze of subtly-different functions.

~Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Builtin for consulting value analysis (better ffs() code gen)
  2024-03-14 15:33       ` Andrew Cooper
@ 2024-03-21  8:15         ` LIU Hao
  0 siblings, 0 replies; 8+ messages in thread
From: LIU Hao @ 2024-03-21  8:15 UTC (permalink / raw)
  To: Andrew Cooper, Andrew Cooper via Gcc


[-- Attachment #1.1: Type: text/plain, Size: 931 bytes --]

在 2024-03-14 23:33, Andrew Cooper via Gcc 写道:
> And for x86's arch_ffs(),
> 
> unsigned int arch_ffs(unsigned int x)
> {
>      unsigned int res;
> 
>      if ( __builtin_constant_p(x > 0) && x > 0 )
>      {
>          // Well defined when x is known non-zero
>          asm("bsf %1, %0" : "=r"(res) : "rm"(x));

Even if you may assume that the destination operand is always destroyed, the hardware has no 
knowledge about that, so this statement has a dependency on `res`, and shouldn't be declared `=r`.

I think it's better to remove this `if`. The other branch below clearly eliminates the dependency.


>      }
>      else
>      {
>          // The architects say this is safe even for 0.
>          res = -1;
>          asm("bsf %1, %0" : "+r"(res) : "rm"(x));
>      }
> 
>      return res + 1;
> }

-- 
Best regards,
LIU Hao


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-21  8:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-14  0:03 Builtin for consulting value analysis (better ffs() code gen) Andrew Cooper
2024-03-14  6:04 ` Alexander Monakov
2024-03-14 11:30   ` Andrew Cooper
2024-03-14 12:03     ` Andreas Schwab
2024-03-14 15:33       ` Andrew Cooper
2024-03-21  8:15         ` LIU Hao
2024-03-14 11:02 ` Florian Weimer
2024-03-14 11:52   ` Andrew Cooper

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).