* Builtin for consulting value analysis (better ffs() code gen)
@ 2024-03-14 0:03 Andrew Cooper
2024-03-14 6:04 ` Alexander Monakov
2024-03-14 11:02 ` Florian Weimer
0 siblings, 2 replies; 8+ messages in thread
From: Andrew Cooper @ 2024-03-14 0:03 UTC (permalink / raw)
To: gcc
Hello,
I've come across an issue that I would have thought there would be a
builtin for, but perhaps that's just wishful thinking. I'd like to be
able to write something like this:
if (__builtin_expr_is_true(x > 0))
... // one thing
else
... // something else
This stems from trying to clean up the mess of bit operation helpers in Xen.
On x86, __builtin_ffs() doesn't have great code generation. This is a
consequence of the BSF instruction having miserable semantics, and the
builtin emits code with a branch or cmov to compensate for undefined
case of passing 0 in.
On x86_64 at least, Intel and AMD have made enough guarantees in writing
to allow a condition-less form:
mov $-1, %dst
bsf %src, %dst
add $1, %dst
which is good, but not great. It is common to have an __ffs() variant
which states that a src of 0 is undefined, and while this makes a
reasonable improvement to the code generation within loops, it's still
not great to rely on the programmer to get this right.
A common pattern to find is something like:
while (x) {
int b = ffs(x);
... // do something with x and b
where range analysis can know that x is nonzero. Indeed, the builtin
manages to spot this, and emits a condition-less form too.
However, doing this for a local implementation of ffs() doesn't work. With:
unsigned int my_ffs(unsigned int x)
{
int res;
if (x) {
asm ("bsf ..." : "=r" (res) : "rm" (x));
} else {
res = -1;
asm ("bsf ..." : "+r" (res) : "rm" (x));
}
return res + 1;
}
the while() example above really does get generated with ideal form.
However, in general code where the value of x is unknown, the entire
if/else chain is emitted, which is strictly worse than just emitting the
else case which is the safe catch-all.
I suppose that what I'm looking for is something a little like
__builtin_constant_p() which can either be used in a straight if(), or
in a __builtin_choose_expr().
Anyway - is there a way of doing this that I've managed to overlook?
~Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Builtin for consulting value analysis (better ffs() code gen)
2024-03-14 0:03 Builtin for consulting value analysis (better ffs() code gen) Andrew Cooper
@ 2024-03-14 6:04 ` Alexander Monakov
2024-03-14 11:30 ` Andrew Cooper
2024-03-14 11:02 ` Florian Weimer
1 sibling, 1 reply; 8+ messages in thread
From: Alexander Monakov @ 2024-03-14 6:04 UTC (permalink / raw)
To: Andrew Cooper; +Cc: gcc
On Thu, 14 Mar 2024, Andrew Cooper via Gcc wrote:
> I suppose that what I'm looking for is something a little like
> __builtin_constant_p() which can either be used in a straight if(), or
> in a __builtin_choose_expr().
>
> Anyway - is there a way of doing this that I've managed to overlook?
I am missing what is lacking for you with __builtin_constant_p, I would
do it like this:
unsigned ffs(unsigned x)
{
unsigned res;
unsigned nonzero = x != 0;
if (__builtin_constant_p(nonzero) && nonzero)
asm("bsf %1, %0" : "=r"(res) : "rm"(x));
else {
res = -1;
asm("bsf %1, %0" : "+r"(res) : "rm"(x));
}
return res;
}
or with handling known-zero-input case like this:
unsigned ffs(unsigned x)
{
unsigned res;
unsigned nonzero = x != 0;
if (!__builtin_constant_p(nonzero)) {
res = -1;
asm("bsf %1, %0" : "+r"(res) : "rm"(x));
} else if (nonzero) {
asm("bsf %1, %0" : "=r"(res) : "rm"(x));
} else {
res = -1;
}
return res;
}
Does it work for you?
HTH
Alexander
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Builtin for consulting value analysis (better ffs() code gen)
2024-03-14 0:03 Builtin for consulting value analysis (better ffs() code gen) Andrew Cooper
2024-03-14 6:04 ` Alexander Monakov
@ 2024-03-14 11:02 ` Florian Weimer
2024-03-14 11:52 ` Andrew Cooper
1 sibling, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2024-03-14 11:02 UTC (permalink / raw)
To: Andrew Cooper via Gcc; +Cc: Andrew Cooper
* Andrew Cooper via Gcc:
> Anyway - is there a way of doing this that I've managed to overlook?
Use __builtin_ffs?
I think there's some concern for GCC that some of the alternative x86-64
implementations do not follow the AMD and Intel behavior.
Thanks,
Florian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Builtin for consulting value analysis (better ffs() code gen)
2024-03-14 6:04 ` Alexander Monakov
@ 2024-03-14 11:30 ` Andrew Cooper
2024-03-14 12:03 ` Andreas Schwab
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2024-03-14 11:30 UTC (permalink / raw)
To: Alexander Monakov; +Cc: gcc
On 14/03/2024 6:04 am, Alexander Monakov wrote:
> On Thu, 14 Mar 2024, Andrew Cooper via Gcc wrote:
>
>> I suppose that what I'm looking for is something a little like
>> __builtin_constant_p() which can either be used in a straight if(), or
>> in a __builtin_choose_expr().
>>
>> Anyway - is there a way of doing this that I've managed to overlook?
> I am missing what is lacking for you with __builtin_constant_p, I would
> do it like this:
>
> unsigned ffs(unsigned x)
> {
> unsigned res;
> unsigned nonzero = x != 0;
> if (__builtin_constant_p(nonzero) && nonzero)
> asm("bsf %1, %0" : "=r"(res) : "rm"(x));
> else {
> res = -1;
> asm("bsf %1, %0" : "+r"(res) : "rm"(x));
> }
> return res;
> }
Oh - so it does. I'd not considered that expressing it like that would
still work.
>
> or with handling known-zero-input case like this:
>
> unsigned ffs(unsigned x)
> {
> unsigned res;
> unsigned nonzero = x != 0;
> if (!__builtin_constant_p(nonzero)) {
> res = -1;
> asm("bsf %1, %0" : "+r"(res) : "rm"(x));
> } else if (nonzero) {
> asm("bsf %1, %0" : "=r"(res) : "rm"(x));
> } else {
> res = -1;
> }
> return res;
> }
>
>
> Does it work for you?
I simplified things when asking the question. The real implementation
has a general
if (__builtin_constant_p(x))
return __builtin_ffs(x);
so any known-constant value can be folded. What I'm dealing with is the
remainder of the cases.
Anyway - thankyou for your help.
~Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Builtin for consulting value analysis (better ffs() code gen)
2024-03-14 11:02 ` Florian Weimer
@ 2024-03-14 11:52 ` Andrew Cooper
0 siblings, 0 replies; 8+ messages in thread
From: Andrew Cooper @ 2024-03-14 11:52 UTC (permalink / raw)
To: Florian Weimer, Andrew Cooper via Gcc
On 14/03/2024 11:02 am, Florian Weimer wrote:
> * Andrew Cooper via Gcc:
>
>> Anyway - is there a way of doing this that I've managed to overlook?
> Use __builtin_ffs?
>
> I think there's some concern for GCC that some of the alternative x86-64
> implementations do not follow the AMD and Intel behavior.
This is why I'm not asking "please improve the code gen of
__builtin_ffs()". Its a can of worms I don't expect anyone wants to open.
But in the case that I can do better owing to a narrower scope, I'd like
to do so.
~Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Builtin for consulting value analysis (better ffs() code gen)
2024-03-14 11:30 ` Andrew Cooper
@ 2024-03-14 12:03 ` Andreas Schwab
2024-03-14 15:33 ` Andrew Cooper
0 siblings, 1 reply; 8+ messages in thread
From: Andreas Schwab @ 2024-03-14 12:03 UTC (permalink / raw)
To: Andrew Cooper via Gcc; +Cc: Alexander Monakov, Andrew Cooper
On Mär 14 2024, Andrew Cooper via Gcc wrote:
> so any known-constant value can be folded. What I'm dealing with is the
> remainder of the cases.
Which cases remain?
--
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Builtin for consulting value analysis (better ffs() code gen)
2024-03-14 12:03 ` Andreas Schwab
@ 2024-03-14 15:33 ` Andrew Cooper
2024-03-21 8:15 ` LIU Hao
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Cooper @ 2024-03-14 15:33 UTC (permalink / raw)
To: Andreas Schwab, Andrew Cooper via Gcc; +Cc: Alexander Monakov
On 14/03/2024 12:03 pm, Andreas Schwab wrote:
> On Mär 14 2024, Andrew Cooper via Gcc wrote:
>
>> so any known-constant value can be folded. What I'm dealing with is the
>> remainder of the cases.
> Which cases remain?
None, thanks to the answers on this thread.
The overall structure I've got now is:
unsigned int ffs(unsigned int x)
{
if ( __builtin_constant_p(x) )
return __builtin_ffs(x); // Allows constant folding
#ifndef arch_ffs
#define arch_ffs __builtin_ffs
#endif
return arch_ffs(x);
}
And for x86's arch_ffs(),
unsigned int arch_ffs(unsigned int x)
{
unsigned int res;
if ( __builtin_constant_p(x > 0) && x > 0 )
{
// Well defined when x is known non-zero
asm("bsf %1, %0" : "=r"(res) : "rm"(x));
}
else
{
// The architects say this is safe even for 0.
res = -1;
asm("bsf %1, %0" : "+r"(res) : "rm"(x));
}
return res + 1;
}
This gives the same code gen as before (give or take some register
shuffling), without having to rely on the programmer to remember to get
their ffs()'s separated from their __ffs()'s as it pertains to undefined
input.
The other architectures which have better-defined instructions don't
need any of these games to retain the same good code-gen that is
currently expressed with a maze of subtly-different functions.
~Andrew
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Builtin for consulting value analysis (better ffs() code gen)
2024-03-14 15:33 ` Andrew Cooper
@ 2024-03-21 8:15 ` LIU Hao
0 siblings, 0 replies; 8+ messages in thread
From: LIU Hao @ 2024-03-21 8:15 UTC (permalink / raw)
To: Andrew Cooper, Andrew Cooper via Gcc
[-- Attachment #1.1: Type: text/plain, Size: 931 bytes --]
在 2024-03-14 23:33, Andrew Cooper via Gcc 写道:
> And for x86's arch_ffs(),
>
> unsigned int arch_ffs(unsigned int x)
> {
> unsigned int res;
>
> if ( __builtin_constant_p(x > 0) && x > 0 )
> {
> // Well defined when x is known non-zero
> asm("bsf %1, %0" : "=r"(res) : "rm"(x));
Even if you may assume that the destination operand is always destroyed, the hardware has no
knowledge about that, so this statement has a dependency on `res`, and shouldn't be declared `=r`.
I think it's better to remove this `if`. The other branch below clearly eliminates the dependency.
> }
> else
> {
> // The architects say this is safe even for 0.
> res = -1;
> asm("bsf %1, %0" : "+r"(res) : "rm"(x));
> }
>
> return res + 1;
> }
--
Best regards,
LIU Hao
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-03-21 8:15 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-14 0:03 Builtin for consulting value analysis (better ffs() code gen) Andrew Cooper
2024-03-14 6:04 ` Alexander Monakov
2024-03-14 11:30 ` Andrew Cooper
2024-03-14 12:03 ` Andreas Schwab
2024-03-14 15:33 ` Andrew Cooper
2024-03-21 8:15 ` LIU Hao
2024-03-14 11:02 ` Florian Weimer
2024-03-14 11:52 ` Andrew Cooper
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).