public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* inline asm constraints for conditions
@ 2003-09-26 18:47 David Howells
  2003-09-26 22:05 ` Richard Henderson
  0 siblings, 1 reply; 12+ messages in thread
From: David Howells @ 2003-09-26 18:47 UTC (permalink / raw)
  To: gcc


Would it be possible to get a new type of inline asm constraint added, such
that a "condition" can be an output? This would be useful in the Linux kernel
as there are a number of places where we have to write a value into a register
inside the asm statement and then test that value later.

Take something like the x86 test-and-set bit function for instance:

	static __inline__ int test_and_set_bit(int nr, volatile void * addr)
	{
		int oldbit;

		__asm__ __volatile__( LOCK_PREFIX
			"btsl %2,%1\n\tsbbl %0,%0"
			:"=r" (oldbit),"=m" (ADDR)
			:"Ir" (nr) : "memory");
		return oldbit;
	}

Most of the time, all we want to do with the result is test it, eg:

	if (test_and_set_bit(9, &wibble->flags)) {
		...
	}

However, this incurs unnecessary extra instructions:

	lock bts	%eax,(%edx)
	sbb		%eax,%eax		<---
	test		%eax,%eax		<---
	jne		1c <x+0x1c>

What if, instead, it was possible to specify a special output-only constaint
of the form:

	"=?<true-cond>/<false-cond>"

Where <true-cond> and <false-cond> are the condition parts of a branch, set or
move instruction mnemonic appropriate to the CPU. <true-cond> would be used to
test if the condition is true, and <false-cond> to test if it is false.

For example, the bit-test-and-set above could then be rewritten thus:

	static __inline__ int test_and_set_bit(int nr, volatile void * addr)
	{
		int oldbit;

		__asm__ __volatile__( LOCK_PREFIX
			"btsl %2,%1"
			:"=?c/nc" (oldbit),"=m" (ADDR)
			:"Ir" (nr) : "memory");
		return oldbit;
	}

If the statement that uses it is an if-statement or a conditional operator,
this would be rendered into assembly using a jump or a conditional
instruction:

	lock bts	%eax,(%edx)
	jnc		1c <x+0x1c>

Or:

	lock bts	%eax,(%edx)
	cmovc		$20,%ebx

Anyone trying to access the value directly would cause it to be turned into
zero if the condition was false, non-zero otherwise:

	lock bts	%eax,(%edx)
	setc		%eax
	movzx		%al,%eax

Possibly the compiler could detect c/nc and do the following as a special
case:

	lock bts	%eax,(%edx)
	sbb		%eax,%eax


As for non-x86 CPUs, on something like the SH where, IIRC, you have a single
flag (T), the following would then be possible:

	"=?t/f" or "=?f/t"

And on the IA64 where predicate registers are available:

	"=?p4/p7"

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
  2003-09-26 18:47 inline asm constraints for conditions David Howells
@ 2003-09-26 22:05 ` Richard Henderson
  2003-09-29 12:14   ` Gabriel Paubert
  2003-09-29 17:38   ` Jamie Lokier
  0 siblings, 2 replies; 12+ messages in thread
From: Richard Henderson @ 2003-09-26 22:05 UTC (permalink / raw)
  To: David Howells; +Cc: gcc

On Fri, Sep 26, 2003 at 07:07:12PM +0100, David Howells wrote:
> Would it be possible to get a new type of inline asm constraint added, such
> that a "condition" can be an output?

Not like this, no.

The most basic problem for x86 is that the flags register dies too often.
If we were to add the ability for it to live longer and be reloaded (say
with lahf/sahf), then generic parts of the compiler would start trying to
make use of it, which would almost certainly result in worse code overall
even when the feature was not in use.

> As for non-x86 CPUs, on something like the SH where, IIRC, you have a single
> flag (T), the following would then be possible:
> 
> 	"=?t/f" or "=?f/t"
> 
> And on the IA64 where predicate registers are available:
> 
> 	"=?p4/p7"

For ia64, this ought to be possible.  On ia64 you'd use a constraint
of "=c", which would imply the pair (pN,pN+1) for N even.  The one
missing piece is that there is no language-level access to a BImode
type.  Something that I've been meaning to try for a while is to 
arrange for "bool" to have BImode and see what happens...

For sh, I suspect there will be similar problems as with x86, though
not as bad because fewer instructions clobber T.  It would definitely
have similar problems as ia64 wrt BImode.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
  2003-09-26 22:05 ` Richard Henderson
@ 2003-09-29 12:14   ` Gabriel Paubert
  2003-09-29 17:21     ` Richard Henderson
  2003-09-29 17:38   ` Jamie Lokier
  1 sibling, 1 reply; 12+ messages in thread
From: Gabriel Paubert @ 2003-09-29 12:14 UTC (permalink / raw)
  To: Richard Henderson, David Howells, gcc

On Fri, Sep 26, 2003 at 01:39:35PM -0700, Richard Henderson wrote:
> On Fri, Sep 26, 2003 at 07:07:12PM +0100, David Howells wrote:
> > Would it be possible to get a new type of inline asm constraint added, such
> > that a "condition" can be an output?
> 
> Not like this, no.
> 
> The most basic problem for x86 is that the flags register dies too often.

Indeed.

> If we were to add the ability for it to live longer and be reloaded (say
> with lahf/sahf), then generic parts of the compiler would start trying to
> make use of it, which would almost certainly result in worse code overall
> even when the feature was not in use.

Furthermore lahf/sahf won't work since the overflow flag (used
for signed compares) is in another byte of the flags. According
to my docs, AMD has decided to remove these instructions in
64 bit mode (I don't have the hardware to test).

This would leave pushf/popf as the only way of moving flags
and it has some potentially dangerous side effects, especially
for kernel code (modifying interrupt mask to start with). 

Ok, that was my nitpicking of the day :-)

> For ia64, this ought to be possible.  On ia64 you'd use a constraint
> of "=c", which would imply the pair (pN,pN+1) for N even.  The one
> missing piece is that there is no language-level access to a BImode
> type.  Something that I've been meaning to try for a while is to 
> arrange for "bool" to have BImode and see what happens...
>

Any idea for PPC?

Each condition register is a 4-bit field, looks like PSImode, but some
instructions (CR logical) treat them as individual 1 bit fields, where 
BImode would seem optimal.

	Regards,
	Gabriel


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
  2003-09-29 12:14   ` Gabriel Paubert
@ 2003-09-29 17:21     ` Richard Henderson
  2003-09-29 17:32       ` David Edelsohn
  0 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2003-09-29 17:21 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: David Howells, gcc

On Mon, Sep 29, 2003 at 12:47:50PM +0200, Gabriel Paubert wrote:
> Any idea for PPC?
> 
> Each condition register is a 4-bit field, looks like PSImode, but some
> instructions (CR logical) treat them as individual 1 bit fields, where 
> BImode would seem optimal.

For PPC, you'd have to have a builtin type to get access PQImode or CCmode.
You'd then need a builtin function to get access to the bits you want to
use out of the condition (__builtin_cc_gtu(x), or maybe just
__builtin_ppc_compare(x, <4-bit-immediate>)).  The only bit of ugliness
here is that you might have to hack the generic compare-and-branch 
expansion code to make this work, much as we did for __builtin_expect.

I could see that it would be possible to make this work on PPC, since
there are 8 of these registers to allocate, three of which are even
call-saved.  "A mere matter of programming", as they say.

The solution would be largely ppc specific, but could probably be mirrored
on other targets as applicable.



r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
  2003-09-29 17:21     ` Richard Henderson
@ 2003-09-29 17:32       ` David Edelsohn
  2003-09-29 21:02         ` Gabriel Paubert
  0 siblings, 1 reply; 12+ messages in thread
From: David Edelsohn @ 2003-09-29 17:32 UTC (permalink / raw)
  To: Richard Henderson, Gabriel Paubert, David Howells, gcc

>>>>> Richard Henderson writes:

Richard> For PPC, you'd have to have a builtin type to get access PQImode or CCmode.
Richard> You'd then need a builtin function to get access to the bits you want to
Richard> use out of the condition (__builtin_cc_gtu(x), or maybe just
Richard> __builtin_ppc_compare(x, <4-bit-immediate>)).  The only bit of ugliness
Richard> here is that you might have to hack the generic compare-and-branch 
Richard> expansion code to make this work, much as we did for __builtin_expect.

Richard> I could see that it would be possible to make this work on PPC, since
Richard> there are 8 of these registers to allocate, three of which are even
Richard> call-saved.  "A mere matter of programming", as they say.

	An optimization opportunity for PowerPC is to treat the condition
register bitfields as bitfields.  Instead of interpreting the CR bits as
representing different comparison results, use the appropriate branch
instruction to test the bitfield bits of interest.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
  2003-09-26 22:05 ` Richard Henderson
  2003-09-29 12:14   ` Gabriel Paubert
@ 2003-09-29 17:38   ` Jamie Lokier
  2003-09-29 19:45     ` Richard Henderson
  1 sibling, 1 reply; 12+ messages in thread
From: Jamie Lokier @ 2003-09-29 17:38 UTC (permalink / raw)
  To: Richard Henderson, David Howells, gcc

Richard Henderson wrote:
> On Fri, Sep 26, 2003 at 07:07:12PM +0100, David Howells wrote:
> > Would it be possible to get a new type of inline asm constraint added, such
> > that a "condition" can be an output?
> 
> Not like this, no.
> 
> The most basic problem for x86 is that the flags register dies too often.
> If we were to add the ability for it to live longer and be reloaded (say
> with lahf/sahf), then generic parts of the compiler would start trying to
> make use of it, which would almost certainly result in worse code overall
> even when the feature was not in use.

Why does it need the ability to be reloaded the whole flags?

If the flags register is killed, then "reloading" should be a simple
matter of either redoing the comparison, as effectively happens all
the time now, or saving the _particular_ comparison results that are
wanted using setcc, sbbl etc. into an ordinary register.

Would it work to define a multitude of condition registers on the x86,
one for each branch condition, like this:

	1. cmp/sub set all the condition registers.

	2. other flags-affecting instructions clobber all the condition
	   registers or (in the case of inc/dec) just some of them.

	3. branches and other conditionals use one of the many
           condition registers.

	4. reload "moves" one condition register _to_ a general register
	   or memory using setcc/sbb.

	5. reload "moves" one condition register _from_ a general
	   register or memory by doing a comparison against the saved
	   value to set the real flags again.

	   Of course this clobbers all the other condition registers :)

Would this work?

-- Jamie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
  2003-09-29 17:38   ` Jamie Lokier
@ 2003-09-29 19:45     ` Richard Henderson
  2003-09-29 22:04       ` Jamie Lokier
  2003-09-30 13:34       ` David Howells
  0 siblings, 2 replies; 12+ messages in thread
From: Richard Henderson @ 2003-09-29 19:45 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: David Howells, gcc

On Mon, Sep 29, 2003 at 06:21:34PM +0100, Jamie Lokier wrote:
> If the flags register is killed, then "reloading" should be a simple
> matter of either redoing the comparison, as effectively happens all
> the time now, or saving the _particular_ comparison results that are
> wanted using setcc, sbbl etc. into an ordinary register.

In the case under discussion the compiler would not have access to the
instruction that set the flags -- it's hidden in an asm.  If the insns
in question were exposed via __builtins, the flags might have been set
by an instruction like xaddl or cmpxchg which cannot be re-done.

So, no, this would not help at all.



r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
  2003-09-29 21:02         ` Gabriel Paubert
@ 2003-09-29 20:31           ` David Edelsohn
  0 siblings, 0 replies; 12+ messages in thread
From: David Edelsohn @ 2003-09-29 20:31 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Richard Henderson, David Howells, gcc

>>>>> Gabriel Paubert writes:

Gabriel> One thing I've done in a few cases when testing several flags in PPC
Gabriel> assembly is moving the flags to CR with an mtcrf instruction and
Gabriel> then use bt/bf on the corresponding bit. This eliminates quite a few
Gabriel> testing instructions and allows to prepare the CR field(s) well in
Gabriel> advance. GCC does not do anything remotely similar AFAICT. 
Gabriel> Is this kind of technique what you have in mind?

	Yes, this is what I was describing.

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
  2003-09-29 17:32       ` David Edelsohn
@ 2003-09-29 21:02         ` Gabriel Paubert
  2003-09-29 20:31           ` David Edelsohn
  0 siblings, 1 reply; 12+ messages in thread
From: Gabriel Paubert @ 2003-09-29 21:02 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Richard Henderson, David Howells, gcc

On Mon, Sep 29, 2003 at 01:17:18PM -0400, David Edelsohn wrote:
> >>>>> Richard Henderson writes:
> 
> Richard> For PPC, you'd have to have a builtin type to get access PQImode or CCmode.
> Richard> You'd then need a builtin function to get access to the bits you want to
> Richard> use out of the condition (__builtin_cc_gtu(x), or maybe just
> Richard> __builtin_ppc_compare(x, <4-bit-immediate>)).  The only bit of ugliness
> Richard> here is that you might have to hack the generic compare-and-branch 
> Richard> expansion code to make this work, much as we did for __builtin_expect.
> 
> Richard> I could see that it would be possible to make this work on PPC, since
> Richard> there are 8 of these registers to allocate, three of which are even
> Richard> call-saved.  "A mere matter of programming", as they say.
> 
> 	An optimization opportunity for PowerPC is to treat the condition
> register bitfields as bitfields.  Instead of interpreting the CR bits as
> representing different comparison results, use the appropriate branch
> instruction to test the bitfield bits of interest.

Ugh, I don't understand anything in this. Sorry. 

One thing I've done in a few cases when testing several flags in PPC
assembly is moving the flags to CR with an mtcrf instruction and
then use bt/bf on the corresponding bit. This eliminates quite a few
testing instructions and allows to prepare the CR field(s) well in
advance. GCC does not do anything remotely similar AFAICT. 
Is this kind of technique what you have in mind?

This said, I have no idea on how to hack GCC to do this :-(

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
  2003-09-29 19:45     ` Richard Henderson
@ 2003-09-29 22:04       ` Jamie Lokier
  2003-09-30 13:34       ` David Howells
  1 sibling, 0 replies; 12+ messages in thread
From: Jamie Lokier @ 2003-09-29 22:04 UTC (permalink / raw)
  To: Richard Henderson, David Howells, gcc

Richard Henderson wrote:
> On Mon, Sep 29, 2003 at 06:21:34PM +0100, Jamie Lokier wrote:
> > If the flags register is killed, then "reloading" should be a simple
> > matter of either redoing the comparison, as effectively happens all
> > the time now, or saving the _particular_ comparison results that are
> > wanted using setcc, sbbl etc. into an ordinary register.
> 
> In the case under discussion the compiler would not have access to the
> instruction that set the flags -- it's hidden in an asm.  If the insns
> in question were exposed via __builtins, the flags might have been set
> by an instruction like xaddl or cmpxchg which cannot be re-done.

Later in that mail I described what to do when the instruction cannot
be re-done, or when it's better not to redo it:

	4. reload "moves" one condition register _to_ a general register
	   or memory using setcc/sbb.

	5. reload "moves" one condition register _from_ a general
	   register or memory by doing a comparison against the saved
	   value to set the real flags again.

	   Of course this clobbers all the other condition registers :)

In other words, a particular condition is saved as a 0 or 1 value,
using setcc/sbb.  The originating asm instruction is not redone.

> So, no, this would not help at all.

Perhaps it wouldn't, but I think the particular reason you gave is not
the right one :)

-- Jamie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
  2003-09-29 19:45     ` Richard Henderson
  2003-09-29 22:04       ` Jamie Lokier
@ 2003-09-30 13:34       ` David Howells
  1 sibling, 0 replies; 12+ messages in thread
From: David Howells @ 2003-09-30 13:34 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Richard Henderson, gcc


Jamie Lokier <jamie@shareable.org> wrote:
> > On Mon, Sep 29, 2003 at 06:21:34PM +0100, Jamie Lokier wrote:
> > > If the flags register is killed, then "reloading" should be a simple
> > > matter of either redoing the comparison, as effectively happens all
> > > the time now, or saving the _particular_ comparison results that are
> > > wanted using setcc, sbbl etc. into an ordinary register.
> > 
> > In the case under discussion the compiler would not have access to the
> > instruction that set the flags -- it's hidden in an asm.  If the insns
> > in question were exposed via __builtins, the flags might have been set
> > by an instruction like xaddl or cmpxchg which cannot be re-done.

I don't think you can expect to recreate the condition - as was mentioned, the
compiler can't predict which bit(s) of the ASM statement were responsible, nor
can it predict what the side-effects of redoing would be.

> Later in that mail I described what to do when the instruction cannot
> be re-done, or when it's better not to redo it:
> 
> 	4. reload "moves" one condition register _to_ a general register
> 	   or memory using setcc/sbb.
> 
> 	5. reload "moves" one condition register _from_ a general
> 	   register or memory by doing a comparison against the saved
> 	   value to set the real flags again.

That's the sort of thing I was envisioning. The fallback is basically what
these constructs tend to do at the moment - use SETcc/SBB/CMOVcc (or their
equivalents) to store a "boolean" value in a variable, and then compare it
against something appropriate in an if statement later.

In the uttermost end, you could resort to conditional jumps to store the
value:

	<-- got condition in EFLAGS c/nc -->
		mov	$0,%eax
		jnc	0f
		mov	$1,%eax
	0:

> 	   Of course this clobbers all the other condition registers :)

Either the number of returning conditions could be limited to just one, or you
could use SETcc/CMOVcc multiple times.

> In other words, a particular condition is saved as a 0 or 1 value,
> using setcc/sbb.  The originating asm instruction is not redone.

Or, at least, zero or non-zero. Then you don't have to do SETcc+AND.

> > So, no, this would not help at all.
> 
> Perhaps it wouldn't, but I think the particular reason you gave is not
> the right one :)

But if you added this sort of constraint, you'd have to have the ability to
turning such an asm statement result into a real value anyway, and so you
could just build on that:

	int x;
	asm ("lock; bts %0,%1" : "=?c/nc"(x) : "rI"(nr), "m"(addr))
	printk("Bit was %d\n", x);

What you really want, of course, is:

	bool x;
	asm ("lock; bts %0,%1" : "=?c/nc"(x) : "rI"(nr), "m"(addr))
	printk("Bit was %d\n", x);

David

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: inline asm constraints for conditions
       [not found]   ` <20030929104750.GA6467@iram.es.suse.lists.egcs>
@ 2003-09-29 14:34     ` Andi Kleen
  0 siblings, 0 replies; 12+ messages in thread
From: Andi Kleen @ 2003-09-29 14:34 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Richard Henderson, David Howells, gcc

Gabriel Paubert <paubert@iram.es> writes:

> This would leave pushf/popf as the only way of moving flags
> and it has some potentially dangerous side effects, especially
> for kernel code (modifying interrupt mask to start with). 

Also pushf/popf is incredibly slow on the P4.

-Andi

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-09-30 12:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-26 18:47 inline asm constraints for conditions David Howells
2003-09-26 22:05 ` Richard Henderson
2003-09-29 12:14   ` Gabriel Paubert
2003-09-29 17:21     ` Richard Henderson
2003-09-29 17:32       ` David Edelsohn
2003-09-29 21:02         ` Gabriel Paubert
2003-09-29 20:31           ` David Edelsohn
2003-09-29 17:38   ` Jamie Lokier
2003-09-29 19:45     ` Richard Henderson
2003-09-29 22:04       ` Jamie Lokier
2003-09-30 13:34       ` David Howells
     [not found] <8873.1064599632@redhat.com.suse.lists.egcs>
     [not found] ` <20030926203935.GB21887@redhat.com.suse.lists.egcs>
     [not found]   ` <20030929104750.GA6467@iram.es.suse.lists.egcs>
2003-09-29 14:34     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).