public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* PR target/17101: question about powerpc s<cond> expanders
@ 2004-11-12 17:16 Nathan Sidwell
  2004-11-12 18:37 ` David Edelsohn
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Nathan Sidwell @ 2004-11-12 17:16 UTC (permalink / raw)
  To: gcc, David Edelsohn, Geoffrey Keating

PR 17101 is a problem with boolean operations.  rs6000.md contains
seq, sne, sgt ... expanders, but the signed ones are explicitly
disabled for non-POWER (i.e. POWERPC) targets. for instance

(define_expand "sgt"
   [(clobber (match_operand:SI 0 "gpc_reg_operand" ""))]
   ""
   "
{
   if (! rs6000_compare_fp_p
       && (! TARGET_POWER || rs6000_compare_op1 == const0_rtx))
     FAIL;

   rs6000_emit_sCOND (GT, operands[0]);
   DONE;
}")

why is that?  The unsigned variants are not so encumbered.

Also, the signed variants cut out comparisons with zero -- why
do the unsigned ones not do so? Oversight?  In addition, some of
the special cases seem ineffective at best, pessimizing at worst.
Examining each in detail I find the following assembler for 'V OP 0'

.seq: // baseline of 3 insns using the condition regs
         cmpwi 7,3,0
         mfcr 3
         rlwinm 3,3,31,1

.sne: // hm, four insns emitted
         srawi 0,3,31
         xor 3,0,3
         subf 3,3,0
         srwi 3,3,31

.sge: // 2 insns, this is better.
         nor 3,3,3
         srwi 3,3,31

.sgt: // 3 insns, no better
         srawi 0,3,31
         subf 0,3,0
         srwi 0,0,31

.sle: // 3 insns, no better
         addi 0,3,-1
         or 0,0,3
         srwi 0,0,31

.slt: // 1 insn, yay!
         srwi 3,3,31

comments?

nathan
-- 
Nathan Sidwell    ::   http://www.codesourcery.com   ::     CodeSourcery LLC
nathan@codesourcery.com    ::     http://www.planetfall.pwp.blueyonder.co.uk


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR target/17101: question about powerpc s<cond> expanders
  2004-11-12 17:16 PR target/17101: question about powerpc s<cond> expanders Nathan Sidwell
@ 2004-11-12 18:37 ` David Edelsohn
  2004-11-12 19:30   ` Nathan Sidwell
  2004-11-12 19:45 ` Geoffrey Keating
  2004-11-15 23:39 ` Gabriel Paubert
  2 siblings, 1 reply; 9+ messages in thread
From: David Edelsohn @ 2004-11-12 18:37 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: gcc, Geoffrey Keating

>>>>> Nathan Sidwell writes:

Nathan> PR 17101 is a problem with boolean operations.  rs6000.md contains
Nathan> seq, sne, sgt ... expanders, but the signed ones are explicitly
Nathan> disabled for non-POWER (i.e. POWERPC) targets. for instance

Nathan> why is that?  The unsigned variants are not so encumbered.

	There are short, straight-line PowerPC code sequences for
unsigned, but not for signed.  The signed sequences use the POWER "doz"
instruction that was removed from PowerPC.

Nathan> Also, the signed variants cut out comparisons with zero -- why
Nathan> do the unsigned ones not do so? Oversight?  In addition, some of
Nathan> the special cases seem ineffective at best, pessimizing at worst.
Nathan> Examining each in detail I find the following assembler for 'V OP 0'

	The GCC generic straight-line comparison with 0 sequences are as
good as the custom sequences.

Nathan> .seq: // baseline of 3 insns using the condition regs
Nathan> cmpwi 7,3,0
Nathan> mfcr 3
Nathan> rlwinm 3,3,31,1

Nathan> .sne: // hm, four insns emitted
Nathan> srawi 0,3,31
Nathan> xor 3,0,3
Nathan> subf 3,3,0
Nathan> srwi 3,3,31

Nathan> .sge: // 2 insns, this is better.
Nathan> nor 3,3,3
Nathan> srwi 3,3,31

Nathan> .sgt: // 3 insns, no better
Nathan> srawi 0,3,31
Nathan> subf 0,3,0
Nathan> srwi 0,0,31

Nathan> .sle: // 3 insns, no better
Nathan> addi 0,3,-1
Nathan> or 0,0,3
Nathan> srwi 0,0,31

Nathan> .slt: // 1 insn, yay!
Nathan> srwi 3,3,31

Nathan> comments?

	The number of instructions do not correspond to the cost.
Compares, moving bits from condition registers, and bit extraction is
slower on newer PowerPC processors.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR target/17101: question about powerpc s<cond> expanders
  2004-11-12 18:37 ` David Edelsohn
@ 2004-11-12 19:30   ` Nathan Sidwell
  2004-11-12 20:57     ` David Edelsohn
  2004-11-16  8:26     ` Gabriel Paubert
  0 siblings, 2 replies; 9+ messages in thread
From: Nathan Sidwell @ 2004-11-12 19:30 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc, Geoffrey Keating

David Edelsohn wrote:

> Nathan> why is that?  The unsigned variants are not so encumbered.
> 
> 	There are short, straight-line PowerPC code sequences for
> unsigned, but not for signed.  The signed sequences use the POWER "doz"
> instruction that was removed from PowerPC.

that is not what I observe, when I remove the TARGET_POWER check.
For instance, V > 55
.sgt:
         cmpwi 7,3,55
         mfcr 3
         rlwinm 3,3,30,1
is that worse than
	li  9,0
	cmpwe 7,3,55
	ble- 7,.skip
	li  9,1
.skip:

> 	The number of instructions do not correspond to the cost.
> Compares, moving bits from condition registers, and bit extraction is
> slower on newer PowerPC processors.
ok, that makes sense. there should probably be an optimize_size check
though. thanks.

nathan

-- 
Nathan Sidwell    ::   http://www.codesourcery.com   ::     CodeSourcery LLC
nathan@codesourcery.com    ::     http://www.planetfall.pwp.blueyonder.co.uk


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR target/17101: question about powerpc s<cond> expanders
  2004-11-12 17:16 PR target/17101: question about powerpc s<cond> expanders Nathan Sidwell
  2004-11-12 18:37 ` David Edelsohn
@ 2004-11-12 19:45 ` Geoffrey Keating
  2004-11-12 22:45   ` Nathan Sidwell
  2004-11-15 23:39 ` Gabriel Paubert
  2 siblings, 1 reply; 9+ messages in thread
From: Geoffrey Keating @ 2004-11-12 19:45 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: gcc, David Edelsohn

[-- Attachment #1: Type: text/plain, Size: 145 bytes --]


On 12/11/2004, at 9:00 AM, Nathan Sidwell wrote:

> 17101

Actually, 17101 is a bug about forward store motion.  What's the 
correct reference?

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2408 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR target/17101: question about powerpc s<cond> expanders
  2004-11-12 19:30   ` Nathan Sidwell
@ 2004-11-12 20:57     ` David Edelsohn
  2004-11-16  8:26     ` Gabriel Paubert
  1 sibling, 0 replies; 9+ messages in thread
From: David Edelsohn @ 2004-11-12 20:57 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: gcc, Geoffrey Keating

>>>>> Nathan Sidwell writes:

Nathan> that is not what I observe, when I remove the TARGET_POWER check.
Nathan> For instance, V > 55

.sgt:
         cmpwi 7,3,55
         mfcr 3
         rlwinm 3,3,30,1
is that worse than
	li  9,0
	cmpwe 7,3,55
	ble- 7,.skip
	li  9,1
.skip:

	The branch sequence may be better, depending on how well predicted
it is. 

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR target/17101: question about powerpc s<cond> expanders
  2004-11-12 19:45 ` Geoffrey Keating
@ 2004-11-12 22:45   ` Nathan Sidwell
  0 siblings, 0 replies; 9+ messages in thread
From: Nathan Sidwell @ 2004-11-12 22:45 UTC (permalink / raw)
  To: Geoffrey Keating; +Cc: gcc, David Edelsohn

Geoffrey Keating wrote:
> 
> On 12/11/2004, at 9:00 AM, Nathan Sidwell wrote:
> 
>> 17101
> 
> 
> Actually, 17101 is a bug about forward store motion.  What's the correct 
> reference?
sorry it was 17107

nathan

-- 
Nathan Sidwell    ::   http://www.codesourcery.com   ::     CodeSourcery LLC
nathan@codesourcery.com    ::     http://www.planetfall.pwp.blueyonder.co.uk


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR target/17101: question about powerpc s<cond> expanders
  2004-11-12 17:16 PR target/17101: question about powerpc s<cond> expanders Nathan Sidwell
  2004-11-12 18:37 ` David Edelsohn
  2004-11-12 19:45 ` Geoffrey Keating
@ 2004-11-15 23:39 ` Gabriel Paubert
  2 siblings, 0 replies; 9+ messages in thread
From: Gabriel Paubert @ 2004-11-15 23:39 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: gcc, David Edelsohn, Geoffrey Keating

On Fri, Nov 12, 2004 at 05:00:34PM +0000, Nathan Sidwell wrote:
> PR 17101 is a problem with boolean operations.  rs6000.md contains
> seq, sne, sgt ... expanders, but the signed ones are explicitly
> disabled for non-POWER (i.e. POWERPC) targets. for instance
> 
> (define_expand "sgt"
>   [(clobber (match_operand:SI 0 "gpc_reg_operand" ""))]
>   ""
>   "
> {
>   if (! rs6000_compare_fp_p
>       && (! TARGET_POWER || rs6000_compare_op1 == const0_rtx))
>     FAIL;
> 
>   rs6000_emit_sCOND (GT, operands[0]);
>   DONE;
> }")
> 
> why is that?  The unsigned variants are not so encumbered.
> 
> Also, the signed variants cut out comparisons with zero -- why
> do the unsigned ones not do so? Oversight?  In addition, some of
> the special cases seem ineffective at best, pessimizing at worst.
> Examining each in detail I find the following assembler for 'V OP 0'
> 
> .seq: // baseline of 3 insns using the condition regs
>         cmpwi 7,3,0
>         mfcr 3
>         rlwinm 3,3,31,1

The shortest and fastest here is likely:
	  cntlzw 3,3
	  srwi 3,3,5 
i.e. no mfcr, which has dependencies on all the cr fields 
except on Power4 and later where there is a single field mfcr.
Put a xor or sub in front for any value to compare with.

> 
> .sne: // hm, four insns emitted
>         srawi 0,3,31
>         xor 3,0,3
>         subf 3,3,0
>         srwi 3,3,31

unless I'm mistaken, it's essentially a negative absolute
value shifted right by 31. And it works only for 0, we can
and should do better, for example the following should work
to evaluate x!=y for any x and y, without even clobbering 
the carry:
	sub t1,x,y
	sub t2,x,y
	or t1,t1,t2
	srwi t1,t1,31
(same size, but the first two can be executed in 
parallel, if any is negative, the result is true).
In the case of y=0, the special case reduces to:
	neg t1,x
	or t1,t1,x
	srwi t1,t1,31
which saves one instruction and reduces dependency
length by one.

> 
> .sge: // 2 insns, this is better.
>         nor 3,3,3
>         srwi 3,3,31

No way to improve in simple cases, there are many equivalent 
solutions however which may turn out to be better when you have 
a complex logic expression. For example if you generate:
	srwi 3,3,31
	xori 3,3,1 
the xor might be absorbed in a negation since Power/PPC has the 
full complement of 8 logical operations (and, andc, nand, or, orc
nor, xor, eqv). When writing short chunks of assembly code, I've
never had to worry about getting the right "polarity" for inputs 
of logical instructions (when immediates are not involved). 

> .sgt: // 3 insns, no better
>         srawi 0,3,31
>         subf 0,3,0
>         srwi 0,0,31

Using a variant of the code sequence for sle:
	addi 0,3,-1
	nor 0,0,3
	srwi 0,0,31
you avoid the srawi that clobbers the carry but that's the only 
(really minor) improvement I can think of, it only illustrates 
what I said about the polarity.

> 
> .sle: // 3 insns, no better
>         addi 0,3,-1
>         or 0,0,3
>         srwi 0,0,31

better than any mfcr based solution.

> 
> .slt: // 1 insn, yay!
>         srwi 3,3,31

That one is rather obvious.

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR target/17101: question about powerpc s<cond> expanders
  2004-11-12 19:30   ` Nathan Sidwell
  2004-11-12 20:57     ` David Edelsohn
@ 2004-11-16  8:26     ` Gabriel Paubert
  2004-11-16 15:38       ` David Edelsohn
  1 sibling, 1 reply; 9+ messages in thread
From: Gabriel Paubert @ 2004-11-16  8:26 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: David Edelsohn, gcc, Geoffrey Keating

On Fri, Nov 12, 2004 at 06:37:39PM +0000, Nathan Sidwell wrote:
> David Edelsohn wrote:
> 
> >Nathan> why is that?  The unsigned variants are not so encumbered.
> >
> >	There are short, straight-line PowerPC code sequences for
> >unsigned, but not for signed.  The signed sequences use the POWER "doz"
> >instruction that was removed from PowerPC.
> 
> that is not what I observe, when I remove the TARGET_POWER check.
> For instance, V > 55
> .sgt:
>         cmpwi 7,3,55
>         mfcr 3
>         rlwinm 3,3,30,1
> is that worse than
> 	li  9,0
> 	cmpwe 7,3,55
> 	ble- 7,.skip
> 	li  9,1
> .skip:

Wouldn't the best solution in this case be?
	addi tmp,val,-56 
	nor tmp,tmp,val
	srwi tmp,tmp,31
now when comparing two variables that is different. But when one 
of the operands is a constant there are lots of tricks to play 
with using the complete set of Power/PPC logical instructions on
the sign bit.
	
> 
> >	The number of instructions do not correspond to the cost.
> >Compares, moving bits from condition registers, and bit extraction is
> >slower on newer PowerPC processors.

I'm quite surprised that integer compares have been affected, that's a 
frequent operation and quite critical in a lot of applications. I was 
not aware that rlwinm (and I suppose rlwimi/cntlzw/sraw/srawi and their 
64 bit equivalents) are becoming slower (relative to add/subtract and 
logical), but I suspect that this mostly affects high-end 64 bit 
processors and not 32 bit variants for embedded systems.

	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: PR target/17101: question about powerpc s<cond> expanders
  2004-11-16  8:26     ` Gabriel Paubert
@ 2004-11-16 15:38       ` David Edelsohn
  0 siblings, 0 replies; 9+ messages in thread
From: David Edelsohn @ 2004-11-16 15:38 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Nathan Sidwell, gcc, Geoffrey Keating

>>>>> Gabriel Paubert writes:

Gabriel> Wouldn't the best solution in this case be?
Gabriel> addi tmp,val,-56 
Gabriel> nor tmp,tmp,val
Gabriel> srwi tmp,tmp,31

Gabriel> now when comparing two variables that is different. But when one 
Gabriel> of the operands is a constant there are lots of tricks to play 
Gabriel> with using the complete set of Power/PPC logical instructions on
Gabriel> the sign bit.

	Straight-line, simple integer sequences are better.
	
>> >	The number of instructions do not correspond to the cost.
>> >Compares, moving bits from condition registers, and bit extraction is
>> >slower on newer PowerPC processors.

Gabriel> I'm quite surprised that integer compares have been affected, that's a 
Gabriel> frequent operation and quite critical in a lot of applications. I was 
Gabriel> not aware that rlwinm (and I suppose rlwimi/cntlzw/sraw/srawi and their 
Gabriel> 64 bit equivalents) are becoming slower (relative to add/subtract and 
Gabriel> logical), but I suspect that this mostly affects high-end 64 bit 
Gabriel> processors and not 32 bit variants for embedded systems.

	I would appreciate if you would avoid hyperbole and spreading
misinformation.  You can look at the scheduling description in GCC to see
that your assumptions and statements are incorrect.

David

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-11-16 15:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-12 17:16 PR target/17101: question about powerpc s<cond> expanders Nathan Sidwell
2004-11-12 18:37 ` David Edelsohn
2004-11-12 19:30   ` Nathan Sidwell
2004-11-12 20:57     ` David Edelsohn
2004-11-16  8:26     ` Gabriel Paubert
2004-11-16 15:38       ` David Edelsohn
2004-11-12 19:45 ` Geoffrey Keating
2004-11-12 22:45   ` Nathan Sidwell
2004-11-15 23:39 ` Gabriel Paubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).