public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* workaround for "error: more than 30 operands in 'asm'"?
@ 2008-03-12 23:25 Clem Taylor
  2008-03-13  5:56 ` Ian Lance Taylor
  0 siblings, 1 reply; 9+ messages in thread
From: Clem Taylor @ 2008-03-12 23:25 UTC (permalink / raw)
  To: gcc-help

Hi,

I'm working on taking PowerPC VMX code that uses altivec intrinsics
and rescheduling it with inline assembly. gcc is making some fairly
bad scheduling choices in with the code, resulting in code that is
running 4x slower then I was hoping for. I have a simplified version
working, but with the real version gcc is failing with: "error: more
than 30 operands in 'asm'". The code is using 28 vector registers and
6 serial registers.

The code is a mixture of setup code in C and only the inner loop is in
assembly, so it wouldn't be convenient to write this directly in
assembly. Also, because the code is highly pipelined (to overcome the
latency of the VMX floating point unit) it is a mess to split this up
into multiple asm() statements. Beyond recompiling gcc with a larger
operand count, is there a workaround for this problem?

I'm using gcc 4.2.2 and gas 2.18.

                                 Thanks,
                                 Clem

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: workaround for "error: more than 30 operands in 'asm'"?
  2008-03-12 23:25 workaround for "error: more than 30 operands in 'asm'"? Clem Taylor
@ 2008-03-13  5:56 ` Ian Lance Taylor
  2008-03-13  9:45   ` Andrew Haley
  2008-03-14  1:10   ` Clem Taylor
  0 siblings, 2 replies; 9+ messages in thread
From: Ian Lance Taylor @ 2008-03-13  5:56 UTC (permalink / raw)
  To: Clem Taylor; +Cc: gcc-help

"Clem Taylor" <clem.taylor@gmail.com> writes:

> I'm working on taking PowerPC VMX code that uses altivec intrinsics
> and rescheduling it with inline assembly. gcc is making some fairly
> bad scheduling choices in with the code, resulting in code that is
> running 4x slower then I was hoping for. I have a simplified version
> working, but with the real version gcc is failing with: "error: more
> than 30 operands in 'asm'". The code is using 28 vector registers and
> 6 serial registers.
>
> The code is a mixture of setup code in C and only the inner loop is in
> assembly, so it wouldn't be convenient to write this directly in
> assembly. Also, because the code is highly pipelined (to overcome the
> latency of the VMX floating point unit) it is a mess to split this up
> into multiple asm() statements. Beyond recompiling gcc with a larger
> operand count, is there a workaround for this problem?

Use fewer operands?  Otherwise, no.  It's a hard limit in gcc.

Since you mention the number of registers you are using, note that
that only matters if they are inputs or outputs.  If you need a
temporary register, just pick one, and add it the clobber list.  But
if you really have that many inputs and outputs, then you are stuck.

Ian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: workaround for "error: more than 30 operands in 'asm'"?
  2008-03-13  5:56 ` Ian Lance Taylor
@ 2008-03-13  9:45   ` Andrew Haley
  2008-03-13 16:54     ` Ian Lance Taylor
  2008-03-14  1:10   ` Clem Taylor
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Haley @ 2008-03-13  9:45 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Clem Taylor, gcc-help

Ian Lance Taylor wrote:
> "Clem Taylor" <clem.taylor@gmail.com> writes:
> 
>> I'm working on taking PowerPC VMX code that uses altivec intrinsics
>> and rescheduling it with inline assembly. gcc is making some fairly
>> bad scheduling choices in with the code, resulting in code that is
>> running 4x slower then I was hoping for. I have a simplified version
>> working, but with the real version gcc is failing with: "error: more
>> than 30 operands in 'asm'". The code is using 28 vector registers and
>> 6 serial registers.
>>
>> The code is a mixture of setup code in C and only the inner loop is in
>> assembly, so it wouldn't be convenient to write this directly in
>> assembly. Also, because the code is highly pipelined (to overcome the
>> latency of the VMX floating point unit) it is a mess to split this up
>> into multiple asm() statements. Beyond recompiling gcc with a larger
>> operand count, is there a workaround for this problem?
> 
> Use fewer operands?  Otherwise, no.  It's a hard limit in gcc.
> 
> Since you mention the number of registers you are using, note that
> that only matters if they are inputs or outputs.  If you need a
> temporary register, just pick one, and add it the clobber list.  But
> if you really have that many inputs and outputs, then you are stuck.

Isn't this trivially fixed by changing:

  /* Allow at least 30 operands for the sake of asm constructs.  */
  /* ??? We *really* ought to reorganize things such that there
     is no fixed upper bound.  */
  max_recog_operands = 29;  /* We will add 1 later.  */
  max_dup_operands = 1;

in genconfig.c ?

Andrew.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: workaround for "error: more than 30 operands in 'asm'"?
  2008-03-13  9:45   ` Andrew Haley
@ 2008-03-13 16:54     ` Ian Lance Taylor
  2008-03-14 10:11       ` Andrew Haley
  0 siblings, 1 reply; 9+ messages in thread
From: Ian Lance Taylor @ 2008-03-13 16:54 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Clem Taylor, gcc-help

Andrew Haley <aph@redhat.com> writes:

> Ian Lance Taylor wrote:
>> "Clem Taylor" <clem.taylor@gmail.com> writes:
>> 
>>> I'm working on taking PowerPC VMX code that uses altivec intrinsics
>>> and rescheduling it with inline assembly. gcc is making some fairly
>>> bad scheduling choices in with the code, resulting in code that is
>>> running 4x slower then I was hoping for. I have a simplified version
>>> working, but with the real version gcc is failing with: "error: more
>>> than 30 operands in 'asm'". The code is using 28 vector registers and
>>> 6 serial registers.
>>>
>>> The code is a mixture of setup code in C and only the inner loop is in
>>> assembly, so it wouldn't be convenient to write this directly in
>>> assembly. Also, because the code is highly pipelined (to overcome the
>>> latency of the VMX floating point unit) it is a mess to split this up
>>> into multiple asm() statements. Beyond recompiling gcc with a larger
>>> operand count, is there a workaround for this problem?
>> 
>> Use fewer operands?  Otherwise, no.  It's a hard limit in gcc.
>> 
>> Since you mention the number of registers you are using, note that
>> that only matters if they are inputs or outputs.  If you need a
>> temporary register, just pick one, and add it the clobber list.  But
>> if you really have that many inputs and outputs, then you are stuck.
>
> Isn't this trivially fixed by changing:
>
>   /* Allow at least 30 operands for the sake of asm constructs.  */
>   /* ??? We *really* ought to reorganize things such that there
>      is no fixed upper bound.  */
>   max_recog_operands = 29;  /* We will add 1 later.  */
>   max_dup_operands = 1;
>
> in genconfig.c ?

Yes.  Sorry for being confusing, I was answering the question "beyond
recompiling gcc...."

Ian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: workaround for "error: more than 30 operands in 'asm'"?
  2008-03-13  5:56 ` Ian Lance Taylor
  2008-03-13  9:45   ` Andrew Haley
@ 2008-03-14  1:10   ` Clem Taylor
  1 sibling, 0 replies; 9+ messages in thread
From: Clem Taylor @ 2008-03-14  1:10 UTC (permalink / raw)
  To: gcc-help

On Thu, Mar 13, 2008 at 1:56 AM, Ian Lance Taylor <iant@google.com> wrote:
>  Since you mention the number of registers you are using, note that
>  that only matters if they are inputs or outputs.  If you need a
>  temporary register, just pick one, and add it the clobber list.  But
>  if you really have that many inputs and outputs, then you are stuck.

I'm using input and outputs because I want the compiler to pick the
registers and I want to have named values. The inline block looks
something like:

asm (
    "... bunch-o-vmx code ..."
    : [rIn0]  "=rv" (rIn0),  [gIn0]  "=rv" (gIn0),  [bIn0]  "=rv" (bIn0), ...
    : [rpix]  "r"   (rpix),  [gpix]  "r"   (gpix),  [bpix]  "r"   (bpix), ...
    : "memory" );

Writing this type of code using %0 %1 ... %n would be very painful and
unpleasant to maintain. If gcc 4.2.x did a sane jobs scheduling the C
intrinsic version of this code I wouldn't need to use inline assembly.
I wrote the C version in the order that shouldn't have any stalls. But
the compiler re-orders the code and takes offset constants and
recomputes them inside the loop [values like 0, 16, 32, ... 112]. With
all the write/read stalls and extra addi instructions, the C intrinsic
version runs at >5 cycles per instruction and overall the asm version
is ~10x faster, ouch.

                                     --Clem

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: workaround for "error: more than 30 operands in 'asm'"?
  2008-03-13 16:54     ` Ian Lance Taylor
@ 2008-03-14 10:11       ` Andrew Haley
  2008-03-14 10:22         ` Segher Boessenkool
  2008-03-14 14:10         ` Ian Lance Taylor
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Haley @ 2008-03-14 10:11 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Clem Taylor, gcc-help

Ian Lance Taylor wrote:
> Andrew Haley <aph@redhat.com> writes:
> 
>> Ian Lance Taylor wrote:
>>> "Clem Taylor" <clem.taylor@gmail.com> writes:
>>>
>>>> I'm working on taking PowerPC VMX code that uses altivec intrinsics
>>>> and rescheduling it with inline assembly. gcc is making some fairly
>>>> bad scheduling choices in with the code, resulting in code that is
>>>> running 4x slower then I was hoping for. I have a simplified version
>>>> working, but with the real version gcc is failing with: "error: more
>>>> than 30 operands in 'asm'". The code is using 28 vector registers and
>>>> 6 serial registers.
>>>>
>>>> The code is a mixture of setup code in C and only the inner loop is in
>>>> assembly, so it wouldn't be convenient to write this directly in
>>>> assembly. Also, because the code is highly pipelined (to overcome the
>>>> latency of the VMX floating point unit) it is a mess to split this up
>>>> into multiple asm() statements. Beyond recompiling gcc with a larger
>>>> operand count, is there a workaround for this problem?
>>> Use fewer operands?  Otherwise, no.  It's a hard limit in gcc.
>>>
>>> Since you mention the number of registers you are using, note that
>>> that only matters if they are inputs or outputs.  If you need a
>>> temporary register, just pick one, and add it the clobber list.  But
>>> if you really have that many inputs and outputs, then you are stuck.
>> Isn't this trivially fixed by changing:
>>
>>   /* Allow at least 30 operands for the sake of asm constructs.  */
>>   /* ??? We *really* ought to reorganize things such that there
>>      is no fixed upper bound.  */
>>   max_recog_operands = 29;  /* We will add 1 later.  */
>>   max_dup_operands = 1;
>>
>> in genconfig.c ?
> 
> Yes.  Sorry for being confusing, I was answering the question "beyond
> recompiling gcc...."

I wonder if we could do something more sensible than simply using the
constant 30.  Perhaps some function of FIRST_PSEUDO_REGISTER, like
FIRST_PSEUDO_REGISTER+20, or FIRST_PSEUDO_REGISTER*2 or even
MAX (FIRST_PSEUDO_REGISTER, 29).  This would at least solve the problem
here.

Andrew.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: workaround for "error: more than 30 operands in 'asm'"?
  2008-03-14 10:11       ` Andrew Haley
@ 2008-03-14 10:22         ` Segher Boessenkool
  2008-03-14 15:13           ` Ian Lance Taylor
  2008-03-14 14:10         ` Ian Lance Taylor
  1 sibling, 1 reply; 9+ messages in thread
From: Segher Boessenkool @ 2008-03-14 10:22 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Ian Lance Taylor, Clem Taylor, gcc-help

> I wonder if we could do something more sensible than simply using the
> constant 30.  Perhaps some function of FIRST_PSEUDO_REGISTER, like
> FIRST_PSEUDO_REGISTER+20, or FIRST_PSEUDO_REGISTER*2 or even
> MAX (FIRST_PSEUDO_REGISTER, 29).  This would at least solve the problem
> here.

Why is there a maximum here at all -- some hugely non-linear
algorithm or so?

FIRST_PSEUDO_REGISTER is 114 for rs6000, that would be a bit high
in that case.  If it's really just an arbitrary maximum, with no
deeper reason behind it, why not just up it to 1000 and be done
with it for the next ten years ;-)


Segher

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: workaround for "error: more than 30 operands in 'asm'"?
  2008-03-14 10:11       ` Andrew Haley
  2008-03-14 10:22         ` Segher Boessenkool
@ 2008-03-14 14:10         ` Ian Lance Taylor
  1 sibling, 0 replies; 9+ messages in thread
From: Ian Lance Taylor @ 2008-03-14 14:10 UTC (permalink / raw)
  To: Andrew Haley; +Cc: Clem Taylor, gcc-help

Andrew Haley <aph@redhat.com> writes:

> I wonder if we could do something more sensible than simply using the
> constant 30.  Perhaps some function of FIRST_PSEUDO_REGISTER, like
> FIRST_PSEUDO_REGISTER+20, or FIRST_PSEUDO_REGISTER*2 or even
> MAX (FIRST_PSEUDO_REGISTER, 29).  This would at least solve the problem
> here.

That should work.  Making MAX_RECOG_OPERANDS a variable would have
some effect on compile time, but simply making it a larger constant
shouldn't have any significant effect.

Ian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: workaround for "error: more than 30 operands in 'asm'"?
  2008-03-14 10:22         ` Segher Boessenkool
@ 2008-03-14 15:13           ` Ian Lance Taylor
  0 siblings, 0 replies; 9+ messages in thread
From: Ian Lance Taylor @ 2008-03-14 15:13 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Andrew Haley, Clem Taylor, gcc-help

Segher Boessenkool <segher@kernel.crashing.org> writes:

>> I wonder if we could do something more sensible than simply using the
>> constant 30.  Perhaps some function of FIRST_PSEUDO_REGISTER, like
>> FIRST_PSEUDO_REGISTER+20, or FIRST_PSEUDO_REGISTER*2 or even
>> MAX (FIRST_PSEUDO_REGISTER, 29).  This would at least solve the problem
>> here.
>
> Why is there a maximum here at all -- some hugely non-linear
> algorithm or so?

No, just a lot of statically defined arrays.  Having a maximum doesn't
matter much; having a constant matters.

Ian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-03-14 15:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-12 23:25 workaround for "error: more than 30 operands in 'asm'"? Clem Taylor
2008-03-13  5:56 ` Ian Lance Taylor
2008-03-13  9:45   ` Andrew Haley
2008-03-13 16:54     ` Ian Lance Taylor
2008-03-14 10:11       ` Andrew Haley
2008-03-14 10:22         ` Segher Boessenkool
2008-03-14 15:13           ` Ian Lance Taylor
2008-03-14 14:10         ` Ian Lance Taylor
2008-03-14  1:10   ` Clem Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).