public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: modification to inline asm + 128 bit cop2 register support
@ 2000-06-12 14:29 Mike Stump
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Stump @ 2000-06-12 14:29 UTC (permalink / raw)
  To: dylan_cuthbert, gcc

> From: "Dylan Cuthbert" <dylan_cuthbert@hotmail.com>
> Date: Sat, 10 Jun 2000 22:00:24 JST

> I'll reply in three parts - the first part argues the case for my patch with 
> some undeniable logic (IMHO)

Disagree.

> 1. my case for "%A" to determine which_alternative in asm statements:

> Currently, the gcc inline "asm" instruction supports "alternative" 
> constraints for the operands supplied which is a great idea to help the 
> compiler optimize for commands it can't even see.

It is great, except better ways exist.

> Unfortunately, this rather useful feature is fundamentally flawed in
> its current implementation.

Sounds like you're arguing my point.  While I could include this as
fodder on my side, I don't think my side is weak enough to add this point.

> I think we need to give inline assembler programmers (who have a
> hard enough time as it is) the ability to use the fastest
> "alternative" the compiler has calculated it can provide, regardless
> of what processor they are working with.

Agreed.  Though, I think better ways exist.

> 2. problem #1:

> Here's my situation:

> I am programming the Toshiba R5900 which is a 128 bit dual-issue
> processor.  Cygnus have supplied a fairly good machine description
> that supplies a simple 128-bit TI type that can only be
> loaded/stored or copied.

Would be great if Toshiba contributed improvements to gcc that allowed
gcc to take more advantage of their processor.  If they don't,
performance on Toshiba processors will hurt.  This is, in part, between
you and your vendor (Toshiba/Cygnus).

> The 128 bit registers are generally used in 64 bit mode for regular
> math/operations which allows them to be dual-issued to get twice the
> thru-put.  Therefore for regular use the compiler is running in 64
> bit mode.

main() {
       __simd128_t a, b, c;
       __simd_2_64bit_mul  (a, b, c);
}

You then have the compiler register allocator, allocate the registers,
and have the __ builtin forward out to something in the md file.
Relatively simple, not too hard to get working, and we can extend the
compiler out in natural ways to auto vectorize, later.  In the shorter
time frame, you allow users to use these builtins to _get at_ the
features of the machine.  Porting is even easier, as one can redefine
these builtins to forward out to plain C code to emulate the
instructions.  Also, as time goes on, and enough machines start doing
this, we can unify and merge like things together, and make them even
more portable.

> However, the processor has a whole ton of "extra" instructions that
> operate on the 128 bit registers in numerous irregular ways.

Each one become a separate builtin.

> Ways that, as far as I can see, would take several years to get the
> compiler to use and optimize properly (if possible at all).

Yes, that is true, in the general sense, but no, in the shorter term,
which is what I'm talking about, one maps directly from what user
said, into a builtin, and directly from the builtin to a line in the
md file, and directly into the specific asm instruction.

So, while the general scheme might take a long (too long a time), my
scheme is much less aggressive, and far easier to implement.  Also, I
describe not just my opinion, but also how the compiler has been
extended before in practice, though, the code isn't in the main gcc
tree yet.  In the longer run, would be nice to have:

main() {
       long long a1, b1, c1;
       long long a2, b2, c2;
       a1 = b1 * c1;
       a2 = b2 * c2;
}

map directly into the above code, but this is _much_ harder.

> AFAICS, there is no basic type in C/C++ to allow the use of these extra 
> instructions in any way whatsoever.  We *have* to use inline asm.

Doesn't follow.  The assumption is that a port cannot add new register
classes, nor can add new builtins.  I'd like to suggest this is false.

> 3. co-processor 2 registers

> There is a co-processor with 32 128-bit registers.  For reasons similar to 
> the above problem, the actual operations on these registers are impossible 
> for the compiler to generate, (without making a major modification to the 
> C++ iso standard!).

Again, wrong.  compiler builtins don't have to be added to the C++
language standard, ot be added to g++.

> However, the compiler could at least help me with register
> allocation:

Agreed.

> This would produce very efficient code if I had the know-how to get it 
> working.

You can always learn, or pay some else to learn/do it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: modification to inline asm + 128 bit cop2 register support
  2000-06-12 22:53 Dylan_S_Cuthbert
@ 2000-06-12 23:06 ` Richard Henderson
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2000-06-12 23:06 UTC (permalink / raw)
  To: Dylan_S_Cuthbert; +Cc: Mike Stump, gcc

On Tue, Jun 13, 2000 at 02:51:17PM +0900,
Dylan_S_Cuthbert@hq.scei.sony.co.jp wrote:
> PS. Regarding compiler built-ins.  As far as I can see there is no difference
> between a compiler built-in and an inline asm function?

Asms cannot be scheduled.


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: modification to inline asm + 128 bit cop2 register support
@ 2000-06-12 22:53 Dylan_S_Cuthbert
  2000-06-12 23:06 ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Dylan_S_Cuthbert @ 2000-06-12 22:53 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc

Your e-mail and mine are both correct and I don't disagree with any of your
points, but in the same light, my points are valid too.  The difference is that
you are talking from the point of view of someone who has experience porting
gcc, knows the inner-workings and can make substantial changes without
flinching.

Whereas, I'm talking from the point of view of the end-user, who has no
experience of gcc's internals, has no time to compile/test the gcc source and
more importantly would have to re-implement/merge the patch every time the
compiler is updated at the source.

Not everyone is in such a convenient position and with new compiler update
releases every month (in my case), having to make the kind of machine
description patch you're talking about becomes a complete nightmare with regards
to future support.  I work in the video games industry which has extremely tough
deadlines and very short turn-arounds, we have to write an enormous amount of
code (I think you'd be surpised how many lines of code an average video-games
programmer writes in a single week), which has to be incredibly optimized (we
have to synchronise everything to 60Hz), and as close-to-the-metal as possible.

I agree, the %A feature is a short-term solution, but it would *really* help
those of us who have to wait years to see machine-dependent features implemented
by the vendor.

It doesn't make sense to leave the operand "alternative" functionality so
half-assed.  You might as well delete the functionality completely for MIPS
users, which seems a bit mean.

Best Regards
Dylan Cuthbert
(views are mine, no copying!)

PS. Regarding compiler built-ins.  As far as I can see there is no difference
between a compiler built-in and an inline asm function?  Surely the inline
function is the more flexible route (short to medium term)?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: modification to inline asm + 128 bit cop2 register support
  2000-06-10  6:00 Dylan Cuthbert
@ 2000-06-12  6:20 ` Denis Chertykov
  0 siblings, 0 replies; 5+ messages in thread
From: Denis Chertykov @ 2000-06-12  6:20 UTC (permalink / raw)
  To: Dylan Cuthbert; +Cc: gcc

"Dylan Cuthbert" <dylan_cuthbert@hotmail.com> writes:

> (replying to Mike Stump from my home account)
> (this mail is really really way too long! sorry!)
> 
> I'll reply in three parts - the first part argues the case for my patch with 
> some undeniable logic (IMHO).  The second and third describe two problems 
> with the current compiler and the very latest chips that are coming 
> available.  The third situation would really be improved if someone would 
> help me with those darned machine description files.
> 
> >Mike Stump wrote:
> >This goes in the wrong direction I feel.  Instead, add support to
> >expose more of the assembly to the compiler, and have the compiler
> >generate the `more optimal' code.  I think you'll win better that way.
> >You can then have the full power of the optimizer to optimize.
> 
> 1. my case for "%A" to determine which_alternative in asm
> statements:

avr port already uses %A[0-9]
May be %W

Denis.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: modification to inline asm + 128 bit cop2 register support
@ 2000-06-10  6:00 Dylan Cuthbert
  2000-06-12  6:20 ` Denis Chertykov
  0 siblings, 1 reply; 5+ messages in thread
From: Dylan Cuthbert @ 2000-06-10  6:00 UTC (permalink / raw)
  To: gcc

(replying to Mike Stump from my home account)
(this mail is really really way too long! sorry!)

I'll reply in three parts - the first part argues the case for my patch with 
some undeniable logic (IMHO).  The second and third describe two problems 
with the current compiler and the very latest chips that are coming 
available.  The third situation would really be improved if someone would 
help me with those darned machine description files.

>Mike Stump wrote:
>This goes in the wrong direction I feel.  Instead, add support to
>expose more of the assembly to the compiler, and have the compiler
>generate the `more optimal' code.  I think you'll win better that way.
>You can then have the full power of the optimizer to optimize.

1. my case for "%A" to determine which_alternative in asm statements:

Currently, the gcc inline "asm" instruction supports "alternative" 
constraints for the operands supplied which is a great idea to help the 
compiler optimize for commands it can't even see.

Unfortunately, this rather useful feature is fundamentally flawed in its 
current implementation.  This part of asm's *functionality* is 
"machine-dependent", in fact, it is worse than that!  It is assembler 
mnemonic dependent!  So much so, that this very nice feature is *completely* 
unusable for MIPS processors as far as I can see.

It relies too much on the assembler's mnemonic format being able to take 
arguments of different types.  This is way too arbitrary for such a generic 
and otherwise multi-platform compiler such as gcc. (IMHO)

I think we need to give inline assembler programmers (who have a hard enough 
time as it is) the ability to use the fastest "alternative" the compiler has 
calculated it can provide, regardless of what processor they are working 
with.

>
>If you want a concrete example, show me what you wanted to do (and
>annotate it some so I can grasp what you want and why it is better.

2. problem #1:

Here's my situation:

I am programming the Toshiba R5900 which is a 128 bit dual-issue processor.  
Cygnus have supplied a fairly good machine description that supplies a 
simple 128-bit TI type that can only be loaded/stored or copied.

The 128 bit registers are generally used in 64 bit mode for regular 
math/operations which allows them to be dual-issued to get twice the 
thru-put.  Therefore for regular use the compiler is running in 64 bit mode.

However, the processor has a whole ton of "extra" instructions that operate 
on the 128 bit registers in numerous irregular ways.  Ways that, as far as I 
can see, would take several years to get the compiler to use and optimize 
properly (if possible at all).  For example, swapping bits 32-63 with bits 
64-95, or adding every 8 bits of one register with every 8 bits of another 
register etc...  (for more info, the spec for the R5900 is available from 
toshiba.)

AFAICS, there is no basic type in C/C++ to allow the use of these extra 
instructions in any way whatsoever.  We *have* to use inline asm.

I don't see any solution to the above problem, however, I have an additional 
problem which the compiler could at least help me with:

3. co-processor 2 registers

There is a co-processor with 32 128-bit registers.  For reasons similar to 
the above problem, the actual operations on these registers are impossible 
for the compiler to generate, (without making a major modification to the 
C++ iso standard!).  However, the compiler could at least help me with 
register allocation:

Currently, I have to pass 128 bit values through the main core's 128 bit 
registers, execute the COP2 instruction and then pass the values back even 
if the value is being used again by the COP2 in the very next instruction.

eg.

extern inline PerformCop2Insn( TItype value )
{
  asm
  (
    "move to cop2"
    "execute cop2 insn %0"
    "move to core"
    : "+r" (value)
  );
}

int main(...)
{
  TItype value;
  PerformCop2Insn( value );
  PerformCop2Insn( value );
}

This is incredibly un-optimal as you can probably see from the example - (it 
is even more unoptimal when I don't have a %A code to determine whether the 
registers need to be written to memory or a register)

If the compiler could allocate the COP2 registers for me and even supply the 
relevant load/store/move command for the input and output I could write 
"PerformCop2Insn" simply as:

extern inline PerformCop2Insn( TFtype value )
{
  asm
  (
    "execute cop2 insn %0"
  : "=v" (value)   // v is 128 bit co-processor2 register class
  );
}

This would produce very efficient code if I had the know-how to get it 
working.

I would also need inter-assignability between 128 bit core registers and 
cop2 registers.

As an additional complication: (not totally necessary)
Because the co-processor *also* has a "micro" operational mode (where it 
executes programs internal to itself and hence can use all/any of its 
registers), I need to be able to switch the registers available to the 
compiler for allocation on-the-fly between functions. (maybe with a function 
attribute?)

Apologies for the rather long message,

Best Regards

Dylan Cuthbert
(All views and opinions are mine and mine only, etc etc)
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2000-06-12 23:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-06-12 14:29 modification to inline asm + 128 bit cop2 register support Mike Stump
  -- strict thread matches above, loose matches on Subject: below --
2000-06-12 22:53 Dylan_S_Cuthbert
2000-06-12 23:06 ` Richard Henderson
2000-06-10  6:00 Dylan Cuthbert
2000-06-12  6:20 ` Denis Chertykov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).