From mboxrd@z Thu Jan  1 00:00:00 1970
From: ralf@uni-koblenz.de
To: Ted Krovetz <krovetz@cs.ucdavis.edu>
Cc: egcs@cygnus.com
Subject: Re: MIPS long long using inline asm
Date: Thu, 09 Apr 1998 21:23:00 -0000
Message-id: <19980409135244.51637@uni-koblenz.de>
References: <9804081640.AA16957@toadflax.cs.ucdavis.edu>
X-SW-Source: 1998-04/msg00411.html

On Wed, Apr 08, 1998 at 09:40:48AM -0700, Ted Krovetz wrote:

> I'm doing cryptographic research at UC Davis and have need for fast 
> 32-bit x 32-bit -> 64-bit multiplications. On intel using gcc's inline 
> assembler I get this using 
> 
> #define XMUL(x, y) \
> ({ UINT64 __res; UINT32 __x = (x), __y = (y); \
>   __asm__ ("mull %2" : "=A" (__res) : "a" (__x), "r" (__y)); \
>   __res; })
> 
> where the output specifier "=A" (__res) tells the compiler to bind the 
> long long variable __res to the two 32-bit registers EDX:EAX.
> 
> I want to do something similar on MIPS. How can I use gcc and inline 
> assembly to bind a pair of 32-bit registers to a long long variable?

(Afaik this is still not documented and so reading the GCC source is the
only way to find out ...)

Long long's or any other 64bit integer variable on a 32 bit machine are
always being passed in an pair of an even/odd register pair, for example
$8/$9.  Access the register elements of a certain pair works by using
certain format strings in the inline assembler code which there are

[snipet from gcc 2.7.2]
   'D'  print second register of double-word register operand.
   'L'  print low-order register of double-word register operand.
   'M'  print high-order register of double-word register operand.
[...]

So your example would look like:

#define XMUL(x, y) \
({ UINT64 __res; UINT32 __x = (x), __y = (y); \
  __asm__ ("mult %0,%1\n\t"	/* Multiply */ \
	   "mfhi %M0\n\t"	/* High word */ \
	   "mflo %L0" \
	   : "=r" (__res) \
	   : "r" (__x), "r" (__y)); \
  __res; })

Be careful when choosing the format elements for accessing the halfs of the
register pairs.  It's easy to introduce byteorder problems.

Another problem for your specific application is that since GCC is basically
free to do with hi/lo register pair whatever it wants to.  In practice it
will use these two registers for all integer multiplies and divides.  This
means that if you want to play safe, you have to put the mfhi and mflo
instructions into the same inline assembler statement as the actual multiply
instruction which again knocks out the advantage of multiplies being
processed in their own, separate function unit.  That may be a heavy
performance loss; good ol' R4000 could execute above instruction in
effectivly three cycles but with bad instruction scheduling as enforced by
the constraints of this case the time might go up to somewhere well over 40
cycles.

  Ralf