From mboxrd@z Thu Jan 1 00:00:00 1970 From: ralf@uni-koblenz.de To: Ted Krovetz Cc: egcs@cygnus.com Subject: Re: MIPS long long using inline asm Date: Thu, 09 Apr 1998 21:23:00 -0000 Message-id: <19980409135244.51637@uni-koblenz.de> References: <9804081640.AA16957@toadflax.cs.ucdavis.edu> X-SW-Source: 1998-04/msg00411.html On Wed, Apr 08, 1998 at 09:40:48AM -0700, Ted Krovetz wrote: > I'm doing cryptographic research at UC Davis and have need for fast > 32-bit x 32-bit -> 64-bit multiplications. On intel using gcc's inline > assembler I get this using > > #define XMUL(x, y) \ > ({ UINT64 __res; UINT32 __x = (x), __y = (y); \ > __asm__ ("mull %2" : "=A" (__res) : "a" (__x), "r" (__y)); \ > __res; }) > > where the output specifier "=A" (__res) tells the compiler to bind the > long long variable __res to the two 32-bit registers EDX:EAX. > > I want to do something similar on MIPS. How can I use gcc and inline > assembly to bind a pair of 32-bit registers to a long long variable? (Afaik this is still not documented and so reading the GCC source is the only way to find out ...) Long long's or any other 64bit integer variable on a 32 bit machine are always being passed in an pair of an even/odd register pair, for example $8/$9. Access the register elements of a certain pair works by using certain format strings in the inline assembler code which there are [snipet from gcc 2.7.2] 'D' print second register of double-word register operand. 'L' print low-order register of double-word register operand. 'M' print high-order register of double-word register operand. [...] So your example would look like: #define XMUL(x, y) \ ({ UINT64 __res; UINT32 __x = (x), __y = (y); \ __asm__ ("mult %0,%1\n\t" /* Multiply */ \ "mfhi %M0\n\t" /* High word */ \ "mflo %L0" \ : "=r" (__res) \ : "r" (__x), "r" (__y)); \ __res; }) Be careful when choosing the format elements for accessing the halfs of the register pairs. It's easy to introduce byteorder problems. Another problem for your specific application is that since GCC is basically free to do with hi/lo register pair whatever it wants to. In practice it will use these two registers for all integer multiplies and divides. This means that if you want to play safe, you have to put the mfhi and mflo instructions into the same inline assembler statement as the actual multiply instruction which again knocks out the advantage of multiplies being processed in their own, separate function unit. That may be a heavy performance loss; good ol' R4000 could execute above instruction in effectivly three cycles but with bad instruction scheduling as enforced by the constraints of this case the time might go up to somewhere well over 40 cycles. Ralf