public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* problems with gcc inline assembly using xmm registers
@ 2004-12-03 15:29 David Palao
  2004-12-03 16:37 ` Nathan Sidwell
       [not found] ` <Pine.LNX.4.61.0412031705390.2211@tripper.tr69.homelinux.net>
  0 siblings, 2 replies; 5+ messages in thread
From: David Palao @ 2004-12-03 15:29 UTC (permalink / raw)
  To: gcc-help

Hi!

I'm newbie using assembly, but I'm working in computational physics and we 
need to design a very fast function doing an operation which is repeated 
billions of times in our calculations.
In order to gain performance we decided to use some of the SSE features 
(properly used, xmm registers should provide a very nice increase in 
performance).
However, I'm having serious troubles using xmm[0...7] registers with gcc 
inlines.
One example of the code in which I got these troubles is at the end of the 
message.

The error I get is:

../Libraries/fermiqcd_fermi_actions_sse2.h:818: can't find a register in class
   `GENERAL_REGS' while reloading `asm'


what I understand is that the function I'm trying to compile uses too many 
`GENERAL_REGS' registers but...
...do xmm registers belong to this group? aren't they a special set of 
registers in the SSE area?
...Could it be that the inlines use too many intermediate registers from the 
general registers group? In that case, how could I avoid such behaviour?

The problem occurs whenever I try to use the xmm intensively; if I don't use 
SSE extensions eveything goes ok, but I NEED xmm registers!

Thanks in advance!!!
Best regards


David


PS gcc -v:
gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-34)




CODE:

__asm__ __volatile__ ("movsd %0, %%xmm3 \n\t" \
      "movsd %1, %%xmm6 \n\t" \
      "movsd %2, %%xmm4 \n\t" \
      "movsd %3, %%xmm7 \n\t" \
      "movsd %4, %%xmm5 \n\t" \
      "unpcklpd %%xmm3, %%xmm3 \n\t" \
      "unpcklpd %%xmm6, %%xmm6 \n\t" \
      "unpcklpd %%xmm4, %%xmm4 \n\t" \
      "mulpd %%xmm0, %%xmm3 \n\t" \
      "unpcklpd %%xmm7, %%xmm7 \n\t" \
      "mulpd %%xmm1, %%xmm6 \n\t" \
      "unpcklpd %%xmm5, %%xmm5 \n\t" \
      "mulpd %%xmm0, %%xmm4 \n\t" \
      "addpd %%xmm6, %%xmm3 \n\t" \
      "mulpd %%xmm2, %%xmm7 \n\t" \
      "mulpd %%xmm0, %%xmm5 \n\t" \
      "addpd %%xmm7, %%xmm4 \n\t" \
      "movsd %5, %%xmm6 \n\t" \
      "movsd %6, %%xmm7 \n\t" \
      "unpcklpd %%xmm6, %%xmm6 \n\t" \
      "unpcklpd %%xmm7, %%xmm7 \n\t" \
      "mulpd %%xmm1, %%xmm6 \n\t" \
      "mulpd %%xmm2, %%xmm7 \n\t" \
      "addpd %%xmm6, %%xmm5 \n\t" \
      "addpd %%xmm7, %%xmm3 \n\t" \
      "movsd %7, %%xmm6 \n\t" \
      "movsd %8, %%xmm7 \n\t" \
      "unpcklpd %%xmm6, %%xmm6 \n\t" \
      "unpcklpd %%xmm7, %%xmm7 \n\t" \
      "mulpd %%xmm1, %%xmm6 \n\t" \
      "mulpd %%xmm2, %%xmm7 \n\t" \
      "addpd %%xmm6, %%xmm4 \n\t" \
      "addpd %%xmm7, %%xmm5" \
      : \
      : \
      "m" ((u).c11.real()), \
      "m" ((u).c12.real()), \
      "m" ((u).c21.real()), \
      "m" ((u).c23.real()), \
      "m" ((u).c31.real()), \
      "m" ((u).c32.real()), \
      "m" ((u).c13.real()), \
      "m" ((u).c22.real()), \
      "m" ((u).c33.real())); \

__asm__ __volatile__ ("movsd %0, %%xmm6 \n\t" \
      "movsd %1, %%xmm7 \n\t" \
      "shufpd $0x1, %%xmm0, %%xmm0 \n\t" \
      "shufpd $0x1, %%xmm1, %%xmm1 \n\t" \
      "shufpd $0x1, %%xmm2, %%xmm2 \n\t" \
      "unpcklpd %%xmm6, %%xmm6 \n\t" \
      "unpcklpd %%xmm7, %%xmm7 \n\t" \
      "xorpd %9, %%xmm0 \n\t" \
      "xorpd %9, %%xmm1 \n\t" \
      "xorpd %9, %%xmm2 \n\t" \
      "mulpd %%xmm0, %%xmm6 \n\t" \
      "mulpd %%xmm1, %%xmm7 \n\t" \
      "addpd %%xmm6, %%xmm3 \n\t" \
      "addpd %%xmm7, %%xmm4 \n\t" \
      "movsd %2, %%xmm6 \n\t" \
      "movsd %3, %%xmm7 \n\t" \
      "unpcklpd %%xmm6, %%xmm6 \n\t" \
      "unpcklpd %%xmm7, %%xmm7 \n\t" \
      "mulpd %%xmm2, %%xmm6 \n\t" \
      "mulpd %%xmm0, %%xmm7 \n\t" \
      "addpd %%xmm6, %%xmm5 \n\t" \
      "addpd %%xmm7, %%xmm4 \n\t" \
      "movsd %4, %%xmm6 \n\t" \
      "movsd %5, %%xmm7 \n\t" \
      "unpcklpd %%xmm6, %%xmm6 \n\t" \
      "unpcklpd %%xmm7, %%xmm7 \n\t" \
      "mulpd %%xmm1, %%xmm6 \n\t" \
      "mulpd %%xmm0, %%xmm7 \n\t" \
      "addpd %%xmm6, %%xmm3 \n\t" \
      "addpd %%xmm7, %%xmm5 \n\t" \
      "movsd %6, %%xmm0 \n\t" \
      "movsd %7, %%xmm6 \n\t" \
      "movsd %8, %%xmm7 \n\t" \
      "unpcklpd %%xmm0, %%xmm0 \n\t" \
      "unpcklpd %%xmm6, %%xmm6 \n\t" \
      "unpcklpd %%xmm7, %%xmm7 \n\t" \
      "mulpd %%xmm2, %%xmm0 \n\t" \
      "mulpd %%xmm1, %%xmm6 \n\t" \
      "mulpd %%xmm2, %%xmm7 \n\t" \
      "addpd %%xmm0, %%xmm3 \n\t" \
      "addpd %%xmm6, %%xmm5 \n\t" \
      "addpd %%xmm7, %%xmm4" \
      : \
      : \
      "m" ((u).c11.imag()), \
      "m" ((u).c22.imag()), \
      "m" ((u).c33.imag()), \
      "m" ((u).c21.imag()), \
      "m" ((u).c12.imag()), \
      "m" ((u).c31.imag()), \
      "m" ((u).c13.imag()), \
      "m" ((u).c32.imag()), \
      "m" ((u).c23.imag()), \
      "m" (_sse_double_sgn));

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-12-03 17:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-03 15:29 problems with gcc inline assembly using xmm registers David Palao
2004-12-03 16:37 ` Nathan Sidwell
2004-12-03 17:15   ` David Palao
2004-12-03 17:23     ` Nathan Sidwell
     [not found] ` <Pine.LNX.4.61.0412031705390.2211@tripper.tr69.homelinux.net>
2004-12-03 17:42   ` David Palao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).