MMX regs and GCC

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* MMX regs and GCC
@ 1999-09-08  2:03 Jeff Garzik
  1999-09-30 18:02 ` Jeff Garzik
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff Garzik @ 1999-09-08  2:03 UTC (permalink / raw)
  To: gcc

Dumb question...  can the ia32 MMX regs be used to ease register
pressure on that platform?  No SIMD, just store a single integral <= 64
bits in the available regs.

Thanks,

	Jeff




-- 
Custom driver development	|    Never worry about theory as long
Open source programming		|    as the machinery does what it's
				|    supposed to do.  -- R. A. Heinlein

^ permalink raw reply	[flat|nested] 12+ messages in thread

* MMX regs and GCC
  1999-09-08  2:03 MMX regs and GCC Jeff Garzik
@ 1999-09-30 18:02 ` Jeff Garzik
  0 siblings, 0 replies; 12+ messages in thread
From: Jeff Garzik @ 1999-09-30 18:02 UTC (permalink / raw)
  To: gcc

Dumb question...  can the ia32 MMX regs be used to ease register
pressure on that platform?  No SIMD, just store a single integral <= 64
bits in the available regs.

Thanks,

	Jeff




-- 
Custom driver development	|    Never worry about theory as long
Open source programming		|    as the machinery does what it's
				|    supposed to do.  -- R. A. Heinlein

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MMX regs and GCC
  1999-09-08 13:30 ` Jeff Garzik
  1999-09-08 13:42   ` Ben Combee
  1999-09-11 16:11   ` Marc Lehmann
@ 1999-09-30 18:02   ` Jeff Garzik
  2 siblings, 0 replies; 12+ messages in thread
From: Jeff Garzik @ 1999-09-30 18:02 UTC (permalink / raw)
  To: Toshiyasu Morita; +Cc: gcc

hmmm.  I have the Pentium Pro databooks, and the quoted text below seems
to imply direct moves between general registers are possible.  Since
reg<->MMX reg transfers are possible, that seems to imply that loading
and using data in MMX registers would be cheaper than loaded data from
memory.

"The MMX instructions move the packed data types and the quadword data
type to-and-from memory or to-and-from the Intel Architecture
general-purpose registers in 64-bit blocks."
...
"The MOVD (Move 32 Bits) instruction transfers 32 bits of packed data
from memory to MMX registers and visa versa, or from integer registers
to MMX registers and visa versa."
...
"[MOVD] Copies doubleword from the source operand (second operand) to
the destination operand (first operand).  Source and destination
operands can be MMX registers, memory locations, or 32-bit
general-purpose registers; however, data cannot be transferred form an
MMX register to an MMX register, from one memory location to another
memory location, or from one general-purpose register to another
general-purpose register."

Regards,

	Jeff

Toshiyasu Morita wrote:
> 
> Actually, I just thought about this, and the answer is no.
> A movdi3 from int to MMX would require a post-reload split through mem.
> 
> Toshi
> 
> >
> > Thanks!
> >
> >       Jeff
> >
> >
> >
> >
> > Toshiyasu Morita wrote:
> > >
> > > Yes, I'm sure this can be done, given enough time and effort.
> > >
> > > Toshi
> > >
> > > >
> > > > Toshiyasu Morita wrote:
> > > > >
> > > > > No.
> > > > >
> > > > > MMX regs are aliased FP regs.
> > > >
> > > > Yes, I am aware of that.  GP context can be saved and restored.  You
> > > > still didn't answer my question.
> > > >
> > > > Regards,
> > > >
> > > >       Jeff
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > Toshi
> > > > >
> > > > > >
> > > > > > Dumb question...  can the ia32 MMX regs be used to ease register
> > > > > > pressure on that platform?  No SIMD, just store a single integral <= 64
> > > > > > bits in the available regs.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >       Jeff
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Custom driver development     |    Never worry about theory as long
> > > > > > Open source programming               |    as the machinery does what it's
> > > > > >                               |    supposed to do.  -- R. A. Heinlein
> > > > > >
> > > >
> > > > --
> > > > Custom driver development     |    Never worry about theory as long
> > > > Open source programming               |    as the machinery does what it's
> > > >                               |    supposed to do.  -- R. A. Heinlein
> > > >
> >
> > --
> > Custom driver development     |    Never worry about theory as long
> > Open source programming               |    as the machinery does what it's
> >                               |    supposed to do.  -- R. A. Heinlein
> >

-- 
Custom driver development	|    Never worry about theory as long
Open source programming		|    as the machinery does what it's
				|    supposed to do.  -- R. A. Heinlein

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MMX regs and GCC
  1999-09-10  8:24       ` Jamie Lokier
@ 1999-09-30 18:02         ` Jamie Lokier
  0 siblings, 0 replies; 12+ messages in thread
From: Jamie Lokier @ 1999-09-30 18:02 UTC (permalink / raw)
  To: Jeff Garzik

Jeff Garzik wrote:
> What do you think about the case of a program that does little or no
> floating point at all?  It seems like that would be an optimal, and
> common, case for using MMX registers, while minimized FP context
> save/store.  Also I was wondering if it was possible to avoid EMMS by
> using 3DNow for 'single' and 'float'.

Is copying to/from MMX really any faster than a stack memory access?
I was under the impression that they're both fast, but MMX has
additional emms overhead.

-- Jamie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MMX regs and GCC
  1999-09-08 13:42   ` Ben Combee
  1999-09-08 13:50     ` Jeff Garzik
@ 1999-09-30 18:02     ` Ben Combee
  1 sibling, 0 replies; 12+ messages in thread
From: Ben Combee @ 1999-09-30 18:02 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: gcc

> hmmm.  I have the Pentium Pro databooks, and the quoted text below seems
> to imply direct moves between general registers are possible.  Since
> reg<->MMX reg transfers are possible, that seems to imply that loading
> and using data in MMX registers would be cheaper than loaded data from
> memory.

Actually, we have considered adding code to do that to CodeWarrior -- the
biggest problem is that using MMX registers inteferes with any usage of the
FP registers, and on some MMX-capable chips, the time to handle a EMMS
instruction can be quite substancial.  However, if you are already
generating MMX or 3DNow! code, then spilling a 32-bit value to a MMX
register is probably just fine, as long as by doing so you aren't causing
additional spills of MMX registers (the CodeWarrior MMX-3DNow! calling
convention that we use for 3DNow! optimized code specifies that MM4-MM7 are
preserved across calls)

-- Ben Combee, x86 CompilerWarrior

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MMX regs and GCC
  1999-09-08 13:50     ` Jeff Garzik
  1999-09-10  8:24       ` Jamie Lokier
@ 1999-09-30 18:02       ` Jeff Garzik
  1 sibling, 0 replies; 12+ messages in thread
From: Jeff Garzik @ 1999-09-30 18:02 UTC (permalink / raw)
  To: Ben Combee; +Cc: gcc

Ben Combee wrote:
> 
> > hmmm.  I have the Pentium Pro databooks, and the quoted text below seems
> > to imply direct moves between general registers are possible.  Since
> > reg<->MMX reg transfers are possible, that seems to imply that loading
> > and using data in MMX registers would be cheaper than loaded data from
> > memory.
> 
> Actually, we have considered adding code to do that to CodeWarrior -- the
> biggest problem is that using MMX registers inteferes with any usage of the
> FP registers, and on some MMX-capable chips, the time to handle a EMMS
> instruction can be quite substancial.  However, if you are already
> generating MMX or 3DNow! code, then spilling a 32-bit value to a MMX
> register is probably just fine, as long as by doing so you aren't causing
> additional spills of MMX registers (the CodeWarrior MMX-3DNow! calling
> convention that we use for 3DNow! optimized code specifies that MM4-MM7 are
> preserved across calls)

What do you think about the case of a program that does little or no
floating point at all?  It seems like that would be an optimal, and
common, case for using MMX registers, while minimized FP context
save/store.  Also I was wondering if it was possible to avoid EMMS by
using 3DNow for 'single' and 'float'.

Thanks,

	Jeff




-- 
Custom driver development	|    Never worry about theory as long
Open source programming		|    as the machinery does what it's
				|    supposed to do.  -- R. A. Heinlein

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MMX regs and GCC
  1999-09-11 16:11   ` Marc Lehmann
@ 1999-09-30 18:02     ` Marc Lehmann
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Lehmann @ 1999-09-30 18:02 UTC (permalink / raw)
  To: gcc

On Wed, Sep 08, 1999 at 04:28:07PM -0400, Jeff Garzik <jgarzik@pobox.com> wrote:
> hmmm.  I have the Pentium Pro databooks, and the quoted text below seems
> to imply direct moves between general registers are possible.  Since

Fortunately, i have some direct data for a mmx implemerntation of this sort
in gcc (actually, pgcc).

pgcc implements (very suboptimally) mmx registers as general integer
registers that can store one and only one integer value (no parallelity).

There are two options, one that automaticlly emits emms (quite
aggressively, though), which works even when you mix integer(mmx) and fp
code.

That options is a loss on each cpu except on the p-ii, where it is almost as
fast as without that option.

The second option just removes all emms from the output. There benchmarks
seem to run faster on p-ii, and a bit slower on other cpus.

I'm quite sure with proper scheduling the first case could be improved
into a net win, even when we do not do any efforts to optimize for
parallel use. (I also have just been told that pgcc's placement of emms is
_very_ bad)

> reg<->MMX reg transfers are possible, that seems to imply that loading
> and using data in MMX registers would be cheaper than loaded data from
> memory.

Yes, on the p-ii, that is. On other cpus (intel or not) moves between
general registers and mmx might be very slow.

> "The MOVD (Move 32 Bits) instruction transfers 32 bits of packed data
> from memory to MMX registers and visa versa, or from integer registers
> to MMX registers and visa versa."

The problem was actually 64 bit moves, which are implemented as
push;push;movq;add.

Using the 3dnow or katmai instructions will eventually< get rid of the
x86-fp nonsense, independent of mmx usage or (but I don't have such a cpu
to test that).

-- 
      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com      |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MMX regs and GCC
  1999-09-08 13:30 ` Jeff Garzik
  1999-09-08 13:42   ` Ben Combee
@ 1999-09-11 16:11   ` Marc Lehmann
  1999-09-30 18:02     ` Marc Lehmann
  1999-09-30 18:02   ` Jeff Garzik
  2 siblings, 1 reply; 12+ messages in thread
From: Marc Lehmann @ 1999-09-11 16:11 UTC (permalink / raw)
  To: gcc

On Wed, Sep 08, 1999 at 04:28:07PM -0400, Jeff Garzik <jgarzik@pobox.com> wrote:
> hmmm.  I have the Pentium Pro databooks, and the quoted text below seems
> to imply direct moves between general registers are possible.  Since

Fortunately, i have some direct data for a mmx implemerntation of this sort
in gcc (actually, pgcc).

pgcc implements (very suboptimally) mmx registers as general integer
registers that can store one and only one integer value (no parallelity).

There are two options, one that automaticlly emits emms (quite
aggressively, though), which works even when you mix integer(mmx) and fp
code.

That options is a loss on each cpu except on the p-ii, where it is almost as
fast as without that option.

The second option just removes all emms from the output. There benchmarks
seem to run faster on p-ii, and a bit slower on other cpus.

I'm quite sure with proper scheduling the first case could be improved
into a net win, even when we do not do any efforts to optimize for
parallel use. (I also have just been told that pgcc's placement of emms is
_very_ bad)

> reg<->MMX reg transfers are possible, that seems to imply that loading
> and using data in MMX registers would be cheaper than loaded data from
> memory.

Yes, on the p-ii, that is. On other cpus (intel or not) moves between
general registers and mmx might be very slow.

> "The MOVD (Move 32 Bits) instruction transfers 32 bits of packed data
> from memory to MMX registers and visa versa, or from integer registers
> to MMX registers and visa versa."

The problem was actually 64 bit moves, which are implemented as
push;push;movq;add.

Using the 3dnow or katmai instructions will eventually< get rid of the
x86-fp nonsense, independent of mmx usage or (but I don't have such a cpu
to test that).

-- 
      -----==-                                             |
      ----==-- _                                           |
      ---==---(_)__  __ ____  __       Marc Lehmann      +--
      --==---/ / _ \/ // /\ \/ /       pcg@goof.com      |e|
      -=====/_/_//_/\_,_/ /_/\_\       XX11-RIPE         --+
    The choice of a GNU generation                       |
                                                         |

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MMX regs and GCC
  1999-09-08 13:50     ` Jeff Garzik
@ 1999-09-10  8:24       ` Jamie Lokier
  1999-09-30 18:02         ` Jamie Lokier
  1999-09-30 18:02       ` Jeff Garzik
  1 sibling, 1 reply; 12+ messages in thread
From: Jamie Lokier @ 1999-09-10  8:24 UTC (permalink / raw)
  To: Jeff Garzik

Jeff Garzik wrote:
> What do you think about the case of a program that does little or no
> floating point at all?  It seems like that would be an optimal, and
> common, case for using MMX registers, while minimized FP context
> save/store.  Also I was wondering if it was possible to avoid EMMS by
> using 3DNow for 'single' and 'float'.

Is copying to/from MMX really any faster than a stack memory access?
I was under the impression that they're both fast, but MMX has
additional emms overhead.

-- Jamie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MMX regs and GCC
  1999-09-08 13:42   ` Ben Combee
@ 1999-09-08 13:50     ` Jeff Garzik
  1999-09-10  8:24       ` Jamie Lokier
  1999-09-30 18:02       ` Jeff Garzik
  1999-09-30 18:02     ` Ben Combee
  1 sibling, 2 replies; 12+ messages in thread
From: Jeff Garzik @ 1999-09-08 13:50 UTC (permalink / raw)
  To: Ben Combee; +Cc: gcc

Ben Combee wrote:
> 
> > hmmm.  I have the Pentium Pro databooks, and the quoted text below seems
> > to imply direct moves between general registers are possible.  Since
> > reg<->MMX reg transfers are possible, that seems to imply that loading
> > and using data in MMX registers would be cheaper than loaded data from
> > memory.
> 
> Actually, we have considered adding code to do that to CodeWarrior -- the
> biggest problem is that using MMX registers inteferes with any usage of the
> FP registers, and on some MMX-capable chips, the time to handle a EMMS
> instruction can be quite substancial.  However, if you are already
> generating MMX or 3DNow! code, then spilling a 32-bit value to a MMX
> register is probably just fine, as long as by doing so you aren't causing
> additional spills of MMX registers (the CodeWarrior MMX-3DNow! calling
> convention that we use for 3DNow! optimized code specifies that MM4-MM7 are
> preserved across calls)

What do you think about the case of a program that does little or no
floating point at all?  It seems like that would be an optimal, and
common, case for using MMX registers, while minimized FP context
save/store.  Also I was wondering if it was possible to avoid EMMS by
using 3DNow for 'single' and 'float'.

Thanks,

	Jeff




-- 
Custom driver development	|    Never worry about theory as long
Open source programming		|    as the machinery does what it's
				|    supposed to do.  -- R. A. Heinlein

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MMX regs and GCC
  1999-09-08 13:30 ` Jeff Garzik
@ 1999-09-08 13:42   ` Ben Combee
  1999-09-08 13:50     ` Jeff Garzik
  1999-09-30 18:02     ` Ben Combee
  1999-09-11 16:11   ` Marc Lehmann
  1999-09-30 18:02   ` Jeff Garzik
  2 siblings, 2 replies; 12+ messages in thread
From: Ben Combee @ 1999-09-08 13:42 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: gcc

> hmmm.  I have the Pentium Pro databooks, and the quoted text below seems
> to imply direct moves between general registers are possible.  Since
> reg<->MMX reg transfers are possible, that seems to imply that loading
> and using data in MMX registers would be cheaper than loaded data from
> memory.

Actually, we have considered adding code to do that to CodeWarrior -- the
biggest problem is that using MMX registers inteferes with any usage of the
FP registers, and on some MMX-capable chips, the time to handle a EMMS
instruction can be quite substancial.  However, if you are already
generating MMX or 3DNow! code, then spilling a 32-bit value to a MMX
register is probably just fine, as long as by doing so you aren't causing
additional spills of MMX registers (the CodeWarrior MMX-3DNow! calling
convention that we use for 3DNow! optimized code specifies that MM4-MM7 are
preserved across calls)

-- Ben Combee, x86 CompilerWarrior

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: MMX regs and GCC
       [not found] <199909081748.KAA22465@netcom1.netcom.com>
@ 1999-09-08 13:30 ` Jeff Garzik
  1999-09-08 13:42   ` Ben Combee
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Jeff Garzik @ 1999-09-08 13:30 UTC (permalink / raw)
  To: Toshiyasu Morita; +Cc: gcc

hmmm.  I have the Pentium Pro databooks, and the quoted text below seems
to imply direct moves between general registers are possible.  Since
reg<->MMX reg transfers are possible, that seems to imply that loading
and using data in MMX registers would be cheaper than loaded data from
memory.

"The MMX instructions move the packed data types and the quadword data
type to-and-from memory or to-and-from the Intel Architecture
general-purpose registers in 64-bit blocks."
...
"The MOVD (Move 32 Bits) instruction transfers 32 bits of packed data
from memory to MMX registers and visa versa, or from integer registers
to MMX registers and visa versa."
...
"[MOVD] Copies doubleword from the source operand (second operand) to
the destination operand (first operand).  Source and destination
operands can be MMX registers, memory locations, or 32-bit
general-purpose registers; however, data cannot be transferred form an
MMX register to an MMX register, from one memory location to another
memory location, or from one general-purpose register to another
general-purpose register."

Regards,

	Jeff

Toshiyasu Morita wrote:
> 
> Actually, I just thought about this, and the answer is no.
> A movdi3 from int to MMX would require a post-reload split through mem.
> 
> Toshi
> 
> >
> > Thanks!
> >
> >       Jeff
> >
> >
> >
> >
> > Toshiyasu Morita wrote:
> > >
> > > Yes, I'm sure this can be done, given enough time and effort.
> > >
> > > Toshi
> > >
> > > >
> > > > Toshiyasu Morita wrote:
> > > > >
> > > > > No.
> > > > >
> > > > > MMX regs are aliased FP regs.
> > > >
> > > > Yes, I am aware of that.  GP context can be saved and restored.  You
> > > > still didn't answer my question.
> > > >
> > > > Regards,
> > > >
> > > >       Jeff
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > Toshi
> > > > >
> > > > > >
> > > > > > Dumb question...  can the ia32 MMX regs be used to ease register
> > > > > > pressure on that platform?  No SIMD, just store a single integral <= 64
> > > > > > bits in the available regs.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > >       Jeff
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Custom driver development     |    Never worry about theory as long
> > > > > > Open source programming               |    as the machinery does what it's
> > > > > >                               |    supposed to do.  -- R. A. Heinlein
> > > > > >
> > > >
> > > > --
> > > > Custom driver development     |    Never worry about theory as long
> > > > Open source programming               |    as the machinery does what it's
> > > >                               |    supposed to do.  -- R. A. Heinlein
> > > >
> >
> > --
> > Custom driver development     |    Never worry about theory as long
> > Open source programming               |    as the machinery does what it's
> >                               |    supposed to do.  -- R. A. Heinlein
> >

-- 
Custom driver development	|    Never worry about theory as long
Open source programming		|    as the machinery does what it's
				|    supposed to do.  -- R. A. Heinlein

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~1999-09-30 18:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-09-08  2:03 MMX regs and GCC Jeff Garzik
1999-09-30 18:02 ` Jeff Garzik
     [not found] <199909081748.KAA22465@netcom1.netcom.com>
1999-09-08 13:30 ` Jeff Garzik
1999-09-08 13:42   ` Ben Combee
1999-09-08 13:50     ` Jeff Garzik
1999-09-10  8:24       ` Jamie Lokier
1999-09-30 18:02         ` Jamie Lokier
1999-09-30 18:02       ` Jeff Garzik
1999-09-30 18:02     ` Ben Combee
1999-09-11 16:11   ` Marc Lehmann
1999-09-30 18:02     ` Marc Lehmann
1999-09-30 18:02   ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).