public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* How is generic SIMD support supposed to work?
@ 2002-10-20 15:49 Jan Hubicka
  2002-10-20 22:55 ` Richard Henderson
  2002-10-21  7:24 ` Aldy Hernandez
  0 siblings, 2 replies; 5+ messages in thread
From: Jan Hubicka @ 2002-10-20 15:49 UTC (permalink / raw)
  To: gcc, rth, shebs, aldyh, aj, rakdver

Hi,
I didn't like the code we generate for _mm_set style of inlines, where
even when we load constant vector we first construct it on stack from
individual elements creating memory missmatch stall on load time.

I tried to use the SIMD support.  I suppose something like this should
work (and is accepted by parser):
#include <xmmintrin.h>
__v4si
t()
{
  __v4si val = {1,2,3,4};
    return val;
}
and I hope this to be compiled into static initializer loaded at once.
Unforutnately this does not happen, but even worse compiler dies:
athlon:~ # gcc -O2 t.c -msse2 -da
t.c: In function `t':
t.c:7: error: unable to find a register to spill in class `GENERAL_REGS'
t.c:7: error: this is the insn:
(insn:HI 10 9 11 0 0x403ad1e4 (set (subreg:SI (reg/v:V4SI 21 exmm0 [59]) 0)
    (const_int 1 [0x1])) 38 {*movsi_1} (insn_list 9 (nil))
    (nil))
t.c:7: confused by earlier errors, bailing out

Since compiler always generates for moves each setting different SImode
subreg of the vector.  There is no XMM alternative of such instruction
and I am not sure it is valid - how do we define
(set (subreg:SI (vector) 4) (value))
?
Is the other part of register killed, or do we use the rule that SImode
vectors are mixing?  Why we don't use the new vector operations for
this?

What would be preffered way to map this into XMM instructions? (I can
expand the code in emit_move_insn into the xmm sequence and hope that it
will get simplified later that won't happen currently, or perhaps we can
refine the API to be more flexible).

I would like to get the generic SIMD working on SSE in the next period.
Honza

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How is generic SIMD support supposed to work?
  2002-10-20 15:49 How is generic SIMD support supposed to work? Jan Hubicka
@ 2002-10-20 22:55 ` Richard Henderson
  2002-10-21  8:55   ` Jan Hubicka
  2002-12-08 16:49   ` Jan Hubicka
  2002-10-21  7:24 ` Aldy Hernandez
  1 sibling, 2 replies; 5+ messages in thread
From: Richard Henderson @ 2002-10-20 22:55 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, shebs, aldyh, aj, rakdver

On Sun, Oct 20, 2002 at 02:46:28PM +0200, Jan Hubicka wrote:
>   __v4si val = {1,2,3,4};
>     return val;
> }
> and I hope this to be compiled into static initializer loaded at once.
> Unforutnately this does not happen, but even worse compiler dies:

What happens on other architectures is that this value gets
loaded into 4 integer registers, and the subregging works as
expected.  That's going to be prohibitive on x86, so we either
need to arrange for such pseudos to get allocated to the stack
(via CLASS_CANNOT_CHANGE_MODE_P), and/or come up with another
mechanism (via named patterns, I assume) for the code generator
to ask to read or set a vector element.


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How is generic SIMD support supposed to work?
  2002-10-20 15:49 How is generic SIMD support supposed to work? Jan Hubicka
  2002-10-20 22:55 ` Richard Henderson
@ 2002-10-21  7:24 ` Aldy Hernandez
  1 sibling, 0 replies; 5+ messages in thread
From: Aldy Hernandez @ 2002-10-21  7:24 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: gcc, rth, shebs, aj, rakdver

Use CLASS_CANNOT_CHANGE_MODE to disable subregs of the vectors.

--
Aldy Hernandez                                E-mail: aldyh@redhat.com
Professional Gypsy in a one-street town.
Red Hat, Inc.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How is generic SIMD support supposed to work?
  2002-10-20 22:55 ` Richard Henderson
@ 2002-10-21  8:55   ` Jan Hubicka
  2002-12-08 16:49   ` Jan Hubicka
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Hubicka @ 2002-10-21  8:55 UTC (permalink / raw)
  To: Richard Henderson, Jan Hubicka, gcc, shebs, aldyh, aj, rakdver

> On Sun, Oct 20, 2002 at 02:46:28PM +0200, Jan Hubicka wrote:
> >   __v4si val = {1,2,3,4};
> >     return val;
> > }
> > and I hope this to be compiled into static initializer loaded at once.
> > Unforutnately this does not happen, but even worse compiler dies:
> 
> What happens on other architectures is that this value gets
> loaded into 4 integer registers, and the subregging works as

Yes, I know briefly how Sparc is working and I guess PPC has similar
architecture.  Too bad that XMM doesn't ;(

> expected.  That's going to be prohibitive on x86, so we either
> need to arrange for such pseudos to get allocated to the stack
> (via CLASS_CANNOT_CHANGE_MODE_P), and/or come up with another
> mechanism (via named patterns, I assume) for the code generator
> to ask to read or set a vector element.

Yes, named patterns is what I was thinking about, but the interface is
worrying me a bit.

We can define setM1M2 patterns pretty much in the style of Intel
intrincs.  However if you consider for instance moving from two MMX to
insgle SSE, you have to decide wehter to offload both to sequential
memory and read it or whether to do two MMX->SSE moves and shuffling.
This should be ideally decided on reload time but I don't see how it can
be accomplished.

For now I would be happy with CLASS_CANNOT_CHANGE_MODE_P sollution.
Unforutnately XMM registers can be subregged to SImode or DImode sanely
when the subreg is lowpart, but can't be done for non-lowpart subregs.
It may be usefull to do 64bit logical operations in SSE with movq used
for loading/storing operands, so CLASS_CANNOT_CHANGE_MODE_P does not
appear to do the job here for me too.

I will try to define it for a start to see what happends - at the moment
we don't use SI or DImode subregs on XMM.

Honza
> 
> 
> r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: How is generic SIMD support supposed to work?
  2002-10-20 22:55 ` Richard Henderson
  2002-10-21  8:55   ` Jan Hubicka
@ 2002-12-08 16:49   ` Jan Hubicka
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Hubicka @ 2002-12-08 16:49 UTC (permalink / raw)
  To: Richard Henderson, Jan Hubicka, gcc, shebs, aldyh, aj, rakdver

> On Sun, Oct 20, 2002 at 02:46:28PM +0200, Jan Hubicka wrote:
> >   __v4si val = {1,2,3,4};
> >     return val;
> > }
> > and I hope this to be compiled into static initializer loaded at once.
> > Unforutnately this does not happen, but even worse compiler dies:
> 
> What happens on other architectures is that this value gets
> loaded into 4 integer registers, and the subregging works as
> expected.  That's going to be prohibitive on x86, so we either
> need to arrange for such pseudos to get allocated to the stack
> (via CLASS_CANNOT_CHANGE_MODE_P), and/or come up with another
> mechanism (via named patterns, I assume) for the code generator
> to ask to read or set a vector element.
HI,
I've hit this problem in different context.  We currently misscompile
gcc on P4 and K8 since we generate:

(insn:HI 178 177 179 3 0x2a95ce16c0 (set (subreg:DF (reg/v:DI 94) 0)
        (reg/v:DF 58)) 93 {*movdf_nointeger} (nil)
    (expr_list:REG_DEAD (reg/v:DF 58)
        (nil)))

(insn:HI 179 178 180 3 0x2a95ce16c0 (set (mem/f:SI (plus:SI (reg/f:SI 7 esp)
                (const_int 16 [0x10])) [0 S4 A32])
        (subreg:SI (reg/v:DI 94) 0)) 44 {*movsi_1} (insn_list 178 (nil))
    (nil))

(insn:HI 180 179 181 3 0x2a95ce16c0 (set (mem/f:SI (plus:SI (reg/f:SI 7 esp)
                (const_int 12 [0xc])) [0 S4 A32])
        (subreg:SI (reg/v:DI 94) 4)) 44 {*movsi_1} (nil)
    (expr_list:REG_DEAD (reg/v:DI 94)
        (nil)))

Register allocator decides to put 94 into XMM that is kind of sane
decision.  The sequence than comes out as two movqs with xmm0 as
operand.

I tried to use CLASS_CANNOT_CHANGE_MODE_P like this:
#define CANNOT_CHANGE_MODE_CLASS(FROM, TO) \
  ((FROM) != (TO) ? SSE_REGS : NO_REGS)
But this makes us to refuse subregs like (subreg:V2DF (reg:DF) 0)
that is valid and we use it to represent some of scalar fp operations in
later optimization passes.

problem is that I don't see how to distinquish it from
(subreg:V2DF (reg:DF) 8) that is ivalid.
What would you think about adding an SUBREG_BYTE arugment into the macro
and updating everything?  This is the only way out I see and I think
this is important regression we should fix in 3.3

Honza
> 
> 
> r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-12-09  0:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-20 15:49 How is generic SIMD support supposed to work? Jan Hubicka
2002-10-20 22:55 ` Richard Henderson
2002-10-21  8:55   ` Jan Hubicka
2002-12-08 16:49   ` Jan Hubicka
2002-10-21  7:24 ` Aldy Hernandez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).