public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* VIS2 pattern review
@ 2011-10-13 12:29 Richard Henderson
  2011-10-13 19:56 ` David Miller
  2011-10-13 22:46 ` David Miller
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Henderson @ 2011-10-13 12:29 UTC (permalink / raw)
  To: David Miller; +Cc: gcc

[ Using the UltraSparc Architecture, Draft D0.9.4, 27 Sep 2010.
  I believe this is the most recent public manual.  It covers
  VIS1 and VIS2 but not VIS3. ]

The comment for fpmerge_vis is not correct.
I believe that the operation is representable with

  (vec_select:V8QI
    (vec_concat:V8QI
      (match_operand:V4QI 1 ...)
      (match_operand:V4QI 2 ...)
    (parallel [
	0 4 1 5 2 6 3 7
	]))

which can be used as the basis for both of the

  vec_interleave_lowv8qi
  vec_interleave_highv8qi

named patterns.

------

> (define_insn "fmul8x16_vis"
>   [(set (match_operand:V4HI 0 "register_operand" "=e")
>         (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
>                    (match_operand:V4HI 2 "register_operand" "e")))]

This is invalid rtl.  You need

  (mult:V4HI
    (zero_extend:V4HI
      (match_operand:V4QI 1 ...))
    (match_operand:V4HI 2 ...))

> (define_insn "fmul8x16au_vis"
>   [(set (match_operand:V4HI 0 "register_operand" "=e")
>         (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
>                    (match_operand:V2HI 2 "register_operand" "f")))]

AFAICS, this needs an unspec, like fmul8x16al.
Similarly for fmul8sux16_vis, fmuld8sux16_vis,

There's a code sample 7-1 that illustrates a 16x16 multiply:

	fmul8sux16 %f0, %f1, %f2
	fmul8ulx16 %f0, %f1, %f3
	fpadd16    %f2, %f3, %f4

This expansion ought to be available via the "mulv4hi3" named pattern.

Similarly there's a 16x16 -> 32 multiply example:

	fmuld8sux16 %f0, %f1, %f2
	fmuld8ulx16 %f0, %f1, %f3
	fpadd32     %f2, %f3, %f4

that ought to be available via the "vec_widen_smult_{hi,lo}_v4hi"
named patterns.

------

The "movmisalign<mode>" named pattern ought be provided, utilizing the
alignaddr / faligndata insns.

------

The "vec_perm{,_const}v8qi" named patterns ought to be provided using
the bmask / bshuffle insns.

For vec_perm_constv8qi, the compaction of the input byte data to nibble
data, as input to bmask, can happen at compile-time.  For vec_permv8qi,
you'll need to do this at runtime:

Considering each character as a nibble (x = garbage, . = zero):

	i = input 			= xaxbxcxdxexfxgxh
	t1 = i  & 0x0f0f0f0f0f0f0f0f	= .a.b.c.d.e.f.g.h
	t2 = t1 >> 4			= ..a.b.c.d.e.f.g.
	t3 = t1 + t2			= ..abbccddeeffggh
	t4 = t3 & 0x00ff00ff00ff00ff    = ..ab..cd..ef..gh
	t5 = t4 >> 8			= ....ab..cd..ef..
	t6 = t4 + t5			= ..ababcdcdefefgh
	t7 = t6 & 0x0000ffff0000ffff	= ....abcd....efgh
	t8 = t7 >> 16			= ........abcd....
	t9 = t7 + t8			= ........abcdefgh

where that last addition can be performed by the bmask itself.

Dunno if you can come up with a more efficient sequence.  Indeed,
you may want two totally separate sequences depending on whether
the original input is in fp (vector) or integer registers.  Which
of course means delaying the expansion until reload.

------

The comment above cmask8<>_vis suggests an implementation of
the named "vcond<><>" patterns.

------

> (define_insn "fpadd64_vis"
>   [(set (match_operand:DI 0 "register_operand" "=e")
>         (plus:DI (match_operand:DI 1 "register_operand" "e")
>                  (match_operand:DI 2 "register_operand" "e")))]
>   "TARGET_VIS3"
>   "fpadd64\t%1, %2, %0")

This must be folded into the main "adddi3" pattern, like fpadd32s.
It's not recognizable otherwise.  Similarly fpsub64.  If these
patterns were earlier in the file you'd have noticed them breaking
the build.

------

> (define_code_iterator vis3_addsub_ss [ss_plus ss_minus])
> (define_code_attr vis3_addsub_ss_insn
>   [(ss_plus "fpadds") (ss_minus "fpsubs")])
> 
> (define_insn "<vis3_addsub_ss_insn><vbits>_vis"
>   [(set (match_operand:VASS 0 "register_operand" "=<vconstr>")
>         (vis3_addsub_ss:VASS (match_operand:VASS 1 "register_operand" "<vconstr>")
>                              (match_operand:VASS 2 "register_operand" "<vconstr>")))]
>   "TARGET_VIS3"
>   "<vis3_addsub_ss_insn><vbits>\t%1, %2, %0")

These should be exposed as "ssadd<mode>3" "sssub<mode>3".

Unfortunately, the compiler won't do anything with them yet,
but those are the canonical names for signed saturating add/sub,
and if you use those names we'll automatically use them properly
once the vectorizer is extended in the right directions.

------

Other missing vectorization patterns:

  vec_init
  vec_set
  vec_extract
  vec_extract_even
  vec_extract_odd
  vec_unpack{s,u}_{hi,lo}
  vec_unpack{s,u}_float_{hi,lo}
  vec_pack_{trunc,ssat,usat}
  vec_pack_{s,u}fix_trunc

The first three should be provided any time any vector operation
is supported, if at all possible.  Otherwise the compiler will
wind up dropping the data to memory to manipulate it.

The even/odd can be implemented with bshuffle.  We probably ought
to handle this in the middle-end by falling back to vec_perm*, 
but we currently don't.  PPC and SPU could be simplified with this.

The vec_pack_trunc pattern is essentially the same as even/odd,
with the right one selected by endianness.  That said, we still
don't fall back to another pattern.

The other patterns, I don't believe could be helped by the middle-end.
At least not yet.  I seem to recall we've been talking about some
generic representation of vector comparisons, which could be used to
aid middle-end expansions of vec_unpack via compares, zeros, and
vec_interleave.

I don't know how much VIS3 provides that could create specialized
versions of any of these.


Happy hacking,


r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-10-14  6:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-13 12:29 VIS2 pattern review Richard Henderson
2011-10-13 19:56 ` David Miller
2011-10-13 20:06   ` David Miller
2011-10-13 20:20   ` Richard Henderson
2011-10-13 22:46 ` David Miller
2011-10-13 22:53   ` Richard Henderson
2011-10-14  1:50     ` David Miller
2011-10-14  4:38   ` Eric Botcazou
2011-10-14  6:06     ` David Miller
2011-10-14 16:40       ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).