VIS2 pattern review

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* VIS2 pattern review
@ 2011-10-13 12:29 Richard Henderson
  2011-10-13 19:56 ` David Miller
  2011-10-13 22:46 ` David Miller
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Henderson @ 2011-10-13 12:29 UTC (permalink / raw)
  To: David Miller; +Cc: gcc

[ Using the UltraSparc Architecture, Draft D0.9.4, 27 Sep 2010.
  I believe this is the most recent public manual.  It covers
  VIS1 and VIS2 but not VIS3. ]

The comment for fpmerge_vis is not correct.
I believe that the operation is representable with

  (vec_select:V8QI
    (vec_concat:V8QI
      (match_operand:V4QI 1 ...)
      (match_operand:V4QI 2 ...)
    (parallel [
	0 4 1 5 2 6 3 7
	]))

which can be used as the basis for both of the

  vec_interleave_lowv8qi
  vec_interleave_highv8qi

named patterns.

------

> (define_insn "fmul8x16_vis"
>   [(set (match_operand:V4HI 0 "register_operand" "=e")
>         (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
>                    (match_operand:V4HI 2 "register_operand" "e")))]

This is invalid rtl.  You need

  (mult:V4HI
    (zero_extend:V4HI
      (match_operand:V4QI 1 ...))
    (match_operand:V4HI 2 ...))

> (define_insn "fmul8x16au_vis"
>   [(set (match_operand:V4HI 0 "register_operand" "=e")
>         (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
>                    (match_operand:V2HI 2 "register_operand" "f")))]

AFAICS, this needs an unspec, like fmul8x16al.
Similarly for fmul8sux16_vis, fmuld8sux16_vis,

There's a code sample 7-1 that illustrates a 16x16 multiply:

	fmul8sux16 %f0, %f1, %f2
	fmul8ulx16 %f0, %f1, %f3
	fpadd16    %f2, %f3, %f4

This expansion ought to be available via the "mulv4hi3" named pattern.

Similarly there's a 16x16 -> 32 multiply example:

	fmuld8sux16 %f0, %f1, %f2
	fmuld8ulx16 %f0, %f1, %f3
	fpadd32     %f2, %f3, %f4

that ought to be available via the "vec_widen_smult_{hi,lo}_v4hi"
named patterns.

------

The "movmisalign<mode>" named pattern ought be provided, utilizing the
alignaddr / faligndata insns.

------

The "vec_perm{,_const}v8qi" named patterns ought to be provided using
the bmask / bshuffle insns.

For vec_perm_constv8qi, the compaction of the input byte data to nibble
data, as input to bmask, can happen at compile-time.  For vec_permv8qi,
you'll need to do this at runtime:

Considering each character as a nibble (x = garbage, . = zero):

	i = input 			= xaxbxcxdxexfxgxh
	t1 = i  & 0x0f0f0f0f0f0f0f0f	= .a.b.c.d.e.f.g.h
	t2 = t1 >> 4			= ..a.b.c.d.e.f.g.
	t3 = t1 + t2			= ..abbccddeeffggh
	t4 = t3 & 0x00ff00ff00ff00ff    = ..ab..cd..ef..gh
	t5 = t4 >> 8			= ....ab..cd..ef..
	t6 = t4 + t5			= ..ababcdcdefefgh
	t7 = t6 & 0x0000ffff0000ffff	= ....abcd....efgh
	t8 = t7 >> 16			= ........abcd....
	t9 = t7 + t8			= ........abcdefgh

where that last addition can be performed by the bmask itself.

Dunno if you can come up with a more efficient sequence.  Indeed,
you may want two totally separate sequences depending on whether
the original input is in fp (vector) or integer registers.  Which
of course means delaying the expansion until reload.

------

The comment above cmask8<>_vis suggests an implementation of
the named "vcond<><>" patterns.

------

> (define_insn "fpadd64_vis"
>   [(set (match_operand:DI 0 "register_operand" "=e")
>         (plus:DI (match_operand:DI 1 "register_operand" "e")
>                  (match_operand:DI 2 "register_operand" "e")))]
>   "TARGET_VIS3"
>   "fpadd64\t%1, %2, %0")

This must be folded into the main "adddi3" pattern, like fpadd32s.
It's not recognizable otherwise.  Similarly fpsub64.  If these
patterns were earlier in the file you'd have noticed them breaking
the build.

------

> (define_code_iterator vis3_addsub_ss [ss_plus ss_minus])
> (define_code_attr vis3_addsub_ss_insn
>   [(ss_plus "fpadds") (ss_minus "fpsubs")])
> 
> (define_insn "<vis3_addsub_ss_insn><vbits>_vis"
>   [(set (match_operand:VASS 0 "register_operand" "=<vconstr>")
>         (vis3_addsub_ss:VASS (match_operand:VASS 1 "register_operand" "<vconstr>")
>                              (match_operand:VASS 2 "register_operand" "<vconstr>")))]
>   "TARGET_VIS3"
>   "<vis3_addsub_ss_insn><vbits>\t%1, %2, %0")

These should be exposed as "ssadd<mode>3" "sssub<mode>3".

Unfortunately, the compiler won't do anything with them yet,
but those are the canonical names for signed saturating add/sub,
and if you use those names we'll automatically use them properly
once the vectorizer is extended in the right directions.

------

Other missing vectorization patterns:

  vec_init
  vec_set
  vec_extract
  vec_extract_even
  vec_extract_odd
  vec_unpack{s,u}_{hi,lo}
  vec_unpack{s,u}_float_{hi,lo}
  vec_pack_{trunc,ssat,usat}
  vec_pack_{s,u}fix_trunc

The first three should be provided any time any vector operation
is supported, if at all possible.  Otherwise the compiler will
wind up dropping the data to memory to manipulate it.

The even/odd can be implemented with bshuffle.  We probably ought
to handle this in the middle-end by falling back to vec_perm*, 
but we currently don't.  PPC and SPU could be simplified with this.

The vec_pack_trunc pattern is essentially the same as even/odd,
with the right one selected by endianness.  That said, we still
don't fall back to another pattern.

The other patterns, I don't believe could be helped by the middle-end.
At least not yet.  I seem to recall we've been talking about some
generic representation of vector comparisons, which could be used to
aid middle-end expansions of vec_unpack via compares, zeros, and
vec_interleave.

I don't know how much VIS3 provides that could create specialized
versions of any of these.

Happy hacking,

r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VIS2 pattern review
  2011-10-13 12:29 VIS2 pattern review Richard Henderson
@ 2011-10-13 19:56 ` David Miller
  2011-10-13 20:06   ` David Miller
  2011-10-13 20:20   ` Richard Henderson
  2011-10-13 22:46 ` David Miller
  1 sibling, 2 replies; 10+ messages in thread
From: David Miller @ 2011-10-13 19:56 UTC (permalink / raw)
  To: rth; +Cc: gcc

From: Richard Henderson <rth@redhat.com>
Date: Wed, 12 Oct 2011 17:49:19 -0700

> There's a code sample 7-1 that illustrates a 16x16 multiply:
> 
> 	fmul8sux16 %f0, %f1, %f2
> 	fmul8ulx16 %f0, %f1, %f3
> 	fpadd16    %f2, %f3, %f4

Be wary of code examples that don't even assemble (even numbered
float registers are required here).

fmul8sux16 basically does, for each element:

	src1 = (rs1 >> 8) & 0xff;
	src2 = rs2 & 0xffff;

	product = src1 * src2;

	scaled = (product & 0x00ffff00) >> 8;
	if (product & 0x80)
		scaled++;

	rd = scaled & 0xffff;

fmul8ulx16 does the same except the assignment to src1 is:

	src1 = rs1 & 0xff;

Therefore, I think this "16 x 16 multiply" operation isn't the kind
you think it is, and it's therefore not appropriate to use this in the
compiler for vector multiplies.

Just for shits and grins I tried it and the slp-7 testcase, as expected,
fails.  The main multiply loop in that test case is compiled to:

	sethi   %hi(.LLC6), %i3
	sethi   %hi(in2), %g1
	ldd     [%i3+%lo(.LLC6)], %f22
	sethi   %hi(.LLC7), %i4
	sethi   %hi(.LLC8), %i2
	sethi   %hi(.LLC9), %i3
	add     %fp, -256, %g2
	ldd     [%i4+%lo(.LLC7)], %f20
	or      %g1, %lo(in2), %g1  
	ldd     [%i2+%lo(.LLC8)], %f18
	mov     %fp, %i5
	ldd     [%i3+%lo(.LLC9)], %f16
	mov     %g1, %g4
	mov     %g2, %g3
.LL10:
	ldd     [%g4+8], %f14
	ldd     [%g4+16], %f12
	fmul8sux16      %f14, %f22, %f26
	ldd     [%g4+24], %f10    
	fmul8ulx16      %f14, %f22, %f24
     	ldd     [%g4], %f8
	fmul8sux16      %f12, %f20, %f34
	fmul8ulx16      %f12, %f20, %f32
	fmul8sux16      %f10, %f18, %f30
	fpadd16 %f26, %f24, %f14
	fmul8ulx16      %f10, %f18, %f28
	fmul8sux16      %f8, %f16, %f26
	fmul8ulx16      %f8, %f16, %f24
	fpadd16 %f34, %f32, %f12
	std     %f14, [%g3+8]
	fpadd16 %f30, %f28, %f10
	std     %f12, [%g3+16]
     	fpadd16 %f26, %f24, %f8
	std     %f10, [%g3+24]
	std     %f8, [%g3]
	add     %g3, 32, %g3
	cmp     %g3, %i5
	bne,pt  %icc, .LL10
	 add    %g4, 32, %g4

and it simply gives the wrong results.

The entire out2[] array is all zeros.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VIS2 pattern review
  2011-10-13 19:56 ` David Miller
@ 2011-10-13 20:06   ` David Miller
  2011-10-13 20:20   ` Richard Henderson
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2011-10-13 20:06 UTC (permalink / raw)
  To: rth; +Cc: gcc

From: David Miller <davem@davemloft.net>
Date: Thu, 13 Oct 2011 14:26:36 -0400 (EDT)

> 	product = src1 * src2;
> 
> 	scaled = (product & 0x00ffff00) >> 8;
> 	if (product & 0x80)
> 		scaled++;

In fact, all of the partitioned multiply instructions scale the result
by 8 bits with rounding towards positive infinity.

Therefore, we have to use an unspec for all of them.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VIS2 pattern review
  2011-10-13 19:56 ` David Miller
  2011-10-13 20:06   ` David Miller
@ 2011-10-13 20:20   ` Richard Henderson
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2011-10-13 20:20 UTC (permalink / raw)
  To: David Miller; +Cc: gcc

On 10/13/2011 11:26 AM, David Miller wrote:
> Therefore, I think this "16 x 16 multiply" operation isn't the kind
> you think it is, and it's therefore not appropriate to use this in the
> compiler for vector multiplies.

Ah, I see the magic word in the docs now: "fixed point".
I.e. class MODE_ACCUM not class MODE_INT.

I guess that's a totally different kind of support that could be added.


r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VIS2 pattern review
  2011-10-13 12:29 VIS2 pattern review Richard Henderson
  2011-10-13 19:56 ` David Miller
@ 2011-10-13 22:46 ` David Miller
  2011-10-13 22:53   ` Richard Henderson
  2011-10-14  4:38   ` Eric Botcazou
  1 sibling, 2 replies; 10+ messages in thread
From: David Miller @ 2011-10-13 22:46 UTC (permalink / raw)
  To: rth; +Cc: gcc

From: Richard Henderson <rth@redhat.com>
Date: Wed, 12 Oct 2011 17:49:19 -0700

> The comment for fpmerge_vis is not correct.
> I believe that the operation is representable with
> 
>   (vec_select:V8QI
>     (vec_concat:V8QI
>       (match_operand:V4QI 1 ...)
>       (match_operand:V4QI 2 ...)
>     (parallel [
> 	0 4 1 5 2 6 3 7
> 	]))
> 
> which can be used as the basis for both of the
> 
>   vec_interleave_lowv8qi
>   vec_interleave_highv8qi
> 
> named patterns.

Agreed.

> AFAICS, this needs an unspec, like fmul8x16al.
> Similarly for fmul8sux16_vis, fmuld8sux16_vis,

Yes, as we found all the partitioned multiplies need to be unspecs.

>> (define_code_iterator vis3_addsub_ss [ss_plus ss_minus])
>> (define_code_attr vis3_addsub_ss_insn
>>   [(ss_plus "fpadds") (ss_minus "fpsubs")])
>> 
>> (define_insn "<vis3_addsub_ss_insn><vbits>_vis"
>>   [(set (match_operand:VASS 0 "register_operand" "=<vconstr>")
>>         (vis3_addsub_ss:VASS (match_operand:VASS 1 "register_operand" "<vconstr>")
>>                              (match_operand:VASS 2 "register_operand" "<vconstr>")))]
>>   "TARGET_VIS3"
>>   "<vis3_addsub_ss_insn><vbits>\t%1, %2, %0")
> 
> These should be exposed as "ssadd<mode>3" "sssub<mode>3".

Agreed.

I'm currently regstrapping the patch at the end of this mail and will
commit it to trunk if no regressions pop up.

I'll look into the rest of your feedback.  But I want to look into a more
fundamental issue with VIS support before moving much further.

I worked for several evenings on adding support for the VIS3
instructions that move directly between float and integer regs.  I tried
really hard to get the compiler to do something sensible but it's next
to impossible for two reasons:

1) We don't represent single entry vectors using vector modes, we just
   use SImode, DImode etc.

   I think this is a huge mistake, because the compiler now thinks it
   can do "SImode stuff" in the float regs.  Other backends are able
   to segregate vector vs. non-vector operations by using the single
   entry vector modes.

2) In addition to that, because of how we number the registers for
   allocation on sparc for leaf functions, the compiler starts trying
   to reload SImode and DImode values into the floating point registers
   before trying to use the non-leaf integer regs.

   Because the leaf allocation order is "leaf integer regs", "float
   regs", "non-leaf integer regs".

Even if I jacked up the register move cost for the cases where the
VIS3 instructions applied, it still did these kinds of reloads.

This also gets reload into trouble because it believes that if it can
move an address value (say Pmode == SImode) from one place to another,
then a plus on the same operands can be performed (with perhaps minor
reloading).  But that doesn't work when this "move" is "move float reg
to int reg" and therefore the operands are "%f12" and "%g3".  It
tries to do things like "(plus:SI (reg:SI %f12) (reg:SI %g3))"

All of these troubles would be eliminated if we used vector modes for
all the VIS operations instead of using SImode and DImode for the single
entry vector cases.

Unfortunately, that would involve some ABI changes for the VIS
builtins.  I'm trending towards considering just changing things
anyways since the VIS intrinsics were next to unusable beforehand.
I've scoured the net for examples of people actually using the GCC
intrinsics before all of my recent changes, and they all fall into two
categories: 1) they use inline assembler because the VIS intrinsics
don't work and 2) they try to use the intrinsics but the code is
disabled because it "doesn't work".

--------------------
Fix the RTL of some sparc VIS patterns.

	* config/sparc/sparc.md (UNSPEC_FPMERGE): Delete.
	(UNSPEC_MUL16AU, UNSPEC_MUL8, UNSPEC_MUL8SU, UNSPEC_MULDSU): New
	unspecs.
	(fpmerge_vis): Remove inaccurate comment, represent using vec_select
	of a vec_concat.
	(vec_interleave_lowv8qi, vec_interleave_highv8qi): New insns.
	(fmul8x16_vis, fmul8x16au_vis, fmul8sux16_vis, fmuld8sux16_vis):
	Reimplement as unspecs and remove inaccurate comments.
	(vis3_shift_patname): New code attr.
	(<vis3_shift_insn><vbits>_vis): Rename to "v<vis3_shift_patname><mode>3".
	(vis3_addsub_ss_patname): New code attr.
	(<vis3_addsub_ss_insn><vbits>_vis): Rename to "<vis3_addsub_ss_patname><mode>".

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 017594f..ae36634 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,18 @@
 2011-10-12  David S. Miller  <davem@davemloft.net>
 
+	* config/sparc/sparc.md (UNSPEC_FPMERGE): Delete.
+	(UNSPEC_MUL16AU, UNSPEC_MUL8, UNSPEC_MUL8SU, UNSPEC_MULDSU): New
+	unspecs.
+	(fpmerge_vis): Remove inaccurate comment, represent using vec_select
+	of a vec_concat.
+	(vec_interleave_lowv8qi, vec_interleave_highv8qi): New insns.
+	(fmul8x16_vis, fmul8x16au_vis, fmul8sux16_vis, fmuld8sux16_vis):
+	Reimplement as unspecs and remove inaccurate comments.
+	(vis3_shift_patname): New code attr.
+	(<vis3_shift_insn><vbits>_vis): Rename to "v<vis3_shift_patname><mode>3".
+	(vis3_addsub_ss_patname): New code attr.
+	(<vis3_addsub_ss_insn><vbits>_vis): Rename to "<vis3_addsub_ss_patname><mode>".
+
 	* config/sparc/sparc.h: Do not force TARGET_VIS3 and TARGET_FMAF
 	to zero when assembler lacks support for such instructions.
 	* config/sparc/sparc.c (sparc_option_override): Clear MASK_VIS3
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index fa790b3..7211658 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -9492,21 +9492,21 @@ sparc_vis_init_builtins (void)
       def_builtin_const ("__builtin_vis_fchksm16", CODE_FOR_fchksm16_vis,
 			 v4hi_ftype_v4hi_v4hi);
 
-      def_builtin_const ("__builtin_vis_fsll16", CODE_FOR_fsll16_vis,
+      def_builtin_const ("__builtin_vis_fsll16", CODE_FOR_vashlv4hi3,
 			 v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fslas16", CODE_FOR_fslas16_vis,
+      def_builtin_const ("__builtin_vis_fslas16", CODE_FOR_vssashlv4hi3,
 			 v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fsrl16", CODE_FOR_fsrl16_vis,
+      def_builtin_const ("__builtin_vis_fsrl16", CODE_FOR_vlshrv4hi3,
 			 v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fsra16", CODE_FOR_fsra16_vis,
+      def_builtin_const ("__builtin_vis_fsra16", CODE_FOR_vashrv4hi3,
 			 v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fsll32", CODE_FOR_fsll32_vis,
+      def_builtin_const ("__builtin_vis_fsll32", CODE_FOR_vashlv2si3,
 			 v2si_ftype_v2si_v2si);
-      def_builtin_const ("__builtin_vis_fslas32", CODE_FOR_fslas32_vis,
+      def_builtin_const ("__builtin_vis_fslas32", CODE_FOR_vssashlv2si3,
 			 v2si_ftype_v2si_v2si);
-      def_builtin_const ("__builtin_vis_fsrl32", CODE_FOR_fsrl32_vis,
+      def_builtin_const ("__builtin_vis_fsrl32", CODE_FOR_vlshrv2si3,
 			 v2si_ftype_v2si_v2si);
-      def_builtin_const ("__builtin_vis_fsra32", CODE_FOR_fsra32_vis,
+      def_builtin_const ("__builtin_vis_fsra32", CODE_FOR_vashrv2si3,
 			 v2si_ftype_v2si_v2si);
 
       if (TARGET_ARCH64)
@@ -9523,21 +9523,21 @@ sparc_vis_init_builtins (void)
       def_builtin_const ("__builtin_vis_fpsub64", CODE_FOR_fpsub64_vis,
 			 di_ftype_di_di);
 
-      def_builtin_const ("__builtin_vis_fpadds16", CODE_FOR_fpadds16_vis,
+      def_builtin_const ("__builtin_vis_fpadds16", CODE_FOR_ssaddv4hi,
 			 v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fpadds16s", CODE_FOR_fpadds16s_vis,
+      def_builtin_const ("__builtin_vis_fpadds16s", CODE_FOR_ssaddv2hi,
 			 v2hi_ftype_v2hi_v2hi);
-      def_builtin_const ("__builtin_vis_fpsubs16", CODE_FOR_fpsubs16_vis,
+      def_builtin_const ("__builtin_vis_fpsubs16", CODE_FOR_sssubv4hi,
 			 v4hi_ftype_v4hi_v4hi);
-      def_builtin_const ("__builtin_vis_fpsubs16s", CODE_FOR_fpsubs16s_vis,
+      def_builtin_const ("__builtin_vis_fpsubs16s", CODE_FOR_sssubv2hi,
 			 v2hi_ftype_v2hi_v2hi);
-      def_builtin_const ("__builtin_vis_fpadds32", CODE_FOR_fpadds32_vis,
+      def_builtin_const ("__builtin_vis_fpadds32", CODE_FOR_ssaddv2si,
 			 v2si_ftype_v2si_v2si);
-      def_builtin_const ("__builtin_vis_fpadds32s", CODE_FOR_fpadds32s_vis,
+      def_builtin_const ("__builtin_vis_fpadds32s", CODE_FOR_ssaddsi,
 			 v1si_ftype_v1si_v1si);
-      def_builtin_const ("__builtin_vis_fpsubs32", CODE_FOR_fpsubs32_vis,
+      def_builtin_const ("__builtin_vis_fpsubs32", CODE_FOR_sssubv2si,
 			 v2si_ftype_v2si_v2si);
-      def_builtin_const ("__builtin_vis_fpsubs32s", CODE_FOR_fpsubs32s_vis,
+      def_builtin_const ("__builtin_vis_fpsubs32s", CODE_FOR_sssubsi,
 			 v1si_ftype_v1si_v1si);
 
       if (TARGET_ARCH64)
diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md
index 24993fb..fb8fe8b 100644
--- a/gcc/config/sparc/sparc.md
+++ b/gcc/config/sparc/sparc.md
@@ -53,7 +53,7 @@
    (UNSPEC_FPACK32		41)
    (UNSPEC_FPACKFIX		42)
    (UNSPEC_FEXPAND		43)
-   (UNSPEC_FPMERGE		44)
+   (UNSPEC_MUL16AU		44)
    (UNSPEC_MUL16AL		45)
    (UNSPEC_MUL8UL		46)
    (UNSPEC_MULDUL		47)
@@ -89,6 +89,9 @@
    (UNSPEC_FHADD		83)
    (UNSPEC_FHSUB		84)
    (UNSPEC_XMUL			85)
+   (UNSPEC_MUL8			86)
+   (UNSPEC_MUL8SU		87)
+   (UNSPEC_MULDSU		88)
   ])
 
 (define_constants
@@ -8004,36 +8007,64 @@
  [(set_attr "type" "fga")
   (set_attr "fptype" "double")])
 
-;; It may be possible to describe this operation as (1 indexed):
-;; (vec_select (vec_duplicate (vec_duplicate (vec_concat 1 2)))
-;;  1,5,10,14,19,23,28,32)
-;; Note that (vec_merge:V8QI [(V4QI) (V4QI)] (10101010 = 170) doesn't work
-;; because vec_merge expects all the operands to be of the same type.
 (define_insn "fpmerge_vis"
   [(set (match_operand:V8QI 0 "register_operand" "=e")
-        (unspec:V8QI [(match_operand:V4QI 1 "register_operand" "f")
-                      (match_operand:V4QI 2 "register_operand" "f")]
-         UNSPEC_FPMERGE))]
+        (vec_select:V8QI
+          (vec_concat:V8QI (match_operand:V4QI 1 "register_operand" "f")
+                           (match_operand:V4QI 2 "register_operand" "f"))
+          (parallel [(const_int 0) (const_int 4)
+                     (const_int 1) (const_int 5)
+                     (const_int 2) (const_int 6)
+		     (const_int 3) (const_int 7)])))]
  "TARGET_VIS"
  "fpmerge\t%1, %2, %0"
  [(set_attr "type" "fga")
   (set_attr "fptype" "double")])
 
+(define_insn "vec_interleave_lowv8qi"
+  [(set (match_operand:V8QI 0 "register_operand" "=e")
+        (vec_select:V8QI
+          (vec_concat:V16QI (match_operand:V8QI 1 "register_operand" "f")
+                            (match_operand:V8QI 2 "register_operand" "f"))
+          (parallel [(const_int 0) (const_int 8)
+                     (const_int 1) (const_int 9)
+                     (const_int 2) (const_int 10)
+		     (const_int 3) (const_int 11)])))]
+ "TARGET_VIS"
+ "fpmerge\t%L1, %L2, %0"
+ [(set_attr "type" "fga")
+  (set_attr "fptype" "double")])
+
+(define_insn "vec_interleave_highv8qi"
+  [(set (match_operand:V8QI 0 "register_operand" "=e")
+        (vec_select:V8QI
+          (vec_concat:V16QI (match_operand:V8QI 1 "register_operand" "f")
+                            (match_operand:V8QI 2 "register_operand" "f"))
+          (parallel [(const_int 4) (const_int 12)
+                     (const_int 5) (const_int 13)
+                     (const_int 6) (const_int 14)
+		     (const_int 7) (const_int 15)])))]
+ "TARGET_VIS"
+ "fpmerge\t%H1, %H2, %0"
+ [(set_attr "type" "fga")
+  (set_attr "fptype" "double")])
+
 ;; Partitioned multiply instructions
 (define_insn "fmul8x16_vis"
   [(set (match_operand:V4HI 0 "register_operand" "=e")
-        (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
-                   (match_operand:V4HI 2 "register_operand" "e")))]
+        (unspec:V4HI [(match_operand:V4QI 1 "register_operand" "f")
+                      (match_operand:V4HI 2 "register_operand" "e")]
+         UNSPEC_MUL8))]
   "TARGET_VIS"
   "fmul8x16\t%1, %2, %0"
   [(set_attr "type" "fpmul")
    (set_attr "fptype" "double")])
 
-;; Only one of the following two insns can be a multiply.
 (define_insn "fmul8x16au_vis"
   [(set (match_operand:V4HI 0 "register_operand" "=e")
-        (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
-                   (match_operand:V2HI 2 "register_operand" "f")))]
+        (unspec:V4HI [(match_operand:V4QI 1 "register_operand" "f")
+                      (match_operand:V2HI 2 "register_operand" "f")]
+         UNSPEC_MUL16AU))]
   "TARGET_VIS"
   "fmul8x16au\t%1, %2, %0"
   [(set_attr "type" "fpmul")
@@ -8049,11 +8080,11 @@
   [(set_attr "type" "fpmul")
    (set_attr "fptype" "double")])
 
-;; Only one of the following two insns can be a multiply.
 (define_insn "fmul8sux16_vis"
   [(set (match_operand:V4HI 0 "register_operand" "=e")
-        (mult:V4HI (match_operand:V8QI 1 "register_operand" "e")
-                   (match_operand:V4HI 2 "register_operand" "e")))]
+        (unspec:V4HI [(match_operand:V8QI 1 "register_operand" "e")
+                      (match_operand:V4HI 2 "register_operand" "e")]
+         UNSPEC_MUL8SU))]
   "TARGET_VIS"
   "fmul8sux16\t%1, %2, %0"
   [(set_attr "type" "fpmul")
@@ -8069,11 +8100,11 @@
   [(set_attr "type" "fpmul")
    (set_attr "fptype" "double")])
 
-;; Only one of the following two insns can be a multiply.
 (define_insn "fmuld8sux16_vis"
   [(set (match_operand:V2SI 0 "register_operand" "=e")
-        (mult:V2SI (match_operand:V4QI 1 "register_operand" "f")
-                   (match_operand:V2HI 2 "register_operand" "f")))]
+        (unspec:V2SI [(match_operand:V4QI 1 "register_operand" "f")
+                      (match_operand:V2HI 2 "register_operand" "f")]
+         UNSPEC_MULDSU))]
   "TARGET_VIS"
   "fmuld8sux16\t%1, %2, %0"
   [(set_attr "type" "fpmul")
@@ -8440,8 +8471,10 @@
 (define_code_iterator vis3_shift [ashift ss_ashift lshiftrt ashiftrt])
 (define_code_attr vis3_shift_insn
   [(ashift "fsll") (ss_ashift "fslas") (lshiftrt "fsrl") (ashiftrt "fsra")])
+(define_code_attr vis3_shift_patname
+  [(ashift "ashl") (ss_ashift "ssashl") (lshiftrt "lshr") (ashiftrt "ashr")])
    
-(define_insn "<vis3_shift_insn><vbits>_vis"
+(define_insn "v<vis3_shift_patname><mode>3"
   [(set (match_operand:V64N8 0 "register_operand" "=<vconstr>")
         (vis3_shift:V64N8 (match_operand:V64N8 1 "register_operand" "<vconstr>")
                           (match_operand:V64N8 2 "register_operand" "<vconstr>")))]
@@ -8490,8 +8523,10 @@
 (define_code_iterator vis3_addsub_ss [ss_plus ss_minus])
 (define_code_attr vis3_addsub_ss_insn
   [(ss_plus "fpadds") (ss_minus "fpsubs")])
+(define_code_attr vis3_addsub_ss_patname
+  [(ss_plus "ssadd") (ss_minus "sssub")])
 
-(define_insn "<vis3_addsub_ss_insn><vbits>_vis"
+(define_insn "<vis3_addsub_ss_patname><mode>"
   [(set (match_operand:VASS 0 "register_operand" "=<vconstr>")
         (vis3_addsub_ss:VASS (match_operand:VASS 1 "register_operand" "<vconstr>")
                              (match_operand:VASS 2 "register_operand" "<vconstr>")))]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VIS2 pattern review
  2011-10-13 22:46 ` David Miller
@ 2011-10-13 22:53   ` Richard Henderson
  2011-10-14  1:50     ` David Miller
  2011-10-14  4:38   ` Eric Botcazou
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Henderson @ 2011-10-13 22:53 UTC (permalink / raw)
  To: David Miller; +Cc: gcc

On 10/13/2011 12:55 PM, David Miller wrote:
> -(define_insn "<vis3_addsub_ss_insn><vbits>_vis"
> +(define_insn "<vis3_addsub_ss_patname><mode>"

Missing a "3" on the end.  Otherwise these look ok.

> Unfortunately, that would involve some ABI changes for the VIS
> builtins.  I'm trending towards considering just changing things
> anyways since the VIS intrinsics were next to unusable beforehand.

Why?  You can do just about anything you like inside the builtin
expander, including frobbing the modes around.

E.g. for x86, we can't create anything TImode at the user level,
and thus can't expose either TImode or V2TImode operands into the
builtins.  So we have the builtins use V2DI or V4DImode and do
some gen_lowpart frobbing while expanding the builtins to rtl.

I don't see why you couldn't accept DImode from the builtin and
transform it to V1DImode in the rtl.  Etc.

r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VIS2 pattern review
  2011-10-13 22:53   ` Richard Henderson
@ 2011-10-14  1:50     ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2011-10-14  1:50 UTC (permalink / raw)
  To: rth; +Cc: gcc

From: Richard Henderson <rth@redhat.com>
Date: Thu, 13 Oct 2011 13:06:19 -0700

> On 10/13/2011 12:55 PM, David Miller wrote:
>> -(define_insn "<vis3_addsub_ss_insn><vbits>_vis"
>> +(define_insn "<vis3_addsub_ss_patname><mode>"
> 
> Missing a "3" on the end.  Otherwise these look ok.

Thanks for finding that.

>> Unfortunately, that would involve some ABI changes for the VIS
>> builtins.  I'm trending towards considering just changing things
>> anyways since the VIS intrinsics were next to unusable beforehand.
> 
> Why?  You can do just about anything you like inside the builtin
> expander, including frobbing the modes around.

Hmmm, ok, I'll look into approaching the change that way.

Thanks again Richard.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VIS2 pattern review
  2011-10-13 22:46 ` David Miller
  2011-10-13 22:53   ` Richard Henderson
@ 2011-10-14  4:38   ` Eric Botcazou
  2011-10-14  6:06     ` David Miller
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Botcazou @ 2011-10-14  4:38 UTC (permalink / raw)
  To: David Miller; +Cc: gcc, rth

> Unfortunately, that would involve some ABI changes for the VIS
> builtins.  I'm trending towards considering just changing things
> anyways since the VIS intrinsics were next to unusable beforehand.

Could you elaborate?  The calling conventions for vectors (like for the other 
classes) shouldn't depend on the mode but only on the type.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VIS2 pattern review
  2011-10-14  4:38   ` Eric Botcazou
@ 2011-10-14  6:06     ` David Miller
  2011-10-14 16:40       ` Eric Botcazou
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2011-10-14  6:06 UTC (permalink / raw)
  To: ebotcazou; +Cc: gcc, rth

From: Eric Botcazou <ebotcazou@adacore.com>
Date: Fri, 14 Oct 2011 00:41:42 +0200

>> Unfortunately, that would involve some ABI changes for the VIS
>> builtins.  I'm trending towards considering just changing things
>> anyways since the VIS intrinsics were next to unusable beforehand.
> 
> Could you elaborate?  The calling conventions for vectors (like for the other 
> classes) shouldn't depend on the mode but only on the type.

Right and as Richard said I can munge the modes during expansion of
existing builtins when needed.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: VIS2 pattern review
  2011-10-14  6:06     ` David Miller
@ 2011-10-14 16:40       ` Eric Botcazou
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Botcazou @ 2011-10-14 16:40 UTC (permalink / raw)
  To: David Miller; +Cc: gcc, rth

> Right and as Richard said I can munge the modes during expansion of
> existing builtins when needed.

OK, but you precisely shouldn't need to do it since the type is fixed.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-10-14  6:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-13 12:29 VIS2 pattern review Richard Henderson
2011-10-13 19:56 ` David Miller
2011-10-13 20:06   ` David Miller
2011-10-13 20:20   ` Richard Henderson
2011-10-13 22:46 ` David Miller
2011-10-13 22:53   ` Richard Henderson
2011-10-14  1:50     ` David Miller
2011-10-14  4:38   ` Eric Botcazou
2011-10-14  6:06     ` David Miller
2011-10-14 16:40       ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).