From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-170642-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 30868 invoked by alias); 13 Oct 2011 00:49:42 -0000
Received: (qmail 30860 invoked by uid 22791); 13 Oct 2011 00:49:41 -0000
X-SWARE-Spam-Status: No, hits=-6.4 required=5.0	tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS,TW_BX,TW_CX,TW_DX,TW_VB,TW_XC,TW_XD,TW_XF
X-Spam-Check-By: sourceware.org
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 13 Oct 2011 00:49:22 +0000
Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p9D0nKSc010149	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK);	Wed, 12 Oct 2011 20:49:20 -0400
Received: from anchor.twiddle.net (vpn-237-55.phx2.redhat.com [10.3.237.55])	by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p9D0nJMS029331;	Wed, 12 Oct 2011 20:49:20 -0400
Message-ID: <4E96358F.30405@redhat.com>
Date: Thu, 13 Oct 2011 12:29:00 -0000
From: Richard Henderson <rth@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20110927 Thunderbird/7.0
MIME-Version: 1.0
To: David Miller <davem@davemloft.net>
CC: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Subject: VIS2 pattern review
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2011-10/txt/msg00200.txt.bz2

[ Using the UltraSparc Architecture, Draft D0.9.4, 27 Sep 2010.
  I believe this is the most recent public manual.  It covers
  VIS1 and VIS2 but not VIS3. ]

The comment for fpmerge_vis is not correct.
I believe that the operation is representable with

  (vec_select:V8QI
    (vec_concat:V8QI
      (match_operand:V4QI 1 ...)
      (match_operand:V4QI 2 ...)
    (parallel [
	0 4 1 5 2 6 3 7
	]))

which can be used as the basis for both of the

  vec_interleave_lowv8qi
  vec_interleave_highv8qi

named patterns.

------

> (define_insn "fmul8x16_vis"
>   [(set (match_operand:V4HI 0 "register_operand" "=e")
>         (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
>                    (match_operand:V4HI 2 "register_operand" "e")))]

This is invalid rtl.  You need

  (mult:V4HI
    (zero_extend:V4HI
      (match_operand:V4QI 1 ...))
    (match_operand:V4HI 2 ...))

> (define_insn "fmul8x16au_vis"
>   [(set (match_operand:V4HI 0 "register_operand" "=e")
>         (mult:V4HI (match_operand:V4QI 1 "register_operand" "f")
>                    (match_operand:V2HI 2 "register_operand" "f")))]

AFAICS, this needs an unspec, like fmul8x16al.
Similarly for fmul8sux16_vis, fmuld8sux16_vis,

There's a code sample 7-1 that illustrates a 16x16 multiply:

	fmul8sux16 %f0, %f1, %f2
	fmul8ulx16 %f0, %f1, %f3
	fpadd16    %f2, %f3, %f4

This expansion ought to be available via the "mulv4hi3" named pattern.

Similarly there's a 16x16 -> 32 multiply example:

	fmuld8sux16 %f0, %f1, %f2
	fmuld8ulx16 %f0, %f1, %f3
	fpadd32     %f2, %f3, %f4

that ought to be available via the "vec_widen_smult_{hi,lo}_v4hi"
named patterns.

------

The "movmisalign<mode>" named pattern ought be provided, utilizing the
alignaddr / faligndata insns.

------

The "vec_perm{,_const}v8qi" named patterns ought to be provided using
the bmask / bshuffle insns.

For vec_perm_constv8qi, the compaction of the input byte data to nibble
data, as input to bmask, can happen at compile-time.  For vec_permv8qi,
you'll need to do this at runtime:

Considering each character as a nibble (x = garbage, . = zero):

	i = input 			= xaxbxcxdxexfxgxh
	t1 = i  & 0x0f0f0f0f0f0f0f0f	= .a.b.c.d.e.f.g.h
	t2 = t1 >> 4			= ..a.b.c.d.e.f.g.
	t3 = t1 + t2			= ..abbccddeeffggh
	t4 = t3 & 0x00ff00ff00ff00ff    = ..ab..cd..ef..gh
	t5 = t4 >> 8			= ....ab..cd..ef..
	t6 = t4 + t5			= ..ababcdcdefefgh
	t7 = t6 & 0x0000ffff0000ffff	= ....abcd....efgh
	t8 = t7 >> 16			= ........abcd....
	t9 = t7 + t8			= ........abcdefgh

where that last addition can be performed by the bmask itself.

Dunno if you can come up with a more efficient sequence.  Indeed,
you may want two totally separate sequences depending on whether
the original input is in fp (vector) or integer registers.  Which
of course means delaying the expansion until reload.

------

The comment above cmask8<>_vis suggests an implementation of
the named "vcond<><>" patterns.

------

> (define_insn "fpadd64_vis"
>   [(set (match_operand:DI 0 "register_operand" "=e")
>         (plus:DI (match_operand:DI 1 "register_operand" "e")
>                  (match_operand:DI 2 "register_operand" "e")))]
>   "TARGET_VIS3"
>   "fpadd64\t%1, %2, %0")

This must be folded into the main "adddi3" pattern, like fpadd32s.
It's not recognizable otherwise.  Similarly fpsub64.  If these
patterns were earlier in the file you'd have noticed them breaking
the build.

------

> (define_code_iterator vis3_addsub_ss [ss_plus ss_minus])
> (define_code_attr vis3_addsub_ss_insn
>   [(ss_plus "fpadds") (ss_minus "fpsubs")])
> 
> (define_insn "<vis3_addsub_ss_insn><vbits>_vis"
>   [(set (match_operand:VASS 0 "register_operand" "=<vconstr>")
>         (vis3_addsub_ss:VASS (match_operand:VASS 1 "register_operand" "<vconstr>")
>                              (match_operand:VASS 2 "register_operand" "<vconstr>")))]
>   "TARGET_VIS3"
>   "<vis3_addsub_ss_insn><vbits>\t%1, %2, %0")

These should be exposed as "ssadd<mode>3" "sssub<mode>3".

Unfortunately, the compiler won't do anything with them yet,
but those are the canonical names for signed saturating add/sub,
and if you use those names we'll automatically use them properly
once the vectorizer is extended in the right directions.

------

Other missing vectorization patterns:

  vec_init
  vec_set
  vec_extract
  vec_extract_even
  vec_extract_odd
  vec_unpack{s,u}_{hi,lo}
  vec_unpack{s,u}_float_{hi,lo}
  vec_pack_{trunc,ssat,usat}
  vec_pack_{s,u}fix_trunc

The first three should be provided any time any vector operation
is supported, if at all possible.  Otherwise the compiler will
wind up dropping the data to memory to manipulate it.

The even/odd can be implemented with bshuffle.  We probably ought
to handle this in the middle-end by falling back to vec_perm*, 
but we currently don't.  PPC and SPU could be simplified with this.

The vec_pack_trunc pattern is essentially the same as even/odd,
with the right one selected by endianness.  That said, we still
don't fall back to another pattern.

The other patterns, I don't believe could be helped by the middle-end.
At least not yet.  I seem to recall we've been talking about some
generic representation of vector comparisons, which could be used to
aid middle-end expansions of vec_unpack via compares, zeros, and
vec_interleave.

I don't know how much VIS3 provides that could create specialized
versions of any of these.


Happy hacking,


r~