From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30868 invoked by alias); 13 Oct 2011 00:49:42 -0000 Received: (qmail 30860 invoked by uid 22791); 13 Oct 2011 00:49:41 -0000 X-SWARE-Spam-Status: No, hits=-6.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD,SPF_HELO_PASS,TW_BX,TW_CX,TW_DX,TW_VB,TW_XC,TW_XD,TW_XF X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 13 Oct 2011 00:49:22 +0000 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p9D0nKSc010149 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 12 Oct 2011 20:49:20 -0400 Received: from anchor.twiddle.net (vpn-237-55.phx2.redhat.com [10.3.237.55]) by int-mx02.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id p9D0nJMS029331; Wed, 12 Oct 2011 20:49:20 -0400 Message-ID: <4E96358F.30405@redhat.com> Date: Thu, 13 Oct 2011 12:29:00 -0000 From: Richard Henderson User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:7.0) Gecko/20110927 Thunderbird/7.0 MIME-Version: 1.0 To: David Miller CC: "gcc@gcc.gnu.org" Subject: VIS2 pattern review Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-10/txt/msg00200.txt.bz2 [ Using the UltraSparc Architecture, Draft D0.9.4, 27 Sep 2010. I believe this is the most recent public manual. It covers VIS1 and VIS2 but not VIS3. ] The comment for fpmerge_vis is not correct. I believe that the operation is representable with (vec_select:V8QI (vec_concat:V8QI (match_operand:V4QI 1 ...) (match_operand:V4QI 2 ...) (parallel [ 0 4 1 5 2 6 3 7 ])) which can be used as the basis for both of the vec_interleave_lowv8qi vec_interleave_highv8qi named patterns. ------ > (define_insn "fmul8x16_vis" > [(set (match_operand:V4HI 0 "register_operand" "=e") > (mult:V4HI (match_operand:V4QI 1 "register_operand" "f") > (match_operand:V4HI 2 "register_operand" "e")))] This is invalid rtl. You need (mult:V4HI (zero_extend:V4HI (match_operand:V4QI 1 ...)) (match_operand:V4HI 2 ...)) > (define_insn "fmul8x16au_vis" > [(set (match_operand:V4HI 0 "register_operand" "=e") > (mult:V4HI (match_operand:V4QI 1 "register_operand" "f") > (match_operand:V2HI 2 "register_operand" "f")))] AFAICS, this needs an unspec, like fmul8x16al. Similarly for fmul8sux16_vis, fmuld8sux16_vis, There's a code sample 7-1 that illustrates a 16x16 multiply: fmul8sux16 %f0, %f1, %f2 fmul8ulx16 %f0, %f1, %f3 fpadd16 %f2, %f3, %f4 This expansion ought to be available via the "mulv4hi3" named pattern. Similarly there's a 16x16 -> 32 multiply example: fmuld8sux16 %f0, %f1, %f2 fmuld8ulx16 %f0, %f1, %f3 fpadd32 %f2, %f3, %f4 that ought to be available via the "vec_widen_smult_{hi,lo}_v4hi" named patterns. ------ The "movmisalign" named pattern ought be provided, utilizing the alignaddr / faligndata insns. ------ The "vec_perm{,_const}v8qi" named patterns ought to be provided using the bmask / bshuffle insns. For vec_perm_constv8qi, the compaction of the input byte data to nibble data, as input to bmask, can happen at compile-time. For vec_permv8qi, you'll need to do this at runtime: Considering each character as a nibble (x = garbage, . = zero): i = input = xaxbxcxdxexfxgxh t1 = i & 0x0f0f0f0f0f0f0f0f = .a.b.c.d.e.f.g.h t2 = t1 >> 4 = ..a.b.c.d.e.f.g. t3 = t1 + t2 = ..abbccddeeffggh t4 = t3 & 0x00ff00ff00ff00ff = ..ab..cd..ef..gh t5 = t4 >> 8 = ....ab..cd..ef.. t6 = t4 + t5 = ..ababcdcdefefgh t7 = t6 & 0x0000ffff0000ffff = ....abcd....efgh t8 = t7 >> 16 = ........abcd.... t9 = t7 + t8 = ........abcdefgh where that last addition can be performed by the bmask itself. Dunno if you can come up with a more efficient sequence. Indeed, you may want two totally separate sequences depending on whether the original input is in fp (vector) or integer registers. Which of course means delaying the expansion until reload. ------ The comment above cmask8<>_vis suggests an implementation of the named "vcond<><>" patterns. ------ > (define_insn "fpadd64_vis" > [(set (match_operand:DI 0 "register_operand" "=e") > (plus:DI (match_operand:DI 1 "register_operand" "e") > (match_operand:DI 2 "register_operand" "e")))] > "TARGET_VIS3" > "fpadd64\t%1, %2, %0") This must be folded into the main "adddi3" pattern, like fpadd32s. It's not recognizable otherwise. Similarly fpsub64. If these patterns were earlier in the file you'd have noticed them breaking the build. ------ > (define_code_iterator vis3_addsub_ss [ss_plus ss_minus]) > (define_code_attr vis3_addsub_ss_insn > [(ss_plus "fpadds") (ss_minus "fpsubs")]) > > (define_insn "_vis" > [(set (match_operand:VASS 0 "register_operand" "=") > (vis3_addsub_ss:VASS (match_operand:VASS 1 "register_operand" "") > (match_operand:VASS 2 "register_operand" "")))] > "TARGET_VIS3" > "\t%1, %2, %0") These should be exposed as "ssadd3" "sssub3". Unfortunately, the compiler won't do anything with them yet, but those are the canonical names for signed saturating add/sub, and if you use those names we'll automatically use them properly once the vectorizer is extended in the right directions. ------ Other missing vectorization patterns: vec_init vec_set vec_extract vec_extract_even vec_extract_odd vec_unpack{s,u}_{hi,lo} vec_unpack{s,u}_float_{hi,lo} vec_pack_{trunc,ssat,usat} vec_pack_{s,u}fix_trunc The first three should be provided any time any vector operation is supported, if at all possible. Otherwise the compiler will wind up dropping the data to memory to manipulate it. The even/odd can be implemented with bshuffle. We probably ought to handle this in the middle-end by falling back to vec_perm*, but we currently don't. PPC and SPU could be simplified with this. The vec_pack_trunc pattern is essentially the same as even/odd, with the right one selected by endianness. That said, we still don't fall back to another pattern. The other patterns, I don't believe could be helped by the middle-end. At least not yet. I seem to recall we've been talking about some generic representation of vector comparisons, which could be used to aid middle-end expansions of vec_unpack via compares, zeros, and vec_interleave. I don't know how much VIS3 provides that could create specialized versions of any of these. Happy hacking, r~