From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3056 invoked by alias); 13 Oct 2011 18:26:59 -0000 Received: (qmail 3032 invoked by uid 22791); 13 Oct 2011 18:26:57 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,TO_NO_BRKTS_PCNT X-Spam-Check-By: sourceware.org Received: from shards.monkeyblade.net (HELO shards.monkeyblade.net) (198.137.202.13) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 13 Oct 2011 18:26:43 +0000 Received: from localhost (nat-pool-rdu.redhat.com [66.187.233.202]) (authenticated bits=0) by shards.monkeyblade.net (8.14.4/8.14.4) with ESMTP id p9DIQbR3013909 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 13 Oct 2011 11:26:39 -0700 Date: Thu, 13 Oct 2011 19:56:00 -0000 Message-Id: <20111013.142636.1859659747859622111.davem@davemloft.net> To: rth@redhat.com Cc: gcc@gcc.gnu.org Subject: Re: VIS2 pattern review From: David Miller In-Reply-To: <4E96358F.30405@redhat.com> References: <4E96358F.30405@redhat.com> Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2011-10/txt/msg00210.txt.bz2 From: Richard Henderson Date: Wed, 12 Oct 2011 17:49:19 -0700 > There's a code sample 7-1 that illustrates a 16x16 multiply: > > fmul8sux16 %f0, %f1, %f2 > fmul8ulx16 %f0, %f1, %f3 > fpadd16 %f2, %f3, %f4 Be wary of code examples that don't even assemble (even numbered float registers are required here). fmul8sux16 basically does, for each element: src1 = (rs1 >> 8) & 0xff; src2 = rs2 & 0xffff; product = src1 * src2; scaled = (product & 0x00ffff00) >> 8; if (product & 0x80) scaled++; rd = scaled & 0xffff; fmul8ulx16 does the same except the assignment to src1 is: src1 = rs1 & 0xff; Therefore, I think this "16 x 16 multiply" operation isn't the kind you think it is, and it's therefore not appropriate to use this in the compiler for vector multiplies. Just for shits and grins I tried it and the slp-7 testcase, as expected, fails. The main multiply loop in that test case is compiled to: sethi %hi(.LLC6), %i3 sethi %hi(in2), %g1 ldd [%i3+%lo(.LLC6)], %f22 sethi %hi(.LLC7), %i4 sethi %hi(.LLC8), %i2 sethi %hi(.LLC9), %i3 add %fp, -256, %g2 ldd [%i4+%lo(.LLC7)], %f20 or %g1, %lo(in2), %g1 ldd [%i2+%lo(.LLC8)], %f18 mov %fp, %i5 ldd [%i3+%lo(.LLC9)], %f16 mov %g1, %g4 mov %g2, %g3 .LL10: ldd [%g4+8], %f14 ldd [%g4+16], %f12 fmul8sux16 %f14, %f22, %f26 ldd [%g4+24], %f10 fmul8ulx16 %f14, %f22, %f24 ldd [%g4], %f8 fmul8sux16 %f12, %f20, %f34 fmul8ulx16 %f12, %f20, %f32 fmul8sux16 %f10, %f18, %f30 fpadd16 %f26, %f24, %f14 fmul8ulx16 %f10, %f18, %f28 fmul8sux16 %f8, %f16, %f26 fmul8ulx16 %f8, %f16, %f24 fpadd16 %f34, %f32, %f12 std %f14, [%g3+8] fpadd16 %f30, %f28, %f10 std %f12, [%g3+16] fpadd16 %f26, %f24, %f8 std %f10, [%g3+24] std %f8, [%g3] add %g3, 32, %g3 cmp %g3, %i5 bne,pt %icc, .LL10 add %g4, 32, %g4 and it simply gives the wrong results. The entire out2[] array is all zeros.