From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-170652-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 3056 invoked by alias); 13 Oct 2011 18:26:59 -0000
Received: (qmail 3032 invoked by uid 22791); 13 Oct 2011 18:26:57 -0000
X-SWARE-Spam-Status: No, hits=-1.8 required=5.0	tests=AWL,BAYES_00,TO_NO_BRKTS_PCNT
X-Spam-Check-By: sourceware.org
Received: from shards.monkeyblade.net (HELO shards.monkeyblade.net) (198.137.202.13)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 13 Oct 2011 18:26:43 +0000
Received: from localhost (nat-pool-rdu.redhat.com [66.187.233.202])	(authenticated bits=0)	by shards.monkeyblade.net (8.14.4/8.14.4) with ESMTP id p9DIQbR3013909	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);	Thu, 13 Oct 2011 11:26:39 -0700
Date: Thu, 13 Oct 2011 19:56:00 -0000
Message-Id: <20111013.142636.1859659747859622111.davem@davemloft.net>
To: rth@redhat.com
Cc: gcc@gcc.gnu.org
Subject: Re: VIS2 pattern review
From: David Miller <davem@davemloft.net>
In-Reply-To: <4E96358F.30405@redhat.com>
References: <4E96358F.30405@redhat.com>
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2011-10/txt/msg00210.txt.bz2

From: Richard Henderson <rth@redhat.com>
Date: Wed, 12 Oct 2011 17:49:19 -0700

> There's a code sample 7-1 that illustrates a 16x16 multiply:
> 
> 	fmul8sux16 %f0, %f1, %f2
> 	fmul8ulx16 %f0, %f1, %f3
> 	fpadd16    %f2, %f3, %f4

Be wary of code examples that don't even assemble (even numbered
float registers are required here).

fmul8sux16 basically does, for each element:

	src1 = (rs1 >> 8) & 0xff;
	src2 = rs2 & 0xffff;

	product = src1 * src2;

	scaled = (product & 0x00ffff00) >> 8;
	if (product & 0x80)
		scaled++;

	rd = scaled & 0xffff;

fmul8ulx16 does the same except the assignment to src1 is:

	src1 = rs1 & 0xff;

Therefore, I think this "16 x 16 multiply" operation isn't the kind
you think it is, and it's therefore not appropriate to use this in the
compiler for vector multiplies.

Just for shits and grins I tried it and the slp-7 testcase, as expected,
fails.  The main multiply loop in that test case is compiled to:

	sethi   %hi(.LLC6), %i3
	sethi   %hi(in2), %g1
	ldd     [%i3+%lo(.LLC6)], %f22
	sethi   %hi(.LLC7), %i4
	sethi   %hi(.LLC8), %i2
	sethi   %hi(.LLC9), %i3
	add     %fp, -256, %g2
	ldd     [%i4+%lo(.LLC7)], %f20
	or      %g1, %lo(in2), %g1  
	ldd     [%i2+%lo(.LLC8)], %f18
	mov     %fp, %i5
	ldd     [%i3+%lo(.LLC9)], %f16
	mov     %g1, %g4
	mov     %g2, %g3
.LL10:
	ldd     [%g4+8], %f14
	ldd     [%g4+16], %f12
	fmul8sux16      %f14, %f22, %f26
	ldd     [%g4+24], %f10    
	fmul8ulx16      %f14, %f22, %f24
     	ldd     [%g4], %f8
	fmul8sux16      %f12, %f20, %f34
	fmul8ulx16      %f12, %f20, %f32
	fmul8sux16      %f10, %f18, %f30
	fpadd16 %f26, %f24, %f14
	fmul8ulx16      %f10, %f18, %f28
	fmul8sux16      %f8, %f16, %f26
	fmul8ulx16      %f8, %f16, %f24
	fpadd16 %f34, %f32, %f12
	std     %f14, [%g3+8]
	fpadd16 %f30, %f28, %f10
	std     %f12, [%g3+16]
     	fpadd16 %f26, %f24, %f8
	std     %f10, [%g3+24]
	std     %f8, [%g3]
	add     %g3, 32, %g3
	cmp     %g3, %i5
	bne,pt  %icc, .LL10
	 add    %g4, 32, %g4

and it simply gives the wrong results.

The entire out2[] array is all zeros.