From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10609 invoked by alias); 12 Oct 2009 16:39:06 -0000 Received: (qmail 10597 invoked by uid 22791); 12 Oct 2009 16:39:05 -0000 X-SWARE-Spam-Status: No, hits=-3.3 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from tx2ehsobe005.messaging.microsoft.com (HELO TX2EHSOBE010.bigfish.com) (65.55.88.15) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 12 Oct 2009 16:39:00 +0000 Received: from mail50-tx2-R.bigfish.com (10.9.14.252) by TX2EHSOBE010.bigfish.com (10.9.40.30) with Microsoft SMTP Server id 8.1.340.0; Mon, 12 Oct 2009 16:38:58 +0000 Received: from mail50-tx2 (localhost.localdomain [127.0.0.1]) by mail50-tx2-R.bigfish.com (Postfix) with ESMTP id 47C79B303E0; Mon, 12 Oct 2009 16:38:58 +0000 (UTC) X-SpamScore: -19 X-BigFish: VPS-19(z40baiz1432R4015L179dNzz1202hzzz32i2a8I6bh62h) X-Spam-TCS-SCL: 1:0 X-FB-SS: 5, Received: by mail50-tx2 (MessageSwitch) id 1255365531703876_27370; Mon, 12 Oct 2009 16:38:51 +0000 (UCT) Received: from TX2EHSMHS049.bigfish.com (unknown [10.9.14.251]) by mail50-tx2.bigfish.com (Postfix) with ESMTP id 903CA318051; Mon, 12 Oct 2009 16:38:51 +0000 (UTC) Received: from ausb3extmailp02.amd.com (163.181.251.22) by TX2EHSMHS049.bigfish.com (10.9.99.149) with Microsoft SMTP Server (TLS) id 14.0.482.32; Mon, 12 Oct 2009 16:38:49 +0000 Received: from ausb3twp02.amd.com ([163.181.250.38]) by ausb3extmailp02.amd.com (Switch-3.2.7/Switch-3.2.7) with ESMTP id n9CGcjh1024460; Mon, 12 Oct 2009 11:38:48 -0500 X-M-MSG: Received: from sausexhtp01.amd.com (sausexhtp01.amd.com [163.181.3.165]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by ausb3twp02.amd.com (Tumbleweed MailGate 3.7.0) with ESMTP id 2CFA9C8506; Mon, 12 Oct 2009 11:38:39 -0500 (CDT) Received: from sausexmbp02.amd.com ([163.181.3.191]) by sausexhtp01.amd.com ([163.181.3.165]) with mapi; Mon, 12 Oct 2009 09:38:43 -0700 From: "rajagopal, dwarak" To: "'Jan Hubicka'" , "gcc-patches@gcc.gnu.org" CC: "Harle, Christophe" , "Jagasia, Harsha" , "'Jan Hubicka'" Date: Mon, 12 Oct 2009 16:54:00 -0000 Subject: RE: PATCH: Add XOP 128-bit and 256-bit support for upcoming AMD Orochi processor. Message-ID: <1C8DE0332CB01445BF7ADEDE3DDD57071F74C4@sausexmbp02.amd.com> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Reverse-DNS: ausb3extmailp02.amd.com X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2009-10/txt/msg00762.txt.bz2 Hi Honza, I will be going to take over this patch from Harsha and wrap this up, so th= at I can check in this patch. It would be great if you could answer the comments below from Harsha so tha= t I will get more clarity. Thanks, Dwarak Hi Honza, I have gone through your feedback on the XOP patch and it makes sense and I= will fix as per your suggestions. However I am not entirely sure I understand the 2 comments below. Since Mik= e Meissner added these patterns, I mostly inherited them from him. =20 Can you please elaborate or provide some more guidance on what I need to do. (Please see my comments below) > +;; We don't have a straight 32-bit parallel multiply on XOP, so fake it = with a > +;; multiply/add. In general, we expect the define_split to occur before > +;; register allocation, so we have to handle the corner case where the t= arget > +;; is the same as one of the inputs. > +(define_insn_and_split "*xop_mulv4si3" > + [(set (match_operand:V4SI 0 "register_operand" "=3D&x") > + (mult:V4SI (match_operand:V4SI 1 "register_operand" "%x") > + (match_operand:V4SI 2 "nonimmediate_operand" "xm")))] > > + "TARGET_XOP" > > + "#" > > + "&& (reload_completed > > + || (!reg_mentioned_p (operands[0], operands[1]) > > + && !reg_mentioned_p (operands[0], operands[2])))" >=20 > WHat happens when regs are mentioned? > There are other cases 2 memory operand multiply-add splitting testing > these, are we somehow making sure this conditional will always hold and > we won't ICE not being able to satisfy the conditions? Actually I am not even sure this xop_mulv4si3 pattern is needed because XOP= now implies SSE 4.2 and AVX and so we can just generate the mulv4si3 patte= rns for AVX or SSE 4.1 when -mxop is used. Can I just remove this xop_mulv4= si3 pattern then? As for your reference to "other cases 2 memory operand multiply-add splitti= ng", I assume you are referring to the vpmac/d* define_splits. =20 In XOP vpmac/d* instructions, there is no restriction any more for the dest= ination reg to be same as the third src operand unlike SSE5. And only the s= econd source can be memory. Also I don't see anything in the manual that th= e destination reg is enforced to be different from source 1, source 2 or so= urce 3 operands individually either. Should I then remove below from the pmac/d* patterns? (!reg_mentioned_p (operands[0], operands[1]) > > + && !reg_mentioned_p (operands[0], operands[2])) >=20 > > +;; The following are for the various unpack insns which doesn't need > the first > > +;; source operand, so we can just use the output operand for the first > operand. > > +;; This allows either of the other two operands to be a memory operand. > We > > +;; can't just use the first operand as an argument to the normal pperm > because > > +;; then an output only argument, suddenly becomes an input operand. > > +(define_insn "xop_pperm_zero_v16qi_v8hi" > > + [(set (match_operand:V8HI 0 "register_operand" "=3Dx,x") > > + (zero_extend:V8HI > > + (vec_select:V8QI > > + (match_operand:V16QI 1 "nonimmediate_operand" "xm,x") > > + (match_operand 2 "" "")))) ;; parallel with const_int's > > + (use (match_operand:V16QI 3 "nonimmediate_operand" "x,xm"))] >=20 > Hmm, there is no unspec or omething that would make it clear that we can > not ever somehow simplify into this form with operand 2 being something > different than parallel with const_ints. I think this needs new > predicate. I can define a new predicate for it in predicates.md, but I am not sure how= exactly to represent the "parallel with const ints" part. Any suggestions? Thanks, Harsha