From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 18599 invoked by alias); 5 Jun 2013 20:24:32 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 18586 invoked by uid 89); 5 Jun 2013 20:24:31 -0000 X-Spam-SWARE-Status: No, score=-3.6 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,RCVD_IN_HOSTKARMA_W,RCVD_IN_HOSTKARMA_WL,TW_TX,TW_XV autolearn=no version=3.3.1 Received: from e9.ny.us.ibm.com (HELO e9.ny.us.ibm.com) (32.97.182.139) by sourceware.org (qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP; Wed, 05 Jun 2013 20:24:30 +0000 Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 5 Jun 2013 16:24:28 -0400 Received: from d01dlp02.pok.ibm.com (9.56.250.167) by e9.ny.us.ibm.com (192.168.1.109) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 5 Jun 2013 16:24:26 -0400 Received: from d01relay07.pok.ibm.com (d01relay07.pok.ibm.com [9.56.227.147]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id F05556E804C for ; Wed, 5 Jun 2013 16:24:21 -0400 (EDT) Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay07.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r55KOPhB65536244 for ; Wed, 5 Jun 2013 16:24:25 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r55KOOBd027606 for ; Wed, 5 Jun 2013 17:24:24 -0300 Received: from ibm-tiger.the-meissners.org (dhcp-9-32-77-206.usma.ibm.com [9.32.77.206]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r55KOLG3027354; Wed, 5 Jun 2013 17:24:22 -0300 Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 9FC0E4266A; Wed, 5 Jun 2013 16:24:20 -0400 (EDT) Date: Wed, 05 Jun 2013 20:24:00 -0000 From: Michael Meissner To: Segher Boessenkool Cc: Michael Meissner , David Edelsohn , GCC Patches , Pat Haugen , Peter Bergner Subject: Re: [PATCH, rs6000] power8 patches, patch #4 (revised), new power8 builtins Message-ID: <20130605202420.GA8002@ibm-tiger.the-meissners.org> Mail-Followup-To: Michael Meissner , Segher Boessenkool , David Edelsohn , GCC Patches , Pat Haugen , Peter Bergner References: <20130520204053.GA21090@ibm-tiger.the-meissners.org> <20130521234717.GA27879@ibm-tiger.the-meissners.org> <20130604184853.GA12768@ibm-tiger.the-meissners.org> <1A6C76BF-AB76-4471-9F80-462FC0EBCB60@kernel.crashing.org> <20130605160427.GA5774@ibm-tiger.the-meissners.org> <1389948D-B28E-45B4-ADB4-50E2DA258331@kernel.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1389948D-B28E-45B4-ADB4-50E2DA258331@kernel.crashing.org> User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13060520-7182-0000-0000-00000728CEA8 X-SW-Source: 2013-06/txt/msg00279.txt.bz2 On Wed, Jun 05, 2013 at 10:06:08PM +0200, Segher Boessenkool wrote: > >I also wonder whether it would be useful to have 32-bit do the > >vector logical > >ops in gprs as well. At the moment, the patches don't allow it > >(vector types > >must be done in the altivec/vsx registers, an TImode is done by > >splitting the > >operation into 4 separate categories). On the 64-bit side, having > >__int128_t > >passed in GPRs, means you want to avoid ping-ponging between the > >GPRs and VSX > >registers. In addition, the atomic quad word support (patch #7) > >has to run in > >GPRs, so we need add/subtract/logical to have versions that run in > >GPRs. > > It might work better if you added a mode V1TI for TI in vector > regs, and then used plain TI only for GPRs. It certainly will > make things a lot more regular; whether it actually works better, > I have no idea. > > The way you have things now, only after reload the vector patterns > are split to GPR patterns; much too late to do most optimisations > on it. On the other hand, deciding early what register set some > op should go to isn't too pleasant either; is it always the best > choice to use the vector regs when possible? It depends. For example consider: #ifndef TYPE #define TYPE __int128_t #endif TYPE a_and (TYPE p, TYPE q) { return p & q; } void p_and (TYPE *p, TYPE *q, TYPE *r) { *p = *q & *r; } In a_and, p and q are passed in GPRs, so you want to use the GPR based instructions. In p_and, it is simpler to do the instruction in the VSX registers. This is what my code from patch 4 generates: .L.a_and: and 3,3,5 and 4,4,6 blr .L.p_and: lxvd2x 12,0,4 lxvd2x 0,0,5 xxland 0,12,0 stxvd2x 0,0,3 blr Unfortunately when I added the TImode in VSX registers, I didn't notice this, and the current code generates: .L.a_and: addi 9,1,-16 std 3,0(9) std 4,8(9) ori 2,2,0 lxvd2x 12,0,9 std 5,0(9) std 6,8(9) ori 2,2,0 lxvd2x 0,0,9 xxland 0,12,0 stxvd2x 0,0,9 ori 2,2,0 ld 3,0(9) ld 4,8(9) blr .L.p_and: lxvd2x 12,0,4 lxvd2x 0,0,5 xxland 0,12,0 stxvd2x 0,0,3 blr Previous versions (and -mno-vsx-timode) generate: .L.a_and: and 3,3,5 and 4,4,6 blr .L.p_and: ld 10,0(4) ld 9,0(5) and 9,10,9 std 9,0(3) ld 10,8(4) ld 9,8(5) and 9,10,9 std 9,8(3) blr Note, that the scheduler does not interleave the loads and the and's, instead it does ld/ld/and/std. This bouncing back and forth will get somewhat worse when the support for doing 128int_t add/subtract in the vector registers is added. We don't want to hard wire doing all of TImode in vector registers, because this breaks the 8-byte atomic fetch_and_add functions (without having to use an UNSPEC to hide the add). -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797