From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-423152-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 129321 invoked by alias); 11 Mar 2016 22:42:05 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 129309 invoked by uid 89); 11 Mar 2016 22:42:05 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=1.5 required=5.0 tests=AWL,BAYES_00,KAM_ASCII_DIVIDERS,KAM_LAZY_DOMAIN_SECURITY,TO_NO_BRKTS_PCNT autolearn=no version=3.3.2 spammy=H*u:1.5.20, H*UA:1.5.20, 031, 206
X-HELO: e32.co.us.ibm.com
Received: from e32.co.us.ibm.com (HELO e32.co.us.ibm.com) (32.97.110.150) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Fri, 11 Mar 2016 22:41:55 +0000
Received: from localhost	by e32.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <meissner@ibm-tiger.the-meissners.org>;	Fri, 11 Mar 2016 15:41:53 -0700
Received: from d03dlp01.boulder.ibm.com (9.17.202.177)	by e32.co.us.ibm.com (192.168.1.132) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Fri, 11 Mar 2016 15:41:51 -0700
X-IBM-Helo: d03dlp01.boulder.ibm.com
X-IBM-MailFrom: meissner@ibm-tiger.the-meissners.org
X-IBM-RcptTo: gcc-patches@gcc.gnu.org
Received: from b03cxnp08027.gho.boulder.ibm.com (b03cxnp08027.gho.boulder.ibm.com [9.17.130.19])	by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id A37311FF0021	for <gcc-patches@gcc.gnu.org>; Fri, 11 Mar 2016 15:29:59 -0700 (MST)
Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169])	by b03cxnp08027.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id u2BMfob630539996	for <gcc-patches@gcc.gnu.org>; Fri, 11 Mar 2016 15:41:50 -0700
Received: from d03av03.boulder.ibm.com (localhost [127.0.0.1])	by d03av03.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id u2BMfofj017345	for <gcc-patches@gcc.gnu.org>; Fri, 11 Mar 2016 15:41:50 -0700
Received: from ibm-tiger.the-meissners.org (dhcp-9-32-77-111.usma.ibm.com [9.32.77.111])	by d03av03.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id u2BMfnru017313;	Fri, 11 Mar 2016 15:41:49 -0700
Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500)	id 51DC543A1F; Fri, 11 Mar 2016 17:41:49 -0500 (EST)
Date: Fri, 11 Mar 2016 22:42:00 -0000
From: Michael Meissner <meissner@linux.vnet.ibm.com>
To: gcc-patches@gcc.gnu.org, dje.gcc@gmail.com
Subject: [PATCH], Fix PR 70131, disable (double)(int) optimization for power8
Message-ID: <20160311224148.GA31239@ibm-tiger.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.vnet.ibm.com>,	gcc-patches@gcc.gnu.org, dje.gcc@gmail.com
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="IJpNTDwzlM2Ie8A6"
Content-Disposition: inline
User-Agent: Mutt/1.5.20 (2009-12-10)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 16031122-0005-0000-0000-00001D753225
X-IsSubscribed: yes
X-SW-Source: 2016-03/txt/msg00724.txt.bz2


--IJpNTDwzlM2Ie8A6
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-length: 2356

As I was auditing rs6000.md for power9 changes, I noticed that changes I had
made in 2010 for power7 weren't as effective with power8.

The FCTIWZ/FCTIWUZ instructions convert the scalar floating point value to a
32-bit signed/unsigned integer in bits 32-63 of the floating point or vector
register.  Unfortunately, the hardware does not guarantee that bits 0-31 are
copies of the sign, so that it can be used as a valid 64-bit integer.  There is
no conversion from 32-bit int to floating point.  This meant in the power7
days, if you wanted to round a floating point value to 32-bit integer, you
would need to do:

	convert to 32-bit integer
	store 32-bit value on the stack
	load 32-bit value to a GPR
	sign/zero extend it
	store 32-bit value to the stack
	load 32-bit value to a FPR/vector register.

The optimization does a store/load to sign/zero extend, rather than going
through the GPRs.

On power8, we have a direct move instruction that copies the value between the
register sets, and the compiler will generate this if the above optimization is
turned off (which is what this patch does).

There are other ways to sign/zero extend a value in the vector registers
without doing a move using multiple instructions, but in practice direct move
seems to be as fast as the other instructions.

I bootstrapped the compiler and there were no regressions with this patch.

I rebuilt the Spec 2006 benchmark suite, and there 7 of the benchmarks that
used this sequence somewhere in the code.  I ran those benchmarks with this
patch, and compared them to the original benchmarks.  In 6 of the benchmarks,
the run time was almost precisely the same.  The 416.gamess benchmark was about
2% faster, and there were no regressions.

Is this patch ok to apply to the trunk?  I would like to apply it to the gcc 5
branch as well.  Is this ok also?

[gcc]
2016-03-11  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/70131
	* config/rs6000/rs6000.md (round32<mode>2_fprs): Do not do the
	optimization if we have direct move.
	(roundu32<mode>2_fprs): Likewise.

[gcc/testsuite]
2016-03-11  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/70131
	* gcc.target/powerpc/ppc-round2.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

--IJpNTDwzlM2Ie8A6
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="pr70131.patch01b"
Content-length: 3542

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 234147)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -5387,10 +5387,12 @@ (define_insn "*friz"
    xsrdpiz %x0,%x1"
   [(set_attr "type" "fp")])
 
-;; Since FCTIWZ doesn't sign extend the upper bits, we have to do a store and a
-;; load to properly sign extend the value, but at least doing a store, load
-;; into a GPR to sign extend, a store from the GPR and a load back into the FPR
-;; if we have 32-bit memory ops
+;; Opitmize converting SF/DFmode to signed SImode and back to SF/DFmode.  This
+;; optimization prevents on ISA 2.06 systems and earlier having to store the
+;; value from the FPR/vector unit to the stack, load the value into a GPR, sign
+;; extend it, store it back on the stack from the GPR, load it back into the
+;; FP/vector unit to do the rounding. If we have direct move (ISA 2.07),
+;; disable using store and load to sign/zero extend the value.
 (define_insn_and_split "*round32<mode>2_fprs"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d")
 	(float:SFDF
@@ -5399,7 +5401,7 @@ (define_insn_and_split "*round32<mode>2_
    (clobber (match_scratch:DI 3 "=d"))]
   "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
    && <SI_CONVERT_FP> && TARGET_LFIWAX && TARGET_STFIWX && TARGET_FCFID
-   && can_create_pseudo_p ()"
+   && !TARGET_DIRECT_MOVE && can_create_pseudo_p ()"
   "#"
   ""
   [(pc)]
@@ -5431,7 +5433,7 @@ (define_insn_and_split "*roundu32<mode>2
    (clobber (match_scratch:DI 2 "=d"))
    (clobber (match_scratch:DI 3 "=d"))]
   "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_DOUBLE_FLOAT
-   && TARGET_LFIWZX && TARGET_STFIWX && TARGET_FCFIDU
+   && TARGET_LFIWZX && TARGET_STFIWX && TARGET_FCFIDU && !TARGET_DIRECT_MOVE
    && can_create_pseudo_p ()"
   "#"
   ""
Index: gcc/testsuite/gcc.target/powerpc/ppc-round2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc-round2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ppc-round2.c	(working copy)
@@ -0,0 +1,42 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
+/* { dg-options "-O2 -mcpu=power8" } */
+/* { dg-final { scan-assembler-times "fcfid "      2 } } */
+/* { dg-final { scan-assembler-times "fcfids "     2 } } */
+/* { dg-final { scan-assembler-times "fctiwuz "    2 } } */
+/* { dg-final { scan-assembler-times "fctiwz "     2 } } */
+/* { dg-final { scan-assembler-times "mfvsrd "     4 } } */
+/* { dg-final { scan-assembler-times "mtvsrwa "    2 } } */
+/* { dg-final { scan-assembler-times "mtvsrwz "    2 } } */
+/* { dg-final { scan-assembler-not   "lwz"           } } */
+/* { dg-final { scan-assembler-not   "lfiwax "       } } */
+/* { dg-final { scan-assembler-not   "lfiwzx "       } } */
+/* { dg-final { scan-assembler-not   "stw"           } } */
+/* { dg-final { scan-assembler-not   "stfiwx "       } } */
+
+/* Make sure we don't have loads/stores to the GPR unit.  */
+double
+round_double_int (double a)
+{
+  return (double)(int)a;
+}
+
+float
+round_float_int (float a)
+{
+  return (float)(int)a;
+}
+
+double
+round_double_uint (double a)
+{
+  return (double)(unsigned int)a;
+}
+
+float
+round_float_uint (float a)
+{
+  return (float)(unsigned int)a;
+}

--IJpNTDwzlM2Ie8A6--