[PATCH], PR target/80510, Optimize 32-bit offsettable memory references on power7/power8

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Michael Meissner <meissner@linux.vnet.ibm.com>
To: GCC Patches <gcc-patches@gcc.gnu.org>,
	       Segher Boessenkool <segher@kernel.crashing.org>,
	       David Edelsohn <dje.gcc@gmail.com>,
	       Bill Schmidt <wschmidt@linux.vnet.ibm.com>
Subject: [PATCH], PR target/80510, Optimize 32-bit offsettable memory references on power7/power8
Date: Thu, 22 Jun 2017 22:55:00 -0000	[thread overview]
Message-ID: <20170622225452.GA7801@ibm-tiger.the-meissners.org> (raw)

[-- Attachment #1: Type: text/plain, Size: 2342 bytes --]

Andreas Schwab noticed that the two tests for PR 80510 failed on 32-bit systems
due to long being only a 32-bit type.

Yesterday, I committed this patch to disable the test for 32-bit:
https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01607.html

This patch implements the necessary move and peephole support for 32-bit ISA
2.05/2.06 (power7/power8) targets, so that the compiler can optimize:

	load FPR, <offsettable memory>		move FPR, ALTIVEC
	move ALTIVEC, FPR			store FPR, <offsettable memory>

into:

	ADDI GPR, <offsettable address>		ADDI GPR, <offsettable address>
	load ALTIVEC, GPR			store ALTIVEC, GPR

I tested it on two systems: 1) a big endian power7 system that has the 32-bit
libraries installed, and 2) a little endian power8 system.  On both systems, it
bootstrapped and passed make check.  I did verify on the power7 system that it
ran the two tests for the functionality correctly.

FWIW, I built two 32-bit versions of Spec 2006, using the compiler without and
with the changes installed.  Unlike 64-bit, I don't see any code changes as a
result of this optimization, and all 30 spec benchmarks built correctly.

However, the tests show that it will generate the instructions in some cases,
but it is evidently not currently triggered.

Can I install this into the trunk and after a burn in period, install it on the
GCC 7 and GCC 6 branches (the previous patch for 64-bit is already installed on
both branches)?  If desired, I can make sure it gets into 6.4, or I can wait to
install the patch until after 6.4 ships.

[gcc]
2017-06-22  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/80510
	* config/rs6000/rs6000.md (ALTIVEC_DFORM): Do not allow DImode in
	32-bit, since indexed is not valid for DImode.
	(mov<mode>_hardfloat32): Reorder ISA 2.07 load/stores before ISA
	3.0 d-form load/stores to be the same as mov<mode>_hardfloat64.
	(define_peephole2 for Altivec d-form load): Add 32-bit support.
	(define_peephole2 for Altivec d-form store): Likewise.

[gcc/testsuite]
2017-06-22  Michael Meissner  <meissner@linux.vnet.ibm.com>

	PR target/80510
	* gcc.target/powerpc/pr80510-1.c: Allow test to run on 32-bit.
	* gcc.target/powerpc/pr80510-2.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

[-- Attachment #2: pr80510.patch04b --]
[-- Type: text/plain, Size: 5532 bytes --]

Index: gcc/config/rs6000/rs6000.md
===================================================================
--- gcc/config/rs6000/rs6000.md	(revision 249488)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -690,7 +690,9 @@ (define_code_attr     SMINMAX	[(smin "SM
 ;; Iterator to optimize the following cases:
 ;;	D-form load to FPR register & move to Altivec register
 ;;	Move Altivec register to FPR register and store
-(define_mode_iterator ALTIVEC_DFORM [DI DF SF])
+(define_mode_iterator ALTIVEC_DFORM [DF
+				     SF
+				     (DI "TARGET_POWERPC64")])
 
 \f
 ;; Start with fixed-point load and store insns.  Here we put only the more
@@ -7391,8 +7393,8 @@ (define_split
 ;; except for 0.0 which can be created on VSX with an xor instruction.
 
 (define_insn "*mov<mode>_hardfloat32"
-  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,<f64_av>,Z,<f64_p9>,wY,<f64_vsx>,<f64_vsx>,!r,Y,r,!r")
-	(match_operand:FMOVE64 1 "input_operand" "d,m,d,Z,<f64_av>,wY,<f64_p9>,<f64_vsx>,<zero_fp>,<zero_fp>,r,Y,r"))]
+  [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m,d,d,<f64_p9>,wY,<f64_av>,Z,<f64_vsx>,<f64_vsx>,!r,Y,r,!r")
+	(match_operand:FMOVE64 1 "input_operand" "d,m,d,wY,<f64_p9>,Z,<f64_av>,<f64_vsx>,<zero_fp>,<zero_fp>,r,Y,r"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT 
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -7400,10 +7402,10 @@ (define_insn "*mov<mode>_hardfloat32"
    stfd%U0%X0 %1,%0
    lfd%U1%X1 %0,%1
    fmr %0,%1
-   lxsd%U1x %x0,%y1
-   stxsd%U0x %x1,%y0
    lxsd %0,%1
    stxsd %1,%0
+   lxsd%U1x %x0,%y1
+   stxsd%U0x %x1,%y0
    xxlor %x0,%x1,%x1
    xxlxor %x0,%x0,%x0
    #
@@ -13967,13 +13969,13 @@ (define_insn "*fusion_p9_<mode>_constant
 ;;	LXSDX 32,3,9
 
 (define_peephole2
-  [(match_scratch:DI 0 "b")
+  [(match_scratch:P 0 "b")
    (set (match_operand:ALTIVEC_DFORM 1 "fpr_reg_operand")
 	(match_operand:ALTIVEC_DFORM 2 "simple_offsettable_mem_operand"))
    (set (match_operand:ALTIVEC_DFORM 3 "altivec_register_operand")
 	(match_dup 1))]
-  "TARGET_VSX && TARGET_POWERPC64 && TARGET_UPPER_REGS_<MODE>
-   && !TARGET_P9_DFORM_SCALAR && peep2_reg_dead_p (2, operands[1])"
+  "TARGET_VSX && TARGET_UPPER_REGS_<MODE> && !TARGET_P9_DFORM_SCALAR
+   && peep2_reg_dead_p (2, operands[1])"
   [(set (match_dup 0)
 	(match_dup 4))
    (set (match_dup 3)
@@ -13988,7 +13990,7 @@ (define_peephole2
   add_op0 = XEXP (addr, 0);
   add_op1 = XEXP (addr, 1);
   gcc_assert (REG_P (add_op0));
-  new_addr = gen_rtx_PLUS (DImode, add_op0, tmp_reg);
+  new_addr = gen_rtx_PLUS (Pmode, add_op0, tmp_reg);
 
   operands[4] = add_op1;
   operands[5] = change_address (mem, <MODE>mode, new_addr);
@@ -14004,13 +14006,13 @@ (define_peephole2
 ;;	STXSDX 32,3,9
 
 (define_peephole2
-  [(match_scratch:DI 0 "b")
+  [(match_scratch:P 0 "b")
    (set (match_operand:ALTIVEC_DFORM 1 "fpr_reg_operand")
 	(match_operand:ALTIVEC_DFORM 2 "altivec_register_operand"))
    (set (match_operand:ALTIVEC_DFORM 3 "simple_offsettable_mem_operand")
 	(match_dup 1))]
-  "TARGET_VSX && TARGET_POWERPC64 && TARGET_UPPER_REGS_<MODE>
-   && !TARGET_P9_DFORM_SCALAR && peep2_reg_dead_p (2, operands[1])"
+  "TARGET_VSX && TARGET_UPPER_REGS_<MODE> && !TARGET_P9_DFORM_SCALAR
+   && peep2_reg_dead_p (2, operands[1])"
   [(set (match_dup 0)
 	(match_dup 4))
    (set (match_dup 5)
@@ -14025,7 +14027,7 @@ (define_peephole2
   add_op0 = XEXP (addr, 0);
   add_op1 = XEXP (addr, 1);
   gcc_assert (REG_P (add_op0));
-  new_addr = gen_rtx_PLUS (DImode, add_op0, tmp_reg);
+  new_addr = gen_rtx_PLUS (Pmode, add_op0, tmp_reg);
 
   operands[4] = add_op1;
   operands[5] = change_address (mem, <MODE>mode, new_addr);
Index: gcc/testsuite/gcc.target/powerpc/pr80510-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr80510-1.c	(revision 249488)
+++ gcc/testsuite/gcc.target/powerpc/pr80510-1.c	(working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
@@ -6,9 +6,7 @@
 
 /* Make sure that STXSDX is generated for double scalars in Altivec registers
    on power7 instead of moving the value to a FPR register and doing a X-FORM
-   store.
-
-   32-bit currently does not have support for STXSDX in the mov{df,dd} patterns.  */
+   store.  */
 
 #ifndef TYPE
 #define TYPE double
Index: gcc/testsuite/gcc.target/powerpc/pr80510-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/pr80510-2.c	(revision 249488)
+++ gcc/testsuite/gcc.target/powerpc/pr80510-2.c	(working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power8" } } */
@@ -6,9 +6,7 @@
 
 /* Make sure that STXSSPX is generated for float scalars in Altivec registers
    on power7 instead of moving the value to a FPR register and doing a X-FORM
-   store.
-
-   32-bit currently does not have support for STXSSPX in the mov{sf,sd} patterns.  */
+   store.  */
 
 #ifndef TYPE
 #define TYPE float

next             reply	other threads:[~2017-06-22 22:55 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-22 22:55 Michael Meissner [this message]
2017-06-23 17:19 ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170622225452.GA7801@ibm-tiger.the-meissners.org \
    --to=meissner@linux.vnet.ibm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=segher@kernel.crashing.org \
    --cc=wschmidt@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).