public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Power10 PCREL_OPT support
@ 2020-08-18  6:31 Michael Meissner
  2020-08-18  6:34 ` [PATCH 1/3] Power10: Add PCREL_OPT load support Michael Meissner
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Michael Meissner @ 2020-08-18  6:31 UTC (permalink / raw)
  To: gcc-patches, Segher Boessenkool, David Edelsohn,
	Michael Meissner, Bill Schmidt, Peter Bergner, Alan Modra

The ELF-v2 ISA 3.1 support for Power10 has relocations to optimize cases where
the code is references an external variable in only one location.  This patch
is similar to the optimizations that the linker already does to optimize TOC
accesses.

I will be submitting 3 patches as follow-ups to this message:

    * The first patch adds support for PCREL_OPT loads;
    * The second patch adds support for PCREL_OPT stores; (and)
    * The third patch adds the tests.

If the program is compiled to be the main program, and the variable is defined
in the main program, these relocations will convert loading up the address of
the external variable and then doing a load or store using that address to be
doing the prefixed load or store directly and converting the second instruction
into a NOP.

For example, consider the following program:

	extern int ext_variable;

	int ret_var (void)
	{
	  return ext_variable;
	}

	void store_var (int i)
	{
	  ext_variable = i;
	}

Currently on power10, the compiler compiles this as:

	ret_var:
	        pld 9,ext_variable@got@pcrel
		lwa 3,0(9)
	        blr

	store_var:
		pld 9,ext_variable@got@pcrel
		stw 3,0(9)
		blr

That is, it loads up the address of 'ext_variable' from the GOT table into
register r9, and then uses r9 as a base register to reference the actual
variable.

The linker does optimize the case where you are compiling the main program, and
the variable is also defined in the main program to be:

	ret_var:
		pla	9,ext_variable,1
		lwa	3,0(9)
		blr

	store_var:
		pla	9,ext_variable,1
		stw	3,0(9)
		blr

These patches generate:

	ret_var:
	        pld	9,ext_variable@got@pcrel
	.Lpcrel1:
		.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
	        lwa	3,0(9)
		blr

	store_var:
	        pld	9,ext_variable@got@pcrel
	.Lpcrel2:
		.reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
	        stw	3,0(9)
		blr

Note, the label for locating the PLD occurs after the PLD and not before it.
This is so that if the assembler adds a NOP in front of the PLD to align it,
the relocations will still work.

If the linker can, it will convert the code into:

	ret_var:
		plwa	3,ext_variable,1
		nop
		blr

	store_var:
		pstw	3,ext_variable,1
		nop
		blr

These patches allow the load of the address to not be physically adjacent to
the actual load or store, which should allow for better code.

For loads, there must no references to the register that is being loaded
between the PLD and the actual load.

For stores, it becomes a little trickier, in that the register being stored
must be live at the time the PLD instruction is done, and it must continue to
be live and unmodified between the PLD and the store.

For both loads and stores, there must be only one reference to the address
being loaded into a base register, and that base register must die at the point
of the load/store.

In order to do this, the pass that converts the load address and load/store
must occur late in the compilation cycle.  In particular, the second scheduler
pass will duplicate and optimize some of the references and it will produce an
invalid program.  In the past, Segher has said that we should be able to move
it earlier.  I have my doubts whether that is feasible.  What I would like to
do is put these patches into GCC 11, which will enable many of the cases that
we want to optimize.

Then somebody else can take a swing at doing the optimization to allow the code
to do this optimization earlier.  That way, even if we can't get the super
optimized code to work, we at least will get the majority of cases to work.

For reference, here is what the current compiler generates for a medium code
model system targeting power9 with the TOC support:

	        .section        ".toc","aw"
	.LC0:
		.quad   ext_variable
		.section	".text"

	ret_var:
	.LCF0:
	0:      addis	2,12,.TOC.-.LCF0@ha
		addi	2,2,.TOC.-.LCF0@l
		.localentry     ret_var,.-ret_var
		addis	9,2,.LC0@toc@ha
		ld	9,.LC0@toc@l(9)
		lwa	3,0(9)
		blr

		.section        ".toc","aw"
		.set .LC1,.LC0

		.section        ".text"
	store_var:
	.LCF1:
	0:      addis	2,12,.TOC.-.LCF1@ha
		addi	2,2,.TOC.-.LCF1@l
		.localentry     store_var,.-store_var
		addis	9,2,.LC1@toc@ha
		ld	9,.LC1@toc@l(9)
		stw	3,0(9)
		blr

And the linker optimizes this to:

	ret_var:
		lis	2,.TOC@ha
		addi	2,2,.TOC@l 
		.localentry     ret_var,.-ret_var
		nop			; addis eliminated due to small TOC
		addi	9,2,<offset>	; ld converted into addi
		lwa	3,0(9)		; actual load

	store_var:
		lis	2,.TOC@ha
		addi	2,2,.TOC@l
		.localentry	store_var,.-store_var
		nop			; addis eliminated due to small TOC
		addi	9,2,<offset>	; ld converted into addi
		stw	3,0(9)		; actual store

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] Power10: Add PCREL_OPT load support
  2020-08-18  6:31 [PATCH 0/3] Power10 PCREL_OPT support Michael Meissner
@ 2020-08-18  6:34 ` Michael Meissner
  2020-08-21  2:09   ` Segher Boessenkool
  2020-08-18  6:35 ` [PATCH 2/3] Power10: Add PCREL_OPT store support Michael Meissner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Michael Meissner @ 2020-08-18  6:34 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Alan Modra

[PATCH 1/3] Power10: Add PCREL_OPT load support.

This patch adds support for optimizing power10 loads of an external variable to
eliminate loading the address of the variable, and then doing a subsequent load
using that address.

I have built compilers with and without these set of 3 patches doing a
bootstrap build and make check.  There were no regressions, and the new tests
passed.  Can I check these patches into the master branch for GCC?  Because
this is new functionality, I do not intend to back port these patches to GCC 10
at this time.

gcc/
2020-08-18  Michael Meissner  <meissner@linux.ibm.com>

	* config.gcc (powerpc*-*-*): Add pcrel-opt.o.
	(rs6000*-*-*): Add pcrel-opt.o.
	* config/rs6000/pcrel-opt.c: New file.
	* config/rs6000/pcrel-opt.md: New file.
	* config/rs6000/predicates.md (d_form_memory): New predicate.
	* config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add
	-mpcrel-opt.
	(POWERPC_MASKS): Add -mpcrel-opt.
	* config/rs6000/rs6000-passes.def: Add PCREL_OPT pass.
	* config/rs6000/rs6000-protos.h (reg_to_non_prefixed): New
	declaration.
	(make_pass_pcrel_opt): New declaration.
	* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
	support for -mpcrel-opt.
	(rs6000_delegitimize_address): Add support for PCREL_OPT
	addresses.
	(print_operand, 'r' case): New operand for PCREL_OPT.
	(rs6000_opt_masks): Add -mpcrel-opt.
	(rs6000_asm_output_opcode): Reset flag to emit the initial 'p'
	after use.
	* config/rs6000/rs6000.md (loads_extern_addr attribute): New
	attribute.
	(isa attribute): Add pcrel_opt sub-case.
	(enabled attribute): Add support for pcrel_opt.
	(pcrel_extern_addr): Set loads_extern_addr attribute.
	(toplevel): Include pcrel-opt.md.
	* config/rs6000/rs6000.opt (-mpcrel-opt): New debug option.
	* config/rs6000/t-rs6000 (pcrel-opt.o): Add build rule.
	(MD_INCLUDES): Add pcrel-opt.md.
---
 gcc/config.gcc                      |   6 +-
 gcc/config/rs6000/pcrel-opt.c       | 656 ++++++++++++++++++++++++++++++++++++
 gcc/config/rs6000/pcrel-opt.md      | 248 ++++++++++++++
 gcc/config/rs6000/predicates.md     |  23 ++
 gcc/config/rs6000/rs6000-cpus.def   |   2 +
 gcc/config/rs6000/rs6000-passes.def |   8 +
 gcc/config/rs6000/rs6000-protos.h   |   2 +
 gcc/config/rs6000/rs6000.c          |  40 ++-
 gcc/config/rs6000/rs6000.md         |  14 +-
 gcc/config/rs6000/rs6000.opt        |   4 +
 gcc/config/rs6000/t-rs6000          |   7 +-
 11 files changed, 1001 insertions(+), 9 deletions(-)
 create mode 100644 gcc/config/rs6000/pcrel-opt.c
 create mode 100644 gcc/config/rs6000/pcrel-opt.md

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2370368..605d743 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -505,7 +505,7 @@ or1k*-*-*)
 	;;
 powerpc*-*-*)
 	cpu_type=rs6000
-	extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o"
+	extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o pcrel-opt.o"
 	extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h"
 	extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h"
 	extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h"
@@ -520,6 +520,7 @@ powerpc*-*-*)
 	esac
 	extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt"
 	target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c \$(srcdir)/config/rs6000/rs6000-call.c"
+	target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
 	;;
 pru-*-*)
 	cpu_type=pru
@@ -531,8 +532,9 @@ riscv*)
 	;;
 rs6000*-*-*)
 	extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt"
-	extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o"
+	extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o pcrel-opt.o"
 	target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c \$(srcdir)/config/rs6000/rs6000-call.c"
+	target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c"
 	;;
 sparc*-*-*)
 	cpu_type=sparc
diff --git a/gcc/config/rs6000/pcrel-opt.c b/gcc/config/rs6000/pcrel-opt.c
new file mode 100644
index 0000000..10b4bc4
--- /dev/null
+++ b/gcc/config/rs6000/pcrel-opt.c
@@ -0,0 +1,656 @@
+/* Subroutines used support the pc-relative linker optimization.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This file implements a RTL pass that looks for pc-relative loads of the
+   address of an external variable using the PCREL_GOT relocation and a single
+   load that uses that external address.  If that is found we create the
+   PCREL_OPT relocation to possibly convert:
+
+	pld addr_reg,var@pcrel@got(0),1
+
+	<possibly other insns that do not use 'addr_reg' or 'data_reg'>
+
+	lwz data_reg,0(addr_reg)
+
+   into:
+
+	plwz data_reg,var@pcrel(0),1
+
+	<possibly other insns that do not use 'addr_reg' or 'data_reg'>
+
+	nop
+
+   If the variable is not defined in the main program or the code using it is
+   not in the main program, the linker put the address in the .got section and
+   do:
+
+		.section .got
+	.Lvar_got:
+		.dword var
+
+		.section .text
+		pld addr_reg,.Lvar_got@pcrel(0),1
+
+		<possibly other insns that do not use 'addr_reg' or 'data_reg'>
+
+		lwz data_reg,0(addr_reg)
+
+   We only look for a single usage in the basic block where the external
+   address is loaded.  Multiple uses or references in another basic block will
+   force us to not use the PCREL_OPT relocation.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "rtl.h"
+#include "tree.h"
+#include "memmodel.h"
+#include "expmed.h"
+#include "optabs.h"
+#include "recog.h"
+#include "df.h"
+#include "tm_p.h"
+#include "ira.h"
+#include "print-tree.h"
+#include "varasm.h"
+#include "explow.h"
+#include "expr.h"
+#include "output.h"
+#include "tree-pass.h"
+#include "rtx-vector-builder.h"
+#include "print-rtl.h"
+#include "insn-attr.h"
+#include "insn-codes.h"
+
+\f
+// Maximum number of insns to scan between the load address and the load that
+// uses that address.  This can be bumped up if desired.  If the insns are far
+// enough away, the PCREL_OPT optimization probably does not help, since the
+// load of the external address has probably completed by the time we do the
+// load of the variable at that address.
+const int MAX_PCREL_OPT_INSNS	= 10;
+
+/* Next PCREL_OPT label number.  */
+static unsigned int pcrel_opt_next_num;
+
+/* Various counters.  */
+static struct {
+  unsigned long extern_addrs;
+  unsigned long loads;
+  unsigned long load_separation[MAX_PCREL_OPT_INSNS+1];
+} counters;
+
+\f
+// Optimize a PC-relative load address to be used in a load.
+//
+// If the sequence of insns is safe to use the PCREL_OPT optimization (i.e. no
+// additional references to the address register, the address register dies at
+// the load, and no references to the load), convert insns of the form:
+//
+//	(set (reg:DI addr)
+//	     (symbol_ref:DI "ext_symbol"))
+//
+//	...
+//
+//	(set (reg:<MODE> value)
+//	     (mem:<MODE> (reg:DI addr)))
+//
+// into:
+//
+//	(parallel [(set (reg:DI addr)
+//                      (unspec:<MODE> [(symbol_ref:DI "ext_symbol")
+//                                      (const_int label_num)
+//                                      (const_int 0)]
+//                                     UNSPEC_PCREL_OPT_LD_ADDR))
+//                 (set (reg:DI data)
+//                      (unspec:DI [(const_int 0)]
+//	                           UNSPEC_PCREL_OPT_LD_ADDR))])
+//
+//	...
+//
+//	(parallel [(set (reg:<MODE>)
+//                      (unspec:<MODE> [(mem:<MODE> (reg:DI addr))
+//	                                (reg:DI data)
+//                                      (const_int label_num)]
+//                                     UNSPEC_PCREL_OPT_LD_RELOC))
+//                 (clobber (reg:DI addr))])
+//
+// If the register being loaded is the same register that was used to hold the
+// external address, we generate the following insn instead:
+//
+//	(set (reg:DI data)
+//           (unspec:DI [(symbol_ref:DI "ext_symbol")
+//                       (const_int label_num)
+//                       (const_int 1)]
+//	                UNSPEC_PCREL_OPT_LD_ADDR))
+//
+// In the first insn, we set both the address of the external variable, and
+// mark that the variable being loaded both are created in that insn, and are
+// consumed in the second insn.  It doesn't matter what mode the register that
+// we will ultimately do the load into, so we use DImode.  We just need to mark
+// that both registers may be set in the first insn, and will be used in the
+// second insn.
+//
+// The UNSPEC_PCREL_OPT_LD_ADDR insn will generate the load address plus
+// a definition of a label (.Lpcrel<n>), while the UNSPEC_PCREL_OPT_LD_RELOC
+// insn will generate the .reloc to tell the linker to tie the load address and
+// load using that address together.
+//
+//	pld b,ext_symbol@got@pcrel(0),1
+// .Lpcrel1:
+//
+//	...
+//
+//	.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
+//	lwz r,0(b)
+//
+// If ext_symbol is defined in another object file in the main program and we
+// are linking the main program, the linker will convert the above instructions
+// to:
+//
+//	plwz r,ext_symbol@got@pcrel(0),1
+//
+//	...
+//
+//	nop
+//
+// Return true if the PCREL_OPT load optimization succeeded.
+
+static bool
+do_pcrel_opt_load (rtx_insn *addr_insn,		// insn loading address
+		   rtx_insn *load_insn)		// insn using address
+{
+  rtx addr_set = PATTERN (addr_insn);
+  rtx addr_reg = SET_DEST (addr_set);
+  rtx addr_symbol = SET_SRC (addr_set);
+  rtx load_set = single_set (load_insn);
+  rtx reg = SET_DEST (load_set);
+  rtx mem = SET_SRC (load_set);
+  machine_mode reg_mode = GET_MODE (reg);
+  machine_mode mem_mode = GET_MODE (mem);
+  rtx mem_inner = mem;
+  unsigned int reg_regno = reg_or_subregno (reg);
+
+  // LWA is a DS format instruction, but LWZ is a D format instruction.  We use
+  // DImode for the mode to force checking whether the bottom 2 bits are 0.
+  // However FPR and vector registers uses the LFIWAX instruction which is
+  // indexed only.
+  if (GET_CODE (mem) == SIGN_EXTEND && GET_MODE (XEXP (mem, 0)) == SImode)
+    {
+      if (!INT_REGNO_P (reg_regno))
+	return false;
+
+      mem_inner = XEXP (mem, 0);
+      mem_mode = DImode;
+    }
+
+  else if (GET_CODE (mem) == SIGN_EXTEND
+	   || GET_CODE (mem) == ZERO_EXTEND
+	   || GET_CODE (mem) == FLOAT_EXTEND)
+    {
+      mem_inner = XEXP (mem, 0);
+      mem_mode = GET_MODE (mem_inner);
+    }
+
+  if (!MEM_P (mem_inner))
+    return false;
+
+  // If this is LFIWAX or similar instructions that are indexed only, we can't
+  // do the optimization.
+  enum non_prefixed_form non_prefixed = reg_to_non_prefixed (reg, mem_mode);
+  if (non_prefixed == NON_PREFIXED_X)
+    return false;
+
+  // The optimization will only work on non-prefixed offsettable loads.
+  rtx addr = XEXP (mem_inner, 0);
+  enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed);
+  if (iform != INSN_FORM_BASE_REG
+      && iform != INSN_FORM_D
+      && iform != INSN_FORM_DS
+      && iform != INSN_FORM_DQ)
+    return false;
+
+  // Allocate a new PC-relative label, and update the load external address
+  // insn.
+  //
+  // (parallel [(set (reg load)
+  //                 (unspec [(symbol_ref addr_symbol)
+  //                          (const_int label_num)
+  //                          (const_int 0)]
+  //                         UNSPEC_PCREL_OPT_LD_ADDR))
+  //            (set (reg addr)
+  //                 (unspec [(const_int 0)]
+  //	                     UNSPEC_PCREL_OPT_LD_ADDR))])
+
+  ++pcrel_opt_next_num;
+  unsigned int addr_regno = reg_or_subregno (addr_reg);
+  rtx label_num = GEN_INT (pcrel_opt_next_num);
+  rtx reg_di = gen_rtx_REG (DImode, reg_regno);
+
+  PATTERN (addr_insn)
+    = ((addr_regno != reg_regno)
+       ? gen_pcrel_opt_ld_addr (addr_reg, addr_symbol, label_num, reg_di)
+       : gen_pcrel_opt_ld_addr_same_reg (addr_reg, addr_symbol, label_num));
+
+  // Revalidate the insn, backing out of the optimization if the insn is not
+  // supported.
+  INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
+  if (INSN_CODE (addr_insn) < 0)
+    {
+      PATTERN (addr_insn) = addr_set;
+      INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
+      return false;
+    }
+
+  // Update the load insn.  If the mem had a sign/zero/float extend, add that
+  // also after doing the UNSPEC.  Add an explicit clobber of the external
+  // address register just to make it clear that the address register dies.
+  //
+  // (parallel [(set (reg:<MODE> data)
+  //                 (unspec:<MODE> [(mem (addr_reg)
+  //                                 (reg:DI data)
+  //                                 (const_int label_num)]
+  //                                UNSPEC_PCREL_OPT_LD_RELOC))
+  //            (clobber (reg:DI addr_reg))])
+
+  rtvec v_load = gen_rtvec (3, mem_inner, reg_di, label_num);
+  rtx new_load = gen_rtx_UNSPEC (GET_MODE (mem_inner), v_load,
+				 UNSPEC_PCREL_OPT_LD_RELOC);
+
+  if (GET_CODE (mem) != GET_CODE (mem_inner))
+    new_load = gen_rtx_fmt_e (GET_CODE (mem), reg_mode, new_load);
+
+  rtx old_load_set = PATTERN (load_insn);
+  rtx new_load_set = gen_rtx_SET (reg, new_load);
+  rtx load_clobber = gen_rtx_CLOBBER (VOIDmode,
+				      (addr_regno == reg_regno
+				       ? gen_rtx_SCRATCH (Pmode)
+				       : addr_reg));
+  PATTERN (load_insn)
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, new_load_set, load_clobber));
+
+  // Revalidate the insn, backing out of the optimization if the insn is not
+  // supported.
+
+  INSN_CODE (load_insn) = recog (PATTERN (load_insn), load_insn, 0);
+  if (INSN_CODE (load_insn) < 0)
+    {
+      PATTERN (addr_insn) = addr_set;
+      INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
+
+      PATTERN (load_insn) = old_load_set;
+      INSN_CODE (load_insn) = recog (PATTERN (load_insn), load_insn, 0);
+      return false;
+    }
+
+  return true;
+}
+
+\f
+/* Given an insn, find the next insn in the basic block.  Stop if we find a the
+   end of a basic block, such as a label, call or jump, and return NULL.  */
+
+static rtx_insn *
+next_active_insn_in_basic_block (rtx_insn *insn)
+{
+  insn = NEXT_INSN (insn);
+
+  while (insn != NULL_RTX)
+    {
+      /* If the basic block ends or there is a jump of some kind, exit the
+	 loop.  */
+      if (CALL_P (insn)
+	  || JUMP_P (insn)
+	  || JUMP_TABLE_DATA_P (insn)
+	  || LABEL_P (insn)
+	  || BARRIER_P (insn))
+	return NULL;
+
+      /* If this is a real insn, return it.  */
+      if (!insn->deleted ()
+	  && NONJUMP_INSN_P (insn)
+	  && GET_CODE (PATTERN (insn)) != USE
+	  && GET_CODE (PATTERN (insn)) != CLOBBER)
+	return insn;
+
+      /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs.  */
+      insn = NEXT_INSN (insn);
+    }
+
+  return NULL;
+}
+
+\f
+// Validate that a load is actually a single instruction that can be optimized
+// with the PCREL_OPT optimization.
+
+static bool
+is_single_instruction (rtx_insn *insn, rtx reg)
+{
+  if (!REG_P (reg) && !SUBREG_P (reg))
+    return false;
+
+  if (get_attr_length (insn) != 4)
+    return false;
+
+  // _Decimal128 and IBM extended double are always multiple instructions.
+  machine_mode mode = GET_MODE (reg);
+  if (mode == TFmode && !TARGET_IEEEQUAD)
+    return false;
+
+  if (mode == TDmode || mode == IFmode)
+    return false;
+
+  // Don't optimize PLQ/PSTQ instructions
+  unsigned int regno = reg_or_subregno (reg);
+  unsigned int size = GET_MODE_SIZE (mode);
+  if (size >= 16 && !VSX_REGNO_P (regno))
+    return false;
+
+  return true;
+}
+
+\f
+// Given an insn with that loads up a base register with the address of an
+// external symbol, see if we can optimize it with the PCREL_OPT optimization.
+
+static void
+do_pcrel_opt_addr (rtx_insn *addr_insn)
+{
+  int num_insns = 0;
+
+  // Do some basic validation.
+  rtx addr_set = PATTERN (addr_insn);
+  if (GET_CODE (addr_set) != SET)
+    return;
+
+  rtx addr_reg = SET_DEST (addr_set);
+  rtx addr_symbol = SET_SRC (addr_set);
+
+  if (!base_reg_operand (addr_reg, Pmode)
+      || !pcrel_external_address (addr_symbol, Pmode))
+    return;
+
+  rtx_insn *insn = addr_insn;
+  bool looping = true;
+  bool had_load = false;	// whether intermediate insns had a load
+  bool had_store = false;	// whether intermediate insns had a store
+  bool is_load = false;		// whether the current insn is a load
+  bool is_store = false;	// whether the current insn is a store
+
+  // Check the following insns and see if it is a load or store that uses the
+  // external address.  If we can't do the optimization, just return.
+  while (looping)
+    {
+      is_load = is_store = false;
+
+      // Don't allow too many insns between the load of the external address
+      // and the eventual load or store.
+      if (++num_insns >= MAX_PCREL_OPT_INSNS)
+	return;
+
+      insn = next_active_insn_in_basic_block (insn);
+      if (!insn)
+	return;
+
+      // See if the current insn is a load or store
+      switch (get_attr_type (insn))
+	{
+	  // While load of the external address is a 'load' for scheduling
+	  // purposes, it should be safe to allow loading other external
+	  // addresses between the load of the external address we are
+	  // currently looking at and the load or store using that address.
+	case TYPE_LOAD:
+	  if (get_attr_loads_extern_addr (insn) == LOADS_EXTERN_ADDR_YES)
+	    break;
+	  /* fall through */
+
+	case TYPE_FPLOAD:
+	case TYPE_VECLOAD:
+	  is_load = true;
+	  break;
+
+	case TYPE_STORE:
+	case TYPE_FPSTORE:
+	case TYPE_VECSTORE:
+	  is_store = true;
+	  break;
+
+	  // Don't do the optimization through atomic operations.
+	case TYPE_LOAD_L:
+	case TYPE_STORE_C:
+	case TYPE_HTM:
+	case TYPE_HTMSIMPLE:
+	  return;
+
+	default:
+	  break;
+	}
+
+      // If the external addresss register was referenced, it must also die in
+      // the same insn.
+      if (reg_referenced_p (addr_reg, PATTERN (insn)))
+	{
+	  if (!dead_or_set_p (insn, addr_reg))
+	    return;
+
+	  looping = false;
+	}
+
+      // If it dies by being set without being referenced, exit.
+      else if (dead_or_set_p (insn, addr_reg))
+	return;
+
+      // If it isn't the insn we want, remember if there were loads or stores.
+      else
+	{
+	  had_load |= is_load;
+	  had_store |= is_store;
+	}
+    }
+
+  // If the insn does not use the external address, or the external address
+  // register does not die at this insn, we can't do the optimization.
+  if (!reg_referenced_p (addr_reg, PATTERN (insn))
+      || !dead_or_set_p (insn, addr_reg))
+    return;
+
+  rtx set = single_set (insn);
+  if (!set)
+    return;
+
+  // Optimize loads
+  if (is_load)
+    {
+      // If there were any stores in the insns between loading the external
+      // address and doing the load, turn off the optimization.
+      if (had_store)
+	return;
+
+      rtx reg = SET_DEST (set);
+      if (!is_single_instruction (insn, reg))
+	return;
+
+      rtx mem = SET_SRC (set);
+      switch (GET_CODE (mem))
+	{
+	case MEM:
+	  break;
+
+	case SIGN_EXTEND:
+	case ZERO_EXTEND:
+	case FLOAT_EXTEND:
+	  if (!MEM_P (XEXP (mem, 0)))
+	    return;
+	  break;
+
+	default:
+	  return;
+	}
+
+      // If the register being loaded was used or set between the load of
+      // the external address and the load using the address, we can't do
+      // the optimization.
+      if (reg_used_between_p (reg, addr_insn, insn)
+	  || reg_set_between_p (reg, addr_insn, insn))
+	return;
+
+      // Process the load in detail
+      if (do_pcrel_opt_load (addr_insn, insn))
+	{
+	  counters.loads++;
+	  counters.load_separation[num_insns-1]++;
+	}
+    }
+
+  return;
+}
+
+\f
+// Optimize pcrel external variable references
+
+static unsigned int
+do_pcrel_opt_pass (function *fun)
+{
+  basic_block bb;
+  rtx_insn *insn, *curr_insn = 0;
+
+  memset ((char *) &counters, '\0', sizeof (counters));
+
+  // Dataflow analysis for use-def chains.
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS);
+  df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN);
+  df_note_add_problem ();
+  df_analyze ();
+  df_set_flags (DF_DEFER_INSN_RESCAN | DF_LR_RUN_DCE);
+
+  // Look at each basic block to see if there is a load of an external
+  // variable's external address, and a single load using that external
+  // address.
+  FOR_ALL_BB_FN (bb, fun)
+    {
+      FOR_BB_INSNS_SAFE (bb, insn, curr_insn)
+	{
+	  if (NONJUMP_INSN_P (insn) && single_set (insn)
+	      && get_attr_loads_extern_addr (insn) == LOADS_EXTERN_ADDR_YES)
+	    {
+	      counters.extern_addrs++;
+	      do_pcrel_opt_addr (insn);
+	    }
+	}
+    }
+
+  df_remove_problem (df_chain);
+  df_process_deferred_rescans ();
+  df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_LR_RUN_DCE);
+  df_chain_add_problem (DF_UD_CHAIN);
+  df_note_add_problem ();
+  df_analyze ();
+
+  if (dump_file)
+    {
+      if (!counters.extern_addrs)
+	fprintf (dump_file, "\nNo external symbols were referenced\n");
+
+      else
+	{
+	  fprintf (dump_file,
+		   "\n# of loads of an address of an external symbol = %lu\n",
+		   counters.extern_addrs);
+
+	  if (!counters.loads)
+	    fprintf (dump_file,
+		     "\nNo PCREL_OPT load optimizations were done\n");
+
+	  else
+	    {
+	      fprintf (dump_file, "# of PCREL_OPT loads = %lu\n",
+		       counters.loads);
+
+	      fprintf (dump_file, "# of adjacent PCREL_OPT loads = %lu\n",
+		       counters.load_separation[0]);
+
+	      for (int i = 1; i < MAX_PCREL_OPT_INSNS; i++)
+		{
+		  if (counters.load_separation[i])
+		    fprintf (dump_file,
+			     "# of PCREL_OPT loads separated by %d insn%s = %lu\n",
+			     i, (i == 1) ? "" : "s",
+			     counters.load_separation[i]);
+		}
+	    }
+	}
+
+      fprintf (dump_file, "\n");
+    }
+
+  return 0;
+}
+
+\f
+// Optimize pc-relative references for the new PCREL_OPT pass
+const pass_data pass_data_pcrel_opt =
+{
+  RTL_PASS,			// type
+  "pcrel_opt",			// name
+  OPTGROUP_NONE,		// optinfo_flags
+  TV_NONE,			// tv_id
+  0,				// properties_required
+  0,				// properties_provided
+  0,				// properties_destroyed
+  0,				// todo_flags_start
+  TODO_df_finish,		// todo_flags_finish
+};
+
+// Pass data structures
+class pcrel_opt : public rtl_opt_pass
+{
+public:
+  pcrel_opt (gcc::context *ctxt)
+  : rtl_opt_pass (pass_data_pcrel_opt, ctxt)
+  {}
+
+  ~pcrel_opt (void)
+  {}
+
+  // opt_pass methods:
+  virtual bool gate (function *)
+  {
+    return (TARGET_PCREL && TARGET_PCREL_OPT && optimize);
+  }
+
+  virtual unsigned int execute (function *fun)
+  {
+    return do_pcrel_opt_pass (fun);
+  }
+
+  opt_pass *clone ()
+  {
+    return new pcrel_opt (m_ctxt);
+  }
+};
+
+rtl_opt_pass *
+make_pass_pcrel_opt (gcc::context *ctxt)
+{
+  return new pcrel_opt (ctxt);
+}
diff --git a/gcc/config/rs6000/pcrel-opt.md b/gcc/config/rs6000/pcrel-opt.md
new file mode 100644
index 0000000..00a3bc4
--- /dev/null
+++ b/gcc/config/rs6000/pcrel-opt.md
@@ -0,0 +1,248 @@
+;; Machine description for the PCREL_OPT optimization.
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;; Contributed by Michael Meissner (meissner@linux.ibm.com)
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; Support for the PCREL_OPT optimization.  PCREL_OPT looks for instances where
+;; an external variable is used only once, either for reading or for writing.
+;;
+;; If we are optimizing a single read, normally the code would look like:
+;;
+;;	(set (reg:DI <ptr>)
+;;	     (symbol_ref:DI "<extern_addr>"))	# <data> is currently dead
+;;
+;;		...	# insns do not need to be adjacent
+;;
+;;	(set (reg:SI <data>)
+;;	     (mem:SI (reg:DI <xxx>)))		# <ptr> dies with this insn
+;;
+;; We transform this into:
+;;
+;;	(parallel [(set (reg:DI <ptr>)
+;;	                (unspec:SI [(symbol_ref:DI <extern_addr>)
+;;	                            (const_int <marker>)]
+;;	                           UNSPEC_PCREL_OPT_LD_ADDR))
+;;	           (set (reg:DI <data>)
+;;	                (unspec:DI [(const_int 0)]
+;;	                           UNSPEC_PCREL_OPT_LD_ADDR))])
+;;
+;;	...
+;;
+;;	(parallel [(set (reg:SI <data>)
+;;	           (unspec:SI [(mem:SI (reg:DI <ptr>))
+;;	                       (reg:DI <data>)
+;;	                       (const_int <marker>)]
+;;	                      UNSPEC_PCREL_OPT_LD))
+;;	           (clobber (reg:DI <ptr>))])
+;;
+;; The marker is an integer constant that links the load of the external
+;; address to the load of the actual variable.
+;;
+;; In the first insn, we set both the address of the external variable, and
+;; mark that the variable being loaded both are created in that insn, and are
+;; consumed in the second insn.  It doesn't matter what mode the register that
+;; we will ultimately do the load into, so we use DImode.  We just need to mark
+;; that both registers may be set in the first insn, and will be used in the
+;; second insn.
+;;
+;; Since we use UNSPEC's and link both the the register holding the external
+;; address and the value being loaded, it should prevent other passes from
+;; modifying it.
+;;
+;; If the register being loaded is the same as the base register, we use an
+;; alternate form of the insns.
+;;
+;;	(set (reg:DI <data_ptr>)
+;;	     (unspec:DI [(symbol_ref:DI <extern_addr>)
+;;	                 (const_int <marker>)]
+;;	                UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG))
+;;
+;;	...
+;;
+;;	(parallel [(set (reg:SI <data>)
+;;	           (unspec:SI [(mem:SI (reg:DI <ptr>))
+;;	                       (reg:DI <data>)
+;;	                       (const_int <marker>)]
+;;	                      UNSPEC_PCREL_OPT_LD))
+;;	           (clobber (reg:DI <ptr>))])
+
+(define_c_enum "unspec"
+  [UNSPEC_PCREL_OPT_LD_ADDR
+   UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG
+   UNSPEC_PCREL_OPT_LD_RELOC])
+
+;; Modes that are supported for PCREL_OPT
+(define_mode_iterator PO [QI HI SI DI TI SF DF KF
+			  V1TI V2DI V4SI V8HI V16QI V2DF V4SF
+			  (TF "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")])
+
+;; Vector modes for PCREL_OPT
+(define_mode_iterator PO_VECT [TI KF V1TI V2DI V4SI V8HI V16QI V2DF V4SF
+			       (TF "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")])
+
+;; Insn for loading the external address, where the register being loaded is not
+;; the same as the register being loaded with the data.
+(define_insn "pcrel_opt_ld_addr"
+  [(set (match_operand:DI 0 "base_reg_operand" "=&b,&b")
+	(unspec:DI [(match_operand:DI 1 "pcrel_external_address")
+		    (match_operand 2 "const_int_operand" "n,n")]
+		   UNSPEC_PCREL_OPT_LD_ADDR))
+   (set (match_operand:DI 3 "gpc_reg_operand" "=r,wa")
+	(unspec:DI [(const_int 0)]
+		   UNSPEC_PCREL_OPT_LD_ADDR))]
+  "TARGET_PCREL_OPT
+   && reg_or_subregno (operands[0]) != reg_or_subregno (operands[3])"
+  "ld %0,%a1\n.Lpcrel%2:"
+  [(set_attr "prefixed" "yes")
+   (set_attr "type" "load")
+   (set_attr "isa" "pcrel_opt")
+   (set_attr "loads_extern_addr" "yes")])
+
+;; Alternate form of loading up the external address that is the same register
+;; as the final load.
+(define_insn "pcrel_opt_ld_addr_same_reg"
+  [(set (match_operand:DI 0 "base_reg_operand" "=b")
+	(unspec:DI [(match_operand:DI 1 "pcrel_external_address")
+		    (match_operand 2 "const_int_operand" "n")]
+		   UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG))]
+  "TARGET_PCREL_OPT"
+  "ld %0,%a1\n.Lpcrel%2:"
+  [(set_attr "prefixed" "yes")
+   (set_attr "type" "load")
+   (set_attr "isa" "pcrel_opt")
+   (set_attr "loads_extern_addr" "yes")])
+
+;; PCREL_OPT modes that are optimized for loading or storing GPRs.
+(define_mode_iterator PO_GPR [QI HI SI DI SF DF])
+
+(define_mode_attr PO_GPR_LD [(QI "lbz")
+			     (HI "lhz")
+			     (SI "lwz")
+			     (SF "lwz")
+			     (DI "ld")
+			     (DF "ld")])
+
+;; PCREL_OPT load operation of GPRs.  Operand 4 (the register used to hold the
+;; address of the external symbol) is SCRATCH if the same register is used for
+;; the normal load.
+(define_insn "*pcrel_opt_ld<mode>_gpr"
+  [(parallel [(set (match_operand:PO_GPR 0 "int_reg_operand" "+r")
+		   (unspec:PO_GPR [(match_operand:PO_GPR 1 "d_form_memory" "o")
+				   (match_operand:DI 2 "int_reg_operand" "0")
+				   (match_operand 3 "const_int_operand" "n")]
+				  UNSPEC_PCREL_OPT_LD_RELOC))
+	      (clobber (match_scratch:DI 4 "=bX"))])]
+  "TARGET_PCREL_OPT
+   && (GET_CODE (operands[4]) == SCRATCH
+       || reg_mentioned_p (operands[4], operands[1]))"
+  "%r3<PO_GPR_LD> %0,%1"
+  [(set_attr "type" "load")
+   (set_attr "isa" "pcrel_opt")])
+
+;; PCREL_OPT load with sign/zero extension
+(define_insn "*pcrel_opt_ldsi_<u><mode>_gpr"
+  [(set (match_operand:EXTSI 0 "int_reg_operand" "+r")
+	(any_extend:EXTSI
+	 (unspec:SI [(match_operand:SI 1 "d_form_memory" "o")
+		     (match_operand:DI 2 "int_reg_operand" "0")
+		     (match_operand 3 "const_int_operand" "n")]
+		     UNSPEC_PCREL_OPT_LD_RELOC)))
+   (clobber (match_scratch:DI 4 "=bX"))]
+  "TARGET_PCREL_OPT"
+  "%r3lw<az> %0,%1"
+  [(set_attr "type" "load")
+   (set_attr "isa" "pcrel_opt")])
+
+(define_insn "*pcrel_opt_ldhi_<u><mode>_gpr"
+  [(set (match_operand:EXTHI 0 "int_reg_operand" "+r")
+	(any_extend:EXTHI
+	 (unspec:HI [(match_operand:HI 1 "d_form_memory" "o")
+		     (match_operand:DI 2 "int_reg_operand" "0")
+		     (match_operand 3 "const_int_operand" "n")]
+		     UNSPEC_PCREL_OPT_LD_RELOC)))
+   (clobber (match_scratch:DI 4 "=bX"))]
+  "TARGET_PCREL_OPT"
+  "%r3lh<az> %0,%1"
+  [(set_attr "type" "load")
+   (set_attr "isa" "pcrel_opt")])
+
+(define_insn "*pcrel_opt_ldqi_u<mode>_gpr"
+  [(set (match_operand:EXTQI 0 "int_reg_operand" "+r")
+	(zero_extend:EXTQI
+	 (unspec:QI [(match_operand:QI 1 "d_form_memory" "o")
+		     (match_operand:DI 2 "int_reg_operand" "0")
+		     (match_operand 3 "const_int_operand" "n")]
+		     UNSPEC_PCREL_OPT_LD_RELOC)))
+   (clobber (match_scratch:DI 4 "=bX"))]
+  "TARGET_PCREL_OPT"
+  "%r3lbz %0,%1"
+  [(set_attr "type" "load")
+   (set_attr "isa" "pcrel_opt")])
+
+;; Scalar types that can be optimized by loading them into floating point
+;; or Altivec registers.
+(define_mode_iterator PO_FP [DI DF SF])
+
+;; Load instructions to load up scalar floating point or 64-bit integer values
+;; into floating point registers or Altivec registers.
+(define_mode_attr PO_FPR_LD [(DI "lfd")  (DF "lfd")  (SF "lfs")])
+(define_mode_attr PO_AVX_LD [(DI "lxsd") (DF "lxsd") (SF "lxssp")])
+
+;; PCREL_OPT load operation of scalar DF/DI/SF into vector registers.
+(define_insn "*pcrel_opt_ld<mode>_vsx"
+  [(set (match_operand:PO_FP 0 "vsx_register_operand" "+d,v")
+	(unspec:PO_FP [(match_operand:PO_FP 1 "d_form_memory" "o,o")
+		       (match_operand:DI 2 "vsx_register_operand" "0,0")
+		       (match_operand 3 "const_int_operand" "n,n")]
+		       UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_operand:DI 4 "base_reg_operand" "=b,b"))]
+  "TARGET_PCREL_OPT"
+  "@
+   %r3<PO_FPR_LD> %0,%1
+   %r3<PO_AVX_LD> %0,%1"
+  [(set_attr "type" "fpload")
+   (set_attr "isa" "pcrel_opt")])
+
+;; PCREL_OPT optimization extending SFmode to DFmode via a load.
+(define_insn "*pcrel_opt_ldsf_df"
+  [(set (match_operand:DF 0 "vsx_register_operand" "+d,v")
+	(float_extend:DF
+	 (unspec:SF [(match_operand:SF 1 "d_form_memory" "o,o")
+		     (match_operand:DI 2 "vsx_register_operand" "0,0")
+		     (match_operand 3 "const_int_operand" "n,n")]
+		    UNSPEC_PCREL_OPT_LD_RELOC)))
+   (clobber (match_operand:DI 4 "base_reg_operand" "=b,b"))]
+  "TARGET_PCREL_OPT"
+  "@
+   %r3lfs %0,%1
+   %r3lxssp %0,%1"
+  [(set_attr "type" "fpload")
+   (set_attr "isa" "pcrel_opt")])
+
+;; PCREL_OPT load operation of vector/float128 types into vector registers.
+(define_insn "*pcrel_opt_ld<mode>"
+  [(set (match_operand:PO_VECT 0 "vsx_register_operand" "+wa")
+	(unspec:PO_VECT [(match_operand:PO_VECT 1 "d_form_memory" "o")
+			 (match_operand:DI 2 "vsx_register_operand" "0")
+			 (match_operand 3 "const_int_operand" "n")]
+			UNSPEC_PCREL_OPT_LD_RELOC))
+   (clobber (match_operand:DI 4 "base_reg_operand" "=b"))]
+  "TARGET_PCREL_OPT"
+  "%r3lxv %x0,%1"
+  [(set_attr "type" "vecload")
+   (set_attr "isa" "pcrel_opt")])
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 2709e46..38ae9cd 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1876,3 +1876,26 @@ (define_predicate "prefixed_memory"
 {
   return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT);
 })
+
+;; Return true if the operand is a valid memory operand with an offsettable
+;; address that could be merged with the load of a PC-relative external address
+;; with the PCREL_OPT optimization.  We don't check here whether or not the
+;; offset needs to be used in a DS-FORM (bottom 2 bits 0) or DQ-FORM (bottom 4
+;; bits 0) instruction.
+(define_predicate "d_form_memory"
+  (match_code "mem")
+{
+  if (!memory_operand (op, mode))
+    return false;
+
+  rtx addr = XEXP (op, 0);
+
+  if (REG_P (addr) || SUBREG_P (addr))
+    return true;
+
+  if (GET_CODE (addr) != PLUS)
+    return false;
+
+  return (base_reg_operand (XEXP (addr, 0), Pmode)
+	  && satisfies_constraint_I (XEXP (addr, 1)));
+})
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index 8d2c1ff..d3f72d7 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -78,6 +78,7 @@
 /* Flags that need to be turned off if -mno-power10.  */
 #define OTHER_POWER10_MASKS	(OPTION_MASK_MMA			\
 				 | OPTION_MASK_PCREL			\
+				 | OPTION_MASK_PCREL_OPT		\
 				 | OPTION_MASK_PREFIXED)
 
 #define ISA_3_1_MASKS_SERVER	(ISA_3_0_MASKS_SERVER			\
@@ -142,6 +143,7 @@
 				 | OPTION_MASK_P9_MISC			\
 				 | OPTION_MASK_P9_VECTOR		\
 				 | OPTION_MASK_PCREL			\
+				 | OPTION_MASK_PCREL_OPT		\
 				 | OPTION_MASK_POPCNTB			\
 				 | OPTION_MASK_POPCNTD			\
 				 | OPTION_MASK_POWERPC64		\
diff --git a/gcc/config/rs6000/rs6000-passes.def b/gcc/config/rs6000/rs6000-passes.def
index 5164c52..9b93fc7 100644
--- a/gcc/config/rs6000/rs6000-passes.def
+++ b/gcc/config/rs6000/rs6000-passes.def
@@ -24,4 +24,12 @@ along with GCC; see the file COPYING3.  If not see
    REPLACE_PASS (PASS, INSTANCE, TGT_PASS)
  */
 
+  /* Pass to add the appropriate vector swaps on power8 little endian systems.
+     The power8 does not have instructions that automaticaly do the byte swaps
+     for loads and stores.  */
   INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps);
+
+  /* Pass to do the PCREL_OPT optimization that combines the load of an
+     external symbol's address along with a single load or store using that
+     address as a base register.  */
+  INSERT_PASS_AFTER (pass_sched2, 1, pass_pcrel_opt);
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 28e859f..517713a 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -189,6 +189,7 @@ enum non_prefixed_form {
 
 extern enum insn_form address_to_insn_form (rtx, machine_mode,
 					    enum non_prefixed_form);
+extern enum non_prefixed_form reg_to_non_prefixed (rtx, machine_mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
@@ -305,6 +306,7 @@ namespace gcc { class context; }
 class rtl_opt_pass;
 
 extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *);
+extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *);
 extern bool rs6000_sum_of_two_registers_p (const_rtx expr);
 extern bool rs6000_quadword_masked_address_p (const_rtx exp);
 extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fe93cf6..6877de5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1175,7 +1175,6 @@ static bool rs6000_secondary_reload_move (enum rs6000_reg_type,
 					  machine_mode,
 					  secondary_reload_info *,
 					  bool);
-static enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 rtl_opt_pass *make_pass_analyze_swaps (gcc::context*);
 
 /* Hash table stuff for keeping track of TOC entries.  */
@@ -4316,6 +4315,14 @@ rs6000_option_override_internal (bool global_init_p)
       rs6000_isa_flags &= ~OPTION_MASK_MMA;
     }
 
+  if (!TARGET_PCREL && TARGET_PCREL_OPT)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL_OPT) != 0)
+	error ("%qs requires %qs", "-mpcrel-opt", "-mpcrel");
+
+	rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
+    }
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -8515,7 +8522,10 @@ rs6000_delegitimize_address (rtx orig_x)
 {
   rtx x, y, offset;
 
-  if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR)
+  if (GET_CODE (orig_x) == UNSPEC
+      && (XINT (orig_x, 1) == UNSPEC_FUSION_GPR
+	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR
+	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG))
     orig_x = XVECEXP (orig_x, 0, 0);
 
   orig_x = delegitimize_mem_from_attrs (orig_x);
@@ -13197,6 +13207,19 @@ print_operand (FILE *file, rtx x, int code)
 	fprintf (file, "%d", 128 >> (REGNO (x) - CR0_REGNO));
       return;
 
+    case 'r':
+      /* X is a label number for the PCREL_OPT optimization.  Emit the .reloc
+	 to enable this optimization, unless the value is 0.  */
+      gcc_assert (CONST_INT_P (x));
+      if (UINTVAL (x) != 0)
+	{
+	  unsigned int label_num = UINTVAL (x);
+	  fprintf (file,
+		   ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t",
+		   label_num, label_num);
+	}
+      return;
+
     case 's':
       /* Low 5 bits of 32 - value */
       if (! INT_P (x))
@@ -23244,6 +23267,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "mulhw",			OPTION_MASK_MULHW,		false, true  },
   { "multiple",			OPTION_MASK_MULTIPLE,		false, true  },
   { "pcrel",			OPTION_MASK_PCREL,		false, true  },
+  { "pcrel-opt",		OPTION_MASK_PCREL_OPT,		false, true  },
   { "popcntb",			OPTION_MASK_POPCNTB,		false, true  },
   { "popcntd",			OPTION_MASK_POPCNTD,		false, true  },
   { "power8-fusion",		OPTION_MASK_P8_FUSION,		false, true  },
@@ -25368,7 +25392,7 @@ is_lfs_stfs_insn (rtx_insn *insn)
 /* Helper function to take a REG and a MODE and turn it into the non-prefixed
    instruction format (D/DS/DQ) used for offset memory.  */
 
-static enum non_prefixed_form
+enum non_prefixed_form
 reg_to_non_prefixed (rtx reg, machine_mode mode)
 {
   /* If it isn't a register, use the defaults.  */
@@ -25591,7 +25615,15 @@ void
 rs6000_asm_output_opcode (FILE *stream)
 {
   if (next_insn_prefixed_p)
-    fprintf (stream, "p");
+    {
+      fprintf (stream, "p");
+
+      /* Reset flag in case there are separate insn lines in the sequence, so
+	 the 'p' is only emited for the first line.  This shows up when we are
+	 doing the PCREL_OPT optimization, in that the label created with %r<n>
+	 would have a leading 'p' printed.  */
+      next_insn_prefixed_p = false;
+    }
 
   return;
 }
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 43b620a..d9dd25f 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -292,6 +292,10 @@ (define_attr "prefixed" "no,yes"
 
 	(const_string "no")))
 
+;; Whether an insn loads an external address for the PCREL_OPT optimizaton.
+(define_attr "loads_extern_addr" "no,yes"
+  (const_string "no"))
+
 ;; Return the number of real hardware instructions in a combined insn.  If it
 ;; is 0, just use the length / 4.
 (define_attr "num_insns" "" (const_int 0))
@@ -323,7 +327,7 @@ (define_attr "cpu"
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
-(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
+(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,pcrel_opt"
   (const_string "any"))
 
 ;; Is this alternative enabled for the current CPU/ISA/etc.?
@@ -371,6 +375,10 @@ (define_attr "enabled" ""
      (and (eq_attr "isa" "p10")
 	  (match_test "TARGET_POWER10"))
      (const_int 1)
+
+     (and (eq_attr "isa" "pcrel_opt")
+	  (match_test "TARGET_PCREL_OPT"))
+     (const_int 1)
     ] (const_int 0)))
 
 ;; If this instruction is microcoded on the CELL processor
@@ -10226,7 +10234,8 @@ (define_insn "*pcrel_extern_addr"
   "TARGET_PCREL"
   "ld %0,%a1"
   [(set_attr "prefixed" "yes")
-   (set_attr "type" "load")])
+   (set_attr "type" "load")
+   (set_attr "loads_extern_addr" "yes")])
 
 ;; TOC register handling.
 
@@ -14900,3 +14909,4 @@ (define_insn "*cmpeqb_internal"
 (include "dfp.md")
 (include "crypto.md")
 (include "htm.md")
+(include "pcrel-opt.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d3e740..22d3af4 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -582,6 +582,10 @@ mpcrel
 Target Report Mask(PCREL) Var(rs6000_isa_flags)
 Generate (do not generate) pc-relative memory addressing.
 
+mpcrel-opt
+Target Undocumented Mask(PCREL_OPT) Var(rs6000_isa_flags)
+Generate (do not generate) pc-relative memory optimizations for externals.
+
 mmma
 Target Report Mask(MMA) Var(rs6000_isa_flags)
 Generate (do not generate) MMA instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 1ddb572..a617276 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -23,6 +23,10 @@ TM_H += $(srcdir)/config/rs6000/rs6000-cpus.def
 TM_H += $(srcdir)/config/rs6000/rs6000-modes.h
 PASSES_EXTRA += $(srcdir)/config/rs6000/rs6000-passes.def
 
+pcrel-opt.o: $(srcdir)/config/rs6000/pcrel-opt.c
+	$(COMPILE) $<
+	$(POSTCOMPILE)
+
 rs6000-c.o: $(srcdir)/config/rs6000/rs6000-c.c
 	$(COMPILE) $<
 	$(POSTCOMPILE)
@@ -86,4 +90,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
 	$(srcdir)/config/rs6000/mma.md \
 	$(srcdir)/config/rs6000/crypto.md \
 	$(srcdir)/config/rs6000/htm.md \
-	$(srcdir)/config/rs6000/dfp.md
+	$(srcdir)/config/rs6000/dfp.md \
+	$(srcdir)/config/rs6000/pcrel-opt.md
-- 
1.8.3.1


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] Power10: Add PCREL_OPT store support.
  2020-08-18  6:31 [PATCH 0/3] Power10 PCREL_OPT support Michael Meissner
  2020-08-18  6:34 ` [PATCH 1/3] Power10: Add PCREL_OPT load support Michael Meissner
@ 2020-08-18  6:35 ` Michael Meissner
  2020-08-18  6:37 ` [PATCH 3/3] Power10: Add tests for PCREL_OPT support Michael Meissner
  2020-08-20 23:33 ` [PATCH 0/3] Power10 " Segher Boessenkool
  3 siblings, 0 replies; 12+ messages in thread
From: Michael Meissner @ 2020-08-18  6:35 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Alan Modra

[PATCH 2/3] Power10: Add PCREL_OPT store support.

This patch adds support for optimizing power10 stores to an external variable
to eliminate loading the address of the variable, and then doing a subsequent
store using that address.

I have built compilers with and without these set of 3 patches doing a
bootstrap build and make check.  There were no regressions, and the new tests
passed.  Can I check these patches into the master branch for GCC?  Because
this is new functionality, I do not intend to back port these patches to GCC 10
at this time.

gcc/
2020-08-18  Michael Meissner  <meissner@linux.ibm.com>

	* config/rs6000/pcrel-opt.c (counters): Add fields to count number
	of PCREL_OPT stores that were processed.
	(do_pcrel_opt_store): New function to do PCREL_OPT stores.
	(do_pcrel_opt_addr): Add support to optimize PCREL_OPT stores.
	(do_pcrel_opt_pass): Print out statistics for PCREL_OPT stores.
	* config/rs6000/pcrel-opt.md (UNSPEC_PCREL_OPT_ST_ADDR): New
	unspec.
	(UNSPEC_PCREL_OPT_ST_RELOC): New unspec.
	(pcrel_opt_st_addr<mode>): New insns for PCREL_OPT stores.
	(pcrel_opt_st<mode>): New insns for QI/HI/SI PCREL_OPT stores.
	(pcrel_opt_stdi): New insn to optimize DI PCREL_OPT stores.
	(pcrel_opt_stsf): New insn to optimize SF PCREL_OPT stores.
	(pcrel_opt_stdf): New insn to optimize DF PCREL_OPT stores.
	(pcrel_opt_st<mode>): New insns to optimize vector PCREL_OPT
	stores.
	* config/rs6000/rs6000.c (rs6000_delegitimize_address): Add
	support to de-legitimize PCREL_OPT stores.
---
 gcc/config/rs6000/pcrel-opt.c  | 259 +++++++++++++++++++++++++++++++++++++++--
 gcc/config/rs6000/pcrel-opt.md | 115 +++++++++++++++++-
 gcc/config/rs6000/rs6000.c     |   3 +-
 3 files changed, 367 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/pcrel-opt.c b/gcc/config/rs6000/pcrel-opt.c
index 10b4bc4..61dce67 100644
--- a/gcc/config/rs6000/pcrel-opt.c
+++ b/gcc/config/rs6000/pcrel-opt.c
@@ -53,6 +53,43 @@
 
    We only look for a single usage in the basic block where the external
    address is loaded.  Multiple uses or references in another basic block will
+   force us to not use the PCREL_OPT relocation.
+
+   We also optimize stores to the address of an external variable using the
+   PCREL_GOT relocation and a single store that uses that external address.  If
+   that is found we create the PCREL_OPT relocation to possibly convert:
+
+	pld addr_reg,var@pcrel@got(0),1
+
+	<possibly other insns that do not use 'addr_reg' or 'data_reg'>
+
+	stw data_reg,0(addr_reg)
+
+   into:
+
+	pstw data_reg,var@pcrel(0),1
+
+	<possibly other insns that do not use 'addr_reg' or 'data_reg'>
+
+	nop
+
+   If the variable is not defined in the main program or the code using it is
+   not in the main program, the linker put the address in the .got section and
+   do:
+
+		.section .got
+	.Lvar_got:
+		.dword var
+
+		.section .text
+		pld addr_reg,.Lvar_got@pcrel(0),1
+
+		<possibly other insns that do not use 'addr_reg' or 'data_reg'>
+
+		stw data_reg,0(addr_reg)
+
+   We only look for a single usage in the basic block where the external
+   address is loaded.  Multiple uses or references in another basic block will
    force us to not use the PCREL_OPT relocation.  */
 
 #define IN_TARGET_CODE 1
@@ -82,11 +119,11 @@
 #include "insn-codes.h"
 
 \f
-// Maximum number of insns to scan between the load address and the load that
-// uses that address.  This can be bumped up if desired.  If the insns are far
-// enough away, the PCREL_OPT optimization probably does not help, since the
-// load of the external address has probably completed by the time we do the
-// load of the variable at that address.
+// Maximum number of insns to scan between the load address and the load or
+// store that uses that address.  This can be bumped up if desired.  If the
+// insns are far enough away, the PCREL_OPT optimization probably does not
+// help, since the load of the external address has probably completed by the
+// time we do the load or store of the variable at that address.
 const int MAX_PCREL_OPT_INSNS	= 10;
 
 /* Next PCREL_OPT label number.  */
@@ -97,6 +134,8 @@ static struct {
   unsigned long extern_addrs;
   unsigned long loads;
   unsigned long load_separation[MAX_PCREL_OPT_INSNS+1];
+  unsigned long stores;
+  unsigned long store_separation[MAX_PCREL_OPT_INSNS+1];
 } counters;
 
 \f
@@ -306,6 +345,156 @@ do_pcrel_opt_load (rtx_insn *addr_insn,		// insn loading address
 }
 
 \f
+// Optimize a PC-relative load address to be used in a store.
+
+// If the sequence of insns is safe to use the PCREL_OPT optimization (i.e. no
+// additional references to the address register, the address register dies at
+// the load, and no references to the load), convert insns of the form:
+//
+//	(set (reg:DI addr)
+//	     (symbol_ref:DI "ext_symbol"))
+//
+//	...
+//
+//	(set (mem:<MODE> (reg:DI addr))
+//	     (reg:<MODE> value))
+//
+// into:
+//
+//	(parallel [(set (reg:DI addr)
+//                      (unspec:DI [(symbol_ref:DI "ext_symbol")
+//                                  (const_int label_num)]
+//                                 UNSPEC_PCREL_OPT_ST_ADDR))
+//                 (use (reg:<MODE> value))])
+//
+//	...
+//
+//	(parallel [(set (mem:<MODE> (reg:DI addr))
+//                      (unspec:<MODE> [(reg:<MODE>)
+//                                      (const_int label_num)]
+//                                     UNSPEC_PCREL_OPT_ST_RELOC))
+//                 (clobber (reg:DI addr))])
+//
+//
+// The UNSPEC_PCREL_OPT_ST_ADDR insn will generate the load address plus
+// a definition of a label (.Lpcrel<n>), while the UNSPEC_PCREL_OPT_ST_RELOC
+// insn will generate the .reloc to tell the linker to tie the load address and
+// load using that address together.
+//
+//	pld b,ext_symbol@got@pcrel(0),1
+// .Lpcrel1:
+//
+//	...
+//
+//	.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
+//	stw r,0(b)
+//
+// If ext_symbol is defined in another object file in the main program and we
+// are linking the main program, the linker will convert the above instructions
+// to:
+//
+//	pstwz r,ext_symbol@got@pcrel(0),1
+//
+//	...
+//
+//	nop
+//
+// Return the number of insns between the load of the external address and the
+// actual load or 0 if the load of the external address could not be combined
+// with a load with the PCREL_OPT optimization (i.e. if the load of the
+// external address was adjacent to the load that uses that external address, 1
+// would be returned)..
+//
+// Return true if the PCREL_OPT store optimization succeeded.
+
+static bool
+do_pcrel_opt_store (rtx_insn *addr_insn,	// insn loading address
+		    rtx_insn *store_insn)	// insn using address
+{
+  rtx addr_set = PATTERN (addr_insn);
+  rtx addr_reg = SET_DEST (addr_set);
+  rtx addr_symbol = SET_SRC (addr_set);
+  rtx store_set = single_set (store_insn);
+  rtx mem = SET_DEST (store_set);
+  rtx reg = SET_SRC (store_set);
+  machine_mode mem_mode = GET_MODE (mem);
+
+  // If this is LFIWAX or similar instructions that are indexed only, we can't
+  // do the optimization.
+  enum non_prefixed_form non_prefixed = reg_to_non_prefixed (reg, mem_mode);
+  if (non_prefixed == NON_PREFIXED_X)
+    return false;
+
+  // The optimization will only work on non-prefixed offsettable loads.
+  rtx addr = XEXP (mem, 0);
+  enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed);
+  if (iform != INSN_FORM_BASE_REG
+      && iform != INSN_FORM_D
+      && iform != INSN_FORM_DS
+      && iform != INSN_FORM_DQ)
+    return false;
+
+  // Allocate a new PC-relative label, and update the load address insn.
+
+  ++pcrel_opt_next_num;
+  rtx label_num = GEN_INT (pcrel_opt_next_num);
+  rtvec v_addr = gen_rtvec (2, addr_symbol, label_num);
+  rtx addr_unspec = gen_rtx_UNSPEC (Pmode, v_addr,
+				   UNSPEC_PCREL_OPT_ST_ADDR);
+  rtx addr_new_set = gen_rtx_SET (addr_reg, addr_unspec);
+  rtx addr_use = gen_rtx_USE (VOIDmode, reg);
+
+  PATTERN (addr_insn)
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, addr_new_set, addr_use));
+
+  // Revalidate the insn, backing out of the optimization if the insn is not
+  // supported.
+  INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
+  if (INSN_CODE (addr_insn) < 0)
+    {
+      PATTERN (addr_insn) = addr_set;
+      INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
+      return false;
+    }
+
+  // Update the store insn.  Add an explicit clobber of the external address
+  // register just in case something runs after this pass.
+  //
+  // (parallel [(set (mem (addr_reg)
+  //                 (unspec:<MODE> [(reg)
+  //                                 (const_int label_num)]
+  //                                UNSPEC_PCREL_OPT_ST_RELOC))
+  //            (clobber (reg:DI addr_reg))])
+
+  rtvec v_store = gen_rtvec (2, reg, label_num);
+  rtx new_store = gen_rtx_UNSPEC (mem_mode, v_store,
+				  UNSPEC_PCREL_OPT_ST_RELOC);
+
+  rtx old_store_set = PATTERN (store_insn);
+  rtx new_store_set = gen_rtx_SET (mem, new_store);
+  rtx store_clobber = gen_rtx_CLOBBER (VOIDmode, addr_reg);
+
+  PATTERN (store_insn)
+    = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, new_store_set, store_clobber));
+
+  // Revalidate the insn, backing out of the optimization if the insn is not
+  // supported.
+
+  INSN_CODE (store_insn) = recog (PATTERN (store_insn), store_insn, 0);
+  if (INSN_CODE (store_insn) < 0)
+    {
+      PATTERN (addr_insn) = addr_set;
+      INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
+
+      PATTERN (store_insn) = old_store_set;
+      INSN_CODE (store_insn) = recog (PATTERN (store_insn), store_insn, 0);
+      return false;
+    }
+
+  return true;
+}
+
+\f
 /* Given an insn, find the next insn in the basic block.  Stop if we find a the
    end of a basic block, such as a label, call or jump, and return NULL.  */
 
@@ -340,8 +529,8 @@ next_active_insn_in_basic_block (rtx_insn *insn)
 }
 
 \f
-// Validate that a load is actually a single instruction that can be optimized
-// with the PCREL_OPT optimization.
+// Validate that a load or store is actually a single instruction that can be
+// optimized with the PCREL_OPT optimization.
 
 static bool
 is_single_instruction (rtx_insn *insn, rtx reg)
@@ -522,6 +711,36 @@ do_pcrel_opt_addr (rtx_insn *addr_insn)
 	}
     }
 
+  // Optimize stores
+  else if (is_store)
+    {
+      // If there were any loads in the insns between loading the external
+      // address and doing the store, turn off the optimization.
+      if (had_load)
+	return;
+
+      rtx reg = SET_SRC (set);
+      rtx mem = SET_DEST (set);
+      if (!is_single_instruction (insn, reg))
+	return;
+
+      if (!MEM_P (mem))
+	return;
+
+      // If the register being loaded or stored was used or set between the
+      // load of the external address and the load or store using the address,
+      // we can't do the optimization.
+      if (reg_used_between_p (reg, addr_insn, insn)
+	  || reg_set_between_p (reg, addr_insn, insn))
+	return;
+
+      if (do_pcrel_opt_store (addr_insn, insn))
+	{
+	  counters.stores++;
+	  counters.store_separation[num_insns-1]++;
+	}
+    }
+
   return;
 }
 
@@ -544,7 +763,7 @@ do_pcrel_opt_pass (function *fun)
   df_set_flags (DF_DEFER_INSN_RESCAN | DF_LR_RUN_DCE);
 
   // Look at each basic block to see if there is a load of an external
-  // variable's external address, and a single load using that external
+  // variable's external address, and a single load/store using that external
   // address.
   FOR_ALL_BB_FN (bb, fun)
     {
@@ -598,6 +817,30 @@ do_pcrel_opt_pass (function *fun)
 			     counters.load_separation[i]);
 		}
 	    }
+
+	  if (!counters.stores)
+	    fprintf (dump_file,
+		     "No PCREL_OPT store optimizations were done\n");
+
+	  else
+	    {
+	      fprintf (dump_file, "# of PCREL_OPT stores = %lu\n",
+		       counters.stores);
+
+	      fprintf (dump_file, "# of adjacent PCREL_OPT stores = %lu\n",
+		       counters.store_separation[0]);
+
+	      for (int i = 1; i < MAX_PCREL_OPT_INSNS; i++)
+		{
+		  if (counters.store_separation[i])
+		    fprintf (dump_file,
+			     "# of PCREL_OPT stores separated by "
+			     "%d insn%s = %lu\n",
+			     i, (i == 1) ? "" : "s",
+			     counters.store_separation[i]);
+		}
+	    }
+
 	}
 
       fprintf (dump_file, "\n");
diff --git a/gcc/config/rs6000/pcrel-opt.md b/gcc/config/rs6000/pcrel-opt.md
index 00a3bc4..d98c6ce 100644
--- a/gcc/config/rs6000/pcrel-opt.md
+++ b/gcc/config/rs6000/pcrel-opt.md
@@ -84,7 +84,9 @@
 (define_c_enum "unspec"
   [UNSPEC_PCREL_OPT_LD_ADDR
    UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG
-   UNSPEC_PCREL_OPT_LD_RELOC])
+   UNSPEC_PCREL_OPT_LD_RELOC
+   UNSPEC_PCREL_OPT_ST_ADDR
+   UNSPEC_PCREL_OPT_ST_RELOC])
 
 ;; Modes that are supported for PCREL_OPT
 (define_mode_iterator PO [QI HI SI DI TI SF DF KF
@@ -246,3 +248,114 @@ (define_insn "*pcrel_opt_ld<mode>"
   "%r3lxv %x0,%1"
   [(set_attr "type" "vecload")
    (set_attr "isa" "pcrel_opt")])
+
+\f
+;; PCREL_OPT optimization for stores.  We need to put the label after the PLD
+;; instruction, because the assembler might insert a NOP before the PLD for
+;; alignment.
+;;
+;; If we are optimizing a single write, normally the code would look like:
+;;
+;;	(set (reg:DI <ptr>)
+;;	     (symbol_ref:DI "<extern_addr>"))	# <data> must be live here
+;;
+;;	    ...              # insns do not need to be adjacent
+;;
+;;	(set (mem:SI (reg:DI <xxx>))
+;;	     (reg:SI <data>))			# <ptr> dies with this insn
+;;
+;; We optimize this to be:
+;;
+;;	(parallel [(set (reg:DI <ptr>)
+;;	                (unspec:DI [(symbol_ref:DI "<extern_addr>")
+;;	                            (const_int <marker>)]
+;;	                           UNSPEC_PCREL_OPT_ST_ADDR))
+;;	           (use (reg:<MODE> <data>))])
+;;
+;;	    ...              # insns do not need to be adjacent
+;;
+;;	(parallel [(set (mem:<MODE> (reg:DI <ptr>))
+;;	                (unspec:<MODE> [(reg:<MODE> <data>)
+;;	                                (const_int <marker>)]
+;;	                               UNSPEC_PCREL_OPT_ST_RELOC))
+;;	           (clobber (reg:DI <ptr>))])
+
+(define_insn "*pcrel_opt_st_addr<mode>"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=b")
+	(unspec:DI [(match_operand:DI 1 "pcrel_external_address")
+		    (match_operand 2 "const_int_operand" "n")]
+		UNSPEC_PCREL_OPT_ST_ADDR))
+   (use (match_operand:PO 3 "gpc_reg_operand" "rwa"))]
+  "TARGET_PCREL_OPT"
+  "ld %0,%a1\n.Lpcrel%2:"
+  [(set_attr "prefixed" "yes")
+   (set_attr "type" "load")
+   (set_attr "isa" "pcrel_opt")
+   (set_attr "loads_extern_addr" "yes")])
+
+;; Alternate form of the stores that include a marker to identify whether we
+;; can do the PCREL_OPT optimization.
+(define_insn "*pcrel_opt_st<mode>"
+  [(set (match_operand:QHSI 0 "d_form_memory" "=o")
+	(unspec:QHSI [(match_operand:QHSI 1 "gpc_reg_operand" "r")
+		      (match_operand 2 "const_int_operand" "n")]
+		     UNSPEC_PCREL_OPT_ST_RELOC))
+   (clobber (match_operand:DI 3 "base_reg_operand" "=b"))]
+  "TARGET_PCREL_OPT"
+  "%r2st<wd> %1,%0"
+  [(set_attr "type" "store")
+   (set_attr "isa" "pcrel_opt")])
+
+(define_insn "*pcrel_opt_stdi"
+  [(set (match_operand:DI 0 "d_form_memory" "=o,o,o")
+	(unspec:DI [(match_operand:DI 1 "gpc_reg_operand" "r,d,v")
+		    (match_operand 2 "const_int_operand" "n,n,n")]
+		   UNSPEC_PCREL_OPT_ST_RELOC))
+   (clobber (match_operand:DI 3 "base_reg_operand" "=b,b,b"))]
+  "TARGET_PCREL_OPT && TARGET_POWERPC64"
+  "@
+   %r2std %1,%0
+   %r2stfd %1,%0
+   %r2stxsd %1,%0"
+  [(set_attr "type" "store,fpstore,fpstore")
+   (set_attr "isa" "pcrel_opt")])
+
+(define_insn "*pcrel_opt_stsf"
+  [(set (match_operand:SF 0 "d_form_memory" "=o,o,o")
+	(unspec:SF [(match_operand:SF 1 "gpc_reg_operand" "d,v,r")
+		    (match_operand 2 "const_int_operand" "n,n,n")]
+		   UNSPEC_PCREL_OPT_ST_RELOC))
+   (clobber (match_operand:DI 3 "base_reg_operand" "=b,b,b"))]
+  "TARGET_PCREL_OPT"
+  "@
+   %r2stfs %1,%0
+   %r2stxssp %1,%0
+   %r2stw %1,%0"
+  [(set_attr "type" "fpstore,fpstore,store")
+   (set_attr "isa" "pcrel_opt")])
+
+(define_insn "*pcrel_opt_stdf"
+  [(set (match_operand:DF 0 "d_form_memory" "=o,o,o")
+	(unspec:DF [(match_operand:DF 1 "gpc_reg_operand" "d,v,r")
+		    (match_operand 2 "const_int_operand" "n,n,n")]
+		   UNSPEC_PCREL_OPT_ST_RELOC))
+   (clobber (match_operand:DI 3 "base_reg_operand" "=b,b,b"))]
+  "TARGET_PCREL_OPT
+   && (TARGET_POWERPC64 || vsx_register_operand (operands[1], DFmode))"
+  "@
+   %r2stfd %1,%0
+   %r2stxsd %1,%0
+   %r2std %1,%0"
+  [(set_attr "type" "fpstore,fpstore,store")
+   (set_attr "isa" "pcrel_opt")])
+
+(define_insn "*pcrel_opt_st<mode>"
+  [(set (match_operand:PO_VECT 0 "d_form_memory" "=o")
+	(unspec:PO_VECT [(match_operand:PO_VECT 1 "gpc_reg_operand" "wa")
+		     (match_operand 2 "const_int_operand" "n")]
+		    UNSPEC_PCREL_OPT_ST_RELOC))
+   (clobber (match_operand:DI 3 "base_reg_operand" "=b"))]
+  "TARGET_PCREL_OPT"
+  "%r2stxv %x1,%0"
+  [(set_attr "type" "vecstore")
+   (set_attr "isa" "pcrel_opt")])
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6877de5..9ec346c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8525,7 +8525,8 @@ rs6000_delegitimize_address (rtx orig_x)
   if (GET_CODE (orig_x) == UNSPEC
       && (XINT (orig_x, 1) == UNSPEC_FUSION_GPR
 	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR
-	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG))
+	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG
+	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_ST_ADDR))
     orig_x = XVECEXP (orig_x, 0, 0);
 
   orig_x = delegitimize_mem_from_attrs (orig_x);
-- 
1.8.3.1


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] Power10: Add tests for PCREL_OPT support.
  2020-08-18  6:31 [PATCH 0/3] Power10 PCREL_OPT support Michael Meissner
  2020-08-18  6:34 ` [PATCH 1/3] Power10: Add PCREL_OPT load support Michael Meissner
  2020-08-18  6:35 ` [PATCH 2/3] Power10: Add PCREL_OPT store support Michael Meissner
@ 2020-08-18  6:37 ` Michael Meissner
  2020-08-20 23:33 ` [PATCH 0/3] Power10 " Segher Boessenkool
  3 siblings, 0 replies; 12+ messages in thread
From: Michael Meissner @ 2020-08-18  6:37 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Alan Modra

[PATCH 3/3] Power10: Add tests for PCREL_OPT support.

These are the tests for the PCREL_OPT support.

gcc/testsuite/
2020-08-18  Michael Meissner  <meissner@linux.ibm.com>

	* gcc.target/powerpc/pcrel-opt-inc-di.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-ld-df.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-ld-di.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-ld-hi.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-ld-qi.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-ld-sf.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-ld-si.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-ld-vector.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-st-df.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-st-di.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-st-hi.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-st-qi.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-st-sf.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-st-si.c: New PCREL_OPT test.
	* gcc.target/powerpc/pcrel-opt-st-vector.c: New PCREL_OPT test.
---
 .../gcc.target/powerpc/pcrel-opt-inc-di.c          | 18 +++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c | 36 ++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c | 43 ++++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c | 42 +++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c | 42 +++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c | 42 +++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c | 41 +++++++++++++++++++++
 .../gcc.target/powerpc/pcrel-opt-ld-vector.c       | 36 ++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c | 36 ++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c | 43 ++++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c | 42 +++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c | 42 +++++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c | 36 ++++++++++++++++++
 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c | 41 +++++++++++++++++++++
 .../gcc.target/powerpc/pcrel-opt-st-vector.c       | 36 ++++++++++++++++++
 15 files changed, 576 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-vector.c

diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
new file mode 100644
index 0000000..f165068
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	unsigned int
+
+/* Test whether using an external variable twice (doing an increment) prevents
+   the PCREL_OPT optimization.  */
+extern TYPE ext;
+
+void
+inc (void)
+{
+  ext++;		/* No PCREL_OPT (uses address twice).  */
+}
+
+/* { dg-final { scan-assembler-not "R_PPC64_PCREL_OPT" } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
new file mode 100644
index 0000000..d35862f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	double
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for
+   double.  */
+extern TYPE ext[];
+
+TYPE
+get (void)
+{
+  return ext[0];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get2 (void)
+{
+  return ext[2];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get_large (void)
+{
+  return ext[LARGE];		/* No PCREL_OPT (load is  prefixed).  */
+}
+
+TYPE
+get_variable (unsigned long n)
+{
+  return ext[n];		/* No PCREL_OPT (load is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
new file mode 100644
index 0000000..12b51ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	long long
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for long
+   long.  */
+extern TYPE ext[];
+
+TYPE
+get (void)
+{
+  return ext[0];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get2 (void)
+{
+  return ext[2];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get_large (void)
+{
+  return ext[LARGE];		/* No PCREL_OPT (load is  prefixed).  */
+}
+
+TYPE
+get_variable (unsigned long n)
+{
+  return ext[n];		/* No PCREL_OPT (load is indexed).  */
+}
+
+double
+get_double (void)
+{
+  return (double) ext[0];	/* PCREL_OPT relocation.  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c
new file mode 100644
index 0000000..4143aeb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	unsigned short
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for unsigned
+   short.  */
+extern TYPE ext[];
+
+TYPE
+get (void)
+{
+  return ext[0];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get2 (void)
+{
+  return ext[2];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get_large (void)
+{
+  return ext[LARGE];		/* No PCREL_OPT (load is  prefixed).  */
+}
+
+TYPE
+get_variable (unsigned long n)
+{
+  return ext[n];		/* No PCREL_OPT (load is indexed).  */
+}
+
+double
+get_double (void)
+{
+  return (double) ext[0];	/* No PCREL_OPT (LXSIHZX is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c
new file mode 100644
index 0000000..30d3236
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	unsigned char
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for unsigned
+   char.  */
+extern TYPE ext[];
+
+TYPE
+get (void)
+{
+  return ext[0];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get2 (void)
+{
+  return ext[2];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get_large (void)
+{
+  return ext[LARGE];		/* No PCREL_OPT (load is  prefixed).  */
+}
+
+TYPE
+get_variable (unsigned long n)
+{
+  return ext[n];		/* No PCREL_OPT (load is indexed).  */
+}
+
+double
+get_double (void)
+{
+  return (double) ext[0];	/* No PCREL_OPT (LXSIBZX is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c
new file mode 100644
index 0000000..9d1e2a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	float
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for
+   float.  */
+extern TYPE ext[];
+
+TYPE
+get (void)
+{
+  return ext[0];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get2 (void)
+{
+  return ext[2];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get_large (void)
+{
+  return ext[LARGE];		/* No PCREL_OPT (load is  prefixed).  */
+}
+
+TYPE
+get_variable (unsigned long n)
+{
+  return ext[n];		/* No PCREL_OPT (load is indexed).  */
+}
+
+double
+get_double (void)
+{
+  return (double) ext[0];	/* PCREL_OPT relocation.  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c
new file mode 100644
index 0000000..17be6fa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	int
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for int.  */
+extern TYPE ext[];
+
+TYPE
+get (void)
+{
+  return ext[0];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get2 (void)
+{
+  return ext[2];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get_large (void)
+{
+  return ext[LARGE];		/* No PCREL_OPT (load is  prefixed).  */
+}
+
+TYPE
+get_variable (unsigned long n)
+{
+  return ext[n];		/* No PCREL_OPT (load is indexed).  */
+}
+
+double
+get_double (void)
+{
+  return (double) ext[0];	/* No PCREL_OPT (LFIWAX is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c
new file mode 100644
index 0000000..8c12aea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	vector double
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for
+   vector double.  */
+extern TYPE ext[];
+
+TYPE
+get (void)
+{
+  return ext[0];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get2 (void)
+{
+  return ext[2];		/* PCREL_OPT relocation.  */
+}
+
+TYPE
+get_large (void)
+{
+  return ext[LARGE];		/* No PCREL_OPT (load is  prefixed).  */
+}
+
+TYPE
+get_variable (unsigned long n)
+{
+  return ext[n];		/* No PCREL_OPT (load is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c
new file mode 100644
index 0000000..d795d35
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	double
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for
+   double.  */
+extern TYPE ext[];
+
+void
+store (TYPE a)
+{
+  ext[0] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store2 (TYPE a)
+{
+  ext[2] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store_large (TYPE a)
+{
+  ext[LARGE] = a;		/* No PCREL_OPT (store is prefixed).  */
+}
+
+void
+store_variable (TYPE a, unsigned long n)
+{
+  ext[n] = a;			/* No PCREL_OPT (store is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c
new file mode 100644
index 0000000..a065d83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	long long
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for long
+   long.  */
+extern TYPE ext[];
+
+void
+store (TYPE a)
+{
+  ext[0] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store2 (TYPE a)
+{
+  ext[2] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store_large (TYPE a)
+{
+  ext[LARGE] = a;		/* No PCREL_OPT (store is prefixed).  */
+}
+
+void
+store_variable (TYPE a, unsigned long n)
+{
+  ext[n] = a;			/* No PCREL_OPT (store is indexed).  */
+}
+
+void
+store_double (double a)
+{
+  ext[0] = (TYPE) a;		/* PCREL_OPT relocation.  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c
new file mode 100644
index 0000000..8822e76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	unsigned short
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for unsigned
+   short.  */
+extern TYPE ext[];
+
+void
+store (TYPE a)
+{
+  ext[0] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store2 (TYPE a)
+{
+  ext[2] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store_large (TYPE a)
+{
+  ext[LARGE] = a;		/* No PCREL_OPT (store is prefixed).  */
+}
+
+void
+store_variable (TYPE a, unsigned long n)
+{
+  ext[n] = a;			/* No PCREL_OPT (store is indexed).  */
+}
+
+void
+store_double (double a)
+{
+  ext[0] = (TYPE) a;		/* No PCREL_OPT (STXIHZX is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c
new file mode 100644
index 0000000..2f75683
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	unsigned char
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for unsigned
+   char.  */
+extern TYPE ext[];
+
+void
+store (TYPE a)
+{
+  ext[0] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store2 (TYPE a)
+{
+  ext[2] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store_large (TYPE a)
+{
+  ext[LARGE] = a;		/* No PCREL_OPT (store is prefixed).  */
+}
+
+void
+store_variable (TYPE a, unsigned long n)
+{
+  ext[n] = a;			/* No PCREL_OPT (store is indexed).  */
+}
+
+void
+store_double (double a)
+{
+  ext[0] = (TYPE) a;		/* No PCREL_OPT (STXIBZX is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c
new file mode 100644
index 0000000..3dd88aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	float
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for
+   float.  */
+extern TYPE ext[];
+
+void
+store (TYPE a)
+{
+  ext[0] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store2 (TYPE a)
+{
+  ext[2] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store_large (TYPE a)
+{
+  ext[LARGE] = a;		/* No PCREL_OPT (store is prefixed).  */
+}
+
+void
+store_variable (TYPE a, unsigned long n)
+{
+  ext[n] = a;			/* No PCREL_OPT (store is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c
new file mode 100644
index 0000000..78dc812
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	int
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for int.  */
+extern TYPE ext[];
+
+void
+store (TYPE a)
+{
+  ext[0] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store2 (TYPE a)
+{
+  ext[2] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store_large (TYPE a)
+{
+  ext[LARGE] = a;		/* No PCREL_OPT (store is prefixed).  */
+}
+
+void
+store_variable (TYPE a, unsigned long n)
+{
+  ext[n] = a;			/* No PCREL_OPT (store is indexed).  */
+}
+
+void
+store_double (double a)
+{
+  ext[0] = (TYPE) a;		/* No PCREL_OPT (STFIWX is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-vector.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-vector.c
new file mode 100644
index 0000000..2c602eb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-vector.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_pcrel } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10" } */
+
+#define TYPE	vector double
+#define LARGE	0x20000
+
+/* Test whether we get the right number of PCREL_OPT optimizations for
+   vector double.  */
+extern TYPE ext[];
+
+void
+store (TYPE a)
+{
+  ext[0] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store2 (TYPE a)
+{
+  ext[2] = a;			/* PCREL_OPT relocation.  */
+}
+
+void
+store_large (TYPE a)
+{
+  ext[LARGE] = a;		/* No PCREL_OPT (store is prefixed).  */
+}
+
+void
+store_variable (TYPE a, unsigned long n)
+{
+  ext[n] = a;			/* No PCREL_OPT (store is indexed).  */
+}
+
+/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT"  2 } } */
-- 
1.8.3.1


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Power10 PCREL_OPT support
  2020-08-18  6:31 [PATCH 0/3] Power10 PCREL_OPT support Michael Meissner
                   ` (2 preceding siblings ...)
  2020-08-18  6:37 ` [PATCH 3/3] Power10: Add tests for PCREL_OPT support Michael Meissner
@ 2020-08-20 23:33 ` Segher Boessenkool
  2020-08-23  0:05   ` Bill Schmidt
  2020-08-25  3:48   ` Michael Meissner
  3 siblings, 2 replies; 12+ messages in thread
From: Segher Boessenkool @ 2020-08-20 23:33 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Alan Modra

Hi!

On Tue, Aug 18, 2020 at 02:31:41AM -0400, Michael Meissner wrote:
> Currently on power10, the compiler compiles this as:
> 
> 	ret_var:
> 	        pld 9,ext_variable@got@pcrel
> 		lwa 3,0(9)
> 	        blr
> 
> 	store_var:
> 		pld 9,ext_variable@got@pcrel
> 		stw 3,0(9)
> 		blr
> 
> That is, it loads up the address of 'ext_variable' from the GOT table into
> register r9, and then uses r9 as a base register to reference the actual
> variable.
> 
> The linker does optimize the case where you are compiling the main program, and
> the variable is also defined in the main program to be:
> 
> 	ret_var:
> 		pla	9,ext_variable,1
> 		lwa	3,0(9)
> 		blr
> 
> 	store_var:
> 		pla	9,ext_variable,1
> 		stw	3,0(9)
> 		blr

Those "pla" insns are invalid; please correct them?  (You mixed "pla"
and "paddi" syntax I think.)

> These patches generate:
> 
> 	ret_var:
> 	        pld	9,ext_variable@got@pcrel
> 	.Lpcrel1:
> 		.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
> 	        lwa	3,0(9)
> 		blr
> 
> 	store_var:
> 	        pld	9,ext_variable@got@pcrel
> 	.Lpcrel2:
> 		.reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
> 	        stw	3,0(9)
> 		blr
> 
> Note, the label for locating the PLD occurs after the PLD and not before it.
> This is so that if the assembler adds a NOP in front of the PLD to align it,
> the relocations will still work.
> 
> If the linker can, it will convert the code into:
> 
> 	ret_var:
> 		plwa	3,ext_variable,1
> 		nop
> 		blr
> 
> 	store_var:
> 		pstw	3,ext_variable,1
> 		nop
> 		blr

Those "plwa" and "pstw" are invalid syntax as well (should have "(0)"
after the symbol name).

> These patches allow the load of the address to not be physically adjacent to
> the actual load or store, which should allow for better code.

Why is that?  That is not what it does anyway?  /confused

> In order to do this, the pass that converts the load address and load/store
> must occur late in the compilation cycle.

That does not follow afaics.

> In particular, the second scheduler
> pass will duplicate and optimize some of the references and it will produce an
> invalid program.  In the past, Segher has said that we should be able to move
> it earlier.

I said that you shouldn't require this to be the very last pass.  There
is no reason for that, and that will not scale (what if a second pass
shows up that also requires this!)

It also makes it impossible to do normal late optimisations on code
produced here (optimisations like peephole, cprop_hardreg, dce).

I also said that you should use the DF framework, not parse all RTL by
hand and getting it all wrong, as *everyone* does: this stuff is hard.


Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] Power10: Add PCREL_OPT load support
  2020-08-18  6:34 ` [PATCH 1/3] Power10: Add PCREL_OPT load support Michael Meissner
@ 2020-08-21  2:09   ` Segher Boessenkool
  2020-09-03 17:24     ` Michael Meissner
  0 siblings, 1 reply; 12+ messages in thread
From: Segher Boessenkool @ 2020-08-21  2:09 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Alan Modra

Hi!

On Tue, Aug 18, 2020 at 02:34:01AM -0400, Michael Meissner wrote:
> +// Maximum number of insns to scan between the load address and the load that

Please don't mix /* and // comments.  Just stick to /* comments, like
all the rest of our backend?

> +const int MAX_PCREL_OPT_INSNS	= 10;

"static const", and not tab please (just a space).

> +  // LWA is a DS format instruction, but LWZ is a D format instruction.  We use
> +  // DImode for the mode to force checking whether the bottom 2 bits are 0.

How so?  Can't you just check if the resulting insn is accepted? That
would be so much more robust (and can have a correct description more
easily, too!)

> +  // However FPR and vector registers uses the LFIWAX instruction which is
> +  // indexed only.

(vectors use lxsiwax)

> +  if (GET_CODE (mem) == SIGN_EXTEND && GET_MODE (XEXP (mem, 0)) == SImode)

You're checking here whether the address of the MEM is SImode.

> +    {
> +      if (!INT_REGNO_P (reg_regno))

That should use base_reg_operand instead?

> +  // The optimization will only work on non-prefixed offsettable loads.
> +  rtx addr = XEXP (mem_inner, 0);
> +  enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed);
> +  if (iform != INSN_FORM_BASE_REG
> +      && iform != INSN_FORM_D
> +      && iform != INSN_FORM_DS
> +      && iform != INSN_FORM_DQ)
> +    return false;

Sounds like something for a utility function?  Do we use this elsewhere?

> +  ++pcrel_opt_next_num;

Normal style is postfix increment.  Empty lines around this are nice, it
indicates this is the point where we decided to do the thing.  Or add a
comment even, maybe?

> +  unsigned int addr_regno = reg_or_subregno (addr_reg);
> +  rtx label_num = GEN_INT (pcrel_opt_next_num);
> +  rtx reg_di = gen_rtx_REG (DImode, reg_regno);
> +
> +  PATTERN (addr_insn)
> +    = ((addr_regno != reg_regno)
> +       ? gen_pcrel_opt_ld_addr (addr_reg, addr_symbol, label_num, reg_di)
> +       : gen_pcrel_opt_ld_addr_same_reg (addr_reg, addr_symbol, label_num));

Please use if() instead of multi-line expressions.  The outer parens are
unnecessary, too.

> +  // Revalidate the insn, backing out of the optimization if the insn is not
> +  // supported.

So the count will be incorrect then?

> +  INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
> +  if (INSN_CODE (addr_insn) < 0)
> +    {
> +      PATTERN (addr_insn) = addr_set;
> +      INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
> +      return false;
> +    }

Can you use the normal undo mechanisms, instead?  apply_change_group
etc.

> +static rtx_insn *
> +next_active_insn_in_basic_block (rtx_insn *insn)
> +{
> +  insn = NEXT_INSN (insn);
> +
> +  while (insn != NULL_RTX)
> +    {
> +      /* If the basic block ends or there is a jump of some kind, exit the
> +	 loop.  */
> +      if (CALL_P (insn)
> +	  || JUMP_P (insn)
> +	  || JUMP_TABLE_DATA_P (insn)
> +	  || LABEL_P (insn)
> +	  || BARRIER_P (insn))
> +	return NULL;
> +
> +      /* If this is a real insn, return it.  */
> +      if (!insn->deleted ()
> +	  && NONJUMP_INSN_P (insn)
> +	  && GET_CODE (PATTERN (insn)) != USE
> +	  && GET_CODE (PATTERN (insn)) != CLOBBER)
> +	return insn;
> +
> +      /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs.  */
> +      insn = NEXT_INSN (insn);
> +    }
> +
> +  return NULL;
> +}

There are things to do this.  Please don't write code manually parsing
RTL unless you have to.

> +// Validate that a load is actually a single instruction that can be optimized
> +// with the PCREL_OPT optimization.
> +
> +static bool
> +is_single_instruction (rtx_insn *insn, rtx reg)

Of course it is a single instruction!  A single RTL instruction...
Clarify as "single machine instruction"?

> +{
> +  if (!REG_P (reg) && !SUBREG_P (reg))
> +    return false;

You need to check if is a subreg of reg, then.  There are subregs of
other things.  Not that you should see those here, but still.

> +  // _Decimal128 and IBM extended double are always multiple instructions.
> +  machine_mode mode = GET_MODE (reg);
> +  if (mode == TFmode && !TARGET_IEEEQUAD)
> +    return false;
> +
> +  if (mode == TDmode || mode == IFmode)
> +    return false;

Don't we have a helper for this?  The ibm128 part at least.

Maybe we should just have an attribute on the insns that *can* get this
optimisation?

> +  return (base_reg_operand (XEXP (addr, 0), Pmode)
> +	  && satisfies_constraint_I (XEXP (addr, 1)));

short_cint_operand.  But maybe just rs6000_legitimate_offset_address_p?

>  /* Flags that need to be turned off if -mno-power10.  */
>  #define OTHER_POWER10_MASKS	(OPTION_MASK_MMA			\
>  				 | OPTION_MASK_PCREL			\
> +				 | OPTION_MASK_PCREL_OPT		\
>  				 | OPTION_MASK_PREFIXED)

This isn't a CPU flag.  Instead, it is just an optimisation option.

> @@ -8515,7 +8522,10 @@ rs6000_delegitimize_address (rtx orig_x)
>  {
>    rtx x, y, offset;
>  
> -  if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR)
> +  if (GET_CODE (orig_x) == UNSPEC
> +      && (XINT (orig_x, 1) == UNSPEC_FUSION_GPR
> +	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR
> +	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG))
>      orig_x = XVECEXP (orig_x, 0, 0);
>  
>    orig_x = delegitimize_mem_from_attrs (orig_x);

Why this?  It needs an explanation (in the code).

> @@ -13197,6 +13207,19 @@ print_operand (FILE *file, rtx x, int code)
>  	fprintf (file, "%d", 128 >> (REGNO (x) - CR0_REGNO));
>        return;
>  
> +    case 'r':
> +      /* X is a label number for the PCREL_OPT optimization.  Emit the .reloc
> +	 to enable this optimization, unless the value is 0.  */
> +      gcc_assert (CONST_INT_P (x));
> +      if (UINTVAL (x) != 0)
> +	{
> +	  unsigned int label_num = UINTVAL (x);
> +	  fprintf (file,
> +		   ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t",
> +		   label_num, label_num);
> +	}
> +      return;

Don't eat output modifiers please.  We have only so few left, and we
cannot recycle any more without pain.

I don't see why we cannot just do this in the normal output (C) code of
the one or few insns that want this?

>  ;; The ISA we implement.
> -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
> +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,pcrel_opt"
>    (const_string "any"))

No.  Please read the heading.


Segher

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Power10 PCREL_OPT support
  2020-08-20 23:33 ` [PATCH 0/3] Power10 " Segher Boessenkool
@ 2020-08-23  0:05   ` Bill Schmidt
  2020-08-25  4:01     ` Michael Meissner
  2020-08-25  3:48   ` Michael Meissner
  1 sibling, 1 reply; 12+ messages in thread
From: Bill Schmidt @ 2020-08-23  0:05 UTC (permalink / raw)
  To: Segher Boessenkool, Michael Meissner, gcc-patches,
	David Edelsohn, Peter Bergner, Alan Modra

On 8/20/20 6:33 PM, Segher Boessenkool wrote:
> Hi!
>
> On Tue, Aug 18, 2020 at 02:31:41AM -0400, Michael Meissner wrote:
>
>> In order to do this, the pass that converts the load address and load/store
>> must occur late in the compilation cycle.
> That does not follow afaics.
>
Let me see if I can help explain this.

I think the issue is that this optimization creates a dependency that 
isn't directly represented in RTL.  We either have to figure out how to 
represent it, or we have to do this very late to avoid problems.

Suppose we are at a point where hard registers have been assigned, and 
the RTL looks like:

     addi  r5,r3,4
     sldi  r6,r5,2
     pld  r10,symbol@got@pcrel
     lwz  r5,0(r10)

Everything is fine for the optimization to take place, since the two 
instructions are adjacent and therefore we can't have any problems with 
r10 being redefined in between, or r5 being used. So we stick on the 
relocation telling the linker to change this if resolved during static 
link time to:

     addi  r5,r3,4
     sldi  r6,r5,2
     plwz  r5,symbol@pcrel
     nop

Now, suppose after we insert the relocation we get a reordering of 
instructions such as

     addi  r5,r3,4
     pld  r10,symbol@got@pcrel
     sldi  r6,r5,2
     lwz  r5,0(r10)

When the linker performs the replacement, we will now end up with

     addi  r5,r3,4
     plwz  r5,symbol@pcrel
     sldi  r6,r5,2
     nop

which has altered the semantics of the program.

What is necessary in order to allow this optimization to occur earlier 
is to make this hidden dependency explicit.  When the relocation is 
inserted, we have to change the "pld" instruction to have a specific 
clobber of (in this case) r5, which represents what will happen if the 
linker makes the substitution.

I agree that it's too fragile to force this to be the last pass, so I 
think if Mike can look into introducing a clobber of the hard register 
when performing the optimization, that would at least allow us to move 
this anywhere after reload.

I don't immediately see a solution that works prior to register 
allocation because we basically are representing two potential starting 
points of a live range, only one of which will survive in the final 
code.  That is too ugly a problem to hand to the register allocator.

Thanks,
Bill


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Power10 PCREL_OPT support
  2020-08-20 23:33 ` [PATCH 0/3] Power10 " Segher Boessenkool
  2020-08-23  0:05   ` Bill Schmidt
@ 2020-08-25  3:48   ` Michael Meissner
  2020-08-25  4:09     ` Michael Meissner
  1 sibling, 1 reply; 12+ messages in thread
From: Michael Meissner @ 2020-08-25  3:48 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Alan Modra

On Thu, Aug 20, 2020 at 06:33:29PM -0500, Segher Boessenkool wrote:
> > These patches allow the load of the address to not be physically adjacent to
> > the actual load or store, which should allow for better code.
> 
> Why is that?  That is not what it does anyway?  /confused

It does allow that.  Perhaps I'm not being clear.

Consider this example:

	extern int a, b, c;

	int sum (void)
	{
	  return a + b + c;
	}

With my patches it generates:

	sum:
		pld 8,a@got@pcrel
	.Lpcrel1:

		pld 10,b@got@pcrel
	.Lpcrel2:

		pld 9,c@got@pcrel
	.Lpcrel3:

		.reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8)
		lwz 3,0(8)

		.reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8)
		lwz 10,0(10)

		.reloc .Lpcrel3-8,R_PPC64_PCREL_OPT,.-(.Lpcrel3-8)
		lwz 9,0(9)

		add 3,3,10
		add 3,3,9
		extsw 3,3
		blr


Thus it separates the load of the 3 external addresses from the actual LWZ used
to load the values.

For example, in a recent Spec 2017 build for power10, over all of the benchmarks:

    Total PCREL_OPTs found for load/store:			41,440
    Times the PLD/PLA was separated from the load/store:	17,893
    Times the PLD/PLA was adjacent to the load/store:		23,547

    Number of PCREL_OPT loads:					38,657
    Number of PCREL_OPT loads separated from the PLD:		15,768
    Number of PCREL_OPT loads adjancent to the PLD:		22,889

    Number of PCREL_OPT stores:					 2,783
    Number of PCREL_OPT stores separated from the PLD:		 2,125
    Number of PCREL_OPT stores adjancent to the PLD:		   658

Where it wins is if the external variable is in a shared library.  There the
PLD is in fact a load, and having some separation from the dependent load/store
helps.


> > In order to do this, the pass that converts the load address and load/store
> > must occur late in the compilation cycle.
> 
> That does not follow afaics.
> 
> > In particular, the second scheduler
> > pass will duplicate and optimize some of the references and it will produce an
> > invalid program.  In the past, Segher has said that we should be able to move
> > it earlier.
> 
> I said that you shouldn't require this to be the very last pass.  There
> is no reason for that, and that will not scale (what if a second pass
> shows up that also requires this!)

The patches I submitted don't require it to be the 'last' pass.  In fact, I put
it after sched2 because earlier versions of the patch could not be moved
earlier.  There are 11 passes after sched2 before final.

However, it turns out that in the last spin of the patches, I added the
necessary clobbers and such, so it can now go any where after register
allocation.  I built 3 versions of the compiler:

    The first version had the pass after sched2 (version in patches);
    The second version had the pass before sched2; (and)
    The third version had the pass immediately after reload.

I built Spec 2017 with the two compilers.  Unlike before, there were no linker
failures.  I also wrote a perl script to verify that each PCREL_OPT relocation
only targeted one PLD/PLA with one load or store.

> It also makes it impossible to do normal late optimisations on code
> produced here (optimisations like peephole, cprop_hardreg, dce).

Now, it can do those optimizations.

> I also said that you should use the DF framework, not parse all RTL by
> hand and getting it all wrong, as *everyone* does: this stuff is hard.

Bill has said he would look into helping convert it to DF format.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Power10 PCREL_OPT support
  2020-08-23  0:05   ` Bill Schmidt
@ 2020-08-25  4:01     ` Michael Meissner
  2020-08-25  4:24       ` Bill Schmidt
  0 siblings, 1 reply; 12+ messages in thread
From: Michael Meissner @ 2020-08-25  4:01 UTC (permalink / raw)
  To: Bill Schmidt
  Cc: Segher Boessenkool, Michael Meissner, gcc-patches,
	David Edelsohn, Peter Bergner, Alan Modra

On Sat, Aug 22, 2020 at 07:05:51PM -0500, Bill Schmidt wrote:
> What is necessary in order to allow this optimization to occur
> earlier is to make this hidden dependency explicit.  When the
> relocation is inserted, we have to change the "pld" instruction to
> have a specific clobber of (in this case) r5, which represents what
> will happen if the linker makes the substitution.
> 
> I agree that it's too fragile to force this to be the last pass, so
> I think if Mike can look into introducing a clobber of the hard
> register when performing the optimization, that would at least allow
> us to move this anywhere after reload.
> 
> I don't immediately see a solution that works prior to register
> allocation because we basically are representing two potential
> starting points of a live range, only one of which will survive in
> the final code.  That is too ugly a problem to hand to the register
> allocator.

As I said in a private message, I have the appropriate clobbers and such
already.

Here is the program I used in my previous reply to Segher:

	extern int a, b, c;

	int sum (void)
	{
	  return a + b + c;
	}


Here is the RTL before the PCREL_OPT pass from sched2:

	;; Load the address of a into r8
	(insn:TI 5 13 6 2 (set (reg/f:DI 8 8 [123])
		(symbol_ref:DI ("a") [flags 0xc0]  <var_decl 0x7ff100832480 a>)) "foo02.c":5:12 722 {*pcrel_extern_addr}
	     (expr_list:REG_EQUIV (symbol_ref:DI ("a") [flags 0xc0]  <var_decl 0x7ff100832480 a>)
		(nil)))

	;; Load the address of b into r10
	(insn 6 5 10 2 (set (reg/f:DI 10 10 [124])
		(symbol_ref:DI ("b") [flags 0xc0]  <var_decl 0x7ff100832510 b>)) "foo02.c":5:12 722 {*pcrel_extern_addr}
	     (expr_list:REG_EQUIV (symbol_ref:DI ("b") [flags 0xc0]  <var_decl 0x7ff100832510 b>)
		(nil)))

	;; Load the address of c into r9
	(insn 10 6 7 2 (set (reg/f:DI 9 9 [128])
		(symbol_ref:DI ("c") [flags 0xc0]  <var_decl 0x7ff1008325a0 c>)) "foo02.c":5:16 722 {*pcrel_extern_addr}
	     (expr_list:REG_EQUIV (symbol_ref:DI ("c") [flags 0xc0]  <var_decl 0x7ff1008325a0 c>)
		(nil)))

	;; Load a's value into r3, using r8 as the base register
	(insn:TI 7 10 8 2 (set (reg:DI 3 3)
		(zero_extend:DI (mem/c:SI (reg/f:DI 8 8 [123]) [1 a+0 S4 A32]))) "foo02.c":5:12 16 {zero_extendsidi2}
	     (expr_list:REG_DEAD (reg/f:DI 8 8 [123])
		(nil)))

	;; Load b's value into r10, using r10 as the base register
	(insn 8 7 11 2 (set (reg:DI 10 10)
		(zero_extend:DI (mem/c:SI (reg/f:DI 10 10 [124]) [1 b+0 S4 A32]))) "foo02.c":5:12 16 {zero_extendsidi2}
	     (nil))

	;; Load c's value into r9, using r9 as the base register
	(insn 11 8 9 2 (set (reg:DI 9 9)
		(zero_extend:DI (mem/c:SI (reg/f:DI 9 9 [128]) [1 c+0 S4 A32]))) "foo02.c":5:16 16 {zero_extendsidi2}
	     (nil))

	;; Add a+b
	(insn:TI 9 11 12 2 (set (reg:SI 3 3 [125])
		(plus:SI (reg:SI 3 3 [orig:126 a ] [126])
		    (reg:SI 10 10 [orig:127 b ] [127]))) "foo02.c":5:12 65 {*addsi3}
	     (expr_list:REG_DEAD (reg:SI 10 10 [orig:127 b ] [127])
		(nil)))

	;; Add (a+b)+c
	(insn:TI 12 9 18 2 (set (reg:SI 3 3 [122])
		(plus:SI (reg:SI 3 3 [125])
		    (reg:SI 9 9 [orig:129 c ] [129]))) "foo02.c":5:16 65 {*addsi3}
	     (expr_list:REG_DEAD (reg:SI 9 9 [orig:129 c ] [129])
		(nil)))

	;; Sign extend
	(insn:TI 18 12 19 2 (set (reg/i:DI 3 3)
		(sign_extend:DI (reg:SI 3 3 [122]))) "foo02.c":6:1 31 {extendsidi2}
	     (nil))

	;; Return
	(insn 19 18 29 2 (use (reg/i:DI 3 3)) "foo02.c":6:1 -1
	     (nil))
	(note 29 19 25 2 NOTE_INSN_EPILOGUE_BEG)
	(jump_insn 25 29 26 2 (simple_return) "foo02.c":6:1 866 {simple_return}
	     (nil)
	 -> simple_return)


And here is the RTL after the PCREL_OPT:

	;; Load of address a into r8, a will be loaded into r3
	(insn:TI 5 13 6 2 (parallel [
		    (set (reg/f:DI 8 8 [123])
			(unspec:DI [
				(symbol_ref:DI ("a") [flags 0xc0]  <var_decl 0x7ff100832480 a>)
				(const_int 1 [0x1])
			    ] UNSPEC_PCREL_OPT_LD_ADDR))
		    (set (reg:DI 3 3)
			(unspec:DI [
				(const_int 0 [0])
			    ] UNSPEC_PCREL_OPT_LD_ADDR))
		]) "foo02.c":5:12 2198 {pcrel_opt_ld_addr}
	     (expr_list:REG_EQUIV (symbol_ref:DI ("a") [flags 0xc0]  <var_decl 0x7ff100832480 a>)
		(nil)))

	;; Load of address b into r10, which will be the same register b's value is loaded into
	(insn 6 5 10 2 (set (reg/f:DI 10 10 [124])
		(unspec:DI [
			(symbol_ref:DI ("b") [flags 0xc0]  <var_decl 0x7ff100832510 b>)
			(const_int 2 [0x2])
		    ] UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) "foo02.c":5:12 2199 {pcrel_opt_ld_addr_same_reg}
	     (expr_list:REG_EQUIV (symbol_ref:DI ("b") [flags 0xc0]  <var_decl 0x7ff100832510 b>)
		(nil)))

	;; Load of address c into r9, which will be the same register c's value is loaded into
	(insn 10 6 7 2 (set (reg/f:DI 9 9 [128])
		(unspec:DI [
			(symbol_ref:DI ("c") [flags 0xc0]  <var_decl 0x7ff1008325a0 c>)
			(const_int 3 [0x3])
		    ] UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) "foo02.c":5:16 2199 {pcrel_opt_ld_addr_same_reg}
	     (expr_list:REG_EQUIV (symbol_ref:DI ("c") [flags 0xc0]  <var_decl 0x7ff1008325a0 c>)
		(nil)))

	;; Load & zero extend the variable a into r3, using base register r8
	(insn:TI 7 10 8 2 (parallel [
		    (set (reg:DI 3 3)
			(zero_extend:DI (unspec:SI [
				    (mem/c:SI (reg/f:DI 8 8 [123]) [1 a+0 S4 A32])
				    (reg:DI 3 3)
				    (const_int 1 [0x1])
				] UNSPEC_PCREL_OPT_LD_RELOC)))
		    (clobber (reg/f:DI 8 8 [123]))
		]) "foo02.c":5:12 2207 {*pcrel_opt_ldsi_udi_gpr}
	     (expr_list:REG_DEAD (reg/f:DI 8 8 [123])
		(nil)))

	;; Load & zero extend the variable b into r10, using r10 as the base register
	(insn 8 7 11 2 (parallel [
		    (set (reg:DI 10 10)
			(zero_extend:DI (unspec:SI [
				    (mem/c:SI (reg/f:DI 10 10 [124]) [1 b+0 S4 A32])
				    (reg:DI 10 10)
				    (const_int 2 [0x2])
				] UNSPEC_PCREL_OPT_LD_RELOC)))
		    (clobber (scratch:DI))
		]) "foo02.c":5:12 2207 {*pcrel_opt_ldsi_udi_gpr}
	     (nil))

	;; Load and zero extend the variable c into r9, using r9 as the base register
	(insn 11 8 9 2 (parallel [
		    (set (reg:DI 9 9)
			(zero_extend:DI (unspec:SI [
				    (mem/c:SI (reg/f:DI 9 9 [128]) [1 c+0 S4 A32])
				    (reg:DI 9 9)
				    (const_int 3 [0x3])
				] UNSPEC_PCREL_OPT_LD_RELOC)))
		    (clobber (scratch:DI))
		]) "foo02.c":5:16 2207 {*pcrel_opt_ldsi_udi_gpr}
	     (nil))

	;; Add a+b
	(insn:TI 9 11 12 2 (set (reg:SI 3 3 [125])
		(plus:SI (reg:SI 3 3 [orig:126 a ] [126])
		    (reg:SI 10 10 [orig:127 b ] [127]))) "foo02.c":5:12 65 {*addsi3}
	     (expr_list:REG_DEAD (reg:SI 10 10 [orig:127 b ] [127])
		(nil)))

	;; Add (a+b)+c
	(insn:TI 12 9 18 2 (set (reg:SI 3 3 [122])
		(plus:SI (reg:SI 3 3 [125])
		    (reg:SI 9 9 [orig:129 c ] [129]))) "foo02.c":5:16 65 {*addsi3}
	     (expr_list:REG_DEAD (reg:SI 9 9 [orig:129 c ] [129])
		(nil)))

	;; Sign extend the result
	(insn:TI 18 12 19 2 (set (reg/i:DI 3 3)
		(sign_extend:DI (reg:SI 3 3 [122]))) "foo02.c":6:1 31 {extendsidi2}
	     (nil))

	;; Return
	(insn 19 18 29 2 (use (reg/i:DI 3 3)) "foo02.c":6:1 -1
	     (nil))
	(note 29 19 25 2 NOTE_INSN_EPILOGUE_BEG)
	(jump_insn 25 29 26 2 (simple_return) "foo02.c":6:1 866 {simple_return}
	     (nil)
	 -> simple_return)

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Power10 PCREL_OPT support
  2020-08-25  3:48   ` Michael Meissner
@ 2020-08-25  4:09     ` Michael Meissner
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Meissner @ 2020-08-25  4:09 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool, gcc-patches,
	David Edelsohn, Bill Schmidt, Peter Bergner, Alan Modra

I forgot to mention that comparing the three tests of placement of the
PCREL_OPT pass:

Having the pass after sched2 generated the same number of PCREL_OPT relocations
as having the pass immediately after reload.

But having the pass just before sched2 resulted in some more PCREL_OPT
relocations.  Mostly the number of instructions was the same, but a few
benchmarks had a few more 4-byte instructions generated.  It looks like those
new PCREL_OPT relocations replaced a few places where we loaded the address,
but did not do a PCREL_OPT optimization.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] Power10 PCREL_OPT support
  2020-08-25  4:01     ` Michael Meissner
@ 2020-08-25  4:24       ` Bill Schmidt
  0 siblings, 0 replies; 12+ messages in thread
From: Bill Schmidt @ 2020-08-25  4:24 UTC (permalink / raw)
  To: Michael Meissner, Segher Boessenkool, gcc-patches,
	David Edelsohn, Peter Bergner, Alan Modra

On 8/24/20 11:01 PM, Michael Meissner wrote:
> On Sat, Aug 22, 2020 at 07:05:51PM -0500, Bill Schmidt wrote:
>> What is necessary in order to allow this optimization to occur
>> earlier is to make this hidden dependency explicit.  When the
>> relocation is inserted, we have to change the "pld" instruction to
>> have a specific clobber of (in this case) r5, which represents what
>> will happen if the linker makes the substitution.
>>
>> I agree that it's too fragile to force this to be the last pass, so
>> I think if Mike can look into introducing a clobber of the hard
>> register when performing the optimization, that would at least allow
>> us to move this anywhere after reload.
>>
>> I don't immediately see a solution that works prior to register
>> allocation because we basically are representing two potential
>> starting points of a live range, only one of which will survive in
>> the final code.  That is too ugly a problem to hand to the register
>> allocator.
> As I said in a private message, I have the appropriate clobbers and such
> already.

Great, thanks!  I had failed to understand this was there now -- excellent.

I started cobbling up some code to use the du-chains to simplify things 
a little (but only a little).  Dealing with intervening loads and stores 
still requires a little mess.  I'll send you something in the next day 
or two, I hope.

Bill

>
> Here is the program I used in my previous reply to Segher:
>
> 	extern int a, b, c;
>
> 	int sum (void)
> 	{
> 	  return a + b + c;
> 	}
>
>
> Here is the RTL before the PCREL_OPT pass from sched2:
>
> 	;; Load the address of a into r8
> 	(insn:TI 5 13 6 2 (set (reg/f:DI 8 8 [123])
> 		(symbol_ref:DI ("a") [flags 0xc0]  <var_decl 0x7ff100832480 a>)) "foo02.c":5:12 722 {*pcrel_extern_addr}
> 	     (expr_list:REG_EQUIV (symbol_ref:DI ("a") [flags 0xc0]  <var_decl 0x7ff100832480 a>)
> 		(nil)))
>
> 	;; Load the address of b into r10
> 	(insn 6 5 10 2 (set (reg/f:DI 10 10 [124])
> 		(symbol_ref:DI ("b") [flags 0xc0]  <var_decl 0x7ff100832510 b>)) "foo02.c":5:12 722 {*pcrel_extern_addr}
> 	     (expr_list:REG_EQUIV (symbol_ref:DI ("b") [flags 0xc0]  <var_decl 0x7ff100832510 b>)
> 		(nil)))
>
> 	;; Load the address of c into r9
> 	(insn 10 6 7 2 (set (reg/f:DI 9 9 [128])
> 		(symbol_ref:DI ("c") [flags 0xc0]  <var_decl 0x7ff1008325a0 c>)) "foo02.c":5:16 722 {*pcrel_extern_addr}
> 	     (expr_list:REG_EQUIV (symbol_ref:DI ("c") [flags 0xc0]  <var_decl 0x7ff1008325a0 c>)
> 		(nil)))
>
> 	;; Load a's value into r3, using r8 as the base register
> 	(insn:TI 7 10 8 2 (set (reg:DI 3 3)
> 		(zero_extend:DI (mem/c:SI (reg/f:DI 8 8 [123]) [1 a+0 S4 A32]))) "foo02.c":5:12 16 {zero_extendsidi2}
> 	     (expr_list:REG_DEAD (reg/f:DI 8 8 [123])
> 		(nil)))
>
> 	;; Load b's value into r10, using r10 as the base register
> 	(insn 8 7 11 2 (set (reg:DI 10 10)
> 		(zero_extend:DI (mem/c:SI (reg/f:DI 10 10 [124]) [1 b+0 S4 A32]))) "foo02.c":5:12 16 {zero_extendsidi2}
> 	     (nil))
>
> 	;; Load c's value into r9, using r9 as the base register
> 	(insn 11 8 9 2 (set (reg:DI 9 9)
> 		(zero_extend:DI (mem/c:SI (reg/f:DI 9 9 [128]) [1 c+0 S4 A32]))) "foo02.c":5:16 16 {zero_extendsidi2}
> 	     (nil))
>
> 	;; Add a+b
> 	(insn:TI 9 11 12 2 (set (reg:SI 3 3 [125])
> 		(plus:SI (reg:SI 3 3 [orig:126 a ] [126])
> 		    (reg:SI 10 10 [orig:127 b ] [127]))) "foo02.c":5:12 65 {*addsi3}
> 	     (expr_list:REG_DEAD (reg:SI 10 10 [orig:127 b ] [127])
> 		(nil)))
>
> 	;; Add (a+b)+c
> 	(insn:TI 12 9 18 2 (set (reg:SI 3 3 [122])
> 		(plus:SI (reg:SI 3 3 [125])
> 		    (reg:SI 9 9 [orig:129 c ] [129]))) "foo02.c":5:16 65 {*addsi3}
> 	     (expr_list:REG_DEAD (reg:SI 9 9 [orig:129 c ] [129])
> 		(nil)))
>
> 	;; Sign extend
> 	(insn:TI 18 12 19 2 (set (reg/i:DI 3 3)
> 		(sign_extend:DI (reg:SI 3 3 [122]))) "foo02.c":6:1 31 {extendsidi2}
> 	     (nil))
>
> 	;; Return
> 	(insn 19 18 29 2 (use (reg/i:DI 3 3)) "foo02.c":6:1 -1
> 	     (nil))
> 	(note 29 19 25 2 NOTE_INSN_EPILOGUE_BEG)
> 	(jump_insn 25 29 26 2 (simple_return) "foo02.c":6:1 866 {simple_return}
> 	     (nil)
> 	 -> simple_return)
>
>
> And here is the RTL after the PCREL_OPT:
>
> 	;; Load of address a into r8, a will be loaded into r3
> 	(insn:TI 5 13 6 2 (parallel [
> 		    (set (reg/f:DI 8 8 [123])
> 			(unspec:DI [
> 				(symbol_ref:DI ("a") [flags 0xc0]  <var_decl 0x7ff100832480 a>)
> 				(const_int 1 [0x1])
> 			    ] UNSPEC_PCREL_OPT_LD_ADDR))
> 		    (set (reg:DI 3 3)
> 			(unspec:DI [
> 				(const_int 0 [0])
> 			    ] UNSPEC_PCREL_OPT_LD_ADDR))
> 		]) "foo02.c":5:12 2198 {pcrel_opt_ld_addr}
> 	     (expr_list:REG_EQUIV (symbol_ref:DI ("a") [flags 0xc0]  <var_decl 0x7ff100832480 a>)
> 		(nil)))
>
> 	;; Load of address b into r10, which will be the same register b's value is loaded into
> 	(insn 6 5 10 2 (set (reg/f:DI 10 10 [124])
> 		(unspec:DI [
> 			(symbol_ref:DI ("b") [flags 0xc0]  <var_decl 0x7ff100832510 b>)
> 			(const_int 2 [0x2])
> 		    ] UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) "foo02.c":5:12 2199 {pcrel_opt_ld_addr_same_reg}
> 	     (expr_list:REG_EQUIV (symbol_ref:DI ("b") [flags 0xc0]  <var_decl 0x7ff100832510 b>)
> 		(nil)))
>
> 	;; Load of address c into r9, which will be the same register c's value is loaded into
> 	(insn 10 6 7 2 (set (reg/f:DI 9 9 [128])
> 		(unspec:DI [
> 			(symbol_ref:DI ("c") [flags 0xc0]  <var_decl 0x7ff1008325a0 c>)
> 			(const_int 3 [0x3])
> 		    ] UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) "foo02.c":5:16 2199 {pcrel_opt_ld_addr_same_reg}
> 	     (expr_list:REG_EQUIV (symbol_ref:DI ("c") [flags 0xc0]  <var_decl 0x7ff1008325a0 c>)
> 		(nil)))
>
> 	;; Load & zero extend the variable a into r3, using base register r8
> 	(insn:TI 7 10 8 2 (parallel [
> 		    (set (reg:DI 3 3)
> 			(zero_extend:DI (unspec:SI [
> 				    (mem/c:SI (reg/f:DI 8 8 [123]) [1 a+0 S4 A32])
> 				    (reg:DI 3 3)
> 				    (const_int 1 [0x1])
> 				] UNSPEC_PCREL_OPT_LD_RELOC)))
> 		    (clobber (reg/f:DI 8 8 [123]))
> 		]) "foo02.c":5:12 2207 {*pcrel_opt_ldsi_udi_gpr}
> 	     (expr_list:REG_DEAD (reg/f:DI 8 8 [123])
> 		(nil)))
>
> 	;; Load & zero extend the variable b into r10, using r10 as the base register
> 	(insn 8 7 11 2 (parallel [
> 		    (set (reg:DI 10 10)
> 			(zero_extend:DI (unspec:SI [
> 				    (mem/c:SI (reg/f:DI 10 10 [124]) [1 b+0 S4 A32])
> 				    (reg:DI 10 10)
> 				    (const_int 2 [0x2])
> 				] UNSPEC_PCREL_OPT_LD_RELOC)))
> 		    (clobber (scratch:DI))
> 		]) "foo02.c":5:12 2207 {*pcrel_opt_ldsi_udi_gpr}
> 	     (nil))
>
> 	;; Load and zero extend the variable c into r9, using r9 as the base register
> 	(insn 11 8 9 2 (parallel [
> 		    (set (reg:DI 9 9)
> 			(zero_extend:DI (unspec:SI [
> 				    (mem/c:SI (reg/f:DI 9 9 [128]) [1 c+0 S4 A32])
> 				    (reg:DI 9 9)
> 				    (const_int 3 [0x3])
> 				] UNSPEC_PCREL_OPT_LD_RELOC)))
> 		    (clobber (scratch:DI))
> 		]) "foo02.c":5:16 2207 {*pcrel_opt_ldsi_udi_gpr}
> 	     (nil))
>
> 	;; Add a+b
> 	(insn:TI 9 11 12 2 (set (reg:SI 3 3 [125])
> 		(plus:SI (reg:SI 3 3 [orig:126 a ] [126])
> 		    (reg:SI 10 10 [orig:127 b ] [127]))) "foo02.c":5:12 65 {*addsi3}
> 	     (expr_list:REG_DEAD (reg:SI 10 10 [orig:127 b ] [127])
> 		(nil)))
>
> 	;; Add (a+b)+c
> 	(insn:TI 12 9 18 2 (set (reg:SI 3 3 [122])
> 		(plus:SI (reg:SI 3 3 [125])
> 		    (reg:SI 9 9 [orig:129 c ] [129]))) "foo02.c":5:16 65 {*addsi3}
> 	     (expr_list:REG_DEAD (reg:SI 9 9 [orig:129 c ] [129])
> 		(nil)))
>
> 	;; Sign extend the result
> 	(insn:TI 18 12 19 2 (set (reg/i:DI 3 3)
> 		(sign_extend:DI (reg:SI 3 3 [122]))) "foo02.c":6:1 31 {extendsidi2}
> 	     (nil))
>
> 	;; Return
> 	(insn 19 18 29 2 (use (reg/i:DI 3 3)) "foo02.c":6:1 -1
> 	     (nil))
> 	(note 29 19 25 2 NOTE_INSN_EPILOGUE_BEG)
> 	(jump_insn 25 29 26 2 (simple_return) "foo02.c":6:1 866 {simple_return}
> 	     (nil)
> 	 -> simple_return)
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] Power10: Add PCREL_OPT load support
  2020-08-21  2:09   ` Segher Boessenkool
@ 2020-09-03 17:24     ` Michael Meissner
  0 siblings, 0 replies; 12+ messages in thread
From: Michael Meissner @ 2020-09-03 17:24 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Alan Modra

On Thu, Aug 20, 2020 at 09:09:40PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 18, 2020 at 02:34:01AM -0400, Michael Meissner wrote:
> > +// Maximum number of insns to scan between the load address and the load that
> 
> Please don't mix /* and // comments.  Just stick to /* comments, like
> all the rest of our backend?
> 
> > +const int MAX_PCREL_OPT_INSNS	= 10;
> 
> "static const", and not tab please (just a space).

With the switch to using flow control, this will go away.

> > +  // LWA is a DS format instruction, but LWZ is a D format instruction.  We use
> > +  // DImode for the mode to force checking whether the bottom 2 bits are 0.
> 
> How so?  Can't you just check if the resulting insn is accepted? That
> would be so much more robust (and can have a correct description more
> easily, too!)

Unfortunately no.  The prefixed form will be accepted, and the whole PCREL_OPT
optimization requires a non-prefixed instruction.

> > +  // However FPR and vector registers uses the LFIWAX instruction which is
> > +  // indexed only.
> 
> (vectors use lxsiwax)

Yes.

> > +  if (GET_CODE (mem) == SIGN_EXTEND && GET_MODE (XEXP (mem, 0)) == SImode)
> 
> You're checking here whether the address of the MEM is SImode.

Whoops.

> > +    {
> > +      if (!INT_REGNO_P (reg_regno))
> 
> That should use base_reg_operand instead?
> 
> > +  // The optimization will only work on non-prefixed offsettable loads.
> > +  rtx addr = XEXP (mem_inner, 0);
> > +  enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed);
> > +  if (iform != INSN_FORM_BASE_REG
> > +      && iform != INSN_FORM_D
> > +      && iform != INSN_FORM_DS
> > +      && iform != INSN_FORM_DQ)
> > +    return false;
> 
> Sounds like something for a utility function?  Do we use this elsewhere?

I will make a utility function.  But no, we currently do not use this anywhere
else.  It might be useful for fusion.

> > +  ++pcrel_opt_next_num;
> 
> Normal style is postfix increment.  Empty lines around this are nice, it
> indicates this is the point where we decided to do the thing.  Or add a
> comment even, maybe?
> 
> > +  unsigned int addr_regno = reg_or_subregno (addr_reg);
> > +  rtx label_num = GEN_INT (pcrel_opt_next_num);
> > +  rtx reg_di = gen_rtx_REG (DImode, reg_regno);
> > +
> > +  PATTERN (addr_insn)
> > +    = ((addr_regno != reg_regno)
> > +       ? gen_pcrel_opt_ld_addr (addr_reg, addr_symbol, label_num, reg_di)
> > +       : gen_pcrel_opt_ld_addr_same_reg (addr_reg, addr_symbol, label_num));
> 
> Please use if() instead of multi-line expressions.  The outer parens are
> unnecessary, too.

Ok.  However, the outer parens make it easier to format in emacs.

> > +  // Revalidate the insn, backing out of the optimization if the insn is not
> > +  // supported.
> 
> So the count will be incorrect then?

No, the count is correct, since we don't count the PCREL_OPTs until everything is validated.

> 
> > +  INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
> > +  if (INSN_CODE (addr_insn) < 0)
> > +    {
> > +      PATTERN (addr_insn) = addr_set;
> > +      INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0);
> > +      return false;
> > +    }
> 
> Can you use the normal undo mechanisms, instead?  apply_change_group
> etc.

I will think about it.

> > +static rtx_insn *
> > +next_active_insn_in_basic_block (rtx_insn *insn)
> > +{
> > +  insn = NEXT_INSN (insn);
> > +
> > +  while (insn != NULL_RTX)
> > +    {
> > +      /* If the basic block ends or there is a jump of some kind, exit the
> > +	 loop.  */
> > +      if (CALL_P (insn)
> > +	  || JUMP_P (insn)
> > +	  || JUMP_TABLE_DATA_P (insn)
> > +	  || LABEL_P (insn)
> > +	  || BARRIER_P (insn))
> > +	return NULL;
> > +
> > +      /* If this is a real insn, return it.  */
> > +      if (!insn->deleted ()
> > +	  && NONJUMP_INSN_P (insn)
> > +	  && GET_CODE (PATTERN (insn)) != USE
> > +	  && GET_CODE (PATTERN (insn)) != CLOBBER)
> > +	return insn;
> > +
> > +      /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs.  */
> > +      insn = NEXT_INSN (insn);
> > +    }
> > +
> > +  return NULL;
> > +}
> 
> There are things to do this.  Please don't write code manually parsing
> RTL unless you have to.

I will switch to using the DF flow in the next set of patches.

> > +// Validate that a load is actually a single instruction that can be optimized
> > +// with the PCREL_OPT optimization.
> > +
> > +static bool
> > +is_single_instruction (rtx_insn *insn, rtx reg)
> 
> Of course it is a single instruction!  A single RTL instruction...
> Clarify as "single machine instruction"?

My experience is some of the insns lie in terms of things like size.  The size
is set when it is split, but this pass may/may not run before those
instructions are split.  But I will change it to just testing if the length is
4, and hope for the best.

> > +{
> > +  if (!REG_P (reg) && !SUBREG_P (reg))
> > +    return false;
> 
> You need to check if is a subreg of reg, then.  There are subregs of
> other things.  Not that you should see those here, but still.

Good point, but in theory outside of register allocation, you should never see
it.

> 
> > +  // _Decimal128 and IBM extended double are always multiple instructions.
> > +  machine_mode mode = GET_MODE (reg);
> > +  if (mode == TFmode && !TARGET_IEEEQUAD)
> > +    return false;
> > +
> > +  if (mode == TDmode || mode == IFmode)
> > +    return false;
> 
> Don't we have a helper for this?  The ibm128 part at least.

Yes, but it doesn't cover TD.

> Maybe we should just have an attribute on the insns that *can* get this
> optimisation?
> 
> > +  return (base_reg_operand (XEXP (addr, 0), Pmode)
> > +	  && satisfies_constraint_I (XEXP (addr, 1)));
> 
> short_cint_operand.  But maybe just rs6000_legitimate_offset_address_p?
> 
> >  /* Flags that need to be turned off if -mno-power10.  */
> >  #define OTHER_POWER10_MASKS	(OPTION_MASK_MMA			\
> >  				 | OPTION_MASK_PCREL			\
> > +				 | OPTION_MASK_PCREL_OPT		\
> >  				 | OPTION_MASK_PREFIXED)
> 
> This isn't a CPU flag.  Instead, it is just an optimisation option.

But you want it turned off if somebody says
__attribute__((target("cpu=power8"))), which is what OTHER_POWER10_MASKS does.

> > @@ -8515,7 +8522,10 @@ rs6000_delegitimize_address (rtx orig_x)
> >  {
> >    rtx x, y, offset;
> >  
> > -  if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR)
> > +  if (GET_CODE (orig_x) == UNSPEC
> > +      && (XINT (orig_x, 1) == UNSPEC_FUSION_GPR
> > +	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR
> > +	  || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG))
> >      orig_x = XVECEXP (orig_x, 0, 0);
> >  
> >    orig_x = delegitimize_mem_from_attrs (orig_x);
> 
> Why this?  It needs an explanation (in the code).

It is needed in some cases to get back the original SYMBOL_REF (particularly
for debugging).

> > @@ -13197,6 +13207,19 @@ print_operand (FILE *file, rtx x, int code)
> >  	fprintf (file, "%d", 128 >> (REGNO (x) - CR0_REGNO));
> >        return;
> >  
> > +    case 'r':
> > +      /* X is a label number for the PCREL_OPT optimization.  Emit the .reloc
> > +	 to enable this optimization, unless the value is 0.  */
> > +      gcc_assert (CONST_INT_P (x));
> > +      if (UINTVAL (x) != 0)
> > +	{
> > +	  unsigned int label_num = UINTVAL (x);
> > +	  fprintf (file,
> > +		   ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t",
> > +		   label_num, label_num);
> > +	}
> > +      return;
> 
> Don't eat output modifiers please.  We have only so few left, and we
> cannot recycle any more without pain.
> 
> I don't see why we cannot just do this in the normal output (C) code of
> the one or few insns that want this?

Well it is quite a few insns in pcrel-opt.md.  I will look at changing it.
Part of it is historical reasons in that it was needed in the old version of
the patches that could be turned off.

> >  ;; The ISA we implement.
> > -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
> > +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,pcrel_opt"
> >    (const_string "any"))
> 
> No.  Please read the heading.

Ok, I will just use p10.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-09-03 17:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-18  6:31 [PATCH 0/3] Power10 PCREL_OPT support Michael Meissner
2020-08-18  6:34 ` [PATCH 1/3] Power10: Add PCREL_OPT load support Michael Meissner
2020-08-21  2:09   ` Segher Boessenkool
2020-09-03 17:24     ` Michael Meissner
2020-08-18  6:35 ` [PATCH 2/3] Power10: Add PCREL_OPT store support Michael Meissner
2020-08-18  6:37 ` [PATCH 3/3] Power10: Add tests for PCREL_OPT support Michael Meissner
2020-08-20 23:33 ` [PATCH 0/3] Power10 " Segher Boessenkool
2020-08-23  0:05   ` Bill Schmidt
2020-08-25  4:01     ` Michael Meissner
2020-08-25  4:24       ` Bill Schmidt
2020-08-25  3:48   ` Michael Meissner
2020-08-25  4:09     ` Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).