* [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) @ 2020-09-05 4:27 Michael Meissner 2020-09-05 4:31 ` [PATCH 1/3] power10: Add PCREL_OPT load support Michael Meissner ` (4 more replies) 0 siblings, 5 replies; 9+ messages in thread From: Michael Meissner @ 2020-09-05 4:27 UTC (permalink / raw) To: gcc-patches, Segher Boessenkool, David Edelsohn, Michael Meissner, Bill Schmidt, Peter Bergner The ELF-v2 ISA 3.1 support for Power10 has relocations to optimize cases where the code is references an external variable in only one location. This patch is similar to the optimizations that the linker already does to optimize TOC accesses. This patch is a revision of the patches last submitted on August 18th, 2020. Compared to those patches, the current patches do: 1) The dataflow (DF) functions are used to find the single defs and single ref to that def in the same basic block. Less parsing of the RTL is being done. 2) I switched to use the validate_change and apply_change_group functions instead of manually replacing the PATTERN if the value did not pan out. I tested this by temporarily disabling some of the patterns, and validated the the results were undone (and then I re-instated the changes to allow these patches to do the work). 3) I tried to clean up the comments, and added some more comments in places. 4) I verified that the patches could be moved earlier, and these patches move the code just before the 2nd scheduler pass (previously it was after the 2nd scheduler pass). I did try out an earlier set of patches, and I could move the pass to be just after the reload pass. In building Spec 2017, I noticed that moving the changes before the 2nd scheduler pass caught a for more cases than moving it after the register allocation or after the 2nd scheduler pass. 5) I removed the '%r' case to print_operand, and instead used a common function to create the .reloc. I will be submitting 3 patches as follow-ups to this message: * The first patch adds support for PCREL_OPT loads; * The second patch adds support for PCREL_OPT stores; (and) * The third patch adds the tests. I have built the compiler with/without the patches, and there were no regressions in the testsuite. Can I check these patches into the master branch? I do not antipate needing to backport these changes to GCC 10.3. If the program is compiled to be the main program, and the variable is defined in the main program, these relocations will convert loading up the address of the external variable and then doing a load or store using that address to be doing the prefixed load or store directly and converting the second instruction into a NOP. For example, consider the following program: extern int ext_variable; int ret_var (void) { return ext_variable; } void store_var (int i) { ext_variable = i; } Currently on power10, the compiler compiles this as: ret_var: pld 9,ext_variable@got@pcrel lwa 3,0(9) blr store_var: pld 9,ext_variable@got@pcrel stw 3,0(9) blr That is, it loads up the address of 'ext_variable' from the GOT table into register r9, and then uses r9 as a base register to reference the actual variable. The linker does optimize the case where you are compiling the main program, and the variable is also defined in the main program to be: ret_var: pla 9,ext_variable lwa 3,0(9) blr store_var: pla 9,ext_variable stw 3,0(9) blr These patches generate: ret_var: pld 9,ext_variable@got@pcrel .Lpcrel1: .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8) lwa 3,0(9) blr store_var: pld 9,ext_variable@got@pcrel .Lpcrel2: .reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8) stw 3,0(9) blr Note, the label for locating the PLD occurs after the PLD and not before it. This is so that if the assembler adds a NOP in front of the PLD to align it, the relocations will still work. If the linker can, it will convert the code into: ret_var: plwa 3,ext_variable@pcrel(0),1 nop blr store_var: pstw 3,ext_variable@pcrel(0),1 nop blr These patches allow the load of the address to not be physically adjacent to the actual load or store, which should allow for better code. For loads, there must no references to the register that is being loaded between the PLD and the actual load. For stores, it becomes a little trickier, in that the register being stored must be live at the time the PLD instruction is done, and it must continue to be live and unmodified between the PLD and the store. For both loads and stores, there must be only one reference to the address being loaded into a base register, and that base register must die at the point of the load/store. For reference, here is what the current compiler generates for a medium code model system targeting power9 with the TOC support: .section ".toc","aw" .LC0: .quad ext_variable .section ".text" ret_var: .LCF0: 0: addis 2,12,.TOC.-.LCF0@ha addi 2,2,.TOC.-.LCF0@l .localentry ret_var,.-ret_var addis 9,2,.LC0@toc@ha ld 9,.LC0@toc@l(9) lwa 3,0(9) blr .section ".toc","aw" .set .LC1,.LC0 .section ".text" store_var: .LCF1: 0: addis 2,12,.TOC.-.LCF1@ha addi 2,2,.TOC.-.LCF1@l .localentry store_var,.-store_var addis 9,2,.LC1@toc@ha ld 9,.LC1@toc@l(9) stw 3,0(9) blr And the linker optimizes this to: ret_var: lis 2,.TOC@ha addi 2,2,.TOC@l .localentry ret_var,.-ret_var nop ; addis eliminated due to small TOC addi 9,2,<offset> ; ld converted into addi lwa 3,0(9) ; actual load store_var: lis 2,.TOC@ha addi 2,2,.TOC@l .localentry store_var,.-store_var nop ; addis eliminated due to small TOC addi 9,2,<offset> ; ld converted into addi stw 3,0(9) ; actual store -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] power10: Add PCREL_OPT load support. 2020-09-05 4:27 [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) Michael Meissner @ 2020-09-05 4:31 ` Michael Meissner 2020-09-05 4:35 ` [PATCH 2/3] power10: Add PCREL_OPT store support Michael Meissner ` (3 subsequent siblings) 4 siblings, 0 replies; 9+ messages in thread From: Michael Meissner @ 2020-09-05 4:31 UTC (permalink / raw) To: Michael Meissner, gcc-patches, Segher Boessenkool, David Edelsohn, Bill Schmidt, Peter Bergner power10: Add PCREL_OPT load support. This patch adds support for optimizing power10 loads of an external variable to eliminate loading the address of the variable, and then doing a subsequent load using that address. The next patch will add the support for optimizing power10 stores to an external variable. The third patch will add the test suite for these patches. I have built compilers with and without these set of 3 patches doing a bootstrap build and make check. There were no regressions, and the new tests passed. Can I check these patches into the master branch for GCC? Because this is new functionality, I do not intend to back port these patches to GCC 10 at this time. gcc/ 2020-09-05 Michael Meissner <meissner@linux.ibm.com> * config.gcc (powerpc*-*-*): Add pcrel-opt.o. (rs6000*-*-*): Add pcrel-opt.o. * config/rs6000/pcrel-opt.c: New file. * config/rs6000/pcrel-opt.md: New file. * config/rs6000/predicates.md (d_form_memory): New predicate. * config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add -mpcrel-opt. (POWERPC_MASKS): Add -mpcrel-opt. * config/rs6000/rs6000-passes.def: Add comment for existing power8 swaps pass. Add PCREL_OPT pass. * config/rs6000/rs6000-protos.h (reg_to_non_prefixed): New declaration. (offsettable_non_prefixed_memory): New declaration. (output_pcrel_opt_reloc): New declaration. (make_pass_pcrel_opt): New declaration. * config/rs6000/rs6000.c (reg_to_non_prefixed): Make function globally visible. (rs6000_option_override_internal): Add support for -mpcrel-opt. (rs6000_delegitimize_address): Add support for the PCREL_OPT addresses. (rs6000_opt_masks): Add -mpcrel-opt. (offsettable_non_prefixed_memory): New helper function. (rs6000_asm_output_opcode): Reset prefixed flag after first use. (output_pcrel_opt_reloc): New function. * config/rs6000/rs6000.md (loads_extern_addr): New insn attribute. (pcrel_extern_addr): Set loads_extern_addr attribute. (toplevel): Include pcrel-opt.md. * config/rs6000/rs6000.opt (-mpcrel-opt): New option. * config/rs6000/t-rs6000 (pcrel-opt.o): Add build rules. (MD_INCLUDES): Add pcrel-opt.md --- gcc/config.gcc | 6 +- gcc/config/rs6000/pcrel-opt.c | 590 ++++++++++++++++++++++++++++ gcc/config/rs6000/pcrel-opt.md | 256 ++++++++++++ gcc/config/rs6000/predicates.md | 23 ++ gcc/config/rs6000/rs6000-cpus.def | 2 + gcc/config/rs6000/rs6000-passes.def | 8 + gcc/config/rs6000/rs6000-protos.h | 4 + gcc/config/rs6000/rs6000.c | 105 ++++- gcc/config/rs6000/rs6000.md | 8 +- gcc/config/rs6000/rs6000.opt | 4 + gcc/config/rs6000/t-rs6000 | 7 +- 11 files changed, 1004 insertions(+), 9 deletions(-) create mode 100644 gcc/config/rs6000/pcrel-opt.c create mode 100644 gcc/config/rs6000/pcrel-opt.md diff --git a/gcc/config.gcc b/gcc/config.gcc index 797f0ad5edd..c677e4145e3 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -505,7 +505,7 @@ or1k*-*-*) ;; powerpc*-*-*) cpu_type=rs6000 - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o" + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o pcrel-opt.o" extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h" extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h" extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h" @@ -520,6 +520,7 @@ powerpc*-*-*) esac extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt" target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c \$(srcdir)/config/rs6000/rs6000-call.c" + target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c" ;; pru-*-*) cpu_type=pru @@ -531,8 +532,9 @@ riscv*) ;; rs6000*-*-*) extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt" - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o" + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o pcrel-opt.o" target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c \$(srcdir)/config/rs6000/rs6000-call.c" + target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c" ;; sparc*-*-*) cpu_type=sparc diff --git a/gcc/config/rs6000/pcrel-opt.c b/gcc/config/rs6000/pcrel-opt.c new file mode 100644 index 00000000000..f831853c90b --- /dev/null +++ b/gcc/config/rs6000/pcrel-opt.c @@ -0,0 +1,590 @@ +/* Subroutines used support the pc-relative linker optimization. + Copyright (C) 2020 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +/* This file implements a RTL pass that looks for pc-relative loads of the + address of an external variable using the PCREL_GOT relocation and a single + load that uses that external address. If that is found we create the + PCREL_OPT relocation to possibly convert: + + pld addr_reg,var@pcrel@got + + <possibly other insns that do not use 'addr_reg' or 'data_reg'> + + lwz data_reg,0(addr_reg) + + into: + + plwz data_reg,var@pcrel + + <possibly other insns that do not use 'addr_reg' or 'data_reg'> + + nop + + If the variable is not defined in the main program or the code using it is + not in the main program, the linker put the address in the .got section and + do: + + .section .got + .Lvar_got: + .dword var + + .section .text + pld addr_reg,.Lvar_got@pcrel + + <possibly other insns that do not use 'addr_reg' or 'data_reg'> + + lwz data_reg,0(addr_reg) + + We only look for a single usage in the basic block where the external + address is loaded. Multiple uses or references in another basic block will + force us to not use the PCREL_OPT relocation. */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "rtl.h" +#include "tree.h" +#include "memmodel.h" +#include "expmed.h" +#include "optabs.h" +#include "recog.h" +#include "df.h" +#include "tm_p.h" +#include "ira.h" +#include "print-tree.h" +#include "varasm.h" +#include "explow.h" +#include "expr.h" +#include "output.h" +#include "tree-pass.h" +#include "rtx-vector-builder.h" +#include "print-rtl.h" +#include "insn-attr.h" +#include "insn-codes.h" + +/* Various counters. */ +static struct { + unsigned long extern_addrs; + unsigned long loads; + unsigned long adjacent_loads; + unsigned long failed_loads; +} counters; + +/* Return a marker to identify the PCREL_OPT load address and load + instruction. We use a constant integer which is added to ".Lpcrel" to make + the label. */ + +static rtx +pcrel_opt_next_marker (void) +{ + static unsigned int pcrel_opt_next_num; + + pcrel_opt_next_num++; + return GEN_INT (pcrel_opt_next_num); +} +\f +/* Optimize a PC-relative load address to be used in a load. + + If the sequence of insns is safe to use the PCREL_OPT optimization (i.e. no + additional references to the address register, the address register dies at + the load, and no references to the load), convert insns of the form: + + (set (reg:DI addr) + (symbol_ref:DI "ext_symbol")) + + ... + + (set (reg:<MODE> value) + (mem:<MODE> (reg:DI addr))) + + into: + + (parallel [(set (reg:DI addr) + (unspec:<MODE> [(symbol_ref:DI "ext_symbol") + (const_int label_num) + (const_int 0)] + UNSPEC_PCREL_OPT_LD_ADDR)) + (set (reg:DI data) + (unspec:DI [(const_int 0)] + UNSPEC_PCREL_OPT_LD_ADDR))]) + + ... + + (parallel [(set (reg:<MODE>) + (unspec:<MODE> [(mem:<MODE> (reg:DI addr)) + (reg:DI data) + (const_int label_num)] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (reg:DI addr))]) + + If the register being loaded is the same register that was used to hold the + external address, we generate the following insn instead: + + (set (reg:DI data) + (unspec:DI [(symbol_ref:DI "ext_symbol") + (const_int label_num) + (const_int 1)] + UNSPEC_PCREL_OPT_LD_ADDR)) + + In the first insn, we set both the address of the external variable, and + mark that the variable being loaded both are created in that insn, and are + consumed in the second insn. It doesn't matter what mode the register that + we will ultimately do the load into, so we use DImode. We just need to mark + that both registers may be set in the first insn, and will be used in the + second insn. + + The UNSPEC_PCREL_OPT_LD_ADDR insn will generate the load address plus + a definition of a label (.Lpcrel<n>), while the UNSPEC_PCREL_OPT_LD_RELOC + insn will generate the .reloc to tell the linker to tie the load address and + load using that address together. + + pld b,ext_symbol@got@pcrel + .Lpcrel1: + + ... + + .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8) + lwz r,0(b) + + If ext_symbol is defined in another object file in the main program and we + are linking the main program, the linker will convert the above instructions + to: + + plwz r,ext_symbol@got@pcrel + + ... + + nop */ + +static void +pcrel_opt_load (rtx_insn *addr_insn, /* insn loading address. */ + rtx_insn *load_insn) /* insn using address. */ +{ + rtx addr_set = PATTERN (addr_insn); + gcc_assert (GET_CODE (addr_set) == SET); + + rtx addr_reg = SET_DEST (addr_set); + gcc_assert (base_reg_operand (addr_reg, Pmode)); + + rtx addr_symbol = SET_SRC (addr_set); + gcc_assert (pcrel_external_address (addr_symbol, Pmode)); + + rtx load_set = PATTERN (load_insn); + gcc_assert (GET_CODE (load_set) == SET); + + /* Make sure there are no references to the register being loaded inbetween + the two insns. */ + rtx reg = SET_DEST (load_set); + if (!register_operand (reg, GET_MODE (reg)) + || reg_used_between_p (reg, addr_insn, load_insn) + || reg_set_between_p (reg, addr_insn, load_insn)) + return; + + rtx mem = SET_SRC (load_set); + machine_mode reg_mode = GET_MODE (reg); + machine_mode mem_mode = GET_MODE (mem); + rtx mem_inner = mem; + unsigned int reg_regno = reg_or_subregno (reg); + + /* LWA is a DS format instruction, but LWZ is a D format instruction. We use + DImode for the mode to force checking whether the bottom 2 bits are 0. + However FPR and vector registers uses the LFIWAX/LXSIWAX instructions + which only have indexed forms. */ + if (GET_CODE (mem) == SIGN_EXTEND && GET_MODE (XEXP (mem, 0)) == SImode) + { + if (!INT_REGNO_P (reg_regno)) + return; + + mem_inner = XEXP (mem, 0); + mem_mode = DImode; + } + + else if (GET_CODE (mem) == SIGN_EXTEND + || GET_CODE (mem) == ZERO_EXTEND + || GET_CODE (mem) == FLOAT_EXTEND) + { + mem_inner = XEXP (mem, 0); + mem_mode = GET_MODE (mem_inner); + } + + if (!MEM_P (mem_inner)) + return; + + /* If the address isn't a non-prefixed offsettable instruction, we can't do + the optimization. */ + if (!offsettable_non_prefixed_memory (reg, mem_mode, mem_inner)) + return; + + /* Allocate a new PC-relative label, and update the load external address + insn. + + If the register being loaded is different from the address register, we + need to indicate both registers are set at the load of the address. + + (parallel [(set (reg load) + (unspec [(symbol_ref addr_symbol) + (const_int label_num)] + UNSPEC_PCREL_OPT_LD_ADDR)) + (set (reg addr) + (unspec [(const_int 0)] + UNSPEC_PCREL_OPT_LD_ADDR))]) + + If the register being loaded is the same as the address register, we use + an alternate form: + + (set (reg load) + (unspec [(symbol_ref addr_symbol) + (const_int label_num)] + UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) */ + unsigned int addr_regno = reg_or_subregno (addr_reg); + rtx label_num = pcrel_opt_next_marker (); + rtx reg_di = gen_rtx_REG (DImode, reg_regno); + rtx addr_pattern; + + /* Create the load address, either using the pattern with an explicit clobber + if the address register is not the same as the register being loaded, or + using the pattern that requires the address register to be the address + loaded. */ + if (addr_regno != reg_regno) + addr_pattern = gen_pcrel_opt_ld_addr (addr_reg, addr_symbol, label_num, + reg_di); + else + addr_pattern = gen_pcrel_opt_ld_addr_same_reg (addr_reg, addr_symbol, + label_num); + + validate_change (addr_insn, &PATTERN (addr_insn), addr_pattern, true); + + /* Update the load insn. If the mem had a sign/zero/float extend, add that + also after doing the UNSPEC. Add an explicit clobber of the external + address register just to make it clear that the address register dies. + + (parallel [(set (reg:<MODE> data) + (unspec:<MODE> [(mem (addr_reg) + (reg:DI data) + (const_int label_num)] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (reg:DI addr_reg))]) */ + rtvec v_load = gen_rtvec (3, mem_inner, reg_di, label_num); + rtx new_load = gen_rtx_UNSPEC (GET_MODE (mem_inner), v_load, + UNSPEC_PCREL_OPT_LD_RELOC); + + if (GET_CODE (mem) != GET_CODE (mem_inner)) + new_load = gen_rtx_fmt_e (GET_CODE (mem), reg_mode, new_load); + + rtx new_load_set = gen_rtx_SET (reg, new_load); + rtx load_clobber = gen_rtx_CLOBBER (VOIDmode, + (addr_regno == reg_regno + ? gen_rtx_SCRATCH (Pmode) + : addr_reg)); + rtx new_load_pattern + = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, new_load_set, load_clobber)); + + validate_change (load_insn, &PATTERN (load_insn), new_load_pattern, true); + + /* Note whether the changes were sucessful or not. */ + if (apply_change_group ()) + { + /* PCREL_OPT load optimization succeeded. */ + counters.loads++; + if (next_nonnote_insn (addr_insn) == load_insn) + counters.adjacent_loads++; + + if (dump_file) + fprintf (dump_file, + "PCREL_OPT load (addr insn = %d, use insn = %d).\n", + INSN_UID (addr_insn), + INSN_UID (load_insn)); + } + else + { + /* PCREL_OPT load optimization did not succeed. */ + counters.failed_loads++; + if (dump_file) + fprintf (dump_file, + "PCREL_OPT load failed (addr insn = %d, use insn = %d).\n", + INSN_UID (addr_insn), + INSN_UID (load_insn)); + } + + return; +} +\f +/* Given an insn with that loads up a base register with the address of an + external symbol, see if we can optimize it with the PCREL_OPT + optimization. */ + +static void +pcrel_opt_address (rtx_insn *addr_insn) +{ + counters.extern_addrs++; + + /* Do some basic validation. */ + rtx addr_set = PATTERN (addr_insn); + if (GET_CODE (addr_set) != SET) + return; + + rtx addr_reg = SET_DEST (addr_set); + rtx addr_symbol = SET_SRC (addr_set); + + if (!base_reg_operand (addr_reg, Pmode) + || !pcrel_external_address (addr_symbol, Pmode)) + return; + + /* The address register must have exactly one definition. */ + struct df_insn_info *insn_info = DF_INSN_INFO_GET (addr_insn); + if (!insn_info) + return; + + df_ref def = df_single_def (insn_info); + if (!def) + return; + + /* Is there only one use? */ + df_link *chain = DF_REF_CHAIN (def); + if (!chain || chain->next) + return; + + /* Get the insn of the possible load. */ + df_ref use = chain->ref; + if (!use) + return; + + rtx_insn *use_insn = DF_REF_INSN (use); + + /* The use instruction must be a single non-prefixed instruction. */ + if (get_attr_length (use_insn) != 4) + return; + + /* The address and the memory operation must be in the same basic block. */ + if (BLOCK_FOR_INSN (use_insn) != BLOCK_FOR_INSN (addr_insn)) + return; + + /* If this isn't a simple SET, skip doing the optimization. */ + if (GET_CODE (PATTERN (use_insn)) != SET) + return; + + /* Check the insns between loading the address and its use to classify what + type of insn it is. */ + bool load_insns_found = false; + bool store_insns_found = false; + bool do_pcrel_opt = true; + rtx_insn *insn; + rtx_insn *last_insn_in_bb = BB_END (BLOCK_FOR_INSN (use_insn)); + + + for (insn = NEXT_INSN (addr_insn); + insn != use_insn && do_pcrel_opt; + insn = NEXT_INSN (insn)) + { + /* If we see things like labels, calls, etc. don't do the PCREL_OPT + optimization. */ + if (!insn + || LABEL_P (insn) + || JUMP_P (insn) + || CALL_P (insn) + || BARRIER_P (insn)) + { + do_pcrel_opt = false; + break; + } + + /* For a normal insn, see if it is a load or store. */ + if (NONDEBUG_INSN_P (insn) + && GET_CODE (PATTERN (insn)) != USE + && GET_CODE (PATTERN (insn)) != CLOBBER) + { + switch (get_attr_type (insn)) + { + /* While load of the external address is a 'load' for scheduling + purposes, it should be safe to allow loading other external + addresses between the load of the external address we are + currently looking at and the load or store using that + address. */ + case TYPE_LOAD: + if (get_attr_loads_extern_addr (insn) != LOADS_EXTERN_ADDR_YES) + load_insns_found = true; + break; + + case TYPE_FPLOAD: + case TYPE_VECLOAD: + load_insns_found = true; + break; + + case TYPE_STORE: + case TYPE_FPSTORE: + case TYPE_VECSTORE: + store_insns_found = true; + break; + + /* Don't do the optimization through atomic operations. */ + case TYPE_LOAD_L: + case TYPE_STORE_C: + case TYPE_HTM: + case TYPE_HTMSIMPLE: + do_pcrel_opt = false; + break; + + default: + break; + } + } + + /* If this is the last insn in the basic block, and we haven't found the + load, exit. */ + if (insn == last_insn_in_bb) + { + do_pcrel_opt = false; + break; + } + } + + /* Don't do the optimization if something went wrong. */ + if (!do_pcrel_opt) + return; + + /* Is this a load? */ + switch (get_attr_type (use_insn)) + { + /* Don't do the PCREL_OPT load optimization if there was a store + operation. Perhaps the store might be to the global variable through + a pointer. */ + case TYPE_LOAD: + case TYPE_FPLOAD: + case TYPE_VECLOAD: + if (store_insns_found) + break; + + pcrel_opt_load (addr_insn, use_insn); + break; + + /* If the use is not a load, just skip the optimization. */ + default: + break; + } + + return; +} +\f +/* Optimize pcrel external variable references. */ + +static unsigned int +pcrel_opt_pass (function *fun) +{ + basic_block bb; + rtx_insn *insn, *curr_insn = 0; + + memset ((char *) &counters, '\0', sizeof (counters)); + + /* Dataflow analysis for use-def chains. */ + df_set_flags (DF_RD_PRUNE_DEAD_DEFS); + df_chain_add_problem (DF_DU_CHAIN); + df_note_add_problem (); + df_analyze (); + df_set_flags (DF_DEFER_INSN_RESCAN | DF_LR_RUN_DCE); + + if (dump_file) + fprintf (dump_file, "\n"); + + /* Look at each basic block to see if there is a load of an external + variable's external address, and a single load using that external + address. */ + FOR_ALL_BB_FN (bb, fun) + { + FOR_BB_INSNS_SAFE (bb, insn, curr_insn) + { + if (NONJUMP_INSN_P (insn) + && single_set (insn) + && get_attr_loads_extern_addr (insn) == LOADS_EXTERN_ADDR_YES) + pcrel_opt_address (insn); + } + } + + if (dump_file) + { + fprintf (dump_file, + "\n# of load(s) of an address of an external symbol = %lu\n", + counters.extern_addrs); + + fprintf (dump_file, "# of PCREL_OPT load(s) = %lu (adjacent %lu)\n", + counters.loads, counters.adjacent_loads); + + if (counters.failed_loads) + fprintf (dump_file, "# of failed PCREL_OPT load(s) = %lu\n", + counters.failed_loads); + + fprintf (dump_file, "\n"); + } + + df_remove_problem (df_chain); + df_process_deferred_rescans (); + df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_LR_RUN_DCE); + df_analyze (); + return 0; +} +\f +/* Optimize pc-relative references for the new PCREL_OPT pass. */ +const pass_data pass_data_pcrel_opt = +{ + RTL_PASS, /* type. */ + "pcrel_opt", /* name. */ + OPTGROUP_NONE, /* optinfo_flags. */ + TV_NONE, /* tv_id. */ + 0, /* properties_required. */ + 0, /* properties_provided. */ + 0, /* properties_destroyed. */ + 0, /* todo_flags_start. */ + TODO_df_finish, /* todo_flags_finish. */ +}; + +/* Pass data structures. */ +class pcrel_opt : public rtl_opt_pass +{ +public: + pcrel_opt (gcc::context *ctxt) + : rtl_opt_pass (pass_data_pcrel_opt, ctxt) + {} + + ~pcrel_opt (void) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return (TARGET_PCREL && TARGET_PCREL_OPT && optimize); + } + + virtual unsigned int execute (function *fun) + { + return pcrel_opt_pass (fun); + } + + opt_pass *clone () + { + return new pcrel_opt (m_ctxt); + } +}; + +rtl_opt_pass * +make_pass_pcrel_opt (gcc::context *ctxt) +{ + return new pcrel_opt (ctxt); +} diff --git a/gcc/config/rs6000/pcrel-opt.md b/gcc/config/rs6000/pcrel-opt.md new file mode 100644 index 00000000000..70d4d8911b6 --- /dev/null +++ b/gcc/config/rs6000/pcrel-opt.md @@ -0,0 +1,256 @@ +;; Machine description for the PCREL_OPT optimization. +;; Copyright (C) 2020 Free Software Foundation, Inc. +;; Contributed by Michael Meissner (meissner@linux.ibm.com) + +;; This file is part of GCC. + +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published +;; by the Free Software Foundation; either version 3, or (at your +;; option) any later version. + +;; GCC is distributed in the hope that it will be useful, but WITHOUT +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public +;; License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. + +;; Support for the PCREL_OPT optimization. PCREL_OPT looks for instances where +;; an external variable is used only once, either for reading or for writing. +;; +;; If we are optimizing a single read, normally the code would look like: +;; +;; (set (reg:DI <ptr>) +;; (symbol_ref:DI "<extern_addr>")) # <data> is currently dead +;; +;; ... # insns do not need to be adjacent +;; +;; (set (reg:SI <data>) +;; (mem:SI (reg:DI <xxx>))) # <ptr> dies with this insn +;; +;; We transform this into: +;; +;; (parallel [(set (reg:DI <ptr>) +;; (unspec:SI [(symbol_ref:DI <extern_addr>) +;; (const_int <marker>)] +;; UNSPEC_PCREL_OPT_LD_ADDR)) +;; (set (reg:DI <data>) +;; (unspec:DI [(const_int 0)] +;; UNSPEC_PCREL_OPT_LD_ADDR))]) +;; +;; ... +;; +;; (parallel [(set (reg:SI <data>) +;; (unspec:SI [(mem:SI (reg:DI <ptr>)) +;; (reg:DI <data>) +;; (const_int <marker>)] +;; UNSPEC_PCREL_OPT_LD)) +;; (clobber (reg:DI <ptr>))]) +;; +;; The marker is an integer constant that links the load of the external +;; address to the load of the actual variable. +;; +;; In the first insn, we set both the address of the external variable, and +;; mark that the variable being loaded both are created in that insn, and are +;; consumed in the second insn. It doesn't matter what mode the register that +;; we will ultimately do the load into, so we use DImode. We just need to mark +;; that both registers may be set in the first insn, and will be used in the +;; second insn. +;; +;; Since we use UNSPEC's and link both the the register holding the external +;; address and the value being loaded, it should prevent other passes from +;; modifying it. +;; +;; If the register being loaded is the same as the base register, we use an +;; alternate form of the insns. +;; +;; (set (reg:DI <data_ptr>) +;; (unspec:DI [(symbol_ref:DI <extern_addr>) +;; (const_int <marker>)] +;; UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) +;; +;; ... +;; +;; (parallel [(set (reg:SI <data>) +;; (unspec:SI [(mem:SI (reg:DI <ptr>)) +;; (reg:DI <data>) +;; (const_int <marker>)] +;; UNSPEC_PCREL_OPT_LD)) +;; (clobber (reg:DI <ptr>))]) + +(define_c_enum "unspec" + [UNSPEC_PCREL_OPT_LD_ADDR + UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG + UNSPEC_PCREL_OPT_LD_RELOC]) + +;; Modes that are supported for PCREL_OPT +(define_mode_iterator PO [QI HI SI DI TI SF DF KF + V1TI V2DI V4SI V8HI V16QI V2DF V4SF + (TF "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")]) + +;; Vector modes for PCREL_OPT +(define_mode_iterator PO_VECT [TI KF V1TI V2DI V4SI V8HI V16QI V2DF V4SF + (TF "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")]) + +;; Insn for loading the external address, where the register being loaded is not +;; the same as the register being loaded with the data. +(define_insn "pcrel_opt_ld_addr" + [(set (match_operand:DI 0 "base_reg_operand" "=&b,&b") + (unspec:DI [(match_operand:DI 1 "pcrel_external_address") + (match_operand 2 "const_int_operand" "n,n")] + UNSPEC_PCREL_OPT_LD_ADDR)) + (set (match_operand:DI 3 "gpc_reg_operand" "=r,wa") + (unspec:DI [(const_int 0)] + UNSPEC_PCREL_OPT_LD_ADDR))] + "TARGET_PCREL_OPT + && reg_or_subregno (operands[0]) != reg_or_subregno (operands[3])" + "ld %0,%a1\n.Lpcrel%2:" + [(set_attr "prefixed" "yes") + (set_attr "type" "load") + (set_attr "loads_extern_addr" "yes")]) + +;; Alternate form of loading up the external address that is the same register +;; as the final load. +(define_insn "pcrel_opt_ld_addr_same_reg" + [(set (match_operand:DI 0 "base_reg_operand" "=b") + (unspec:DI [(match_operand:DI 1 "pcrel_external_address") + (match_operand 2 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG))] + "TARGET_PCREL_OPT" + "ld %0,%a1\n.Lpcrel%2:" + [(set_attr "prefixed" "yes") + (set_attr "type" "load") + (set_attr "loads_extern_addr" "yes")]) + +;; PCREL_OPT modes that are optimized for loading or storing GPRs. +(define_mode_iterator PO_GPR [QI HI SI DI SF DF]) + +(define_mode_attr PO_GPR_LD [(QI "lbz") + (HI "lhz") + (SI "lwz") + (SF "lwz") + (DI "ld") + (DF "ld")]) + +;; PCREL_OPT load operation of GPRs. Operand 4 (the register used to hold the +;; address of the external symbol) is SCRATCH if the same register is used for +;; the normal load. +(define_insn "*pcrel_opt_ld<mode>_gpr" + [(parallel [(set (match_operand:PO_GPR 0 "int_reg_operand" "+r") + (unspec:PO_GPR [(match_operand:PO_GPR 1 "d_form_memory" "o") + (match_operand:DI 2 "int_reg_operand" "0") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_scratch:DI 4 "=bX"))])] + "TARGET_PCREL_OPT + && (GET_CODE (operands[4]) == SCRATCH + || reg_mentioned_p (operands[4], operands[1]))" +{ + output_pcrel_opt_reloc (operands[3]); + return "<PO_GPR_LD> %0,%1"; +} + [(set_attr "type" "load")]) + +;; PCREL_OPT load with sign/zero extension +(define_insn "*pcrel_opt_ldsi_<u><mode>_gpr" + [(set (match_operand:EXTSI 0 "int_reg_operand" "+r") + (any_extend:EXTSI + (unspec:SI [(match_operand:SI 1 "d_form_memory" "o") + (match_operand:DI 2 "int_reg_operand" "0") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC))) + (clobber (match_scratch:DI 4 "=bX"))] + "TARGET_PCREL_OPT" +{ + output_pcrel_opt_reloc (operands[3]); + return "lw<az> %0,%1"; +} + [(set_attr "type" "load")]) + +(define_insn "*pcrel_opt_ldhi_<u><mode>_gpr" + [(set (match_operand:EXTHI 0 "int_reg_operand" "+r") + (any_extend:EXTHI + (unspec:HI [(match_operand:HI 1 "d_form_memory" "o") + (match_operand:DI 2 "int_reg_operand" "0") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC))) + (clobber (match_scratch:DI 4 "=bX"))] + "TARGET_PCREL_OPT" +{ + output_pcrel_opt_reloc (operands[3]); + return "lh<az> %0,%1"; +} + [(set_attr "type" "load")]) + +(define_insn "*pcrel_opt_ldqi_u<mode>_gpr" + [(set (match_operand:EXTQI 0 "int_reg_operand" "+r") + (zero_extend:EXTQI + (unspec:QI [(match_operand:QI 1 "d_form_memory" "o") + (match_operand:DI 2 "int_reg_operand" "0") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC))) + (clobber (match_scratch:DI 4 "=bX"))] + "TARGET_PCREL_OPT" +{ + output_pcrel_opt_reloc (operands[3]); + return "lbz %0,%1"; +} + [(set_attr "type" "load")]) + +;; Scalar types that can be optimized by loading them into floating point +;; or Altivec registers. +(define_mode_iterator PO_FP [DI DF SF]) + +;; Load instructions to load up scalar floating point or 64-bit integer values +;; into floating point registers or Altivec registers. +(define_mode_attr PO_FPR_LD [(DI "lfd") (DF "lfd") (SF "lfs")]) +(define_mode_attr PO_AVX_LD [(DI "lxsd") (DF "lxsd") (SF "lxssp")]) + +;; PCREL_OPT load operation of scalar DF/DI/SF into vector registers. +(define_insn "*pcrel_opt_ld<mode>_vsx" + [(set (match_operand:PO_FP 0 "vsx_register_operand" "+d,v") + (unspec:PO_FP [(match_operand:PO_FP 1 "d_form_memory" "o,o") + (match_operand:DI 2 "vsx_register_operand" "0,0") + (match_operand 3 "const_int_operand" "n,n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_operand:DI 4 "base_reg_operand" "=b,b"))] + "TARGET_PCREL_OPT" +{ + output_pcrel_opt_reloc (operands[3]); + return which_alternative ? "<PO_AVX_LD> %0,%1" : "<PO_FPR_LD> %0,%1"; +} + [(set_attr "type" "fpload")]) + +;; PCREL_OPT optimization extending SFmode to DFmode via a load. +(define_insn "*pcrel_opt_ldsf_df" + [(set (match_operand:DF 0 "vsx_register_operand" "+d,v") + (float_extend:DF + (unspec:SF [(match_operand:SF 1 "d_form_memory" "o,o") + (match_operand:DI 2 "vsx_register_operand" "0,0") + (match_operand 3 "const_int_operand" "n,n")] + UNSPEC_PCREL_OPT_LD_RELOC))) + (clobber (match_operand:DI 4 "base_reg_operand" "=b,b"))] + "TARGET_PCREL_OPT" +{ + output_pcrel_opt_reloc (operands[3]); + return which_alternative ? "lxssp %0,%1" : "lfs %0,%1"; +} + [(set_attr "type" "fpload")]) + +;; PCREL_OPT load operation of vector/float128 types into vector registers. +(define_insn "*pcrel_opt_ld<mode>" + [(set (match_operand:PO_VECT 0 "vsx_register_operand" "+wa") + (unspec:PO_VECT [(match_operand:PO_VECT 1 "d_form_memory" "o") + (match_operand:DI 2 "vsx_register_operand" "0") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_operand:DI 4 "base_reg_operand" "=b"))] + "TARGET_PCREL_OPT" +{ + output_pcrel_opt_reloc (operands[3]); + return "lxv %x0,%1"; +} + [(set_attr "type" "vecload")]) diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 60b45601e9b..6e06571300a 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1881,3 +1881,26 @@ (define_predicate "prefixed_memory" { return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); }) + +;; Return true if the operand is a valid memory operand with an offsettable +;; address that could be merged with the load of a PC-relative external address +;; with the PCREL_OPT optimization. We don't check here whether or not the +;; offset needs to be used in a DS-FORM (bottom 2 bits 0) or DQ-FORM (bottom 4 +;; bits 0) instruction. +(define_predicate "d_form_memory" + (match_code "mem") +{ + if (!memory_operand (op, mode)) + return false; + + rtx addr = XEXP (op, 0); + + if (REG_P (addr) || SUBREG_P (addr)) + return true; + + if (GET_CODE (addr) != PLUS) + return false; + + return (base_reg_operand (XEXP (addr, 0), Pmode) + && satisfies_constraint_I (XEXP (addr, 1))); +}) diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 8d2c1ffd6cf..d3f72d77a6e 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -78,6 +78,7 @@ /* Flags that need to be turned off if -mno-power10. */ #define OTHER_POWER10_MASKS (OPTION_MASK_MMA \ | OPTION_MASK_PCREL \ + | OPTION_MASK_PCREL_OPT \ | OPTION_MASK_PREFIXED) #define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \ @@ -142,6 +143,7 @@ | OPTION_MASK_P9_MISC \ | OPTION_MASK_P9_VECTOR \ | OPTION_MASK_PCREL \ + | OPTION_MASK_PCREL_OPT \ | OPTION_MASK_POPCNTB \ | OPTION_MASK_POPCNTD \ | OPTION_MASK_POWERPC64 \ diff --git a/gcc/config/rs6000/rs6000-passes.def b/gcc/config/rs6000/rs6000-passes.def index 5164c526e34..b62244b8df2 100644 --- a/gcc/config/rs6000/rs6000-passes.def +++ b/gcc/config/rs6000/rs6000-passes.def @@ -24,4 +24,12 @@ along with GCC; see the file COPYING3. If not see REPLACE_PASS (PASS, INSTANCE, TGT_PASS) */ + /* Pass to add the appropriate vector swaps on power8 little endian systems. + The power8 does not have instructions that automaticaly do the byte swaps + for loads and stores. */ INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps); + + /* Pass to do the PCREL_OPT optimization that combines the load of an + external symbol's address along with a single load or store using that + address as a base register. */ + INSERT_PASS_BEFORE (pass_sched2, 1, pass_pcrel_opt); diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 02e4d71922f..cfc832b10be 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -189,10 +189,13 @@ enum non_prefixed_form { extern enum insn_form address_to_insn_form (rtx, machine_mode, enum non_prefixed_form); +extern enum non_prefixed_form reg_to_non_prefixed (rtx, machine_mode); +extern bool offsettable_non_prefixed_memory (rtx, machine_mode, rtx); extern bool prefixed_load_p (rtx_insn *); extern bool prefixed_store_p (rtx_insn *); extern bool prefixed_paddi_p (rtx_insn *); extern void rs6000_asm_output_opcode (FILE *); +extern void output_pcrel_opt_reloc (rtx); extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int); extern int rs6000_adjust_insn_length (rtx_insn *, int); @@ -305,6 +308,7 @@ namespace gcc { class context; } class rtl_opt_pass; extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *); +extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *); extern bool rs6000_sum_of_two_registers_p (const_rtx expr); extern bool rs6000_quadword_masked_address_p (const_rtx exp); extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 583aad56639..226b9ce67cb 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1175,7 +1175,6 @@ static bool rs6000_secondary_reload_move (enum rs6000_reg_type, machine_mode, secondary_reload_info *, bool); -static enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode); rtl_opt_pass *make_pass_analyze_swaps (gcc::context*); /* Hash table stuff for keeping track of TOC entries. */ @@ -4324,6 +4323,14 @@ rs6000_option_override_internal (bool global_init_p) rs6000_isa_flags &= ~OPTION_MASK_MMA; } + if (!TARGET_PCREL && TARGET_PCREL_OPT) + { + if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL_OPT) != 0) + error ("%qs requires %qs", "-mpcrel-opt", "-mpcrel"); + + rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT; + } + if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET) rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags); @@ -8523,8 +8530,46 @@ rs6000_delegitimize_address (rtx orig_x) { rtx x, y, offset; - if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR) - orig_x = XVECEXP (orig_x, 0, 0); + /* UNSPEC_FUSION_GPR is created by the peephole2 for power8 fusion. It + encodes loading up the high part of the address of a TOC reference along + with a load of a GPR using the same base register used for the load. We + return the original SYMBOL_REF. + + (set (reg:INT1 <reg> + (unspec:INT1 [<combined-address>] UNSPEC_FUSION_GPR))) + + UNSPEC_PCREL_OPT_LD_ADDR is used by the power10 PCREL_OPT pass. These + UNSPECs include the external SYMBOL_REF along with the value being loaded. + We return the original SYMBOL_REF. + + (parallel [(set (reg:DI <base-reg>) + (unspec:DI [(symbol_ref <symbol>) + (const_int <marker>)] + UNSPEC_PCREL_OPT_LD_ADDR)) + (set (reg:DI <load-reg>) + (unspec:DI [(const_int 0)] + UNSPEC_PCREL_OPT_LD_ADDR))]) + + UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG is an alternative that is used if the + GPR being loaded is the same as the GPR used to hold the external address. + + (set (reg:DI <base-reg>) + (unspec:DI [(symbol_ref <symbol>) + (const_int <marker>)] + UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) */ + + if (GET_CODE (orig_x) == UNSPEC) + switch (XINT (orig_x, 1)) + { + case UNSPEC_FUSION_GPR: + case UNSPEC_PCREL_OPT_LD_ADDR: + case UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG: + orig_x = XVECEXP (orig_x, 0, 0); + break; + + default: + break; + } orig_x = delegitimize_mem_from_attrs (orig_x); @@ -23288,6 +23333,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] = { "mulhw", OPTION_MASK_MULHW, false, true }, { "multiple", OPTION_MASK_MULTIPLE, false, true }, { "pcrel", OPTION_MASK_PCREL, false, true }, + { "pcrel-opt", OPTION_MASK_PCREL_OPT, false, true }, { "popcntb", OPTION_MASK_POPCNTB, false, true }, { "popcntd", OPTION_MASK_POPCNTD, false, true }, { "power8-fusion", OPTION_MASK_P8_FUSION, false, true }, @@ -25369,6 +25415,32 @@ address_to_insn_form (rtx addr, return INSN_FORM_BAD; } +/* Return true if an REG with a given MODE is loaded from or stored into a MEM + location uses a non-prefixed offsettable address. This is used to validate + the load or store with the PCREL_OPT optimization to make sure it is an + instruction that can be optimized. + + We need to specify the MODE separately from the REG to allow for loads that + include zero/sign/float extension. */ + +bool +offsettable_non_prefixed_memory (rtx reg, machine_mode mode, rtx mem) +{ + /* If the instruction is indexed only like LFIWAX/LXSIWAX, it is not + offsettable. */ + enum non_prefixed_form non_prefixed = reg_to_non_prefixed (reg, mode); + if (non_prefixed == NON_PREFIXED_X) + return false; + + /* Check if this is a non-prefixed offsettable instruction. */ + rtx addr = XEXP (mem, 0); + enum insn_form iform = address_to_insn_form (addr, mode, non_prefixed); + return (iform == INSN_FORM_BASE_REG + || iform == INSN_FORM_D + || iform == INSN_FORM_DS + || iform == INSN_FORM_DQ); +} + /* Helper function to see if we're potentially looking at lfs/stfs. - PARALLEL containing a SET and a CLOBBER - stfs: @@ -25427,7 +25499,7 @@ is_lfs_stfs_insn (rtx_insn *insn) /* Helper function to take a REG and a MODE and turn it into the non-prefixed instruction format (D/DS/DQ) used for offset memory. */ -static enum non_prefixed_form +enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode) { /* If it isn't a register, use the defaults. */ @@ -25650,11 +25722,34 @@ void rs6000_asm_output_opcode (FILE *stream) { if (next_insn_prefixed_p) - fprintf (stream, "p"); + { + fprintf (stream, "p"); + + /* Reset flag in case there are separate insn lines in the sequence, so + the 'p' is only emited for the first line. This shows up when we are + doing the PCREL_OPT optimization, in that the label created with %r<n> + would have a leading 'p' printed. */ + next_insn_prefixed_p = false; + } return; } +/* Emit the relocation to tie the next instruction to a previous instruction + that loads up an external address. This is used to do the PCREL_OPT + optimization. Note, the label is generated after the PLD of the got + pc-relative address to allow for the assembler to insert NOPs before the PLD + instruction. The operand is a constant integer that is the label + number. */ + +void +output_pcrel_opt_reloc (rtx label_num) +{ + rtx operands[1] = { label_num }; + output_asm_insn (".reloc .Lpcrel%0-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%0-8)", + operands); +} + /* Adjust the length of an INSN. LENGTH is the currently-computed length and should be adjusted to reflect any required changes. This macro is used when there is some systematic length adjustment required that would be difficult diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 147c635994c..21184932c19 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -292,6 +292,10 @@ (define_attr "prefixed" "no,yes" (const_string "no"))) +;; Whether an insn loads an external address for the PCREL_OPT optimizaton. +(define_attr "loads_extern_addr" "no,yes" + (const_string "no")) + ;; Return the number of real hardware instructions in a combined insn. If it ;; is 0, just use the length / 4. (define_attr "num_insns" "" (const_int 0)) @@ -10254,7 +10258,8 @@ (define_insn "*pcrel_extern_addr" "TARGET_PCREL" "ld %0,%a1" [(set_attr "prefixed" "yes") - (set_attr "type" "load")]) + (set_attr "type" "load") + (set_attr "loads_extern_addr" "yes")]) ;; TOC register handling. @@ -14928,3 +14933,4 @@ (define_insn "*cmpeqb_internal" (include "dfp.md") (include "crypto.md") (include "htm.md") +(include "pcrel-opt.md") diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index b2a70e88ca8..d1719bfd2a0 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -586,6 +586,10 @@ mpcrel Target Report Mask(PCREL) Var(rs6000_isa_flags) Generate (do not generate) pc-relative memory addressing. +mpcrel-opt +Target Undocumented Mask(PCREL_OPT) Var(rs6000_isa_flags) +Generate (do not generate) pc-relative memory optimizations for externals. + mmma Target Report Mask(MMA) Var(rs6000_isa_flags) Generate (do not generate) MMA instructions. diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000 index 1ddb5729cb2..a617276484e 100644 --- a/gcc/config/rs6000/t-rs6000 +++ b/gcc/config/rs6000/t-rs6000 @@ -23,6 +23,10 @@ TM_H += $(srcdir)/config/rs6000/rs6000-cpus.def TM_H += $(srcdir)/config/rs6000/rs6000-modes.h PASSES_EXTRA += $(srcdir)/config/rs6000/rs6000-passes.def +pcrel-opt.o: $(srcdir)/config/rs6000/pcrel-opt.c + $(COMPILE) $< + $(POSTCOMPILE) + rs6000-c.o: $(srcdir)/config/rs6000/rs6000-c.c $(COMPILE) $< $(POSTCOMPILE) @@ -86,4 +90,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \ $(srcdir)/config/rs6000/mma.md \ $(srcdir)/config/rs6000/crypto.md \ $(srcdir)/config/rs6000/htm.md \ - $(srcdir)/config/rs6000/dfp.md + $(srcdir)/config/rs6000/dfp.md \ + $(srcdir)/config/rs6000/pcrel-opt.md -- 2.22.0 -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/3] power10: Add PCREL_OPT store support. 2020-09-05 4:27 [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) Michael Meissner 2020-09-05 4:31 ` [PATCH 1/3] power10: Add PCREL_OPT load support Michael Meissner @ 2020-09-05 4:35 ` Michael Meissner 2020-09-05 4:36 ` [PATCH 3/3] power10: Add tests for PCREL_OPT Michael Meissner ` (2 subsequent siblings) 4 siblings, 0 replies; 9+ messages in thread From: Michael Meissner @ 2020-09-05 4:35 UTC (permalink / raw) To: Michael Meissner, gcc-patches, Segher Boessenkool, David Edelsohn, Bill Schmidt, Peter Bergner power10: Add PCREL_OPT store support. This patch adds support for optimizing power10 stores to an external variable to eliminate loading the address of the variable, and then doing a subsequent load using that address. The previous patch added the support for optimizing power10 loads from an external variable. The third patch will add the test suite for these patches. I have built compilers with and without these set of 3 patches doing a bootstrap build and make check. There were no regressions, and the new tests passed. Can I check these patches into the master branch for GCC? Because this is new functionality, I do not intend to back port these patches to GCC 10 at this time. gcc/ 2020-09-04 Michael Meissner <meissner@linux.ibm.com> * config/rs6000/pcrel-opt.c (pcrel_opt_store): New function. (pcrel_opt_address): Add PCREL_OPT support for stores. (pcrel_opt_pass): Print PCREL_OPT store statistics. * config/rs6000/pcrel-opt.md (UNSPEC_PCREL_OPT_ST_ADDR): New unspec. (UNSPEC_PCREL_OPT_ST_RELOC): New unspec. (pcrel_opt_st_addr<mode>): New insns for PCREL_OPT store support. (pcrel_opt_st<mode>, QHSI iterator): New insns for PCREL_OPT store support. (pcrel_opt_stdi): New insn for PCREL_OPT store support. (pcrel_opt_stsf): New insn for PCREL_OPT store support. (pcrel_opt_stdf): New insns for PCREL_OPT store support. (pcrel_opt_st<mode>, PO_VECT iterator): New insns for PCREL_OPT store support. * config/rs6000/rs6000.c (rs6000_delegitimize_address): Add support for PCREL_OPT store. --- gcc/config/rs6000/pcrel-opt.c | 228 ++++++++++++++++++++++++++++++++- gcc/config/rs6000/pcrel-opt.md | 132 ++++++++++++++++++- gcc/config/rs6000/rs6000.c | 13 +- 3 files changed, 365 insertions(+), 8 deletions(-) diff --git a/gcc/config/rs6000/pcrel-opt.c b/gcc/config/rs6000/pcrel-opt.c index f831853c90b..291ee35b690 100644 --- a/gcc/config/rs6000/pcrel-opt.c +++ b/gcc/config/rs6000/pcrel-opt.c @@ -51,6 +51,43 @@ lwz data_reg,0(addr_reg) + We only look for a single usage in the basic block where the external + address is loaded. Multiple uses or references in another basic block will + force us to not use the PCREL_OPT relocation. + + We also optimize stores to the address of an external variable using the + PCREL_GOT relocation and a single store that uses that external address. If + that is found we create the PCREL_OPT relocation to possibly convert: + + pld addr_reg,var@pcrel@got + + <possibly other insns that do not use 'addr_reg' or 'data_reg'> + + stw data_reg,0(addr_reg) + + into: + + pstw data_reg,var@pcrel + + <possibly other insns that do not use 'addr_reg' or 'data_reg'> + + nop + + If the variable is not defined in the main program or the code using it is + not in the main program, the linker put the address in the .got section and + do: + + .section .got + .Lvar_got: + .dword var + + .section .text + pld addr_reg,.Lvar_got@pcrel + + <possibly other insns that do not use 'addr_reg' or 'data_reg'> + + stw data_reg,0(addr_reg) + We only look for a single usage in the basic block where the external address is loaded. Multiple uses or references in another basic block will force us to not use the PCREL_OPT relocation. */ @@ -87,9 +124,12 @@ static struct { unsigned long loads; unsigned long adjacent_loads; unsigned long failed_loads; + unsigned long stores; + unsigned long adjacent_stores; + unsigned long failed_stores; } counters; -/* Return a marker to identify the PCREL_OPT load address and load +/* Return a marker to identify the PCREL_OPT load address and load/store instruction. We use a constant integer which is added to ".Lpcrel" to make the label. */ @@ -328,6 +368,160 @@ pcrel_opt_load (rtx_insn *addr_insn, /* insn loading address. */ return; } \f +/* Optimize a PC-relative load address to be used in a store. + + If the sequence of insns is safe to use the PCREL_OPT optimization (i.e. no + additional references to the address register, the address register dies at + the load, and no references to the load), convert insns of the form: + + (set (reg:DI addr) + (symbol_ref:DI "ext_symbol")) + + ... + + (set (mem:<MODE> (reg:DI addr)) + (reg:<MODE> value)) + + into: + + (parallel [(set (reg:DI addr) + (unspec:DI [(symbol_ref:DI "ext_symbol") + (const_int label_num)] + UNSPEC_PCREL_OPT_ST_ADDR)) + (use (reg:<MODE> value))]) + + ... + + (parallel [(set (mem:<MODE> (reg:DI addr)) + (unspec:<MODE> [(reg:<MODE>) + (const_int label_num)] + UNSPEC_PCREL_OPT_ST_RELOC)) + (clobber (reg:DI addr))]) + + The UNSPEC_PCREL_OPT_ST_ADDR insn will generate the load address plus a + definition of a label (.Lpcrel<n>), while the UNSPEC_PCREL_OPT_ST_RELOC insn + will generate the .reloc to tell the linker to tie the load address and load + using that address together. + + pld b,ext_symbol@got@pcrel + .Lpcrel1: + + ... + + .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8) + stw r,0(b) + + If ext_symbol is defined in another object file in the main program and we + are linking the main program, the linker will convert the above instructions + to: + + pstwz r,ext_symbol@got@pcrel + + ... + + nop */ + +static void +pcrel_opt_store (rtx_insn *addr_insn, /* insn loading address. */ + rtx_insn *store_insn) /* insn using address. */ +{ + rtx addr_old_set = PATTERN (addr_insn); + gcc_assert (GET_CODE (addr_old_set) == SET); + + rtx addr_reg = SET_DEST (addr_old_set); + gcc_assert (base_reg_operand (addr_reg, Pmode)); + + rtx addr_symbol = SET_SRC (addr_old_set); + gcc_assert (pcrel_external_address (addr_symbol, Pmode)); + + rtx store_set = PATTERN (store_insn); + gcc_assert (GET_CODE (store_set) == SET); + + rtx mem = SET_DEST (store_set); + if (!MEM_P (mem)) + return; + + machine_mode mem_mode = GET_MODE (mem); + rtx reg = SET_SRC (store_set); + + /* Don't allow storing the address of the external variable. Make sure the + value being stored wasn't updated. */ + if (!register_operand (reg, GET_MODE (reg)) + && reg_or_subregno (reg) != reg_or_subregno (addr_reg) + && !reg_set_between_p (reg, addr_insn, store_insn)) + return; + + /* If the address isn't a non-prefixed offsettable instruction, we can't do + the optimization. */ + if (!offsettable_non_prefixed_memory (reg, mem_mode, mem)) + return; + + /* Allocate a new PC-relative label, and update the load address insn. + + (parallel [(set (reg addr) + (unspec [(symbol_ref symbol) + (const_int label_num)] + UNSPEC_PCREL_OPT_ST_ADDR)) + (use (reg store))]) */ + rtx label_num = pcrel_opt_next_marker (); + rtvec v_addr = gen_rtvec (2, addr_symbol, label_num); + rtx addr_unspec = gen_rtx_UNSPEC (Pmode, v_addr, + UNSPEC_PCREL_OPT_ST_ADDR); + rtx addr_new_set = gen_rtx_SET (addr_reg, addr_unspec); + rtx addr_use = gen_rtx_USE (VOIDmode, reg); + rtx addr_new_pattern + = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, addr_new_set, addr_use)); + + validate_change (addr_insn, &PATTERN (addr_insn), addr_new_pattern, true); + + /* Update the store insn. Add an explicit clobber of the external address + register just to be sure there are no additional uses of the address + register. + + (parallel [(set (mem (addr_reg) + (unspec:<MODE> [(reg) + (const_int label_num)] + UNSPEC_PCREL_OPT_ST_RELOC)) + (clobber (reg:DI addr_reg))]) */ + rtvec v_store = gen_rtvec (2, reg, label_num); + rtx new_store = gen_rtx_UNSPEC (mem_mode, v_store, + UNSPEC_PCREL_OPT_ST_RELOC); + + rtx new_store_set = gen_rtx_SET (mem, new_store); + rtx store_clobber = gen_rtx_CLOBBER (VOIDmode, addr_reg); + rtx new_store_pattern + = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, new_store_set, store_clobber)); + + validate_change (store_insn, &PATTERN (store_insn), new_store_pattern, true); + + /* Note whether changes succeeded or not. */ + if (apply_change_group ()) + { + /* PCREL_OPT store succeeded. */ + counters.stores++; + if (next_nonnote_insn (addr_insn) == store_insn) + counters.adjacent_stores++; + + if (dump_file) + fprintf (dump_file, + "PCREL_OPT store (addr insn = %d, use insn = %d).\n", + INSN_UID (addr_insn), + INSN_UID (store_insn)); + } + else + { + /* PCREL_OPT store failed. */ + counters.failed_stores++; + if (dump_file) + fprintf (dump_file, + "PCREL_OPT store failed (addr insn = %d, use insn = %d).\n", + INSN_UID (addr_insn), + INSN_UID (store_insn)); + } + + return; +} +\f /* Given an insn with that loads up a base register with the address of an external symbol, see if we can optimize it with the PCREL_OPT optimization. */ @@ -363,7 +557,7 @@ pcrel_opt_address (rtx_insn *addr_insn) if (!chain || chain->next) return; - /* Get the insn of the possible load. */ + /* Get the insn of the possible load or store. */ df_ref use = chain->ref; if (!use) return; @@ -449,7 +643,7 @@ pcrel_opt_address (rtx_insn *addr_insn) } /* If this is the last insn in the basic block, and we haven't found the - load, exit. */ + load or store, exit. */ if (insn == last_insn_in_bb) { do_pcrel_opt = false; @@ -461,7 +655,7 @@ pcrel_opt_address (rtx_insn *addr_insn) if (!do_pcrel_opt) return; - /* Is this a load? */ + /* Is this a load or a store? */ switch (get_attr_type (use_insn)) { /* Don't do the PCREL_OPT load optimization if there was a store @@ -476,7 +670,22 @@ pcrel_opt_address (rtx_insn *addr_insn) pcrel_opt_load (addr_insn, use_insn); break; - /* If the use is not a load, just skip the optimization. */ + /* Don't do the PCREL_OPT store optimization if there was a load or store + operation. For example, a load might be trying to load the value + being stored in between getting the address and doing the store. If + we do the PCREL_OPT store optimization, there is the potential for the + optimization to replace the load address with a store, which could + change the program. */ + case TYPE_STORE: + case TYPE_FPSTORE: + case TYPE_VECSTORE: + if (store_insns_found || load_insns_found) + break; + + pcrel_opt_store (addr_insn, use_insn); + break; + + /* If the use is not a load or store, just skip the optimization. */ default: break; } @@ -505,7 +714,7 @@ pcrel_opt_pass (function *fun) fprintf (dump_file, "\n"); /* Look at each basic block to see if there is a load of an external - variable's external address, and a single load using that external + variable's external address, and a single load/store using that external address. */ FOR_ALL_BB_FN (bb, fun) { @@ -531,6 +740,13 @@ pcrel_opt_pass (function *fun) fprintf (dump_file, "# of failed PCREL_OPT load(s) = %lu\n", counters.failed_loads); + fprintf (dump_file, "# of PCREL_OPT store(s) = %lu (adjacent %lu)\n", + counters.stores, counters.adjacent_stores); + + if (counters.failed_stores) + fprintf (dump_file, "# of failed PCREL_OPT store(s) = %lu\n", + counters.failed_stores); + fprintf (dump_file, "\n"); } diff --git a/gcc/config/rs6000/pcrel-opt.md b/gcc/config/rs6000/pcrel-opt.md index 70d4d8911b6..60f2faffbcb 100644 --- a/gcc/config/rs6000/pcrel-opt.md +++ b/gcc/config/rs6000/pcrel-opt.md @@ -84,7 +84,9 @@ (define_c_enum "unspec" [UNSPEC_PCREL_OPT_LD_ADDR UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG - UNSPEC_PCREL_OPT_LD_RELOC]) + UNSPEC_PCREL_OPT_LD_RELOC + UNSPEC_PCREL_OPT_ST_ADDR + UNSPEC_PCREL_OPT_ST_RELOC]) ;; Modes that are supported for PCREL_OPT (define_mode_iterator PO [QI HI SI DI TI SF DF KF @@ -254,3 +256,131 @@ (define_insn "*pcrel_opt_ld<mode>" return "lxv %x0,%1"; } [(set_attr "type" "vecload")]) + +\f +;; PCREL_OPT optimization for stores. We need to put the label after the PLD +;; instruction, because the assembler might insert a NOP before the PLD for +;; alignment. +;; +;; If we are optimizing a single write, normally the code would look like: +;; +;; (set (reg:DI <ptr>) +;; (symbol_ref:DI "<extern_addr>")) # <data> must be live here +;; +;; ... # insns do not need to be adjacent +;; +;; (set (mem:SI (reg:DI <xxx>)) +;; (reg:SI <data>)) # <ptr> dies with this insn +;; +;; We optimize this to be: +;; +;; (parallel [(set (reg:DI <ptr>) +;; (unspec:DI [(symbol_ref:DI "<extern_addr>") +;; (const_int <marker>)] +;; UNSPEC_PCREL_OPT_ST_ADDR)) +;; (use (reg:<MODE> <data>))]) +;; +;; ... # insns do not need to be adjacent +;; +;; (parallel [(set (mem:<MODE> (reg:DI <ptr>)) +;; (unspec:<MODE> [(reg:<MODE> <data>) +;; (const_int <marker>)] +;; UNSPEC_PCREL_OPT_ST_RELOC)) +;; (clobber (reg:DI <ptr>))]) + +(define_insn "*pcrel_opt_st_addr<mode>" + [(set (match_operand:DI 0 "gpc_reg_operand" "=b") + (unspec:DI [(match_operand:DI 1 "pcrel_external_address") + (match_operand 2 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_ST_ADDR)) + (use (match_operand:PO 3 "gpc_reg_operand" "rwa"))] + "TARGET_PCREL_OPT" + "ld %0,%a1\n.Lpcrel%2:" + [(set_attr "prefixed" "yes") + (set_attr "type" "load") + (set_attr "loads_extern_addr" "yes")]) + +;; PCREL_OPT stores. +(define_insn "*pcrel_opt_st<mode>" + [(set (match_operand:QHSI 0 "d_form_memory" "=o") + (unspec:QHSI [(match_operand:QHSI 1 "gpc_reg_operand" "r") + (match_operand 2 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_ST_RELOC)) + (clobber (match_operand:DI 3 "base_reg_operand" "=b"))] + "TARGET_PCREL_OPT" +{ + output_pcrel_opt_reloc (operands[2]); + return "st<wd> %1,%0"; +} + [(set_attr "type" "store")]) + +(define_insn "*pcrel_opt_stdi" + [(set (match_operand:DI 0 "d_form_memory" "=o,o,o") + (unspec:DI [(match_operand:DI 1 "gpc_reg_operand" "r,d,v") + (match_operand 2 "const_int_operand" "n,n,n")] + UNSPEC_PCREL_OPT_ST_RELOC)) + (clobber (match_operand:DI 3 "base_reg_operand" "=b,b,b"))] + "TARGET_PCREL_OPT && TARGET_POWERPC64" +{ + output_pcrel_opt_reloc (operands[2]); + switch (which_alternative) + { + case 0: return "std %1,%0"; + case 1: return "stfd %1,%0"; + case 2: return "stxsd %1,%0"; + default: gcc_unreachable (); + } +} + [(set_attr "type" "store,fpstore,fpstore")]) + +(define_insn "*pcrel_opt_stsf" + [(set (match_operand:SF 0 "d_form_memory" "=o,o,o") + (unspec:SF [(match_operand:SF 1 "gpc_reg_operand" "d,v,r") + (match_operand 2 "const_int_operand" "n,n,n")] + UNSPEC_PCREL_OPT_ST_RELOC)) + (clobber (match_operand:DI 3 "base_reg_operand" "=b,b,b"))] + "TARGET_PCREL_OPT" +{ + output_pcrel_opt_reloc (operands[2]); + switch (which_alternative) + { + case 0: return "stfs %1,%0"; + case 1: return "stxssp %1,%0"; + case 2: return "stw %1,%0"; + default: gcc_unreachable (); + } +} + [(set_attr "type" "fpstore,fpstore,store")]) + +(define_insn "*pcrel_opt_stdf" + [(set (match_operand:DF 0 "d_form_memory" "=o,o,o") + (unspec:DF [(match_operand:DF 1 "gpc_reg_operand" "d,v,r") + (match_operand 2 "const_int_operand" "n,n,n")] + UNSPEC_PCREL_OPT_ST_RELOC)) + (clobber (match_operand:DI 3 "base_reg_operand" "=b,b,b"))] + "TARGET_PCREL_OPT + && (TARGET_POWERPC64 || vsx_register_operand (operands[1], DFmode))" +{ + output_pcrel_opt_reloc (operands[2]); + switch (which_alternative) + { + case 0: return "stfd %1,%0"; + case 1: return "stxsd %1,%0"; + case 2: return "std %1,%0"; + default: gcc_unreachable (); + } +} + [(set_attr "type" "fpstore,fpstore,store")]) + +(define_insn "*pcrel_opt_st<mode>" + [(set (match_operand:PO_VECT 0 "d_form_memory" "=o") + (unspec:PO_VECT [(match_operand:PO_VECT 1 "gpc_reg_operand" "wa") + (match_operand 2 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_ST_RELOC)) + (clobber (match_operand:DI 3 "base_reg_operand" "=b"))] + "TARGET_PCREL_OPT" +{ + output_pcrel_opt_reloc (operands[2]); + return "stxv %x1,%0"; +} + [(set_attr "type" "vecstore")]) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 226b9ce67cb..6e74bfde3c7 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -8556,7 +8556,17 @@ rs6000_delegitimize_address (rtx orig_x) (set (reg:DI <base-reg>) (unspec:DI [(symbol_ref <symbol>) (const_int <marker>)] - UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) */ + UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) + + UNSPEC_PCREL_OPT_ST_ADDR is used by the power10 PCREL_OPT pass. This + UNSPEC include the external SYMBOL_REF along with the value being loaded. + We return the original SYMBOL_REF. + + (parallel [(set (reg:DI <base-reg>) + (unspec:DI [(symbol_ref <symbol>) + (const_int <marker>)] + UNSPEC_PCREL_OPT_ST_ADDR)) + (use (reg <store-reg>))]) */ if (GET_CODE (orig_x) == UNSPEC) switch (XINT (orig_x, 1)) @@ -8564,6 +8574,7 @@ rs6000_delegitimize_address (rtx orig_x) case UNSPEC_FUSION_GPR: case UNSPEC_PCREL_OPT_LD_ADDR: case UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG: + case UNSPEC_PCREL_OPT_ST_ADDR: orig_x = XVECEXP (orig_x, 0, 0); break; -- 2.22.0 -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 3/3] power10: Add tests for PCREL_OPT. 2020-09-05 4:27 [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) Michael Meissner 2020-09-05 4:31 ` [PATCH 1/3] power10: Add PCREL_OPT load support Michael Meissner 2020-09-05 4:35 ` [PATCH 2/3] power10: Add PCREL_OPT store support Michael Meissner @ 2020-09-05 4:36 ` Michael Meissner 2020-09-05 15:20 ` [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) Michael Meissner 2020-09-22 3:21 ` Ping: " Michael Meissner 4 siblings, 0 replies; 9+ messages in thread From: Michael Meissner @ 2020-09-05 4:36 UTC (permalink / raw) To: Michael Meissner, gcc-patches, Segher Boessenkool, David Edelsohn, Bill Schmidt, Peter Bergner power10: Add tests for PCREL_OPT. This patch add the PCREL_OPT tests for the previous two patches. I have built compilers with and without these set of 3 patches doing a bootstrap build and make check. There were no regressions, and the new tests passed. Can I check these patches into the master branch for GCC? Because this is new functionality, I do not intend to back port these patches to GCC 10 at this time. gcc/testsuite/ 2020-09-04 Michael Meissner <meissner@linux.ibm.com> * gcc.target/powerpc/pcrel-opt-inc-di.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-ld-df.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-ld-di.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-ld-hi.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-ld-qi.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-ld-sf.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-ld-si.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-ld-vector.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-st-df.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-st-di.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-st-hi.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-st-qi.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-st-sf.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-st-si.c: New PCREL_OPT test. * gcc.target/powerpc/pcrel-opt-st-vector.c: New PCREL_OPT test. --- .../gcc.target/powerpc/pcrel-opt-inc-di.c | 18 ++++++++ .../gcc.target/powerpc/pcrel-opt-ld-df.c | 36 ++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-ld-di.c | 43 +++++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-ld-hi.c | 42 ++++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-ld-qi.c | 42 ++++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-ld-sf.c | 42 ++++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-ld-si.c | 41 ++++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-ld-vector.c | 36 ++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-st-df.c | 36 ++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-st-di.c | 43 +++++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-st-hi.c | 42 ++++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-st-qi.c | 42 ++++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-st-sf.c | 36 ++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-st-si.c | 41 ++++++++++++++++++ .../gcc.target/powerpc/pcrel-opt-st-vector.c | 36 ++++++++++++++++ 15 files changed, 576 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c create mode 100644 gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-vector.c diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c new file mode 100644 index 00000000000..f165068e2be --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-inc-di.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE unsigned int + +/* Test whether using an external variable twice (doing an increment) prevents + the PCREL_OPT optimization. */ +extern TYPE ext; + +void +inc (void) +{ + ext++; /* No PCREL_OPT (uses address twice). */ +} + +/* { dg-final { scan-assembler-not "R_PPC64_PCREL_OPT" } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c new file mode 100644 index 00000000000..d35862fcb6e --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-df.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE double +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for + double. */ +extern TYPE ext[]; + +TYPE +get (void) +{ + return ext[0]; /* PCREL_OPT relocation. */ +} + +TYPE +get2 (void) +{ + return ext[2]; /* PCREL_OPT relocation. */ +} + +TYPE +get_large (void) +{ + return ext[LARGE]; /* No PCREL_OPT (load is prefixed). */ +} + +TYPE +get_variable (unsigned long n) +{ + return ext[n]; /* No PCREL_OPT (load is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c new file mode 100644 index 00000000000..12b51ab2e67 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-di.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE long long +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for long + long. */ +extern TYPE ext[]; + +TYPE +get (void) +{ + return ext[0]; /* PCREL_OPT relocation. */ +} + +TYPE +get2 (void) +{ + return ext[2]; /* PCREL_OPT relocation. */ +} + +TYPE +get_large (void) +{ + return ext[LARGE]; /* No PCREL_OPT (load is prefixed). */ +} + +TYPE +get_variable (unsigned long n) +{ + return ext[n]; /* No PCREL_OPT (load is indexed). */ +} + +double +get_double (void) +{ + return (double) ext[0]; /* PCREL_OPT relocation. */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 3 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c new file mode 100644 index 00000000000..4143aeb7371 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-hi.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE unsigned short +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for unsigned + short. */ +extern TYPE ext[]; + +TYPE +get (void) +{ + return ext[0]; /* PCREL_OPT relocation. */ +} + +TYPE +get2 (void) +{ + return ext[2]; /* PCREL_OPT relocation. */ +} + +TYPE +get_large (void) +{ + return ext[LARGE]; /* No PCREL_OPT (load is prefixed). */ +} + +TYPE +get_variable (unsigned long n) +{ + return ext[n]; /* No PCREL_OPT (load is indexed). */ +} + +double +get_double (void) +{ + return (double) ext[0]; /* No PCREL_OPT (LXSIHZX is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c new file mode 100644 index 00000000000..30d3236f95c --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-qi.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE unsigned char +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for unsigned + char. */ +extern TYPE ext[]; + +TYPE +get (void) +{ + return ext[0]; /* PCREL_OPT relocation. */ +} + +TYPE +get2 (void) +{ + return ext[2]; /* PCREL_OPT relocation. */ +} + +TYPE +get_large (void) +{ + return ext[LARGE]; /* No PCREL_OPT (load is prefixed). */ +} + +TYPE +get_variable (unsigned long n) +{ + return ext[n]; /* No PCREL_OPT (load is indexed). */ +} + +double +get_double (void) +{ + return (double) ext[0]; /* No PCREL_OPT (LXSIBZX is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c new file mode 100644 index 00000000000..9d1e2a1956f --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-sf.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE float +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for + float. */ +extern TYPE ext[]; + +TYPE +get (void) +{ + return ext[0]; /* PCREL_OPT relocation. */ +} + +TYPE +get2 (void) +{ + return ext[2]; /* PCREL_OPT relocation. */ +} + +TYPE +get_large (void) +{ + return ext[LARGE]; /* No PCREL_OPT (load is prefixed). */ +} + +TYPE +get_variable (unsigned long n) +{ + return ext[n]; /* No PCREL_OPT (load is indexed). */ +} + +double +get_double (void) +{ + return (double) ext[0]; /* PCREL_OPT relocation. */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 3 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c new file mode 100644 index 00000000000..17be6fa1778 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-si.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE int +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for int. */ +extern TYPE ext[]; + +TYPE +get (void) +{ + return ext[0]; /* PCREL_OPT relocation. */ +} + +TYPE +get2 (void) +{ + return ext[2]; /* PCREL_OPT relocation. */ +} + +TYPE +get_large (void) +{ + return ext[LARGE]; /* No PCREL_OPT (load is prefixed). */ +} + +TYPE +get_variable (unsigned long n) +{ + return ext[n]; /* No PCREL_OPT (load is indexed). */ +} + +double +get_double (void) +{ + return (double) ext[0]; /* No PCREL_OPT (LFIWAX is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c new file mode 100644 index 00000000000..8c12aea5acd --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-ld-vector.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE vector double +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for + vector double. */ +extern TYPE ext[]; + +TYPE +get (void) +{ + return ext[0]; /* PCREL_OPT relocation. */ +} + +TYPE +get2 (void) +{ + return ext[2]; /* PCREL_OPT relocation. */ +} + +TYPE +get_large (void) +{ + return ext[LARGE]; /* No PCREL_OPT (load is prefixed). */ +} + +TYPE +get_variable (unsigned long n) +{ + return ext[n]; /* No PCREL_OPT (load is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c new file mode 100644 index 00000000000..d795d35d8de --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-df.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE double +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for + double. */ +extern TYPE ext[]; + +void +store (TYPE a) +{ + ext[0] = a; /* PCREL_OPT relocation. */ +} + +void +store2 (TYPE a) +{ + ext[2] = a; /* PCREL_OPT relocation. */ +} + +void +store_large (TYPE a) +{ + ext[LARGE] = a; /* No PCREL_OPT (store is prefixed). */ +} + +void +store_variable (TYPE a, unsigned long n) +{ + ext[n] = a; /* No PCREL_OPT (store is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c new file mode 100644 index 00000000000..a065d83f6f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-di.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE long long +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for long + long. */ +extern TYPE ext[]; + +void +store (TYPE a) +{ + ext[0] = a; /* PCREL_OPT relocation. */ +} + +void +store2 (TYPE a) +{ + ext[2] = a; /* PCREL_OPT relocation. */ +} + +void +store_large (TYPE a) +{ + ext[LARGE] = a; /* No PCREL_OPT (store is prefixed). */ +} + +void +store_variable (TYPE a, unsigned long n) +{ + ext[n] = a; /* No PCREL_OPT (store is indexed). */ +} + +void +store_double (double a) +{ + ext[0] = (TYPE) a; /* PCREL_OPT relocation. */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 3 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c new file mode 100644 index 00000000000..8822e767dfe --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-hi.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE unsigned short +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for unsigned + short. */ +extern TYPE ext[]; + +void +store (TYPE a) +{ + ext[0] = a; /* PCREL_OPT relocation. */ +} + +void +store2 (TYPE a) +{ + ext[2] = a; /* PCREL_OPT relocation. */ +} + +void +store_large (TYPE a) +{ + ext[LARGE] = a; /* No PCREL_OPT (store is prefixed). */ +} + +void +store_variable (TYPE a, unsigned long n) +{ + ext[n] = a; /* No PCREL_OPT (store is indexed). */ +} + +void +store_double (double a) +{ + ext[0] = (TYPE) a; /* No PCREL_OPT (STXIHZX is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c new file mode 100644 index 00000000000..2f756833717 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-qi.c @@ -0,0 +1,42 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE unsigned char +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for unsigned + char. */ +extern TYPE ext[]; + +void +store (TYPE a) +{ + ext[0] = a; /* PCREL_OPT relocation. */ +} + +void +store2 (TYPE a) +{ + ext[2] = a; /* PCREL_OPT relocation. */ +} + +void +store_large (TYPE a) +{ + ext[LARGE] = a; /* No PCREL_OPT (store is prefixed). */ +} + +void +store_variable (TYPE a, unsigned long n) +{ + ext[n] = a; /* No PCREL_OPT (store is indexed). */ +} + +void +store_double (double a) +{ + ext[0] = (TYPE) a; /* No PCREL_OPT (STXIBZX is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c new file mode 100644 index 00000000000..3dd88aad856 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-sf.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE float +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for + float. */ +extern TYPE ext[]; + +void +store (TYPE a) +{ + ext[0] = a; /* PCREL_OPT relocation. */ +} + +void +store2 (TYPE a) +{ + ext[2] = a; /* PCREL_OPT relocation. */ +} + +void +store_large (TYPE a) +{ + ext[LARGE] = a; /* No PCREL_OPT (store is prefixed). */ +} + +void +store_variable (TYPE a, unsigned long n) +{ + ext[n] = a; /* No PCREL_OPT (store is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c new file mode 100644 index 00000000000..78dc8120efe --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-si.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE int +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for int. */ +extern TYPE ext[]; + +void +store (TYPE a) +{ + ext[0] = a; /* PCREL_OPT relocation. */ +} + +void +store2 (TYPE a) +{ + ext[2] = a; /* PCREL_OPT relocation. */ +} + +void +store_large (TYPE a) +{ + ext[LARGE] = a; /* No PCREL_OPT (store is prefixed). */ +} + +void +store_variable (TYPE a, unsigned long n) +{ + ext[n] = a; /* No PCREL_OPT (store is indexed). */ +} + +void +store_double (double a) +{ + ext[0] = (TYPE) a; /* No PCREL_OPT (STFIWX is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-vector.c b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-vector.c new file mode 100644 index 00000000000..2c602eb3103 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pcrel-opt-st-vector.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target powerpc_pcrel } */ +/* { dg-options "-O2 -mdejagnu-cpu=power10" } */ + +#define TYPE vector double +#define LARGE 0x20000 + +/* Test whether we get the right number of PCREL_OPT optimizations for + vector double. */ +extern TYPE ext[]; + +void +store (TYPE a) +{ + ext[0] = a; /* PCREL_OPT relocation. */ +} + +void +store2 (TYPE a) +{ + ext[2] = a; /* PCREL_OPT relocation. */ +} + +void +store_large (TYPE a) +{ + ext[LARGE] = a; /* No PCREL_OPT (store is prefixed). */ +} + +void +store_variable (TYPE a, unsigned long n) +{ + ext[n] = a; /* No PCREL_OPT (store is indexed). */ +} + +/* { dg-final { scan-assembler-times "R_PPC64_PCREL_OPT" 2 } } */ -- 2.22.0 -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) 2020-09-05 4:27 [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) Michael Meissner ` (2 preceding siblings ...) 2020-09-05 4:36 ` [PATCH 3/3] power10: Add tests for PCREL_OPT Michael Meissner @ 2020-09-05 15:20 ` Michael Meissner 2020-09-22 3:21 ` Ping: " Michael Meissner 4 siblings, 0 replies; 9+ messages in thread From: Michael Meissner @ 2020-09-05 15:20 UTC (permalink / raw) To: Michael Meissner, gcc-patches, Segher Boessenkool, David Edelsohn, Bill Schmidt, Peter Bergner I forgot to add Bill Schmidt's name to the ChangeLog entry. He helped with the initial conversion to use the DF framework. I will add his name in my copy of the patch. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Ping: [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) 2020-09-05 4:27 [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) Michael Meissner ` (3 preceding siblings ...) 2020-09-05 15:20 ` [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) Michael Meissner @ 2020-09-22 3:21 ` Michael Meissner 4 siblings, 0 replies; 9+ messages in thread From: Michael Meissner @ 2020-09-22 3:21 UTC (permalink / raw) To: Michael Meissner, gcc-patches, Segher Boessenkool, David Edelsohn, Bill Schmidt, Peter Bergner In case this patch has gotten over-looked, the three patches in the set plus the explanation are the rewrite of the PCREL_OPT patches. It has been rewritten to use the data flow machinery as well as validate_change and apply_change_group from the last patch. The PCREL_OPT patches can go anywhere after register allocation. I have moved it to before sched2 because it seemed to generate more PCREL_OPTs than either immediately after register allocation or after sched2. The first patch does PCREL_OPT loads. The second patch does PCREL_OPT stores. The third patch adds the new tests. I have bootstraped compilers with these patches and run the test suite with no regressions. I have also built Spec 2017 for power10 with compilers built with these patches. Can I check these patches into the master branch? I do not anticipate needing to back port these changes to GCC 10, since this adds new functionality. https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553312.html https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553313.html https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553314.html https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553315.html -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/3] Power10 PCREL_OPT support @ 2020-08-18 6:31 Michael Meissner 2020-08-18 6:34 ` [PATCH 1/3] Power10: Add PCREL_OPT load support Michael Meissner 0 siblings, 1 reply; 9+ messages in thread From: Michael Meissner @ 2020-08-18 6:31 UTC (permalink / raw) To: gcc-patches, Segher Boessenkool, David Edelsohn, Michael Meissner, Bill Schmidt, Peter Bergner, Alan Modra The ELF-v2 ISA 3.1 support for Power10 has relocations to optimize cases where the code is references an external variable in only one location. This patch is similar to the optimizations that the linker already does to optimize TOC accesses. I will be submitting 3 patches as follow-ups to this message: * The first patch adds support for PCREL_OPT loads; * The second patch adds support for PCREL_OPT stores; (and) * The third patch adds the tests. If the program is compiled to be the main program, and the variable is defined in the main program, these relocations will convert loading up the address of the external variable and then doing a load or store using that address to be doing the prefixed load or store directly and converting the second instruction into a NOP. For example, consider the following program: extern int ext_variable; int ret_var (void) { return ext_variable; } void store_var (int i) { ext_variable = i; } Currently on power10, the compiler compiles this as: ret_var: pld 9,ext_variable@got@pcrel lwa 3,0(9) blr store_var: pld 9,ext_variable@got@pcrel stw 3,0(9) blr That is, it loads up the address of 'ext_variable' from the GOT table into register r9, and then uses r9 as a base register to reference the actual variable. The linker does optimize the case where you are compiling the main program, and the variable is also defined in the main program to be: ret_var: pla 9,ext_variable,1 lwa 3,0(9) blr store_var: pla 9,ext_variable,1 stw 3,0(9) blr These patches generate: ret_var: pld 9,ext_variable@got@pcrel .Lpcrel1: .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8) lwa 3,0(9) blr store_var: pld 9,ext_variable@got@pcrel .Lpcrel2: .reloc .Lpcrel2-8,R_PPC64_PCREL_OPT,.-(.Lpcrel2-8) stw 3,0(9) blr Note, the label for locating the PLD occurs after the PLD and not before it. This is so that if the assembler adds a NOP in front of the PLD to align it, the relocations will still work. If the linker can, it will convert the code into: ret_var: plwa 3,ext_variable,1 nop blr store_var: pstw 3,ext_variable,1 nop blr These patches allow the load of the address to not be physically adjacent to the actual load or store, which should allow for better code. For loads, there must no references to the register that is being loaded between the PLD and the actual load. For stores, it becomes a little trickier, in that the register being stored must be live at the time the PLD instruction is done, and it must continue to be live and unmodified between the PLD and the store. For both loads and stores, there must be only one reference to the address being loaded into a base register, and that base register must die at the point of the load/store. In order to do this, the pass that converts the load address and load/store must occur late in the compilation cycle. In particular, the second scheduler pass will duplicate and optimize some of the references and it will produce an invalid program. In the past, Segher has said that we should be able to move it earlier. I have my doubts whether that is feasible. What I would like to do is put these patches into GCC 11, which will enable many of the cases that we want to optimize. Then somebody else can take a swing at doing the optimization to allow the code to do this optimization earlier. That way, even if we can't get the super optimized code to work, we at least will get the majority of cases to work. For reference, here is what the current compiler generates for a medium code model system targeting power9 with the TOC support: .section ".toc","aw" .LC0: .quad ext_variable .section ".text" ret_var: .LCF0: 0: addis 2,12,.TOC.-.LCF0@ha addi 2,2,.TOC.-.LCF0@l .localentry ret_var,.-ret_var addis 9,2,.LC0@toc@ha ld 9,.LC0@toc@l(9) lwa 3,0(9) blr .section ".toc","aw" .set .LC1,.LC0 .section ".text" store_var: .LCF1: 0: addis 2,12,.TOC.-.LCF1@ha addi 2,2,.TOC.-.LCF1@l .localentry store_var,.-store_var addis 9,2,.LC1@toc@ha ld 9,.LC1@toc@l(9) stw 3,0(9) blr And the linker optimizes this to: ret_var: lis 2,.TOC@ha addi 2,2,.TOC@l .localentry ret_var,.-ret_var nop ; addis eliminated due to small TOC addi 9,2,<offset> ; ld converted into addi lwa 3,0(9) ; actual load store_var: lis 2,.TOC@ha addi 2,2,.TOC@l .localentry store_var,.-store_var nop ; addis eliminated due to small TOC addi 9,2,<offset> ; ld converted into addi stw 3,0(9) ; actual store -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] Power10: Add PCREL_OPT load support 2020-08-18 6:31 [PATCH 0/3] Power10 PCREL_OPT support Michael Meissner @ 2020-08-18 6:34 ` Michael Meissner 2020-08-21 2:09 ` Segher Boessenkool 0 siblings, 1 reply; 9+ messages in thread From: Michael Meissner @ 2020-08-18 6:34 UTC (permalink / raw) To: Michael Meissner, gcc-patches, Segher Boessenkool, David Edelsohn, Bill Schmidt, Peter Bergner, Alan Modra [PATCH 1/3] Power10: Add PCREL_OPT load support. This patch adds support for optimizing power10 loads of an external variable to eliminate loading the address of the variable, and then doing a subsequent load using that address. I have built compilers with and without these set of 3 patches doing a bootstrap build and make check. There were no regressions, and the new tests passed. Can I check these patches into the master branch for GCC? Because this is new functionality, I do not intend to back port these patches to GCC 10 at this time. gcc/ 2020-08-18 Michael Meissner <meissner@linux.ibm.com> * config.gcc (powerpc*-*-*): Add pcrel-opt.o. (rs6000*-*-*): Add pcrel-opt.o. * config/rs6000/pcrel-opt.c: New file. * config/rs6000/pcrel-opt.md: New file. * config/rs6000/predicates.md (d_form_memory): New predicate. * config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add -mpcrel-opt. (POWERPC_MASKS): Add -mpcrel-opt. * config/rs6000/rs6000-passes.def: Add PCREL_OPT pass. * config/rs6000/rs6000-protos.h (reg_to_non_prefixed): New declaration. (make_pass_pcrel_opt): New declaration. * config/rs6000/rs6000.c (rs6000_option_override_internal): Add support for -mpcrel-opt. (rs6000_delegitimize_address): Add support for PCREL_OPT addresses. (print_operand, 'r' case): New operand for PCREL_OPT. (rs6000_opt_masks): Add -mpcrel-opt. (rs6000_asm_output_opcode): Reset flag to emit the initial 'p' after use. * config/rs6000/rs6000.md (loads_extern_addr attribute): New attribute. (isa attribute): Add pcrel_opt sub-case. (enabled attribute): Add support for pcrel_opt. (pcrel_extern_addr): Set loads_extern_addr attribute. (toplevel): Include pcrel-opt.md. * config/rs6000/rs6000.opt (-mpcrel-opt): New debug option. * config/rs6000/t-rs6000 (pcrel-opt.o): Add build rule. (MD_INCLUDES): Add pcrel-opt.md. --- gcc/config.gcc | 6 +- gcc/config/rs6000/pcrel-opt.c | 656 ++++++++++++++++++++++++++++++++++++ gcc/config/rs6000/pcrel-opt.md | 248 ++++++++++++++ gcc/config/rs6000/predicates.md | 23 ++ gcc/config/rs6000/rs6000-cpus.def | 2 + gcc/config/rs6000/rs6000-passes.def | 8 + gcc/config/rs6000/rs6000-protos.h | 2 + gcc/config/rs6000/rs6000.c | 40 ++- gcc/config/rs6000/rs6000.md | 14 +- gcc/config/rs6000/rs6000.opt | 4 + gcc/config/rs6000/t-rs6000 | 7 +- 11 files changed, 1001 insertions(+), 9 deletions(-) create mode 100644 gcc/config/rs6000/pcrel-opt.c create mode 100644 gcc/config/rs6000/pcrel-opt.md diff --git a/gcc/config.gcc b/gcc/config.gcc index 2370368..605d743 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -505,7 +505,7 @@ or1k*-*-*) ;; powerpc*-*-*) cpu_type=rs6000 - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o" + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o pcrel-opt.o" extra_headers="ppc-asm.h altivec.h htmintrin.h htmxlintrin.h" extra_headers="${extra_headers} bmi2intrin.h bmiintrin.h" extra_headers="${extra_headers} xmmintrin.h mm_malloc.h emmintrin.h" @@ -520,6 +520,7 @@ powerpc*-*-*) esac extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt" target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c \$(srcdir)/config/rs6000/rs6000-call.c" + target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c" ;; pru-*-*) cpu_type=pru @@ -531,8 +532,9 @@ riscv*) ;; rs6000*-*-*) extra_options="${extra_options} g.opt fused-madd.opt rs6000/rs6000-tables.opt" - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o" + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o rs6000-call.o pcrel-opt.o" target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/rs6000-logue.c \$(srcdir)/config/rs6000/rs6000-call.c" + target_gtfiles="$target_gtfiles \$(srcdir)/config/rs6000/pcrel-opt.c" ;; sparc*-*-*) cpu_type=sparc diff --git a/gcc/config/rs6000/pcrel-opt.c b/gcc/config/rs6000/pcrel-opt.c new file mode 100644 index 0000000..10b4bc4 --- /dev/null +++ b/gcc/config/rs6000/pcrel-opt.c @@ -0,0 +1,656 @@ +/* Subroutines used support the pc-relative linker optimization. + Copyright (C) 2020 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published + by the Free Software Foundation; either version 3, or (at your + option) any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + <http://www.gnu.org/licenses/>. */ + +/* This file implements a RTL pass that looks for pc-relative loads of the + address of an external variable using the PCREL_GOT relocation and a single + load that uses that external address. If that is found we create the + PCREL_OPT relocation to possibly convert: + + pld addr_reg,var@pcrel@got(0),1 + + <possibly other insns that do not use 'addr_reg' or 'data_reg'> + + lwz data_reg,0(addr_reg) + + into: + + plwz data_reg,var@pcrel(0),1 + + <possibly other insns that do not use 'addr_reg' or 'data_reg'> + + nop + + If the variable is not defined in the main program or the code using it is + not in the main program, the linker put the address in the .got section and + do: + + .section .got + .Lvar_got: + .dword var + + .section .text + pld addr_reg,.Lvar_got@pcrel(0),1 + + <possibly other insns that do not use 'addr_reg' or 'data_reg'> + + lwz data_reg,0(addr_reg) + + We only look for a single usage in the basic block where the external + address is loaded. Multiple uses or references in another basic block will + force us to not use the PCREL_OPT relocation. */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "rtl.h" +#include "tree.h" +#include "memmodel.h" +#include "expmed.h" +#include "optabs.h" +#include "recog.h" +#include "df.h" +#include "tm_p.h" +#include "ira.h" +#include "print-tree.h" +#include "varasm.h" +#include "explow.h" +#include "expr.h" +#include "output.h" +#include "tree-pass.h" +#include "rtx-vector-builder.h" +#include "print-rtl.h" +#include "insn-attr.h" +#include "insn-codes.h" + +\f +// Maximum number of insns to scan between the load address and the load that +// uses that address. This can be bumped up if desired. If the insns are far +// enough away, the PCREL_OPT optimization probably does not help, since the +// load of the external address has probably completed by the time we do the +// load of the variable at that address. +const int MAX_PCREL_OPT_INSNS = 10; + +/* Next PCREL_OPT label number. */ +static unsigned int pcrel_opt_next_num; + +/* Various counters. */ +static struct { + unsigned long extern_addrs; + unsigned long loads; + unsigned long load_separation[MAX_PCREL_OPT_INSNS+1]; +} counters; + +\f +// Optimize a PC-relative load address to be used in a load. +// +// If the sequence of insns is safe to use the PCREL_OPT optimization (i.e. no +// additional references to the address register, the address register dies at +// the load, and no references to the load), convert insns of the form: +// +// (set (reg:DI addr) +// (symbol_ref:DI "ext_symbol")) +// +// ... +// +// (set (reg:<MODE> value) +// (mem:<MODE> (reg:DI addr))) +// +// into: +// +// (parallel [(set (reg:DI addr) +// (unspec:<MODE> [(symbol_ref:DI "ext_symbol") +// (const_int label_num) +// (const_int 0)] +// UNSPEC_PCREL_OPT_LD_ADDR)) +// (set (reg:DI data) +// (unspec:DI [(const_int 0)] +// UNSPEC_PCREL_OPT_LD_ADDR))]) +// +// ... +// +// (parallel [(set (reg:<MODE>) +// (unspec:<MODE> [(mem:<MODE> (reg:DI addr)) +// (reg:DI data) +// (const_int label_num)] +// UNSPEC_PCREL_OPT_LD_RELOC)) +// (clobber (reg:DI addr))]) +// +// If the register being loaded is the same register that was used to hold the +// external address, we generate the following insn instead: +// +// (set (reg:DI data) +// (unspec:DI [(symbol_ref:DI "ext_symbol") +// (const_int label_num) +// (const_int 1)] +// UNSPEC_PCREL_OPT_LD_ADDR)) +// +// In the first insn, we set both the address of the external variable, and +// mark that the variable being loaded both are created in that insn, and are +// consumed in the second insn. It doesn't matter what mode the register that +// we will ultimately do the load into, so we use DImode. We just need to mark +// that both registers may be set in the first insn, and will be used in the +// second insn. +// +// The UNSPEC_PCREL_OPT_LD_ADDR insn will generate the load address plus +// a definition of a label (.Lpcrel<n>), while the UNSPEC_PCREL_OPT_LD_RELOC +// insn will generate the .reloc to tell the linker to tie the load address and +// load using that address together. +// +// pld b,ext_symbol@got@pcrel(0),1 +// .Lpcrel1: +// +// ... +// +// .reloc .Lpcrel1-8,R_PPC64_PCREL_OPT,.-(.Lpcrel1-8) +// lwz r,0(b) +// +// If ext_symbol is defined in another object file in the main program and we +// are linking the main program, the linker will convert the above instructions +// to: +// +// plwz r,ext_symbol@got@pcrel(0),1 +// +// ... +// +// nop +// +// Return true if the PCREL_OPT load optimization succeeded. + +static bool +do_pcrel_opt_load (rtx_insn *addr_insn, // insn loading address + rtx_insn *load_insn) // insn using address +{ + rtx addr_set = PATTERN (addr_insn); + rtx addr_reg = SET_DEST (addr_set); + rtx addr_symbol = SET_SRC (addr_set); + rtx load_set = single_set (load_insn); + rtx reg = SET_DEST (load_set); + rtx mem = SET_SRC (load_set); + machine_mode reg_mode = GET_MODE (reg); + machine_mode mem_mode = GET_MODE (mem); + rtx mem_inner = mem; + unsigned int reg_regno = reg_or_subregno (reg); + + // LWA is a DS format instruction, but LWZ is a D format instruction. We use + // DImode for the mode to force checking whether the bottom 2 bits are 0. + // However FPR and vector registers uses the LFIWAX instruction which is + // indexed only. + if (GET_CODE (mem) == SIGN_EXTEND && GET_MODE (XEXP (mem, 0)) == SImode) + { + if (!INT_REGNO_P (reg_regno)) + return false; + + mem_inner = XEXP (mem, 0); + mem_mode = DImode; + } + + else if (GET_CODE (mem) == SIGN_EXTEND + || GET_CODE (mem) == ZERO_EXTEND + || GET_CODE (mem) == FLOAT_EXTEND) + { + mem_inner = XEXP (mem, 0); + mem_mode = GET_MODE (mem_inner); + } + + if (!MEM_P (mem_inner)) + return false; + + // If this is LFIWAX or similar instructions that are indexed only, we can't + // do the optimization. + enum non_prefixed_form non_prefixed = reg_to_non_prefixed (reg, mem_mode); + if (non_prefixed == NON_PREFIXED_X) + return false; + + // The optimization will only work on non-prefixed offsettable loads. + rtx addr = XEXP (mem_inner, 0); + enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed); + if (iform != INSN_FORM_BASE_REG + && iform != INSN_FORM_D + && iform != INSN_FORM_DS + && iform != INSN_FORM_DQ) + return false; + + // Allocate a new PC-relative label, and update the load external address + // insn. + // + // (parallel [(set (reg load) + // (unspec [(symbol_ref addr_symbol) + // (const_int label_num) + // (const_int 0)] + // UNSPEC_PCREL_OPT_LD_ADDR)) + // (set (reg addr) + // (unspec [(const_int 0)] + // UNSPEC_PCREL_OPT_LD_ADDR))]) + + ++pcrel_opt_next_num; + unsigned int addr_regno = reg_or_subregno (addr_reg); + rtx label_num = GEN_INT (pcrel_opt_next_num); + rtx reg_di = gen_rtx_REG (DImode, reg_regno); + + PATTERN (addr_insn) + = ((addr_regno != reg_regno) + ? gen_pcrel_opt_ld_addr (addr_reg, addr_symbol, label_num, reg_di) + : gen_pcrel_opt_ld_addr_same_reg (addr_reg, addr_symbol, label_num)); + + // Revalidate the insn, backing out of the optimization if the insn is not + // supported. + INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0); + if (INSN_CODE (addr_insn) < 0) + { + PATTERN (addr_insn) = addr_set; + INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0); + return false; + } + + // Update the load insn. If the mem had a sign/zero/float extend, add that + // also after doing the UNSPEC. Add an explicit clobber of the external + // address register just to make it clear that the address register dies. + // + // (parallel [(set (reg:<MODE> data) + // (unspec:<MODE> [(mem (addr_reg) + // (reg:DI data) + // (const_int label_num)] + // UNSPEC_PCREL_OPT_LD_RELOC)) + // (clobber (reg:DI addr_reg))]) + + rtvec v_load = gen_rtvec (3, mem_inner, reg_di, label_num); + rtx new_load = gen_rtx_UNSPEC (GET_MODE (mem_inner), v_load, + UNSPEC_PCREL_OPT_LD_RELOC); + + if (GET_CODE (mem) != GET_CODE (mem_inner)) + new_load = gen_rtx_fmt_e (GET_CODE (mem), reg_mode, new_load); + + rtx old_load_set = PATTERN (load_insn); + rtx new_load_set = gen_rtx_SET (reg, new_load); + rtx load_clobber = gen_rtx_CLOBBER (VOIDmode, + (addr_regno == reg_regno + ? gen_rtx_SCRATCH (Pmode) + : addr_reg)); + PATTERN (load_insn) + = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, new_load_set, load_clobber)); + + // Revalidate the insn, backing out of the optimization if the insn is not + // supported. + + INSN_CODE (load_insn) = recog (PATTERN (load_insn), load_insn, 0); + if (INSN_CODE (load_insn) < 0) + { + PATTERN (addr_insn) = addr_set; + INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0); + + PATTERN (load_insn) = old_load_set; + INSN_CODE (load_insn) = recog (PATTERN (load_insn), load_insn, 0); + return false; + } + + return true; +} + +\f +/* Given an insn, find the next insn in the basic block. Stop if we find a the + end of a basic block, such as a label, call or jump, and return NULL. */ + +static rtx_insn * +next_active_insn_in_basic_block (rtx_insn *insn) +{ + insn = NEXT_INSN (insn); + + while (insn != NULL_RTX) + { + /* If the basic block ends or there is a jump of some kind, exit the + loop. */ + if (CALL_P (insn) + || JUMP_P (insn) + || JUMP_TABLE_DATA_P (insn) + || LABEL_P (insn) + || BARRIER_P (insn)) + return NULL; + + /* If this is a real insn, return it. */ + if (!insn->deleted () + && NONJUMP_INSN_P (insn) + && GET_CODE (PATTERN (insn)) != USE + && GET_CODE (PATTERN (insn)) != CLOBBER) + return insn; + + /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs. */ + insn = NEXT_INSN (insn); + } + + return NULL; +} + +\f +// Validate that a load is actually a single instruction that can be optimized +// with the PCREL_OPT optimization. + +static bool +is_single_instruction (rtx_insn *insn, rtx reg) +{ + if (!REG_P (reg) && !SUBREG_P (reg)) + return false; + + if (get_attr_length (insn) != 4) + return false; + + // _Decimal128 and IBM extended double are always multiple instructions. + machine_mode mode = GET_MODE (reg); + if (mode == TFmode && !TARGET_IEEEQUAD) + return false; + + if (mode == TDmode || mode == IFmode) + return false; + + // Don't optimize PLQ/PSTQ instructions + unsigned int regno = reg_or_subregno (reg); + unsigned int size = GET_MODE_SIZE (mode); + if (size >= 16 && !VSX_REGNO_P (regno)) + return false; + + return true; +} + +\f +// Given an insn with that loads up a base register with the address of an +// external symbol, see if we can optimize it with the PCREL_OPT optimization. + +static void +do_pcrel_opt_addr (rtx_insn *addr_insn) +{ + int num_insns = 0; + + // Do some basic validation. + rtx addr_set = PATTERN (addr_insn); + if (GET_CODE (addr_set) != SET) + return; + + rtx addr_reg = SET_DEST (addr_set); + rtx addr_symbol = SET_SRC (addr_set); + + if (!base_reg_operand (addr_reg, Pmode) + || !pcrel_external_address (addr_symbol, Pmode)) + return; + + rtx_insn *insn = addr_insn; + bool looping = true; + bool had_load = false; // whether intermediate insns had a load + bool had_store = false; // whether intermediate insns had a store + bool is_load = false; // whether the current insn is a load + bool is_store = false; // whether the current insn is a store + + // Check the following insns and see if it is a load or store that uses the + // external address. If we can't do the optimization, just return. + while (looping) + { + is_load = is_store = false; + + // Don't allow too many insns between the load of the external address + // and the eventual load or store. + if (++num_insns >= MAX_PCREL_OPT_INSNS) + return; + + insn = next_active_insn_in_basic_block (insn); + if (!insn) + return; + + // See if the current insn is a load or store + switch (get_attr_type (insn)) + { + // While load of the external address is a 'load' for scheduling + // purposes, it should be safe to allow loading other external + // addresses between the load of the external address we are + // currently looking at and the load or store using that address. + case TYPE_LOAD: + if (get_attr_loads_extern_addr (insn) == LOADS_EXTERN_ADDR_YES) + break; + /* fall through */ + + case TYPE_FPLOAD: + case TYPE_VECLOAD: + is_load = true; + break; + + case TYPE_STORE: + case TYPE_FPSTORE: + case TYPE_VECSTORE: + is_store = true; + break; + + // Don't do the optimization through atomic operations. + case TYPE_LOAD_L: + case TYPE_STORE_C: + case TYPE_HTM: + case TYPE_HTMSIMPLE: + return; + + default: + break; + } + + // If the external addresss register was referenced, it must also die in + // the same insn. + if (reg_referenced_p (addr_reg, PATTERN (insn))) + { + if (!dead_or_set_p (insn, addr_reg)) + return; + + looping = false; + } + + // If it dies by being set without being referenced, exit. + else if (dead_or_set_p (insn, addr_reg)) + return; + + // If it isn't the insn we want, remember if there were loads or stores. + else + { + had_load |= is_load; + had_store |= is_store; + } + } + + // If the insn does not use the external address, or the external address + // register does not die at this insn, we can't do the optimization. + if (!reg_referenced_p (addr_reg, PATTERN (insn)) + || !dead_or_set_p (insn, addr_reg)) + return; + + rtx set = single_set (insn); + if (!set) + return; + + // Optimize loads + if (is_load) + { + // If there were any stores in the insns between loading the external + // address and doing the load, turn off the optimization. + if (had_store) + return; + + rtx reg = SET_DEST (set); + if (!is_single_instruction (insn, reg)) + return; + + rtx mem = SET_SRC (set); + switch (GET_CODE (mem)) + { + case MEM: + break; + + case SIGN_EXTEND: + case ZERO_EXTEND: + case FLOAT_EXTEND: + if (!MEM_P (XEXP (mem, 0))) + return; + break; + + default: + return; + } + + // If the register being loaded was used or set between the load of + // the external address and the load using the address, we can't do + // the optimization. + if (reg_used_between_p (reg, addr_insn, insn) + || reg_set_between_p (reg, addr_insn, insn)) + return; + + // Process the load in detail + if (do_pcrel_opt_load (addr_insn, insn)) + { + counters.loads++; + counters.load_separation[num_insns-1]++; + } + } + + return; +} + +\f +// Optimize pcrel external variable references + +static unsigned int +do_pcrel_opt_pass (function *fun) +{ + basic_block bb; + rtx_insn *insn, *curr_insn = 0; + + memset ((char *) &counters, '\0', sizeof (counters)); + + // Dataflow analysis for use-def chains. + df_set_flags (DF_RD_PRUNE_DEAD_DEFS); + df_chain_add_problem (DF_DU_CHAIN | DF_UD_CHAIN); + df_note_add_problem (); + df_analyze (); + df_set_flags (DF_DEFER_INSN_RESCAN | DF_LR_RUN_DCE); + + // Look at each basic block to see if there is a load of an external + // variable's external address, and a single load using that external + // address. + FOR_ALL_BB_FN (bb, fun) + { + FOR_BB_INSNS_SAFE (bb, insn, curr_insn) + { + if (NONJUMP_INSN_P (insn) && single_set (insn) + && get_attr_loads_extern_addr (insn) == LOADS_EXTERN_ADDR_YES) + { + counters.extern_addrs++; + do_pcrel_opt_addr (insn); + } + } + } + + df_remove_problem (df_chain); + df_process_deferred_rescans (); + df_set_flags (DF_RD_PRUNE_DEAD_DEFS | DF_LR_RUN_DCE); + df_chain_add_problem (DF_UD_CHAIN); + df_note_add_problem (); + df_analyze (); + + if (dump_file) + { + if (!counters.extern_addrs) + fprintf (dump_file, "\nNo external symbols were referenced\n"); + + else + { + fprintf (dump_file, + "\n# of loads of an address of an external symbol = %lu\n", + counters.extern_addrs); + + if (!counters.loads) + fprintf (dump_file, + "\nNo PCREL_OPT load optimizations were done\n"); + + else + { + fprintf (dump_file, "# of PCREL_OPT loads = %lu\n", + counters.loads); + + fprintf (dump_file, "# of adjacent PCREL_OPT loads = %lu\n", + counters.load_separation[0]); + + for (int i = 1; i < MAX_PCREL_OPT_INSNS; i++) + { + if (counters.load_separation[i]) + fprintf (dump_file, + "# of PCREL_OPT loads separated by %d insn%s = %lu\n", + i, (i == 1) ? "" : "s", + counters.load_separation[i]); + } + } + } + + fprintf (dump_file, "\n"); + } + + return 0; +} + +\f +// Optimize pc-relative references for the new PCREL_OPT pass +const pass_data pass_data_pcrel_opt = +{ + RTL_PASS, // type + "pcrel_opt", // name + OPTGROUP_NONE, // optinfo_flags + TV_NONE, // tv_id + 0, // properties_required + 0, // properties_provided + 0, // properties_destroyed + 0, // todo_flags_start + TODO_df_finish, // todo_flags_finish +}; + +// Pass data structures +class pcrel_opt : public rtl_opt_pass +{ +public: + pcrel_opt (gcc::context *ctxt) + : rtl_opt_pass (pass_data_pcrel_opt, ctxt) + {} + + ~pcrel_opt (void) + {} + + // opt_pass methods: + virtual bool gate (function *) + { + return (TARGET_PCREL && TARGET_PCREL_OPT && optimize); + } + + virtual unsigned int execute (function *fun) + { + return do_pcrel_opt_pass (fun); + } + + opt_pass *clone () + { + return new pcrel_opt (m_ctxt); + } +}; + +rtl_opt_pass * +make_pass_pcrel_opt (gcc::context *ctxt) +{ + return new pcrel_opt (ctxt); +} diff --git a/gcc/config/rs6000/pcrel-opt.md b/gcc/config/rs6000/pcrel-opt.md new file mode 100644 index 0000000..00a3bc4 --- /dev/null +++ b/gcc/config/rs6000/pcrel-opt.md @@ -0,0 +1,248 @@ +;; Machine description for the PCREL_OPT optimization. +;; Copyright (C) 2020 Free Software Foundation, Inc. +;; Contributed by Michael Meissner (meissner@linux.ibm.com) + +;; This file is part of GCC. + +;; GCC is free software; you can redistribute it and/or modify it +;; under the terms of the GNU General Public License as published +;; by the Free Software Foundation; either version 3, or (at your +;; option) any later version. + +;; GCC is distributed in the hope that it will be useful, but WITHOUT +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +;; or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public +;; License for more details. + +;; You should have received a copy of the GNU General Public License +;; along with GCC; see the file COPYING3. If not see +;; <http://www.gnu.org/licenses/>. + +;; Support for the PCREL_OPT optimization. PCREL_OPT looks for instances where +;; an external variable is used only once, either for reading or for writing. +;; +;; If we are optimizing a single read, normally the code would look like: +;; +;; (set (reg:DI <ptr>) +;; (symbol_ref:DI "<extern_addr>")) # <data> is currently dead +;; +;; ... # insns do not need to be adjacent +;; +;; (set (reg:SI <data>) +;; (mem:SI (reg:DI <xxx>))) # <ptr> dies with this insn +;; +;; We transform this into: +;; +;; (parallel [(set (reg:DI <ptr>) +;; (unspec:SI [(symbol_ref:DI <extern_addr>) +;; (const_int <marker>)] +;; UNSPEC_PCREL_OPT_LD_ADDR)) +;; (set (reg:DI <data>) +;; (unspec:DI [(const_int 0)] +;; UNSPEC_PCREL_OPT_LD_ADDR))]) +;; +;; ... +;; +;; (parallel [(set (reg:SI <data>) +;; (unspec:SI [(mem:SI (reg:DI <ptr>)) +;; (reg:DI <data>) +;; (const_int <marker>)] +;; UNSPEC_PCREL_OPT_LD)) +;; (clobber (reg:DI <ptr>))]) +;; +;; The marker is an integer constant that links the load of the external +;; address to the load of the actual variable. +;; +;; In the first insn, we set both the address of the external variable, and +;; mark that the variable being loaded both are created in that insn, and are +;; consumed in the second insn. It doesn't matter what mode the register that +;; we will ultimately do the load into, so we use DImode. We just need to mark +;; that both registers may be set in the first insn, and will be used in the +;; second insn. +;; +;; Since we use UNSPEC's and link both the the register holding the external +;; address and the value being loaded, it should prevent other passes from +;; modifying it. +;; +;; If the register being loaded is the same as the base register, we use an +;; alternate form of the insns. +;; +;; (set (reg:DI <data_ptr>) +;; (unspec:DI [(symbol_ref:DI <extern_addr>) +;; (const_int <marker>)] +;; UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) +;; +;; ... +;; +;; (parallel [(set (reg:SI <data>) +;; (unspec:SI [(mem:SI (reg:DI <ptr>)) +;; (reg:DI <data>) +;; (const_int <marker>)] +;; UNSPEC_PCREL_OPT_LD)) +;; (clobber (reg:DI <ptr>))]) + +(define_c_enum "unspec" + [UNSPEC_PCREL_OPT_LD_ADDR + UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG + UNSPEC_PCREL_OPT_LD_RELOC]) + +;; Modes that are supported for PCREL_OPT +(define_mode_iterator PO [QI HI SI DI TI SF DF KF + V1TI V2DI V4SI V8HI V16QI V2DF V4SF + (TF "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")]) + +;; Vector modes for PCREL_OPT +(define_mode_iterator PO_VECT [TI KF V1TI V2DI V4SI V8HI V16QI V2DF V4SF + (TF "TARGET_FLOAT128_TYPE && TARGET_IEEEQUAD")]) + +;; Insn for loading the external address, where the register being loaded is not +;; the same as the register being loaded with the data. +(define_insn "pcrel_opt_ld_addr" + [(set (match_operand:DI 0 "base_reg_operand" "=&b,&b") + (unspec:DI [(match_operand:DI 1 "pcrel_external_address") + (match_operand 2 "const_int_operand" "n,n")] + UNSPEC_PCREL_OPT_LD_ADDR)) + (set (match_operand:DI 3 "gpc_reg_operand" "=r,wa") + (unspec:DI [(const_int 0)] + UNSPEC_PCREL_OPT_LD_ADDR))] + "TARGET_PCREL_OPT + && reg_or_subregno (operands[0]) != reg_or_subregno (operands[3])" + "ld %0,%a1\n.Lpcrel%2:" + [(set_attr "prefixed" "yes") + (set_attr "type" "load") + (set_attr "isa" "pcrel_opt") + (set_attr "loads_extern_addr" "yes")]) + +;; Alternate form of loading up the external address that is the same register +;; as the final load. +(define_insn "pcrel_opt_ld_addr_same_reg" + [(set (match_operand:DI 0 "base_reg_operand" "=b") + (unspec:DI [(match_operand:DI 1 "pcrel_external_address") + (match_operand 2 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG))] + "TARGET_PCREL_OPT" + "ld %0,%a1\n.Lpcrel%2:" + [(set_attr "prefixed" "yes") + (set_attr "type" "load") + (set_attr "isa" "pcrel_opt") + (set_attr "loads_extern_addr" "yes")]) + +;; PCREL_OPT modes that are optimized for loading or storing GPRs. +(define_mode_iterator PO_GPR [QI HI SI DI SF DF]) + +(define_mode_attr PO_GPR_LD [(QI "lbz") + (HI "lhz") + (SI "lwz") + (SF "lwz") + (DI "ld") + (DF "ld")]) + +;; PCREL_OPT load operation of GPRs. Operand 4 (the register used to hold the +;; address of the external symbol) is SCRATCH if the same register is used for +;; the normal load. +(define_insn "*pcrel_opt_ld<mode>_gpr" + [(parallel [(set (match_operand:PO_GPR 0 "int_reg_operand" "+r") + (unspec:PO_GPR [(match_operand:PO_GPR 1 "d_form_memory" "o") + (match_operand:DI 2 "int_reg_operand" "0") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_scratch:DI 4 "=bX"))])] + "TARGET_PCREL_OPT + && (GET_CODE (operands[4]) == SCRATCH + || reg_mentioned_p (operands[4], operands[1]))" + "%r3<PO_GPR_LD> %0,%1" + [(set_attr "type" "load") + (set_attr "isa" "pcrel_opt")]) + +;; PCREL_OPT load with sign/zero extension +(define_insn "*pcrel_opt_ldsi_<u><mode>_gpr" + [(set (match_operand:EXTSI 0 "int_reg_operand" "+r") + (any_extend:EXTSI + (unspec:SI [(match_operand:SI 1 "d_form_memory" "o") + (match_operand:DI 2 "int_reg_operand" "0") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC))) + (clobber (match_scratch:DI 4 "=bX"))] + "TARGET_PCREL_OPT" + "%r3lw<az> %0,%1" + [(set_attr "type" "load") + (set_attr "isa" "pcrel_opt")]) + +(define_insn "*pcrel_opt_ldhi_<u><mode>_gpr" + [(set (match_operand:EXTHI 0 "int_reg_operand" "+r") + (any_extend:EXTHI + (unspec:HI [(match_operand:HI 1 "d_form_memory" "o") + (match_operand:DI 2 "int_reg_operand" "0") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC))) + (clobber (match_scratch:DI 4 "=bX"))] + "TARGET_PCREL_OPT" + "%r3lh<az> %0,%1" + [(set_attr "type" "load") + (set_attr "isa" "pcrel_opt")]) + +(define_insn "*pcrel_opt_ldqi_u<mode>_gpr" + [(set (match_operand:EXTQI 0 "int_reg_operand" "+r") + (zero_extend:EXTQI + (unspec:QI [(match_operand:QI 1 "d_form_memory" "o") + (match_operand:DI 2 "int_reg_operand" "0") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC))) + (clobber (match_scratch:DI 4 "=bX"))] + "TARGET_PCREL_OPT" + "%r3lbz %0,%1" + [(set_attr "type" "load") + (set_attr "isa" "pcrel_opt")]) + +;; Scalar types that can be optimized by loading them into floating point +;; or Altivec registers. +(define_mode_iterator PO_FP [DI DF SF]) + +;; Load instructions to load up scalar floating point or 64-bit integer values +;; into floating point registers or Altivec registers. +(define_mode_attr PO_FPR_LD [(DI "lfd") (DF "lfd") (SF "lfs")]) +(define_mode_attr PO_AVX_LD [(DI "lxsd") (DF "lxsd") (SF "lxssp")]) + +;; PCREL_OPT load operation of scalar DF/DI/SF into vector registers. +(define_insn "*pcrel_opt_ld<mode>_vsx" + [(set (match_operand:PO_FP 0 "vsx_register_operand" "+d,v") + (unspec:PO_FP [(match_operand:PO_FP 1 "d_form_memory" "o,o") + (match_operand:DI 2 "vsx_register_operand" "0,0") + (match_operand 3 "const_int_operand" "n,n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_operand:DI 4 "base_reg_operand" "=b,b"))] + "TARGET_PCREL_OPT" + "@ + %r3<PO_FPR_LD> %0,%1 + %r3<PO_AVX_LD> %0,%1" + [(set_attr "type" "fpload") + (set_attr "isa" "pcrel_opt")]) + +;; PCREL_OPT optimization extending SFmode to DFmode via a load. +(define_insn "*pcrel_opt_ldsf_df" + [(set (match_operand:DF 0 "vsx_register_operand" "+d,v") + (float_extend:DF + (unspec:SF [(match_operand:SF 1 "d_form_memory" "o,o") + (match_operand:DI 2 "vsx_register_operand" "0,0") + (match_operand 3 "const_int_operand" "n,n")] + UNSPEC_PCREL_OPT_LD_RELOC))) + (clobber (match_operand:DI 4 "base_reg_operand" "=b,b"))] + "TARGET_PCREL_OPT" + "@ + %r3lfs %0,%1 + %r3lxssp %0,%1" + [(set_attr "type" "fpload") + (set_attr "isa" "pcrel_opt")]) + +;; PCREL_OPT load operation of vector/float128 types into vector registers. +(define_insn "*pcrel_opt_ld<mode>" + [(set (match_operand:PO_VECT 0 "vsx_register_operand" "+wa") + (unspec:PO_VECT [(match_operand:PO_VECT 1 "d_form_memory" "o") + (match_operand:DI 2 "vsx_register_operand" "0") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_PCREL_OPT_LD_RELOC)) + (clobber (match_operand:DI 4 "base_reg_operand" "=b"))] + "TARGET_PCREL_OPT" + "%r3lxv %x0,%1" + [(set_attr "type" "vecload") + (set_attr "isa" "pcrel_opt")]) diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 2709e46..38ae9cd 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -1876,3 +1876,26 @@ (define_predicate "prefixed_memory" { return address_is_prefixed (XEXP (op, 0), mode, NON_PREFIXED_DEFAULT); }) + +;; Return true if the operand is a valid memory operand with an offsettable +;; address that could be merged with the load of a PC-relative external address +;; with the PCREL_OPT optimization. We don't check here whether or not the +;; offset needs to be used in a DS-FORM (bottom 2 bits 0) or DQ-FORM (bottom 4 +;; bits 0) instruction. +(define_predicate "d_form_memory" + (match_code "mem") +{ + if (!memory_operand (op, mode)) + return false; + + rtx addr = XEXP (op, 0); + + if (REG_P (addr) || SUBREG_P (addr)) + return true; + + if (GET_CODE (addr) != PLUS) + return false; + + return (base_reg_operand (XEXP (addr, 0), Pmode) + && satisfies_constraint_I (XEXP (addr, 1))); +}) diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 8d2c1ff..d3f72d7 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -78,6 +78,7 @@ /* Flags that need to be turned off if -mno-power10. */ #define OTHER_POWER10_MASKS (OPTION_MASK_MMA \ | OPTION_MASK_PCREL \ + | OPTION_MASK_PCREL_OPT \ | OPTION_MASK_PREFIXED) #define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \ @@ -142,6 +143,7 @@ | OPTION_MASK_P9_MISC \ | OPTION_MASK_P9_VECTOR \ | OPTION_MASK_PCREL \ + | OPTION_MASK_PCREL_OPT \ | OPTION_MASK_POPCNTB \ | OPTION_MASK_POPCNTD \ | OPTION_MASK_POWERPC64 \ diff --git a/gcc/config/rs6000/rs6000-passes.def b/gcc/config/rs6000/rs6000-passes.def index 5164c52..9b93fc7 100644 --- a/gcc/config/rs6000/rs6000-passes.def +++ b/gcc/config/rs6000/rs6000-passes.def @@ -24,4 +24,12 @@ along with GCC; see the file COPYING3. If not see REPLACE_PASS (PASS, INSTANCE, TGT_PASS) */ + /* Pass to add the appropriate vector swaps on power8 little endian systems. + The power8 does not have instructions that automaticaly do the byte swaps + for loads and stores. */ INSERT_PASS_BEFORE (pass_cse, 1, pass_analyze_swaps); + + /* Pass to do the PCREL_OPT optimization that combines the load of an + external symbol's address along with a single load or store using that + address as a base register. */ + INSERT_PASS_AFTER (pass_sched2, 1, pass_pcrel_opt); diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 28e859f..517713a 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -189,6 +189,7 @@ enum non_prefixed_form { extern enum insn_form address_to_insn_form (rtx, machine_mode, enum non_prefixed_form); +extern enum non_prefixed_form reg_to_non_prefixed (rtx, machine_mode); extern bool prefixed_load_p (rtx_insn *); extern bool prefixed_store_p (rtx_insn *); extern bool prefixed_paddi_p (rtx_insn *); @@ -305,6 +306,7 @@ namespace gcc { class context; } class rtl_opt_pass; extern rtl_opt_pass *make_pass_analyze_swaps (gcc::context *); +extern rtl_opt_pass *make_pass_pcrel_opt (gcc::context *); extern bool rs6000_sum_of_two_registers_p (const_rtx expr); extern bool rs6000_quadword_masked_address_p (const_rtx exp); extern rtx rs6000_gen_lvx (enum machine_mode, rtx, rtx); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index fe93cf6..6877de5 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1175,7 +1175,6 @@ static bool rs6000_secondary_reload_move (enum rs6000_reg_type, machine_mode, secondary_reload_info *, bool); -static enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode); rtl_opt_pass *make_pass_analyze_swaps (gcc::context*); /* Hash table stuff for keeping track of TOC entries. */ @@ -4316,6 +4315,14 @@ rs6000_option_override_internal (bool global_init_p) rs6000_isa_flags &= ~OPTION_MASK_MMA; } + if (!TARGET_PCREL && TARGET_PCREL_OPT) + { + if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL_OPT) != 0) + error ("%qs requires %qs", "-mpcrel-opt", "-mpcrel"); + + rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT; + } + if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET) rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags); @@ -8515,7 +8522,10 @@ rs6000_delegitimize_address (rtx orig_x) { rtx x, y, offset; - if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR) + if (GET_CODE (orig_x) == UNSPEC + && (XINT (orig_x, 1) == UNSPEC_FUSION_GPR + || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR + || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) orig_x = XVECEXP (orig_x, 0, 0); orig_x = delegitimize_mem_from_attrs (orig_x); @@ -13197,6 +13207,19 @@ print_operand (FILE *file, rtx x, int code) fprintf (file, "%d", 128 >> (REGNO (x) - CR0_REGNO)); return; + case 'r': + /* X is a label number for the PCREL_OPT optimization. Emit the .reloc + to enable this optimization, unless the value is 0. */ + gcc_assert (CONST_INT_P (x)); + if (UINTVAL (x) != 0) + { + unsigned int label_num = UINTVAL (x); + fprintf (file, + ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t", + label_num, label_num); + } + return; + case 's': /* Low 5 bits of 32 - value */ if (! INT_P (x)) @@ -23244,6 +23267,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] = { "mulhw", OPTION_MASK_MULHW, false, true }, { "multiple", OPTION_MASK_MULTIPLE, false, true }, { "pcrel", OPTION_MASK_PCREL, false, true }, + { "pcrel-opt", OPTION_MASK_PCREL_OPT, false, true }, { "popcntb", OPTION_MASK_POPCNTB, false, true }, { "popcntd", OPTION_MASK_POPCNTD, false, true }, { "power8-fusion", OPTION_MASK_P8_FUSION, false, true }, @@ -25368,7 +25392,7 @@ is_lfs_stfs_insn (rtx_insn *insn) /* Helper function to take a REG and a MODE and turn it into the non-prefixed instruction format (D/DS/DQ) used for offset memory. */ -static enum non_prefixed_form +enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode) { /* If it isn't a register, use the defaults. */ @@ -25591,7 +25615,15 @@ void rs6000_asm_output_opcode (FILE *stream) { if (next_insn_prefixed_p) - fprintf (stream, "p"); + { + fprintf (stream, "p"); + + /* Reset flag in case there are separate insn lines in the sequence, so + the 'p' is only emited for the first line. This shows up when we are + doing the PCREL_OPT optimization, in that the label created with %r<n> + would have a leading 'p' printed. */ + next_insn_prefixed_p = false; + } return; } diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 43b620a..d9dd25f 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -292,6 +292,10 @@ (define_attr "prefixed" "no,yes" (const_string "no"))) +;; Whether an insn loads an external address for the PCREL_OPT optimizaton. +(define_attr "loads_extern_addr" "no,yes" + (const_string "no")) + ;; Return the number of real hardware instructions in a combined insn. If it ;; is 0, just use the length / 4. (define_attr "num_insns" "" (const_int 0)) @@ -323,7 +327,7 @@ (define_attr "cpu" (const (symbol_ref "(enum attr_cpu) rs6000_tune"))) ;; The ISA we implement. -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10" +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,pcrel_opt" (const_string "any")) ;; Is this alternative enabled for the current CPU/ISA/etc.? @@ -371,6 +375,10 @@ (define_attr "enabled" "" (and (eq_attr "isa" "p10") (match_test "TARGET_POWER10")) (const_int 1) + + (and (eq_attr "isa" "pcrel_opt") + (match_test "TARGET_PCREL_OPT")) + (const_int 1) ] (const_int 0))) ;; If this instruction is microcoded on the CELL processor @@ -10226,7 +10234,8 @@ (define_insn "*pcrel_extern_addr" "TARGET_PCREL" "ld %0,%a1" [(set_attr "prefixed" "yes") - (set_attr "type" "load")]) + (set_attr "type" "load") + (set_attr "loads_extern_addr" "yes")]) ;; TOC register handling. @@ -14900,3 +14909,4 @@ (define_insn "*cmpeqb_internal" (include "dfp.md") (include "crypto.md") (include "htm.md") +(include "pcrel-opt.md") diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 9d3e740..22d3af4 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -582,6 +582,10 @@ mpcrel Target Report Mask(PCREL) Var(rs6000_isa_flags) Generate (do not generate) pc-relative memory addressing. +mpcrel-opt +Target Undocumented Mask(PCREL_OPT) Var(rs6000_isa_flags) +Generate (do not generate) pc-relative memory optimizations for externals. + mmma Target Report Mask(MMA) Var(rs6000_isa_flags) Generate (do not generate) MMA instructions. diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000 index 1ddb572..a617276 100644 --- a/gcc/config/rs6000/t-rs6000 +++ b/gcc/config/rs6000/t-rs6000 @@ -23,6 +23,10 @@ TM_H += $(srcdir)/config/rs6000/rs6000-cpus.def TM_H += $(srcdir)/config/rs6000/rs6000-modes.h PASSES_EXTRA += $(srcdir)/config/rs6000/rs6000-passes.def +pcrel-opt.o: $(srcdir)/config/rs6000/pcrel-opt.c + $(COMPILE) $< + $(POSTCOMPILE) + rs6000-c.o: $(srcdir)/config/rs6000/rs6000-c.c $(COMPILE) $< $(POSTCOMPILE) @@ -86,4 +90,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \ $(srcdir)/config/rs6000/mma.md \ $(srcdir)/config/rs6000/crypto.md \ $(srcdir)/config/rs6000/htm.md \ - $(srcdir)/config/rs6000/dfp.md + $(srcdir)/config/rs6000/dfp.md \ + $(srcdir)/config/rs6000/pcrel-opt.md -- 1.8.3.1 -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] Power10: Add PCREL_OPT load support 2020-08-18 6:34 ` [PATCH 1/3] Power10: Add PCREL_OPT load support Michael Meissner @ 2020-08-21 2:09 ` Segher Boessenkool 2020-09-03 17:24 ` Michael Meissner 0 siblings, 1 reply; 9+ messages in thread From: Segher Boessenkool @ 2020-08-21 2:09 UTC (permalink / raw) To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt, Peter Bergner, Alan Modra Hi! On Tue, Aug 18, 2020 at 02:34:01AM -0400, Michael Meissner wrote: > +// Maximum number of insns to scan between the load address and the load that Please don't mix /* and // comments. Just stick to /* comments, like all the rest of our backend? > +const int MAX_PCREL_OPT_INSNS = 10; "static const", and not tab please (just a space). > + // LWA is a DS format instruction, but LWZ is a D format instruction. We use > + // DImode for the mode to force checking whether the bottom 2 bits are 0. How so? Can't you just check if the resulting insn is accepted? That would be so much more robust (and can have a correct description more easily, too!) > + // However FPR and vector registers uses the LFIWAX instruction which is > + // indexed only. (vectors use lxsiwax) > + if (GET_CODE (mem) == SIGN_EXTEND && GET_MODE (XEXP (mem, 0)) == SImode) You're checking here whether the address of the MEM is SImode. > + { > + if (!INT_REGNO_P (reg_regno)) That should use base_reg_operand instead? > + // The optimization will only work on non-prefixed offsettable loads. > + rtx addr = XEXP (mem_inner, 0); > + enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed); > + if (iform != INSN_FORM_BASE_REG > + && iform != INSN_FORM_D > + && iform != INSN_FORM_DS > + && iform != INSN_FORM_DQ) > + return false; Sounds like something for a utility function? Do we use this elsewhere? > + ++pcrel_opt_next_num; Normal style is postfix increment. Empty lines around this are nice, it indicates this is the point where we decided to do the thing. Or add a comment even, maybe? > + unsigned int addr_regno = reg_or_subregno (addr_reg); > + rtx label_num = GEN_INT (pcrel_opt_next_num); > + rtx reg_di = gen_rtx_REG (DImode, reg_regno); > + > + PATTERN (addr_insn) > + = ((addr_regno != reg_regno) > + ? gen_pcrel_opt_ld_addr (addr_reg, addr_symbol, label_num, reg_di) > + : gen_pcrel_opt_ld_addr_same_reg (addr_reg, addr_symbol, label_num)); Please use if() instead of multi-line expressions. The outer parens are unnecessary, too. > + // Revalidate the insn, backing out of the optimization if the insn is not > + // supported. So the count will be incorrect then? > + INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0); > + if (INSN_CODE (addr_insn) < 0) > + { > + PATTERN (addr_insn) = addr_set; > + INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0); > + return false; > + } Can you use the normal undo mechanisms, instead? apply_change_group etc. > +static rtx_insn * > +next_active_insn_in_basic_block (rtx_insn *insn) > +{ > + insn = NEXT_INSN (insn); > + > + while (insn != NULL_RTX) > + { > + /* If the basic block ends or there is a jump of some kind, exit the > + loop. */ > + if (CALL_P (insn) > + || JUMP_P (insn) > + || JUMP_TABLE_DATA_P (insn) > + || LABEL_P (insn) > + || BARRIER_P (insn)) > + return NULL; > + > + /* If this is a real insn, return it. */ > + if (!insn->deleted () > + && NONJUMP_INSN_P (insn) > + && GET_CODE (PATTERN (insn)) != USE > + && GET_CODE (PATTERN (insn)) != CLOBBER) > + return insn; > + > + /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs. */ > + insn = NEXT_INSN (insn); > + } > + > + return NULL; > +} There are things to do this. Please don't write code manually parsing RTL unless you have to. > +// Validate that a load is actually a single instruction that can be optimized > +// with the PCREL_OPT optimization. > + > +static bool > +is_single_instruction (rtx_insn *insn, rtx reg) Of course it is a single instruction! A single RTL instruction... Clarify as "single machine instruction"? > +{ > + if (!REG_P (reg) && !SUBREG_P (reg)) > + return false; You need to check if is a subreg of reg, then. There are subregs of other things. Not that you should see those here, but still. > + // _Decimal128 and IBM extended double are always multiple instructions. > + machine_mode mode = GET_MODE (reg); > + if (mode == TFmode && !TARGET_IEEEQUAD) > + return false; > + > + if (mode == TDmode || mode == IFmode) > + return false; Don't we have a helper for this? The ibm128 part at least. Maybe we should just have an attribute on the insns that *can* get this optimisation? > + return (base_reg_operand (XEXP (addr, 0), Pmode) > + && satisfies_constraint_I (XEXP (addr, 1))); short_cint_operand. But maybe just rs6000_legitimate_offset_address_p? > /* Flags that need to be turned off if -mno-power10. */ > #define OTHER_POWER10_MASKS (OPTION_MASK_MMA \ > | OPTION_MASK_PCREL \ > + | OPTION_MASK_PCREL_OPT \ > | OPTION_MASK_PREFIXED) This isn't a CPU flag. Instead, it is just an optimisation option. > @@ -8515,7 +8522,10 @@ rs6000_delegitimize_address (rtx orig_x) > { > rtx x, y, offset; > > - if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR) > + if (GET_CODE (orig_x) == UNSPEC > + && (XINT (orig_x, 1) == UNSPEC_FUSION_GPR > + || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR > + || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) > orig_x = XVECEXP (orig_x, 0, 0); > > orig_x = delegitimize_mem_from_attrs (orig_x); Why this? It needs an explanation (in the code). > @@ -13197,6 +13207,19 @@ print_operand (FILE *file, rtx x, int code) > fprintf (file, "%d", 128 >> (REGNO (x) - CR0_REGNO)); > return; > > + case 'r': > + /* X is a label number for the PCREL_OPT optimization. Emit the .reloc > + to enable this optimization, unless the value is 0. */ > + gcc_assert (CONST_INT_P (x)); > + if (UINTVAL (x) != 0) > + { > + unsigned int label_num = UINTVAL (x); > + fprintf (file, > + ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t", > + label_num, label_num); > + } > + return; Don't eat output modifiers please. We have only so few left, and we cannot recycle any more without pain. I don't see why we cannot just do this in the normal output (C) code of the one or few insns that want this? > ;; The ISA we implement. > -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10" > +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,pcrel_opt" > (const_string "any")) No. Please read the heading. Segher ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/3] Power10: Add PCREL_OPT load support 2020-08-21 2:09 ` Segher Boessenkool @ 2020-09-03 17:24 ` Michael Meissner 0 siblings, 0 replies; 9+ messages in thread From: Michael Meissner @ 2020-09-03 17:24 UTC (permalink / raw) To: Segher Boessenkool Cc: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt, Peter Bergner, Alan Modra On Thu, Aug 20, 2020 at 09:09:40PM -0500, Segher Boessenkool wrote: > Hi! > > On Tue, Aug 18, 2020 at 02:34:01AM -0400, Michael Meissner wrote: > > +// Maximum number of insns to scan between the load address and the load that > > Please don't mix /* and // comments. Just stick to /* comments, like > all the rest of our backend? > > > +const int MAX_PCREL_OPT_INSNS = 10; > > "static const", and not tab please (just a space). With the switch to using flow control, this will go away. > > + // LWA is a DS format instruction, but LWZ is a D format instruction. We use > > + // DImode for the mode to force checking whether the bottom 2 bits are 0. > > How so? Can't you just check if the resulting insn is accepted? That > would be so much more robust (and can have a correct description more > easily, too!) Unfortunately no. The prefixed form will be accepted, and the whole PCREL_OPT optimization requires a non-prefixed instruction. > > + // However FPR and vector registers uses the LFIWAX instruction which is > > + // indexed only. > > (vectors use lxsiwax) Yes. > > + if (GET_CODE (mem) == SIGN_EXTEND && GET_MODE (XEXP (mem, 0)) == SImode) > > You're checking here whether the address of the MEM is SImode. Whoops. > > + { > > + if (!INT_REGNO_P (reg_regno)) > > That should use base_reg_operand instead? > > > + // The optimization will only work on non-prefixed offsettable loads. > > + rtx addr = XEXP (mem_inner, 0); > > + enum insn_form iform = address_to_insn_form (addr, mem_mode, non_prefixed); > > + if (iform != INSN_FORM_BASE_REG > > + && iform != INSN_FORM_D > > + && iform != INSN_FORM_DS > > + && iform != INSN_FORM_DQ) > > + return false; > > Sounds like something for a utility function? Do we use this elsewhere? I will make a utility function. But no, we currently do not use this anywhere else. It might be useful for fusion. > > + ++pcrel_opt_next_num; > > Normal style is postfix increment. Empty lines around this are nice, it > indicates this is the point where we decided to do the thing. Or add a > comment even, maybe? > > > + unsigned int addr_regno = reg_or_subregno (addr_reg); > > + rtx label_num = GEN_INT (pcrel_opt_next_num); > > + rtx reg_di = gen_rtx_REG (DImode, reg_regno); > > + > > + PATTERN (addr_insn) > > + = ((addr_regno != reg_regno) > > + ? gen_pcrel_opt_ld_addr (addr_reg, addr_symbol, label_num, reg_di) > > + : gen_pcrel_opt_ld_addr_same_reg (addr_reg, addr_symbol, label_num)); > > Please use if() instead of multi-line expressions. The outer parens are > unnecessary, too. Ok. However, the outer parens make it easier to format in emacs. > > + // Revalidate the insn, backing out of the optimization if the insn is not > > + // supported. > > So the count will be incorrect then? No, the count is correct, since we don't count the PCREL_OPTs until everything is validated. > > > + INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0); > > + if (INSN_CODE (addr_insn) < 0) > > + { > > + PATTERN (addr_insn) = addr_set; > > + INSN_CODE (addr_insn) = recog (PATTERN (addr_insn), addr_insn, 0); > > + return false; > > + } > > Can you use the normal undo mechanisms, instead? apply_change_group > etc. I will think about it. > > +static rtx_insn * > > +next_active_insn_in_basic_block (rtx_insn *insn) > > +{ > > + insn = NEXT_INSN (insn); > > + > > + while (insn != NULL_RTX) > > + { > > + /* If the basic block ends or there is a jump of some kind, exit the > > + loop. */ > > + if (CALL_P (insn) > > + || JUMP_P (insn) > > + || JUMP_TABLE_DATA_P (insn) > > + || LABEL_P (insn) > > + || BARRIER_P (insn)) > > + return NULL; > > + > > + /* If this is a real insn, return it. */ > > + if (!insn->deleted () > > + && NONJUMP_INSN_P (insn) > > + && GET_CODE (PATTERN (insn)) != USE > > + && GET_CODE (PATTERN (insn)) != CLOBBER) > > + return insn; > > + > > + /* Loop for USE, CLOBBER, DEBUG_INSN, NOTEs. */ > > + insn = NEXT_INSN (insn); > > + } > > + > > + return NULL; > > +} > > There are things to do this. Please don't write code manually parsing > RTL unless you have to. I will switch to using the DF flow in the next set of patches. > > +// Validate that a load is actually a single instruction that can be optimized > > +// with the PCREL_OPT optimization. > > + > > +static bool > > +is_single_instruction (rtx_insn *insn, rtx reg) > > Of course it is a single instruction! A single RTL instruction... > Clarify as "single machine instruction"? My experience is some of the insns lie in terms of things like size. The size is set when it is split, but this pass may/may not run before those instructions are split. But I will change it to just testing if the length is 4, and hope for the best. > > +{ > > + if (!REG_P (reg) && !SUBREG_P (reg)) > > + return false; > > You need to check if is a subreg of reg, then. There are subregs of > other things. Not that you should see those here, but still. Good point, but in theory outside of register allocation, you should never see it. > > > + // _Decimal128 and IBM extended double are always multiple instructions. > > + machine_mode mode = GET_MODE (reg); > > + if (mode == TFmode && !TARGET_IEEEQUAD) > > + return false; > > + > > + if (mode == TDmode || mode == IFmode) > > + return false; > > Don't we have a helper for this? The ibm128 part at least. Yes, but it doesn't cover TD. > Maybe we should just have an attribute on the insns that *can* get this > optimisation? > > > + return (base_reg_operand (XEXP (addr, 0), Pmode) > > + && satisfies_constraint_I (XEXP (addr, 1))); > > short_cint_operand. But maybe just rs6000_legitimate_offset_address_p? > > > /* Flags that need to be turned off if -mno-power10. */ > > #define OTHER_POWER10_MASKS (OPTION_MASK_MMA \ > > | OPTION_MASK_PCREL \ > > + | OPTION_MASK_PCREL_OPT \ > > | OPTION_MASK_PREFIXED) > > This isn't a CPU flag. Instead, it is just an optimisation option. But you want it turned off if somebody says __attribute__((target("cpu=power8"))), which is what OTHER_POWER10_MASKS does. > > @@ -8515,7 +8522,10 @@ rs6000_delegitimize_address (rtx orig_x) > > { > > rtx x, y, offset; > > > > - if (GET_CODE (orig_x) == UNSPEC && XINT (orig_x, 1) == UNSPEC_FUSION_GPR) > > + if (GET_CODE (orig_x) == UNSPEC > > + && (XINT (orig_x, 1) == UNSPEC_FUSION_GPR > > + || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR > > + || XINT (orig_x, 1) == UNSPEC_PCREL_OPT_LD_ADDR_SAME_REG)) > > orig_x = XVECEXP (orig_x, 0, 0); > > > > orig_x = delegitimize_mem_from_attrs (orig_x); > > Why this? It needs an explanation (in the code). It is needed in some cases to get back the original SYMBOL_REF (particularly for debugging). > > @@ -13197,6 +13207,19 @@ print_operand (FILE *file, rtx x, int code) > > fprintf (file, "%d", 128 >> (REGNO (x) - CR0_REGNO)); > > return; > > > > + case 'r': > > + /* X is a label number for the PCREL_OPT optimization. Emit the .reloc > > + to enable this optimization, unless the value is 0. */ > > + gcc_assert (CONST_INT_P (x)); > > + if (UINTVAL (x) != 0) > > + { > > + unsigned int label_num = UINTVAL (x); > > + fprintf (file, > > + ".reloc .Lpcrel%u-8,R_PPC64_PCREL_OPT,.-(.Lpcrel%u-8)\n\t", > > + label_num, label_num); > > + } > > + return; > > Don't eat output modifiers please. We have only so few left, and we > cannot recycle any more without pain. > > I don't see why we cannot just do this in the normal output (C) code of > the one or few insns that want this? Well it is quite a few insns in pcrel-opt.md. I will look at changing it. Part of it is historical reasons in that it was needed in the old version of the patches that could be turned off. > > ;; The ISA we implement. > > -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10" > > +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10,pcrel_opt" > > (const_string "any")) > > No. Please read the heading. Ok, I will just use p10. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.ibm.com, phone: +1 (978) 899-4797 ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-09-22 3:22 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-05 4:27 [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) Michael Meissner 2020-09-05 4:31 ` [PATCH 1/3] power10: Add PCREL_OPT load support Michael Meissner 2020-09-05 4:35 ` [PATCH 2/3] power10: Add PCREL_OPT store support Michael Meissner 2020-09-05 4:36 ` [PATCH 3/3] power10: Add tests for PCREL_OPT Michael Meissner 2020-09-05 15:20 ` [PATCH 0/3] Power10 PCREL_OPT support (September 5th 2020) Michael Meissner 2020-09-22 3:21 ` Ping: " Michael Meissner -- strict thread matches above, loose matches on Subject: below -- 2020-08-18 6:31 [PATCH 0/3] Power10 PCREL_OPT support Michael Meissner 2020-08-18 6:34 ` [PATCH 1/3] Power10: Add PCREL_OPT load support Michael Meissner 2020-08-21 2:09 ` Segher Boessenkool 2020-09-03 17:24 ` Michael Meissner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).