public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/acsawdey/heads/fusion-combine)] Combine patterns for p10 load-cmpi fusion
@ 2021-01-26 22:42 Aaron Sawdey
0 siblings, 0 replies; 7+ messages in thread
From: Aaron Sawdey @ 2021-01-26 22:42 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:d5e287320f15b5545a0d3033da070180695137dc
commit d5e287320f15b5545a0d3033da070180695137dc
Author: Aaron Sawdey <acsawdey@linux.ibm.com>
Date: Mon Sep 28 11:15:46 2020 -0500
Combine patterns for p10 load-cmpi fusion
This patch adds the first batch of patterns to support p10 fusion. These
will allow combine to create a single insn for a pair of instructions
that power10 can fuse and execute. These particular fusion pairs have the
requirement that only cr0 can be used when fusing a load with a compare
immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
to put that requirement in, and if it doesn't work out the splitter
can change it back into 2 insns so scheduling can move them apart.
The patterns are generated by a script genfusion.pl and live in new file
fusion.md. This script will be expanded to generate more patterns for
fusion.
This also adds option -mpower10-fusion which defaults on for power10 and
will gate all these fusion patterns. In addition I have added an
undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
that just controls the load+compare-immediate patterns. I have made
these default on for power10 but they are not disallowed for earlier
processors because it is still valid code. This allows us to test the
correctness of fusion code generation by turning it on explicitly.
gcc/ChangeLog:
* config/rs6000/genfusion.pl: New script to generate
define_insn_and_split patterns so combine can arrange fused
instructions next to each other.
* config/rs6000/fusion.md: New file, generated fused instruction
patterns for combine.
* config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
(non_update_memory_operand): New predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
POWERPC_MASKS.
* config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
prototype.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Automatically set OPTION_MASK_P10_FUSION and
OPTION_MASK_P10_FUSION_LD_CMPI if target is power10.
(rs600_opt_masks): Allow -mpower10-fusion
in function attributes.
(address_is_non_pfx_d_or_x): New function.
* config/rs6000/rs6000.h: Add MASK_P10_FUSION.
* config/rs6000/rs6000.md: Include fusion.md.
* config/rs6000/rs6000.opt: Add -mpower10-fusion
and -mpower10-fusion-ld-cmpi.
* config/rs6000/t-rs6000: Add dependencies involving fusion.md
but leave commented for now, so genfusion.pl must be run by hand.
Diff:
---
gcc/config/rs6000/fusion.md | 356 ++++++++++++++++++++++++++++++++++++++
gcc/config/rs6000/genfusion.pl | 148 ++++++++++++++++
gcc/config/rs6000/predicates.md | 14 ++
gcc/config/rs6000/rs6000-cpus.def | 6 +-
gcc/config/rs6000/rs6000-protos.h | 2 +
gcc/config/rs6000/rs6000.c | 50 ++++++
gcc/config/rs6000/rs6000.h | 1 +
gcc/config/rs6000/rs6000.md | 1 +
gcc/config/rs6000/rs6000.opt | 8 +
gcc/config/rs6000/t-rs6000 | 6 +-
10 files changed, 590 insertions(+), 2 deletions(-)
diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
new file mode 100644
index 00000000000..1fb6416675c
--- /dev/null
+++ b/gcc/config/rs6000/fusion.md
@@ -0,0 +1,356 @@
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020,2021 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (zero_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (sign_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (zero_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:GPR 0 "gpc_reg_operand" "=r") (zero_extend:GPR (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi %2,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
+ QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0) (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
new file mode 100755
index 00000000000..2b53c10e9e3
--- /dev/null
+++ b/gcc/config/rs6000/genfusion.pl
@@ -0,0 +1,148 @@
+#!/usr/bin/perl
+# Generate fusion.md
+#
+# Copyright (C) 2020,2021 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+use warnings;
+use strict;
+
+print <<'EOF';
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020,2021 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+EOF
+
+sub mode_to_ldst_char
+{
+ my ($mode) = @_;
+ my %x = (DI => 'd', SI => 'w', HI => 'h', QI => 'b');
+ return $x{$mode} if exists $x{$mode};
+ return '?';
+}
+
+sub gen_ld_cmpi_p10
+{
+ my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
+ $ccmode, $np, $extend, $resultmode);
+ LMODE: foreach $lmode ('DI','SI','HI','QI') {
+ $ldst = mode_to_ldst_char($lmode);
+ $clobbermode = $lmode;
+ # For clobber, we need a SI/DI reg in case we
+ # split because we have to sign/zero extend.
+ if ($lmode eq 'HI' || $lmode eq 'QI') { $clobbermode = "GPR"; }
+ RESULT: foreach $result ('clobber', $lmode, "EXT".$lmode) {
+ # EXTDI does not exist, and we cannot directly produce HI/QI results.
+ next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI";
+ # Don't allow EXTQI because that would allow HI result which we can't do.
+ $result = "GPR" if $result eq "EXTQI";
+ CCMODE: foreach $ccmode ('CC','CCUNS') {
+ $np = "NON_PREFIXED_D";
+ if ( $ccmode eq 'CC' ) {
+ next CCMODE if $lmode eq 'QI';
+ if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
+ # ld and lwa are both DS-FORM.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "";
+ $echr = "a";
+ $constpred = "const_m1_to_1_operand";
+ } else {
+ if ( $lmode eq 'DI' ) {
+ # ld is DS-form, but lwz is not.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "l";
+ $echr = "z";
+ $constpred = "const_0_to_1_operand";
+ }
+ if ($lmode eq 'DI') { $echr = ""; }
+ if ($result =~ m/^EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') {
+ # We always need extension if result > lmode.
+ if ( $ccmode eq 'CC' ) {
+ $extend = "sign";
+ } else {
+ $extend = "zero";
+ }
+ } else {
+ # Result of SI/DI does not need sign extension.
+ $extend = "none";
+ }
+ print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
+ print ";; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend\n";
+
+ print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
+ print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
+ print " (compare:${ccmode} (match_operand:${lmode} 1 \"non_update_memory_operand\" \"m\")\n";
+ if ($ccmode eq 'CCUNS') { print " "; }
+ print " (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
+ if ($result eq 'clobber') {
+ print " (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
+ } elsif ($result eq $lmode) {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (match_dup 1))]\n";
+ } else {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n";
+ }
+ print " \"(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)\"\n";
+ print " \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3\"\n";
+ print " \"&& reload_completed\n";
+ print " && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
+ print " || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),\n";
+ print " ${lmode}mode, ${np}))\"\n";
+
+ if ($extend eq "none") {
+ print " [(set (match_dup 0) (match_dup 1))\n";
+ } else {
+ $resultmode = $result;
+ if ( $result eq 'clobber' ) { $resultmode = $clobbermode }
+ print " [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 1)))\n";
+ }
+ print " (set (match_dup 2)\n";
+ print " (compare:${ccmode} (match_dup 0) (match_dup 3)))]\n";
+ print " \"\"\n";
+ print " [(set_attr \"type\" \"load\")\n";
+ print " (set_attr \"cost\" \"8\")\n";
+ print " (set_attr \"length\" \"8\")])\n";
+ print "\n";
+ }
+ }
+ }
+}
+
+
+gen_ld_cmpi_p10();
+
+exit(0);
+
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 5d1952e59d3..76328ecff3d 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -297,6 +297,11 @@
(and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 1)")))
+;; Match op = -1, op = 0, or op = 1.
+(define_predicate "const_m1_to_1_operand"
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
+
;; Match op = 0..3.
(define_predicate "const_0_to_3_operand"
(and (match_code "const_int")
@@ -847,6 +852,15 @@
|| GET_CODE (XEXP (op, 0)) == PRE_DEC
|| GET_CODE (XEXP (op, 0)) == PRE_MODIFY))"))
+;; Anything that matches memory_operand but does not update the address.
+(define_predicate "non_update_memory_operand"
+ (match_code "mem")
+{
+ if (update_address_mem (op, mode))
+ return 0;
+ return memory_operand (op, mode);
+})
+
;; Return 1 if the operand is a MEM with an indexed-form address.
(define_special_predicate "indexed_address_mem"
(match_test "(MEM_P (op)
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index fa5c75bb49c..fc9376db3f4 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -81,7 +81,9 @@
#define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \
| OPTION_MASK_POWER10 \
- | OTHER_POWER10_MASKS)
+ | OTHER_POWER10_MASKS \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI)
/* Flags that need to be turned off if -mno-power9-vector. */
#define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW \
@@ -128,6 +130,8 @@
| OPTION_MASK_FLOAT128_KEYWORD \
| OPTION_MASK_FPRND \
| OPTION_MASK_POWER10 \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI \
| OPTION_MASK_HTM \
| OPTION_MASK_ISEL \
| OPTION_MASK_MFCRF \
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 9cca7325d0d..d9d44fe9821 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -191,6 +191,8 @@ enum non_prefixed_form {
extern enum insn_form address_to_insn_form (rtx, machine_mode,
enum non_prefixed_form);
+extern bool address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefix_format);
extern bool prefixed_load_p (rtx_insn *);
extern bool prefixed_store_p (rtx_insn *);
extern bool prefixed_paddi_p (rtx_insn *);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b9e90ae0468..48bbef6a5e2 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4430,6 +4430,13 @@ rs6000_option_override_internal (bool global_init_p)
if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
rs6000_isa_flags |= OPTION_MASK_MMA;
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+
+ if (TARGET_POWER10 &&
+ (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
+
/* Turn off vector pair/mma options on non-power10 systems. */
else if (!TARGET_POWER10 && TARGET_MMA)
{
@@ -23623,6 +23630,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "power9-minmax", OPTION_MASK_P9_MINMAX, false, true },
{ "power9-misc", OPTION_MASK_P9_MISC, false, true },
{ "power9-vector", OPTION_MASK_P9_VECTOR, false, true },
+ { "power10-fusion", OPTION_MASK_P10_FUSION, false, true },
{ "powerpc-gfxopt", OPTION_MASK_PPC_GFXOPT, false, true },
{ "powerpc-gpopt", OPTION_MASK_PPC_GPOPT, false, true },
{ "prefixed", OPTION_MASK_PREFIXED, false, true },
@@ -25714,6 +25722,48 @@ address_to_insn_form (rtx addr,
return INSN_FORM_BAD;
}
+/* Given address rtx ADDR for a load of MODE, is this legitimate for a
+ non-prefixed D-form or X-form instruction? NON_PREFIXED_FORMAT is
+ given NON_PREFIXED_D or NON_PREFIXED_DS to indicate whether we want
+ a D-form or DS-form instruction. X-form and base_reg are always
+ allowed. */
+bool
+address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefixed_format)
+{
+ enum insn_form result_form;
+
+ result_form = address_to_insn_form (addr, mode, non_prefixed_format);
+
+ switch (non_prefixed_format)
+ {
+ case NON_PREFIXED_D:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_D:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ return false;
+ }
+ break;
+ case NON_PREFIXED_DS:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ return false;
+ }
+ break;
+ }
+ return false;
+}
+
/* Helper function to see if we're potentially looking at lfs/stfs.
- PARALLEL containing a SET and a CLOBBER
- stfs:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index b05dd827b13..233a92baf3c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
#define MASK_UPDATE OPTION_MASK_UPDATE
#define MASK_VSX OPTION_MASK_VSX
#define MASK_POWER10 OPTION_MASK_POWER10
+#define MASK_P10_FUSION OPTION_MASK_P10_FUSION
#ifndef IN_LIBGCC2
#define MASK_POWERPC64 OPTION_MASK_POWERPC64
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 976425361d9..a1315523fec 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -14927,3 +14927,4 @@
(include "dfp.md")
(include "crypto.md")
(include "htm.md")
+(include "fusion.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index d128e52ff8b..6240f779694 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -479,6 +479,14 @@ mpower8-vector
Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
Use vector and scalar instructions added in ISA 2.07.
+mpower10-fusion
+Target Mask(P10_FUSION) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
+mpower10-fusion-ld-cmpi
+Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
mcrypto
Target Mask(CRYPTO) Var(rs6000_isa_flags)
Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index af96b21667b..1541a653738 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -47,6 +47,9 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c
$(COMPILE) $<
$(POSTCOMPILE)
+#$(srcdir)/config/rs6000/fusion.md: $(srcdir)/config/rs6000/genfusion.pl
+# $(srcdir)/config/rs6000/genfusion.pl > $(srcdir)/config/rs6000/fusion.md
+
$(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
$(srcdir)/config/rs6000/rs6000-cpus.def
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
@@ -86,4 +89,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
$(srcdir)/config/rs6000/mma.md \
$(srcdir)/config/rs6000/crypto.md \
$(srcdir)/config/rs6000/htm.md \
- $(srcdir)/config/rs6000/dfp.md
+ $(srcdir)/config/rs6000/dfp.md \
+ $(srcdir)/config/rs6000/fusion.md
^ permalink raw reply [flat|nested] 7+ messages in thread
* [gcc(refs/users/acsawdey/heads/fusion-combine)] Combine patterns for p10 load-cmpi fusion
@ 2021-01-26 22:48 Aaron Sawdey
0 siblings, 0 replies; 7+ messages in thread
From: Aaron Sawdey @ 2021-01-26 22:48 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:e34ac01d38dd4515f39962cb818c6b7d5406b095
commit e34ac01d38dd4515f39962cb818c6b7d5406b095
Author: Aaron Sawdey <acsawdey@linux.ibm.com>
Date: Mon Sep 28 11:15:46 2020 -0500
Combine patterns for p10 load-cmpi fusion
This patch adds the first batch of patterns to support p10 fusion. These
will allow combine to create a single insn for a pair of instructions
that power10 can fuse and execute. These particular fusion pairs have the
requirement that only cr0 can be used when fusing a load with a compare
immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
to put that requirement in, and if it doesn't work out the splitter
can change it back into 2 insns so scheduling can move them apart.
The patterns are generated by a script genfusion.pl and live in new file
fusion.md. This script will be expanded to generate more patterns for
fusion.
This also adds option -mpower10-fusion which defaults on for power10 and
will gate all these fusion patterns. In addition I have added an
undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
that just controls the load+compare-immediate patterns. I have made
these default on for power10 but they are not disallowed for earlier
processors because it is still valid code. This allows us to test the
correctness of fusion code generation by turning it on explicitly.
gcc/ChangeLog:
* config/rs6000/genfusion.pl: New script to generate
define_insn_and_split patterns so combine can arrange fused
instructions next to each other.
* config/rs6000/fusion.md: New file, generated fused instruction
patterns for combine.
* config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
(non_update_memory_operand): New predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
POWERPC_MASKS.
* config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
prototype.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Automatically set OPTION_MASK_P10_FUSION and
OPTION_MASK_P10_FUSION_LD_CMPI if target is power10.
(rs600_opt_masks): Allow -mpower10-fusion
in function attributes.
(address_is_non_pfx_d_or_x): New function.
* config/rs6000/rs6000.h: Add MASK_P10_FUSION.
* config/rs6000/rs6000.md: Include fusion.md.
* config/rs6000/rs6000.opt: Add -mpower10-fusion
and -mpower10-fusion-ld-cmpi.
* config/rs6000/t-rs6000: Add dependencies involving fusion.md.
Diff:
---
gcc/config/rs6000/fusion.md | 357 ++++++++++++++++++++++++++++++++++++++
gcc/config/rs6000/genfusion.pl | 148 ++++++++++++++++
gcc/config/rs6000/predicates.md | 14 ++
gcc/config/rs6000/rs6000-cpus.def | 6 +-
gcc/config/rs6000/rs6000-protos.h | 2 +
gcc/config/rs6000/rs6000.c | 50 ++++++
gcc/config/rs6000/rs6000.h | 1 +
gcc/config/rs6000/rs6000.md | 1 +
gcc/config/rs6000/rs6000.opt | 8 +
gcc/config/rs6000/t-rs6000 | 6 +-
10 files changed, 591 insertions(+), 2 deletions(-)
diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
new file mode 100644
index 00000000000..a4d3a6ae7f3
--- /dev/null
+++ b/gcc/config/rs6000/fusion.md
@@ -0,0 +1,357 @@
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (zero_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (sign_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (zero_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:GPR 0 "gpc_reg_operand" "=r") (zero_extend:GPR (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
new file mode 100755
index 00000000000..2b53c10e9e3
--- /dev/null
+++ b/gcc/config/rs6000/genfusion.pl
@@ -0,0 +1,148 @@
+#!/usr/bin/perl
+# Generate fusion.md
+#
+# Copyright (C) 2020,2021 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+use warnings;
+use strict;
+
+print <<'EOF';
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020,2021 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+EOF
+
+sub mode_to_ldst_char
+{
+ my ($mode) = @_;
+ my %x = (DI => 'd', SI => 'w', HI => 'h', QI => 'b');
+ return $x{$mode} if exists $x{$mode};
+ return '?';
+}
+
+sub gen_ld_cmpi_p10
+{
+ my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
+ $ccmode, $np, $extend, $resultmode);
+ LMODE: foreach $lmode ('DI','SI','HI','QI') {
+ $ldst = mode_to_ldst_char($lmode);
+ $clobbermode = $lmode;
+ # For clobber, we need a SI/DI reg in case we
+ # split because we have to sign/zero extend.
+ if ($lmode eq 'HI' || $lmode eq 'QI') { $clobbermode = "GPR"; }
+ RESULT: foreach $result ('clobber', $lmode, "EXT".$lmode) {
+ # EXTDI does not exist, and we cannot directly produce HI/QI results.
+ next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI";
+ # Don't allow EXTQI because that would allow HI result which we can't do.
+ $result = "GPR" if $result eq "EXTQI";
+ CCMODE: foreach $ccmode ('CC','CCUNS') {
+ $np = "NON_PREFIXED_D";
+ if ( $ccmode eq 'CC' ) {
+ next CCMODE if $lmode eq 'QI';
+ if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
+ # ld and lwa are both DS-FORM.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "";
+ $echr = "a";
+ $constpred = "const_m1_to_1_operand";
+ } else {
+ if ( $lmode eq 'DI' ) {
+ # ld is DS-form, but lwz is not.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "l";
+ $echr = "z";
+ $constpred = "const_0_to_1_operand";
+ }
+ if ($lmode eq 'DI') { $echr = ""; }
+ if ($result =~ m/^EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') {
+ # We always need extension if result > lmode.
+ if ( $ccmode eq 'CC' ) {
+ $extend = "sign";
+ } else {
+ $extend = "zero";
+ }
+ } else {
+ # Result of SI/DI does not need sign extension.
+ $extend = "none";
+ }
+ print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
+ print ";; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend\n";
+
+ print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
+ print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
+ print " (compare:${ccmode} (match_operand:${lmode} 1 \"non_update_memory_operand\" \"m\")\n";
+ if ($ccmode eq 'CCUNS') { print " "; }
+ print " (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
+ if ($result eq 'clobber') {
+ print " (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
+ } elsif ($result eq $lmode) {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (match_dup 1))]\n";
+ } else {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n";
+ }
+ print " \"(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)\"\n";
+ print " \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3\"\n";
+ print " \"&& reload_completed\n";
+ print " && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
+ print " || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),\n";
+ print " ${lmode}mode, ${np}))\"\n";
+
+ if ($extend eq "none") {
+ print " [(set (match_dup 0) (match_dup 1))\n";
+ } else {
+ $resultmode = $result;
+ if ( $result eq 'clobber' ) { $resultmode = $clobbermode }
+ print " [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 1)))\n";
+ }
+ print " (set (match_dup 2)\n";
+ print " (compare:${ccmode} (match_dup 0) (match_dup 3)))]\n";
+ print " \"\"\n";
+ print " [(set_attr \"type\" \"load\")\n";
+ print " (set_attr \"cost\" \"8\")\n";
+ print " (set_attr \"length\" \"8\")])\n";
+ print "\n";
+ }
+ }
+ }
+}
+
+
+gen_ld_cmpi_p10();
+
+exit(0);
+
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 5d1952e59d3..76328ecff3d 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -297,6 +297,11 @@
(and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 1)")))
+;; Match op = -1, op = 0, or op = 1.
+(define_predicate "const_m1_to_1_operand"
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
+
;; Match op = 0..3.
(define_predicate "const_0_to_3_operand"
(and (match_code "const_int")
@@ -847,6 +852,15 @@
|| GET_CODE (XEXP (op, 0)) == PRE_DEC
|| GET_CODE (XEXP (op, 0)) == PRE_MODIFY))"))
+;; Anything that matches memory_operand but does not update the address.
+(define_predicate "non_update_memory_operand"
+ (match_code "mem")
+{
+ if (update_address_mem (op, mode))
+ return 0;
+ return memory_operand (op, mode);
+})
+
;; Return 1 if the operand is a MEM with an indexed-form address.
(define_special_predicate "indexed_address_mem"
(match_test "(MEM_P (op)
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index fa5c75bb49c..fc9376db3f4 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -81,7 +81,9 @@
#define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \
| OPTION_MASK_POWER10 \
- | OTHER_POWER10_MASKS)
+ | OTHER_POWER10_MASKS \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI)
/* Flags that need to be turned off if -mno-power9-vector. */
#define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW \
@@ -128,6 +130,8 @@
| OPTION_MASK_FLOAT128_KEYWORD \
| OPTION_MASK_FPRND \
| OPTION_MASK_POWER10 \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI \
| OPTION_MASK_HTM \
| OPTION_MASK_ISEL \
| OPTION_MASK_MFCRF \
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 9cca7325d0d..d9d44fe9821 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -191,6 +191,8 @@ enum non_prefixed_form {
extern enum insn_form address_to_insn_form (rtx, machine_mode,
enum non_prefixed_form);
+extern bool address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefix_format);
extern bool prefixed_load_p (rtx_insn *);
extern bool prefixed_store_p (rtx_insn *);
extern bool prefixed_paddi_p (rtx_insn *);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b9e90ae0468..48bbef6a5e2 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4430,6 +4430,13 @@ rs6000_option_override_internal (bool global_init_p)
if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
rs6000_isa_flags |= OPTION_MASK_MMA;
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+
+ if (TARGET_POWER10 &&
+ (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
+
/* Turn off vector pair/mma options on non-power10 systems. */
else if (!TARGET_POWER10 && TARGET_MMA)
{
@@ -23623,6 +23630,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "power9-minmax", OPTION_MASK_P9_MINMAX, false, true },
{ "power9-misc", OPTION_MASK_P9_MISC, false, true },
{ "power9-vector", OPTION_MASK_P9_VECTOR, false, true },
+ { "power10-fusion", OPTION_MASK_P10_FUSION, false, true },
{ "powerpc-gfxopt", OPTION_MASK_PPC_GFXOPT, false, true },
{ "powerpc-gpopt", OPTION_MASK_PPC_GPOPT, false, true },
{ "prefixed", OPTION_MASK_PREFIXED, false, true },
@@ -25714,6 +25722,48 @@ address_to_insn_form (rtx addr,
return INSN_FORM_BAD;
}
+/* Given address rtx ADDR for a load of MODE, is this legitimate for a
+ non-prefixed D-form or X-form instruction? NON_PREFIXED_FORMAT is
+ given NON_PREFIXED_D or NON_PREFIXED_DS to indicate whether we want
+ a D-form or DS-form instruction. X-form and base_reg are always
+ allowed. */
+bool
+address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefixed_format)
+{
+ enum insn_form result_form;
+
+ result_form = address_to_insn_form (addr, mode, non_prefixed_format);
+
+ switch (non_prefixed_format)
+ {
+ case NON_PREFIXED_D:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_D:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ return false;
+ }
+ break;
+ case NON_PREFIXED_DS:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ return false;
+ }
+ break;
+ }
+ return false;
+}
+
/* Helper function to see if we're potentially looking at lfs/stfs.
- PARALLEL containing a SET and a CLOBBER
- stfs:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index b05dd827b13..233a92baf3c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
#define MASK_UPDATE OPTION_MASK_UPDATE
#define MASK_VSX OPTION_MASK_VSX
#define MASK_POWER10 OPTION_MASK_POWER10
+#define MASK_P10_FUSION OPTION_MASK_P10_FUSION
#ifndef IN_LIBGCC2
#define MASK_POWERPC64 OPTION_MASK_POWERPC64
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 976425361d9..a1315523fec 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -14927,3 +14927,4 @@
(include "dfp.md")
(include "crypto.md")
(include "htm.md")
+(include "fusion.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index d128e52ff8b..6240f779694 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -479,6 +479,14 @@ mpower8-vector
Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
Use vector and scalar instructions added in ISA 2.07.
+mpower10-fusion
+Target Mask(P10_FUSION) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
+mpower10-fusion-ld-cmpi
+Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
mcrypto
Target Mask(CRYPTO) Var(rs6000_isa_flags)
Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index af96b21667b..e3a58bf31bf 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -47,6 +47,9 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c
$(COMPILE) $<
$(POSTCOMPILE)
+$(srcdir)/config/rs6000/fusion.md: $(srcdir)/config/rs6000/genfusion.pl
+ $(srcdir)/config/rs6000/genfusion.pl > $(srcdir)/config/rs6000/fusion.md
+
$(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
$(srcdir)/config/rs6000/rs6000-cpus.def
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
@@ -86,4 +89,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
$(srcdir)/config/rs6000/mma.md \
$(srcdir)/config/rs6000/crypto.md \
$(srcdir)/config/rs6000/htm.md \
- $(srcdir)/config/rs6000/dfp.md
+ $(srcdir)/config/rs6000/dfp.md \
+ $(srcdir)/config/rs6000/fusion.md
^ permalink raw reply [flat|nested] 7+ messages in thread
* [gcc(refs/users/acsawdey/heads/fusion-combine)] Combine patterns for p10 load-cmpi fusion
@ 2021-01-19 18:30 Aaron Sawdey
0 siblings, 0 replies; 7+ messages in thread
From: Aaron Sawdey @ 2021-01-19 18:30 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:6837dbc11c161b1399c1c15fd303c404f97e7f51
commit 6837dbc11c161b1399c1c15fd303c404f97e7f51
Author: Aaron Sawdey <acsawdey@linux.ibm.com>
Date: Mon Sep 28 11:15:46 2020 -0500
Combine patterns for p10 load-cmpi fusion
This patch adds the first batch of patterns to support p10 fusion. These
will allow combine to create a single insn for a pair of instructions
that power10 can fuse and execute. These particular fusion pairs have the
requirement that only cr0 can be used when fusing a load with a compare
immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
to put that requirement in, and if it doesn't work out the splitter
can change it back into 2 insns so scheduling can move them apart.
The patterns are generated by a script genfusion.pl and live in new file
fusion.md. This script will be expanded to generate more patterns for
fusion.
This also adds option -mpower10-fusion which defaults on for power10 and
will gate all these fusion patterns. In addition I have added an
undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
that just controls the load+compare-immediate patterns. I have made
these default on for power10 but they are not disallowed for earlier
processors because it is still valid code. This allows us to test the
correctness of fusion code generation by turning it on explicitly.
gcc/ChangeLog:
* config/rs6000/genfusion.pl: New script to generate
define_insn_and_split patterns so combine can arrange fused
instructions next to each other.
* config/rs6000/fusion.md: New file, generated fused instruction
patterns for combine.
* config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
(non_update_memory_operand): New predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
POWERPC_MASKS.
* config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
prototype.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
if target is power10. (rs600_opt_masks): Allow -mpower10-fusion
in function attributes. (address_is_non_pfx_d_or_x): New function.
* config/rs6000/rs6000.h: Add MASK_P10_FUSION.
* config/rs6000/rs6000.md: Include fusion.md.
* config/rs6000/rs6000.opt: Add -mpower10-fusion
and -mpower10-fusion-ld-cmpi.
* config/rs6000/t-rs6000: Add dependencies involving fusion.md.
Diff:
---
gcc/config/rs6000/fusion.md | 357 ++++++++++++++++++++++++++++++++++++++
gcc/config/rs6000/genfusion.pl | 144 +++++++++++++++
gcc/config/rs6000/predicates.md | 14 ++
gcc/config/rs6000/rs6000-cpus.def | 6 +-
gcc/config/rs6000/rs6000-protos.h | 2 +
gcc/config/rs6000/rs6000.c | 51 ++++++
gcc/config/rs6000/rs6000.h | 1 +
gcc/config/rs6000/rs6000.md | 1 +
gcc/config/rs6000/rs6000.opt | 8 +
gcc/config/rs6000/t-rs6000 | 6 +-
10 files changed, 588 insertions(+), 2 deletions(-)
diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
new file mode 100644
index 00000000000..a4d3a6ae7f3
--- /dev/null
+++ b/gcc/config/rs6000/fusion.md
@@ -0,0 +1,357 @@
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (zero_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (sign_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (zero_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:GPR 0 "gpc_reg_operand" "=r") (zero_extend:GPR (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
new file mode 100755
index 00000000000..494537c9439
--- /dev/null
+++ b/gcc/config/rs6000/genfusion.pl
@@ -0,0 +1,144 @@
+#!/usr/bin/perl -w
+# Generate fusion.md
+# Copyright (C) 2020 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+my $copyright = <<'EOF';
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+EOF
+
+print $copyright;
+
+sub mode_to_ldst_char
+{
+ my ($mode) = @_;
+ if ($mode eq 'DI') { return 'd'; }
+ if ($mode eq 'SI') { return 'w'; }
+ if ($mode eq 'HI') { return 'h'; }
+ if ($mode eq 'QI') { return 'b'; }
+ return '?';
+}
+
+sub gen_ld_cmpi_p10
+{
+ LMODE: foreach $lmode ('DI','SI','HI','QI') {
+ $ldst = mode_to_ldst_char($lmode);
+ $clobbermode = $lmode;
+ # For clobber, we need a SI/DI reg in case we split because we have to sign/zero extend.
+ if ( $lmode eq 'HI' || $lmode eq 'QI' ) { $clobbermode = "GPR"; }
+ RESULT: foreach $result ('clobber', $lmode, "EXT".$lmode) {
+ # EXTDI does not exist, and we cannot directly produce HI/QI results.
+ next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI";
+ # Don't allow EXTQI because that would allow HI result which we can't do.
+ if ( $result eq "EXTQI" ) { $result = "GPR"; }
+ CCMODE: foreach $ccmode ('CC','CCUNS') {
+ $np = "NON_PREFIXED_D";
+ if ( $ccmode eq 'CC' ) {
+ next CCMODE if $lmode eq 'QI';
+ if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
+ # ld and lwa are both DS-FORM.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "";
+ $echr = "a";
+ $constpred = "const_m1_to_1_operand";
+ } else {
+ if ( $lmode eq 'DI' ) {
+ # ld is DS-form, but lwz is not.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "l";
+ $echr = "z";
+ $constpred = "const_0_to_1_operand";
+ }
+ if ($lmode eq 'DI') { $echr = ""; }
+ if ($result =~ m/EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') {
+ # We always need extension if result > lmode.
+ if ( $ccmode eq 'CC' ) {
+ $extend = "sign";
+ } else {
+ $extend = "zero";
+ }
+ } else {
+ # Result of SI/DI does not need sign extension.
+ $extend = "none";
+ }
+ print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
+ print ";; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend\n";
+
+ print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
+ print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
+ print " (compare:${ccmode} (match_operand:${lmode} 1 \"non_update_memory_operand\" \"m\")\n";
+ print " (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
+ if ($result eq 'clobber') {
+ print " (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
+ } elsif ($result eq $lmode) {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (match_dup 1))]\n";
+ } else {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n";
+ }
+ print " \"(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)\"\n";
+ print " \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di 0,%0,%3\"\n";
+ print " \"&& reload_completed\n";
+ print " && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
+ print " || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), ${lmode}mode, ${np}))\"\n";
+ if ($extend eq "none") {
+ print " [(set (match_dup 0) (match_dup 1))\n";
+ } else {
+ $resultmode = $result;
+ if ( $result eq 'clobber' ) { $resultmode = $clobbermode }
+ print " [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 1)))\n";
+ }
+ print " (set (match_dup 2)\n";
+ print " (compare:${ccmode} (match_dup 0)\n";
+ print " (match_dup 3)))]\n";
+ print " \"\"\n";
+ print " [(set_attr \"type\" \"load\")\n";
+ print " (set_attr \"cost\" \"8\")\n";
+ print " (set_attr \"length\" \"8\")])\n";
+ print "\n";
+ }
+ }
+ }
+}
+
+
+gen_ld_cmpi_p10();
+
+exit(0);
+
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 5d1952e59d3..76328ecff3d 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -297,6 +297,11 @@
(and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 1)")))
+;; Match op = -1, op = 0, or op = 1.
+(define_predicate "const_m1_to_1_operand"
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
+
;; Match op = 0..3.
(define_predicate "const_0_to_3_operand"
(and (match_code "const_int")
@@ -847,6 +852,15 @@
|| GET_CODE (XEXP (op, 0)) == PRE_DEC
|| GET_CODE (XEXP (op, 0)) == PRE_MODIFY))"))
+;; Anything that matches memory_operand but does not update the address.
+(define_predicate "non_update_memory_operand"
+ (match_code "mem")
+{
+ if (update_address_mem (op, mode))
+ return 0;
+ return memory_operand (op, mode);
+})
+
;; Return 1 if the operand is a MEM with an indexed-form address.
(define_special_predicate "indexed_address_mem"
(match_test "(MEM_P (op)
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index fa5c75bb49c..fc9376db3f4 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -81,7 +81,9 @@
#define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \
| OPTION_MASK_POWER10 \
- | OTHER_POWER10_MASKS)
+ | OTHER_POWER10_MASKS \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI)
/* Flags that need to be turned off if -mno-power9-vector. */
#define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW \
@@ -128,6 +130,8 @@
| OPTION_MASK_FLOAT128_KEYWORD \
| OPTION_MASK_FPRND \
| OPTION_MASK_POWER10 \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI \
| OPTION_MASK_HTM \
| OPTION_MASK_ISEL \
| OPTION_MASK_MFCRF \
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 9cca7325d0d..d9d44fe9821 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -191,6 +191,8 @@ enum non_prefixed_form {
extern enum insn_form address_to_insn_form (rtx, machine_mode,
enum non_prefixed_form);
+extern bool address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefix_format);
extern bool prefixed_load_p (rtx_insn *);
extern bool prefixed_store_p (rtx_insn *);
extern bool prefixed_paddi_p (rtx_insn *);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b9e90ae0468..d35986d88c1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4430,6 +4430,12 @@ rs6000_option_override_internal (bool global_init_p)
if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
rs6000_isa_flags |= OPTION_MASK_MMA;
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
+
/* Turn off vector pair/mma options on non-power10 systems. */
else if (!TARGET_POWER10 && TARGET_MMA)
{
@@ -23623,6 +23629,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "power9-minmax", OPTION_MASK_P9_MINMAX, false, true },
{ "power9-misc", OPTION_MASK_P9_MISC, false, true },
{ "power9-vector", OPTION_MASK_P9_VECTOR, false, true },
+ { "power10-fusion", OPTION_MASK_P10_FUSION, false, true },
{ "powerpc-gfxopt", OPTION_MASK_PPC_GFXOPT, false, true },
{ "powerpc-gpopt", OPTION_MASK_PPC_GPOPT, false, true },
{ "prefixed", OPTION_MASK_PREFIXED, false, true },
@@ -25714,6 +25721,50 @@ address_to_insn_form (rtx addr,
return INSN_FORM_BAD;
}
+/* Given address rtx ADDR for a load of MODE, is this legitimate for a
+ non-prefixed D-form or X-form instruction? NON_PREFIXED_FORMAT is
+ given NON_PREFIXED_D or NON_PREFIXED_DS to indicate whether we want
+ a D-form or DS-form instruction. X-form and base_reg are always
+ allowed. */
+bool
+address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefixed_format)
+{
+ enum insn_form result_form;
+
+ result_form = address_to_insn_form (addr, mode, non_prefixed_format);
+
+ switch (non_prefixed_format)
+ {
+ case NON_PREFIXED_D:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_D:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ break;
+ }
+ break;
+ case NON_PREFIXED_DS:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ break;
+ }
+ break;
+ default:
+ break;
+ }
+ return false;
+}
+
/* Helper function to see if we're potentially looking at lfs/stfs.
- PARALLEL containing a SET and a CLOBBER
- stfs:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index b05dd827b13..233a92baf3c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
#define MASK_UPDATE OPTION_MASK_UPDATE
#define MASK_VSX OPTION_MASK_VSX
#define MASK_POWER10 OPTION_MASK_POWER10
+#define MASK_P10_FUSION OPTION_MASK_P10_FUSION
#ifndef IN_LIBGCC2
#define MASK_POWERPC64 OPTION_MASK_POWERPC64
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 976425361d9..a1315523fec 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -14927,3 +14927,4 @@
(include "dfp.md")
(include "crypto.md")
(include "htm.md")
+(include "fusion.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index d128e52ff8b..6240f779694 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -479,6 +479,14 @@ mpower8-vector
Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
Use vector and scalar instructions added in ISA 2.07.
+mpower10-fusion
+Target Mask(P10_FUSION) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
+mpower10-fusion-ld-cmpi
+Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
mcrypto
Target Mask(CRYPTO) Var(rs6000_isa_flags)
Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index af96b21667b..e3a58bf31bf 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -47,6 +47,9 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c
$(COMPILE) $<
$(POSTCOMPILE)
+$(srcdir)/config/rs6000/fusion.md: $(srcdir)/config/rs6000/genfusion.pl
+ $(srcdir)/config/rs6000/genfusion.pl > $(srcdir)/config/rs6000/fusion.md
+
$(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
$(srcdir)/config/rs6000/rs6000-cpus.def
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
@@ -86,4 +89,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
$(srcdir)/config/rs6000/mma.md \
$(srcdir)/config/rs6000/crypto.md \
$(srcdir)/config/rs6000/htm.md \
- $(srcdir)/config/rs6000/dfp.md
+ $(srcdir)/config/rs6000/dfp.md \
+ $(srcdir)/config/rs6000/fusion.md
^ permalink raw reply [flat|nested] 7+ messages in thread
* [gcc(refs/users/acsawdey/heads/fusion-combine)] Combine patterns for p10 load-cmpi fusion
@ 2021-01-05 18:54 Aaron Sawdey
0 siblings, 0 replies; 7+ messages in thread
From: Aaron Sawdey @ 2021-01-05 18:54 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:f1baa46f44166ad6a4f9184572bed07d003ac322
commit f1baa46f44166ad6a4f9184572bed07d003ac322
Author: Aaron Sawdey <acsawdey@linux.ibm.com>
Date: Mon Sep 28 11:15:46 2020 -0500
Combine patterns for p10 load-cmpi fusion
This patch adds the first batch of patterns to support p10 fusion. These
will allow combine to create a single insn for a pair of instructions
that power10 can fuse and execute. These particular fusion pairs have the
requirement that only cr0 can be used when fusing a load with a compare
immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
to put that requirement in, and if it doesn't work out the splitter
can change it back into 2 insns so scheduling can move them apart.
The patterns are generated by a script genfusion.pl and live in new file
fusion.md. This script will be expanded to generate more patterns for
fusion.
This also adds option -mpower10-fusion which defaults on for power10 and
will gate all these fusion patterns. In addition I have added an
undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
that just controls the load+compare-immediate patterns. I have made
these default on for power10 but they are not disallowed for earlier
processors because it is still valid code. This allows us to test the
correctness of fusion code generation by turning it on explicitly.
gcc/ChangeLog:
* config/rs6000/genfusion.pl: New script to generate
define_insn_and_split patterns so combine can arrange fused
instructions next to each other.
* config/rs6000/fusion.md: New file, generated fused instruction
patterns for combine.
* config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
(non_update_memory_operand): New predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
POWERPC_MASKS.
* config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
prototype.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
if target is power10. (rs600_opt_masks): Allow -mpower10-fusion
in function attributes. (address_is_non_pfx_d_or_x): New function.
* config/rs6000/rs6000.h: Add MASK_P10_FUSION.
* config/rs6000/rs6000.md: Include fusion.md.
* config/rs6000/rs6000.opt: Add -mpower10-fusion
and -mpower10-fusion-ld-cmpi.
* config/rs6000/t-rs6000: Add dependencies involving fusion.md.
Diff:
---
gcc/config/rs6000/fusion.md | 357 ++++++++++++++++++++++++++++++++++++++
gcc/config/rs6000/genfusion.pl | 144 +++++++++++++++
gcc/config/rs6000/predicates.md | 14 ++
gcc/config/rs6000/rs6000-cpus.def | 6 +-
gcc/config/rs6000/rs6000-protos.h | 2 +
gcc/config/rs6000/rs6000.c | 51 ++++++
gcc/config/rs6000/rs6000.h | 1 +
gcc/config/rs6000/rs6000.md | 1 +
gcc/config/rs6000/rs6000.opt | 8 +
gcc/config/rs6000/t-rs6000 | 6 +-
10 files changed, 588 insertions(+), 2 deletions(-)
diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
new file mode 100644
index 00000000000..a4d3a6ae7f3
--- /dev/null
+++ b/gcc/config/rs6000/fusion.md
@@ -0,0 +1,357 @@
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (zero_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (sign_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (zero_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:GPR 0 "gpc_reg_operand" "=r") (zero_extend:GPR (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
new file mode 100755
index 00000000000..494537c9439
--- /dev/null
+++ b/gcc/config/rs6000/genfusion.pl
@@ -0,0 +1,144 @@
+#!/usr/bin/perl -w
+# Generate fusion.md
+# Copyright (C) 2020 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+my $copyright = <<'EOF';
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+EOF
+
+print $copyright;
+
+sub mode_to_ldst_char
+{
+ my ($mode) = @_;
+ if ($mode eq 'DI') { return 'd'; }
+ if ($mode eq 'SI') { return 'w'; }
+ if ($mode eq 'HI') { return 'h'; }
+ if ($mode eq 'QI') { return 'b'; }
+ return '?';
+}
+
+sub gen_ld_cmpi_p10
+{
+ LMODE: foreach $lmode ('DI','SI','HI','QI') {
+ $ldst = mode_to_ldst_char($lmode);
+ $clobbermode = $lmode;
+ # For clobber, we need a SI/DI reg in case we split because we have to sign/zero extend.
+ if ( $lmode eq 'HI' || $lmode eq 'QI' ) { $clobbermode = "GPR"; }
+ RESULT: foreach $result ('clobber', $lmode, "EXT".$lmode) {
+ # EXTDI does not exist, and we cannot directly produce HI/QI results.
+ next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI";
+ # Don't allow EXTQI because that would allow HI result which we can't do.
+ if ( $result eq "EXTQI" ) { $result = "GPR"; }
+ CCMODE: foreach $ccmode ('CC','CCUNS') {
+ $np = "NON_PREFIXED_D";
+ if ( $ccmode eq 'CC' ) {
+ next CCMODE if $lmode eq 'QI';
+ if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
+ # ld and lwa are both DS-FORM.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "";
+ $echr = "a";
+ $constpred = "const_m1_to_1_operand";
+ } else {
+ if ( $lmode eq 'DI' ) {
+ # ld is DS-form, but lwz is not.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "l";
+ $echr = "z";
+ $constpred = "const_0_to_1_operand";
+ }
+ if ($lmode eq 'DI') { $echr = ""; }
+ if ($result =~ m/EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') {
+ # We always need extension if result > lmode.
+ if ( $ccmode eq 'CC' ) {
+ $extend = "sign";
+ } else {
+ $extend = "zero";
+ }
+ } else {
+ # Result of SI/DI does not need sign extension.
+ $extend = "none";
+ }
+ print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
+ print ";; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend\n";
+
+ print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
+ print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
+ print " (compare:${ccmode} (match_operand:${lmode} 1 \"non_update_memory_operand\" \"m\")\n";
+ print " (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
+ if ($result eq 'clobber') {
+ print " (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
+ } elsif ($result eq $lmode) {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (match_dup 1))]\n";
+ } else {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n";
+ }
+ print " \"(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)\"\n";
+ print " \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di 0,%0,%3\"\n";
+ print " \"&& reload_completed\n";
+ print " && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
+ print " || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), ${lmode}mode, ${np}))\"\n";
+ if ($extend eq "none") {
+ print " [(set (match_dup 0) (match_dup 1))\n";
+ } else {
+ $resultmode = $result;
+ if ( $result eq 'clobber' ) { $resultmode = $clobbermode }
+ print " [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 1)))\n";
+ }
+ print " (set (match_dup 2)\n";
+ print " (compare:${ccmode} (match_dup 0)\n";
+ print " (match_dup 3)))]\n";
+ print " \"\"\n";
+ print " [(set_attr \"type\" \"load\")\n";
+ print " (set_attr \"cost\" \"8\")\n";
+ print " (set_attr \"length\" \"8\")])\n";
+ print "\n";
+ }
+ }
+ }
+}
+
+
+gen_ld_cmpi_p10();
+
+exit(0);
+
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 5d1952e59d3..76328ecff3d 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -297,6 +297,11 @@
(and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 1)")))
+;; Match op = -1, op = 0, or op = 1.
+(define_predicate "const_m1_to_1_operand"
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
+
;; Match op = 0..3.
(define_predicate "const_0_to_3_operand"
(and (match_code "const_int")
@@ -847,6 +852,15 @@
|| GET_CODE (XEXP (op, 0)) == PRE_DEC
|| GET_CODE (XEXP (op, 0)) == PRE_MODIFY))"))
+;; Anything that matches memory_operand but does not update the address.
+(define_predicate "non_update_memory_operand"
+ (match_code "mem")
+{
+ if (update_address_mem (op, mode))
+ return 0;
+ return memory_operand (op, mode);
+})
+
;; Return 1 if the operand is a MEM with an indexed-form address.
(define_special_predicate "indexed_address_mem"
(match_test "(MEM_P (op)
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index fa5c75bb49c..fc9376db3f4 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -81,7 +81,9 @@
#define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \
| OPTION_MASK_POWER10 \
- | OTHER_POWER10_MASKS)
+ | OTHER_POWER10_MASKS \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI)
/* Flags that need to be turned off if -mno-power9-vector. */
#define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW \
@@ -128,6 +130,8 @@
| OPTION_MASK_FLOAT128_KEYWORD \
| OPTION_MASK_FPRND \
| OPTION_MASK_POWER10 \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI \
| OPTION_MASK_HTM \
| OPTION_MASK_ISEL \
| OPTION_MASK_MFCRF \
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 9cca7325d0d..d9d44fe9821 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -191,6 +191,8 @@ enum non_prefixed_form {
extern enum insn_form address_to_insn_form (rtx, machine_mode,
enum non_prefixed_form);
+extern bool address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefix_format);
extern bool prefixed_load_p (rtx_insn *);
extern bool prefixed_store_p (rtx_insn *);
extern bool prefixed_paddi_p (rtx_insn *);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 67681d18150..f810051baf3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4430,6 +4430,12 @@ rs6000_option_override_internal (bool global_init_p)
if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
rs6000_isa_flags |= OPTION_MASK_MMA;
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
+
/* Turn off vector pair/mma options on non-power10 systems. */
else if (!TARGET_POWER10 && TARGET_MMA)
{
@@ -23613,6 +23619,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "power9-minmax", OPTION_MASK_P9_MINMAX, false, true },
{ "power9-misc", OPTION_MASK_P9_MISC, false, true },
{ "power9-vector", OPTION_MASK_P9_VECTOR, false, true },
+ { "power10-fusion", OPTION_MASK_P10_FUSION, false, true },
{ "powerpc-gfxopt", OPTION_MASK_PPC_GFXOPT, false, true },
{ "powerpc-gpopt", OPTION_MASK_PPC_GPOPT, false, true },
{ "prefixed", OPTION_MASK_PREFIXED, false, true },
@@ -25704,6 +25711,50 @@ address_to_insn_form (rtx addr,
return INSN_FORM_BAD;
}
+/* Given address rtx ADDR for a load of MODE, is this legitimate for a
+ non-prefixed D-form or X-form instruction? NON_PREFIXED_FORMAT is
+ given NON_PREFIXED_D or NON_PREFIXED_DS to indicate whether we want
+ a D-form or DS-form instruction. X-form and base_reg are always
+ allowed. */
+bool
+address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefixed_format)
+{
+ enum insn_form result_form;
+
+ result_form = address_to_insn_form (addr, mode, non_prefixed_format);
+
+ switch (non_prefixed_format)
+ {
+ case NON_PREFIXED_D:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_D:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ break;
+ }
+ break;
+ case NON_PREFIXED_DS:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ break;
+ }
+ break;
+ default:
+ break;
+ }
+ return false;
+}
+
/* Helper function to see if we're potentially looking at lfs/stfs.
- PARALLEL containing a SET and a CLOBBER
- stfs:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index b05dd827b13..233a92baf3c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
#define MASK_UPDATE OPTION_MASK_UPDATE
#define MASK_VSX OPTION_MASK_VSX
#define MASK_POWER10 OPTION_MASK_POWER10
+#define MASK_P10_FUSION OPTION_MASK_P10_FUSION
#ifndef IN_LIBGCC2
#define MASK_POWERPC64 OPTION_MASK_POWERPC64
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bb9fb42f82a..1fa3277f1e6 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -14926,3 +14926,4 @@
(include "dfp.md")
(include "crypto.md")
(include "htm.md")
+(include "fusion.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index d128e52ff8b..6240f779694 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -479,6 +479,14 @@ mpower8-vector
Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
Use vector and scalar instructions added in ISA 2.07.
+mpower10-fusion
+Target Mask(P10_FUSION) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
+mpower10-fusion-ld-cmpi
+Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
mcrypto
Target Mask(CRYPTO) Var(rs6000_isa_flags)
Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index af96b21667b..e3a58bf31bf 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -47,6 +47,9 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c
$(COMPILE) $<
$(POSTCOMPILE)
+$(srcdir)/config/rs6000/fusion.md: $(srcdir)/config/rs6000/genfusion.pl
+ $(srcdir)/config/rs6000/genfusion.pl > $(srcdir)/config/rs6000/fusion.md
+
$(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
$(srcdir)/config/rs6000/rs6000-cpus.def
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
@@ -86,4 +89,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
$(srcdir)/config/rs6000/mma.md \
$(srcdir)/config/rs6000/crypto.md \
$(srcdir)/config/rs6000/htm.md \
- $(srcdir)/config/rs6000/dfp.md
+ $(srcdir)/config/rs6000/dfp.md \
+ $(srcdir)/config/rs6000/fusion.md
^ permalink raw reply [flat|nested] 7+ messages in thread
* [gcc(refs/users/acsawdey/heads/fusion-combine)] Combine patterns for p10 load-cmpi fusion
@ 2021-01-05 18:33 Aaron Sawdey
0 siblings, 0 replies; 7+ messages in thread
From: Aaron Sawdey @ 2021-01-05 18:33 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:b4b8444fe6d68b18ae2ae9a465184512cf7c307d
commit b4b8444fe6d68b18ae2ae9a465184512cf7c307d
Author: Aaron Sawdey <acsawdey@linux.ibm.com>
Date: Mon Sep 28 11:15:46 2020 -0500
Combine patterns for p10 load-cmpi fusion
This patch adds the first batch of patterns to support p10 fusion. These
will allow combine to create a single insn for a pair of instructions
that power10 can fuse and execute. These particular fusion pairs have the
requirement that only cr0 can be used when fusing a load with a compare
immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
to put that requirement in, and if it doesn't work out the splitter
can change it back into 2 insns so scheduling can move them apart.
The patterns are generated by a script genfusion.pl and live in new file
fusion.md. This script will be expanded to generate more patterns for
fusion.
This also adds option -mpower10-fusion which defaults on for power10 and
will gate all these fusion patterns. In addition I have added an
undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
that just controls the load+compare-immediate patterns. I have made
these default on for power10 but they are not disallowed for earlier
processors because it is still valid code. This allows us to test the
correctness of fusion code generation by turning it on explicitly.
gcc/ChangeLog:
* config/rs6000/genfusion.pl: New script to generate
define_insn_and_split patterns so combine can arrange fused
instructions next to each other.
* config/rs6000/fusion.md: New file, generated fused instruction
patterns for combine.
* config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
(non_update_memory_operand): New predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
POWERPC_MASKS.
* config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
prototype.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
if target is power10. (rs600_opt_masks): Allow -mpower10-fusion
in function attributes. (address_is_non_pfx_d_or_x): New function.
* config/rs6000/rs6000.h: Add MASK_P10_FUSION.
* config/rs6000/rs6000.md: Include fusion.md.
* config/rs6000/rs6000.opt: Add -mpower10-fusion
and -mpower10-fusion-ld-cmpi.
* config/rs6000/t-rs6000: Add dependencies involving fusion.md.
Diff:
---
gcc/config/rs6000/fusion.md | 357 ++++++++++++++++++++++++++++++++++++++
gcc/config/rs6000/genfusion.pl | 144 +++++++++++++++
gcc/config/rs6000/predicates.md | 14 ++
gcc/config/rs6000/rs6000-cpus.def | 6 +-
gcc/config/rs6000/rs6000-protos.h | 2 +
gcc/config/rs6000/rs6000.c | 51 ++++++
gcc/config/rs6000/rs6000.h | 1 +
gcc/config/rs6000/rs6000.md | 1 +
gcc/config/rs6000/rs6000.opt | 8 +
gcc/config/rs6000/t-rs6000 | 6 +-
10 files changed, 588 insertions(+), 2 deletions(-)
diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
new file mode 100644
index 00000000000..a4d3a6ae7f3
--- /dev/null
+++ b/gcc/config/rs6000/fusion.md
@@ -0,0 +1,357 @@
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (zero_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (sign_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (zero_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:GPR 0 "gpc_reg_operand" "=r") (zero_extend:GPR (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
new file mode 100755
index 00000000000..494537c9439
--- /dev/null
+++ b/gcc/config/rs6000/genfusion.pl
@@ -0,0 +1,144 @@
+#!/usr/bin/perl -w
+# Generate fusion.md
+# Copyright (C) 2020 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+my $copyright = <<'EOF';
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+EOF
+
+print $copyright;
+
+sub mode_to_ldst_char
+{
+ my ($mode) = @_;
+ if ($mode eq 'DI') { return 'd'; }
+ if ($mode eq 'SI') { return 'w'; }
+ if ($mode eq 'HI') { return 'h'; }
+ if ($mode eq 'QI') { return 'b'; }
+ return '?';
+}
+
+sub gen_ld_cmpi_p10
+{
+ LMODE: foreach $lmode ('DI','SI','HI','QI') {
+ $ldst = mode_to_ldst_char($lmode);
+ $clobbermode = $lmode;
+ # For clobber, we need a SI/DI reg in case we split because we have to sign/zero extend.
+ if ( $lmode eq 'HI' || $lmode eq 'QI' ) { $clobbermode = "GPR"; }
+ RESULT: foreach $result ('clobber', $lmode, "EXT".$lmode) {
+ # EXTDI does not exist, and we cannot directly produce HI/QI results.
+ next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI";
+ # Don't allow EXTQI because that would allow HI result which we can't do.
+ if ( $result eq "EXTQI" ) { $result = "GPR"; }
+ CCMODE: foreach $ccmode ('CC','CCUNS') {
+ $np = "NON_PREFIXED_D";
+ if ( $ccmode eq 'CC' ) {
+ next CCMODE if $lmode eq 'QI';
+ if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
+ # ld and lwa are both DS-FORM.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "";
+ $echr = "a";
+ $constpred = "const_m1_to_1_operand";
+ } else {
+ if ( $lmode eq 'DI' ) {
+ # ld is DS-form, but lwz is not.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "l";
+ $echr = "z";
+ $constpred = "const_0_to_1_operand";
+ }
+ if ($lmode eq 'DI') { $echr = ""; }
+ if ($result =~ m/EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') {
+ # We always need extension if result > lmode.
+ if ( $ccmode eq 'CC' ) {
+ $extend = "sign";
+ } else {
+ $extend = "zero";
+ }
+ } else {
+ # Result of SI/DI does not need sign extension.
+ $extend = "none";
+ }
+ print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
+ print ";; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend\n";
+
+ print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
+ print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
+ print " (compare:${ccmode} (match_operand:${lmode} 1 \"non_update_memory_operand\" \"m\")\n";
+ print " (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
+ if ($result eq 'clobber') {
+ print " (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
+ } elsif ($result eq $lmode) {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (match_dup 1))]\n";
+ } else {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n";
+ }
+ print " \"(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)\"\n";
+ print " \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di 0,%0,%3\"\n";
+ print " \"&& reload_completed\n";
+ print " && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
+ print " || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), ${lmode}mode, ${np}))\"\n";
+ if ($extend eq "none") {
+ print " [(set (match_dup 0) (match_dup 1))\n";
+ } else {
+ $resultmode = $result;
+ if ( $result eq 'clobber' ) { $resultmode = $clobbermode }
+ print " [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 1)))\n";
+ }
+ print " (set (match_dup 2)\n";
+ print " (compare:${ccmode} (match_dup 0)\n";
+ print " (match_dup 3)))]\n";
+ print " \"\"\n";
+ print " [(set_attr \"type\" \"load\")\n";
+ print " (set_attr \"cost\" \"8\")\n";
+ print " (set_attr \"length\" \"8\")])\n";
+ print "\n";
+ }
+ }
+ }
+}
+
+
+gen_ld_cmpi_p10();
+
+exit(0);
+
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 5d1952e59d3..76328ecff3d 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -297,6 +297,11 @@
(and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 1)")))
+;; Match op = -1, op = 0, or op = 1.
+(define_predicate "const_m1_to_1_operand"
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
+
;; Match op = 0..3.
(define_predicate "const_0_to_3_operand"
(and (match_code "const_int")
@@ -847,6 +852,15 @@
|| GET_CODE (XEXP (op, 0)) == PRE_DEC
|| GET_CODE (XEXP (op, 0)) == PRE_MODIFY))"))
+;; Anything that matches memory_operand but does not update the address.
+(define_predicate "non_update_memory_operand"
+ (match_code "mem")
+{
+ if (update_address_mem (op, mode))
+ return 0;
+ return memory_operand (op, mode);
+})
+
;; Return 1 if the operand is a MEM with an indexed-form address.
(define_special_predicate "indexed_address_mem"
(match_test "(MEM_P (op)
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index fa5c75bb49c..fc9376db3f4 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -81,7 +81,9 @@
#define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \
| OPTION_MASK_POWER10 \
- | OTHER_POWER10_MASKS)
+ | OTHER_POWER10_MASKS \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI)
/* Flags that need to be turned off if -mno-power9-vector. */
#define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW \
@@ -128,6 +130,8 @@
| OPTION_MASK_FLOAT128_KEYWORD \
| OPTION_MASK_FPRND \
| OPTION_MASK_POWER10 \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI \
| OPTION_MASK_HTM \
| OPTION_MASK_ISEL \
| OPTION_MASK_MFCRF \
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 9cca7325d0d..d9d44fe9821 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -191,6 +191,8 @@ enum non_prefixed_form {
extern enum insn_form address_to_insn_form (rtx, machine_mode,
enum non_prefixed_form);
+extern bool address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefix_format);
extern bool prefixed_load_p (rtx_insn *);
extern bool prefixed_store_p (rtx_insn *);
extern bool prefixed_paddi_p (rtx_insn *);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 67681d18150..f810051baf3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4430,6 +4430,12 @@ rs6000_option_override_internal (bool global_init_p)
if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
rs6000_isa_flags |= OPTION_MASK_MMA;
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
+
/* Turn off vector pair/mma options on non-power10 systems. */
else if (!TARGET_POWER10 && TARGET_MMA)
{
@@ -23613,6 +23619,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "power9-minmax", OPTION_MASK_P9_MINMAX, false, true },
{ "power9-misc", OPTION_MASK_P9_MISC, false, true },
{ "power9-vector", OPTION_MASK_P9_VECTOR, false, true },
+ { "power10-fusion", OPTION_MASK_P10_FUSION, false, true },
{ "powerpc-gfxopt", OPTION_MASK_PPC_GFXOPT, false, true },
{ "powerpc-gpopt", OPTION_MASK_PPC_GPOPT, false, true },
{ "prefixed", OPTION_MASK_PREFIXED, false, true },
@@ -25704,6 +25711,50 @@ address_to_insn_form (rtx addr,
return INSN_FORM_BAD;
}
+/* Given address rtx ADDR for a load of MODE, is this legitimate for a
+ non-prefixed D-form or X-form instruction? NON_PREFIXED_FORMAT is
+ given NON_PREFIXED_D or NON_PREFIXED_DS to indicate whether we want
+ a D-form or DS-form instruction. X-form and base_reg are always
+ allowed. */
+bool
+address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefixed_format)
+{
+ enum insn_form result_form;
+
+ result_form = address_to_insn_form (addr, mode, non_prefixed_format);
+
+ switch (non_prefixed_format)
+ {
+ case NON_PREFIXED_D:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_D:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ break;
+ }
+ break;
+ case NON_PREFIXED_DS:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ break;
+ }
+ break;
+ default:
+ break;
+ }
+ return false;
+}
+
/* Helper function to see if we're potentially looking at lfs/stfs.
- PARALLEL containing a SET and a CLOBBER
- stfs:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index b05dd827b13..233a92baf3c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
#define MASK_UPDATE OPTION_MASK_UPDATE
#define MASK_VSX OPTION_MASK_VSX
#define MASK_POWER10 OPTION_MASK_POWER10
+#define MASK_P10_FUSION OPTION_MASK_P10_FUSION
#ifndef IN_LIBGCC2
#define MASK_POWERPC64 OPTION_MASK_POWERPC64
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index bb9fb42f82a..1fa3277f1e6 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -14926,3 +14926,4 @@
(include "dfp.md")
(include "crypto.md")
(include "htm.md")
+(include "fusion.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index d128e52ff8b..66c9b262b3a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -479,6 +479,14 @@ mpower8-vector
Target Mask(P8_VECTOR) Var(rs6000_isa_flags)
Use vector and scalar instructions added in ISA 2.07.
+mpower10-fusion
+Target Report Mask(P10_FUSION) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
+mpower10-fusion-ld-cmpi
+Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
mcrypto
Target Mask(CRYPTO) Var(rs6000_isa_flags)
Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index af96b21667b..e3a58bf31bf 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -47,6 +47,9 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c
$(COMPILE) $<
$(POSTCOMPILE)
+$(srcdir)/config/rs6000/fusion.md: $(srcdir)/config/rs6000/genfusion.pl
+ $(srcdir)/config/rs6000/genfusion.pl > $(srcdir)/config/rs6000/fusion.md
+
$(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
$(srcdir)/config/rs6000/rs6000-cpus.def
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
@@ -86,4 +89,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
$(srcdir)/config/rs6000/mma.md \
$(srcdir)/config/rs6000/crypto.md \
$(srcdir)/config/rs6000/htm.md \
- $(srcdir)/config/rs6000/dfp.md
+ $(srcdir)/config/rs6000/dfp.md \
+ $(srcdir)/config/rs6000/fusion.md
^ permalink raw reply [flat|nested] 7+ messages in thread
* [gcc(refs/users/acsawdey/heads/fusion-combine)] Combine patterns for p10 load-cmpi fusion
@ 2020-12-11 2:51 Aaron Sawdey
0 siblings, 0 replies; 7+ messages in thread
From: Aaron Sawdey @ 2020-12-11 2:51 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:bc9785ead155998d9e2460679d2d052289e120af
commit bc9785ead155998d9e2460679d2d052289e120af
Author: Aaron Sawdey <acsawdey@linux.ibm.com>
Date: Mon Sep 28 11:15:46 2020 -0500
Combine patterns for p10 load-cmpi fusion
This patch adds the first batch of patterns to support p10 fusion. These
will allow combine to create a single insn for a pair of instructions
that power10 can fuse and execute. These particular fusion pairs have the
requirement that only cr0 can be used when fusing a load with a compare
immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
to put that requirement in, and if it doesn't work out the splitter
can change it back into 2 insns so scheduling can move them apart.
The patterns are generated by a script genfusion.pl and live in new file
fusion.md. This script will be expanded to generate more patterns for
fusion.
This also adds option -mpower10-fusion which defaults on for power10 and
will gate all these fusion patterns. In addition I have added an
undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
that just controls the load+compare-immediate patterns. I have made
these default on for power10 but they are not disallowed for earlier
processors because it is still valid code. This allows us to test the
correctness of fusion code generation by turning it on explicitly.
gcc/ChangeLog:
* config/rs6000/genfusion.pl: New script to generate
define_insn_and_split patterns so combine can arrange fused
instructions next to each other.
* config/rs6000/fusion.md: New file, generated fused instruction
patterns for combine.
* config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
(non_update_memory_operand): New predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
POWERPC_MASKS.
* config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
prototype.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
if target is power10. (rs600_opt_masks): Allow -mpower10-fusion
in function attributes. (address_is_non_pfx_d_or_x): New function.
* config/rs6000/rs6000.h: Add MASK_P10_FUSION.
* config/rs6000/rs6000.md: Include fusion.md.
* config/rs6000/rs6000.opt: Add -mpower10-fusion
and -mpower10-fusion-ld-cmpi.
* config/rs6000/t-rs6000: Add dependencies involving fusion.md.
Diff:
---
gcc/config/rs6000/fusion.md | 357 ++++++++++++++++++++++++++++++++++++++
gcc/config/rs6000/genfusion.pl | 144 +++++++++++++++
gcc/config/rs6000/predicates.md | 14 ++
gcc/config/rs6000/rs6000-cpus.def | 6 +-
gcc/config/rs6000/rs6000-protos.h | 2 +
gcc/config/rs6000/rs6000.c | 51 ++++++
gcc/config/rs6000/rs6000.h | 1 +
gcc/config/rs6000/rs6000.md | 1 +
gcc/config/rs6000/rs6000.opt | 8 +
gcc/config/rs6000/t-rs6000 | 6 +-
10 files changed, 588 insertions(+), 2 deletions(-)
diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
new file mode 100644
index 00000000000..a4d3a6ae7f3
--- /dev/null
+++ b/gcc/config/rs6000/fusion.md
@@ -0,0 +1,357 @@
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (zero_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (sign_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (zero_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:GPR 0 "gpc_reg_operand" "=r") (zero_extend:GPR (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
new file mode 100755
index 00000000000..494537c9439
--- /dev/null
+++ b/gcc/config/rs6000/genfusion.pl
@@ -0,0 +1,144 @@
+#!/usr/bin/perl -w
+# Generate fusion.md
+# Copyright (C) 2020 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+my $copyright = <<'EOF';
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+EOF
+
+print $copyright;
+
+sub mode_to_ldst_char
+{
+ my ($mode) = @_;
+ if ($mode eq 'DI') { return 'd'; }
+ if ($mode eq 'SI') { return 'w'; }
+ if ($mode eq 'HI') { return 'h'; }
+ if ($mode eq 'QI') { return 'b'; }
+ return '?';
+}
+
+sub gen_ld_cmpi_p10
+{
+ LMODE: foreach $lmode ('DI','SI','HI','QI') {
+ $ldst = mode_to_ldst_char($lmode);
+ $clobbermode = $lmode;
+ # For clobber, we need a SI/DI reg in case we split because we have to sign/zero extend.
+ if ( $lmode eq 'HI' || $lmode eq 'QI' ) { $clobbermode = "GPR"; }
+ RESULT: foreach $result ('clobber', $lmode, "EXT".$lmode) {
+ # EXTDI does not exist, and we cannot directly produce HI/QI results.
+ next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI";
+ # Don't allow EXTQI because that would allow HI result which we can't do.
+ if ( $result eq "EXTQI" ) { $result = "GPR"; }
+ CCMODE: foreach $ccmode ('CC','CCUNS') {
+ $np = "NON_PREFIXED_D";
+ if ( $ccmode eq 'CC' ) {
+ next CCMODE if $lmode eq 'QI';
+ if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
+ # ld and lwa are both DS-FORM.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "";
+ $echr = "a";
+ $constpred = "const_m1_to_1_operand";
+ } else {
+ if ( $lmode eq 'DI' ) {
+ # ld is DS-form, but lwz is not.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "l";
+ $echr = "z";
+ $constpred = "const_0_to_1_operand";
+ }
+ if ($lmode eq 'DI') { $echr = ""; }
+ if ($result =~ m/EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') {
+ # We always need extension if result > lmode.
+ if ( $ccmode eq 'CC' ) {
+ $extend = "sign";
+ } else {
+ $extend = "zero";
+ }
+ } else {
+ # Result of SI/DI does not need sign extension.
+ $extend = "none";
+ }
+ print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
+ print ";; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend\n";
+
+ print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
+ print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
+ print " (compare:${ccmode} (match_operand:${lmode} 1 \"non_update_memory_operand\" \"m\")\n";
+ print " (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
+ if ($result eq 'clobber') {
+ print " (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
+ } elsif ($result eq $lmode) {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (match_dup 1))]\n";
+ } else {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n";
+ }
+ print " \"(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)\"\n";
+ print " \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di 0,%0,%3\"\n";
+ print " \"&& reload_completed\n";
+ print " && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
+ print " || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), ${lmode}mode, ${np}))\"\n";
+ if ($extend eq "none") {
+ print " [(set (match_dup 0) (match_dup 1))\n";
+ } else {
+ $resultmode = $result;
+ if ( $result eq 'clobber' ) { $resultmode = $clobbermode }
+ print " [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 1)))\n";
+ }
+ print " (set (match_dup 2)\n";
+ print " (compare:${ccmode} (match_dup 0)\n";
+ print " (match_dup 3)))]\n";
+ print " \"\"\n";
+ print " [(set_attr \"type\" \"load\")\n";
+ print " (set_attr \"cost\" \"8\")\n";
+ print " (set_attr \"length\" \"8\")])\n";
+ print "\n";
+ }
+ }
+ }
+}
+
+
+gen_ld_cmpi_p10();
+
+exit(0);
+
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 9ad5ae67302..78de8102f44 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -297,6 +297,11 @@
(and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 1)")))
+;; Match op = -1, op = 0, or op = 1.
+(define_predicate "const_m1_to_1_operand"
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
+
;; Match op = 0..3.
(define_predicate "const_0_to_3_operand"
(and (match_code "const_int")
@@ -847,6 +852,15 @@
|| GET_CODE (XEXP (op, 0)) == PRE_DEC
|| GET_CODE (XEXP (op, 0)) == PRE_MODIFY))"))
+;; Anything that matches memory_operand but does not update the address.
+(define_predicate "non_update_memory_operand"
+ (match_code "mem")
+{
+ if (update_address_mem (op, mode))
+ return 0;
+ return memory_operand (op, mode);
+})
+
;; Return 1 if the operand is a MEM with an indexed-form address.
(define_special_predicate "indexed_address_mem"
(match_test "(MEM_P (op)
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index 8d2c1ffd6cf..3e65289d8df 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -82,7 +82,9 @@
#define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \
| OPTION_MASK_POWER10 \
- | OTHER_POWER10_MASKS)
+ | OTHER_POWER10_MASKS \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI)
/* Flags that need to be turned off if -mno-power9-vector. */
#define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW \
@@ -129,6 +131,8 @@
| OPTION_MASK_FLOAT128_KEYWORD \
| OPTION_MASK_FPRND \
| OPTION_MASK_POWER10 \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI \
| OPTION_MASK_HTM \
| OPTION_MASK_ISEL \
| OPTION_MASK_MFCRF \
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 3c4682b0e26..cd644083558 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -191,6 +191,8 @@ enum non_prefixed_form {
extern enum insn_form address_to_insn_form (rtx, machine_mode,
enum non_prefixed_form);
+extern bool address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefix_format);
extern bool prefixed_load_p (rtx_insn *);
extern bool prefixed_store_p (rtx_insn *);
extern bool prefixed_paddi_p (rtx_insn *);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 517467ebc63..759551d07ec 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4423,6 +4423,12 @@ rs6000_option_override_internal (bool global_init_p)
if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
rs6000_isa_flags |= OPTION_MASK_MMA;
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
+
/* Turn off vector pair/mma options on non-power10 systems. */
else if (!TARGET_POWER10 && TARGET_MMA)
{
@@ -23614,6 +23620,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "power9-minmax", OPTION_MASK_P9_MINMAX, false, true },
{ "power9-misc", OPTION_MASK_P9_MISC, false, true },
{ "power9-vector", OPTION_MASK_P9_VECTOR, false, true },
+ { "power10-fusion", OPTION_MASK_P10_FUSION, false, true },
{ "powerpc-gfxopt", OPTION_MASK_PPC_GFXOPT, false, true },
{ "powerpc-gpopt", OPTION_MASK_PPC_GPOPT, false, true },
{ "prefixed", OPTION_MASK_PREFIXED, false, true },
@@ -25705,6 +25712,50 @@ address_to_insn_form (rtx addr,
return INSN_FORM_BAD;
}
+/* Given address rtx ADDR for a load of MODE, is this legitimate for a
+ non-prefixed D-form or X-form instruction? NON_PREFIXED_FORMAT is
+ given NON_PREFIXED_D or NON_PREFIXED_DS to indicate whether we want
+ a D-form or DS-form instruction. X-form and base_reg are always
+ allowed. */
+bool
+address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefixed_format)
+{
+ enum insn_form result_form;
+
+ result_form = address_to_insn_form (addr, mode, non_prefixed_format);
+
+ switch (non_prefixed_format)
+ {
+ case NON_PREFIXED_D:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_D:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ break;
+ }
+ break;
+ case NON_PREFIXED_DS:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ break;
+ }
+ break;
+ default:
+ break;
+ }
+ return false;
+}
+
/* Helper function to see if we're potentially looking at lfs/stfs.
- PARALLEL containing a SET and a CLOBBER
- stfs:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 5bf9c83fc1e..307c0b200bd 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
#define MASK_UPDATE OPTION_MASK_UPDATE
#define MASK_VSX OPTION_MASK_VSX
#define MASK_POWER10 OPTION_MASK_POWER10
+#define MASK_P10_FUSION OPTION_MASK_P10_FUSION
#ifndef IN_LIBGCC2
#define MASK_POWERPC64 OPTION_MASK_POWERPC64
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b89990f46bf..c39b7098978 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -14926,3 +14926,4 @@
(include "dfp.md")
(include "crypto.md")
(include "htm.md")
+(include "fusion.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 2888172cb27..008a318b98d 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -479,6 +479,14 @@ mpower8-vector
Target Report Mask(P8_VECTOR) Var(rs6000_isa_flags)
Use vector and scalar instructions added in ISA 2.07.
+mpower10-fusion
+Target Report Mask(P10_FUSION) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
+mpower10-fusion-ld-cmpi
+Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
mcrypto
Target Report Mask(CRYPTO) Var(rs6000_isa_flags)
Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 1ddb5729cb2..bcc71a9e21b 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -47,6 +47,9 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c
$(COMPILE) $<
$(POSTCOMPILE)
+$(srcdir)/config/rs6000/fusion.md: $(srcdir)/config/rs6000/genfusion.pl
+ $(srcdir)/config/rs6000/genfusion.pl > $(srcdir)/config/rs6000/fusion.md
+
$(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
$(srcdir)/config/rs6000/rs6000-cpus.def
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
@@ -86,4 +89,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
$(srcdir)/config/rs6000/mma.md \
$(srcdir)/config/rs6000/crypto.md \
$(srcdir)/config/rs6000/htm.md \
- $(srcdir)/config/rs6000/dfp.md
+ $(srcdir)/config/rs6000/dfp.md \
+ $(srcdir)/config/rs6000/fusion.md
^ permalink raw reply [flat|nested] 7+ messages in thread
* [gcc(refs/users/acsawdey/heads/fusion-combine)] Combine patterns for p10 load-cmpi fusion
@ 2020-12-04 19:16 Aaron Sawdey
0 siblings, 0 replies; 7+ messages in thread
From: Aaron Sawdey @ 2020-12-04 19:16 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:77e48392285c15c85a95f116f5b32cd051086710
commit 77e48392285c15c85a95f116f5b32cd051086710
Author: Aaron Sawdey <acsawdey@linux.ibm.com>
Date: Mon Sep 28 11:15:46 2020 -0500
Combine patterns for p10 load-cmpi fusion
This patch adds the first batch of patterns to support p10 fusion. These
will allow combine to create a single insn for a pair of instructions
that that power10 can fuse and execute. These particular ones have the
requirement that only cr0 can be used when fusing a load with a compare
immediate of -1/0/1 (if signed) or 0/1 (if unsigned), so we want combine
to put that requirement in, and if it doesn't work out later the splitter
can get used.
The patterns are generated by a script genfusion.pl and live in new file
fusion.md. This script will be expanded to generate more patterns for
fusion.
This also adds option -mpower10-fusion which defaults on for power10 and
will gate all these fusion patterns. In addition I have added an
undocumented option -mpower10-fusion-ld-cmpi (which may be removed later)
that just controls the load+compare-immediate patterns. I have make
these default on for power10 but they are not disallowed for earlier
processors because it is still valid code. This allows us to test the
correctness of fusion code generation by turning it on explicitly.
gcc/ChangeLog:
* config/rs6000/genfusion.pl: New file, script to generate
define_insn_and_split patterns so combine can arrange fused
instructions next to each other.
* config/rs6000/fusion.md: New file, generated fused instruction
patterns for combine.
* config/rs6000/predicates.md (const_m1_to_1_operand): New predicate.
(non_update_memory_operand): New predicate.
* config/rs6000/rs6000-cpus.def: Add OPTION_MASK_P10_FUSION and
OPTION_MASK_P10_FUSION_LD_CMPI to ISA_3_1_MASKS_SERVER and
POWERPC_MASKS.
* config/rs6000/rs6000-protos.h (address_is_non_pfx_d_or_x): Add
prototype.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
automatically set -mpower10-fusion and -mpower10-fusion-ld-cmpi
if target is power10. (rs600_opt_masks): Allow -mpower10-fusion
in function attributes. (address_is_non_pfx_d_or_x): New function.
* config/rs6000/rs6000.h: Add MASK_P10_FUSION.
* config/rs6000/rs6000.md: Include fusion.md.
* config/rs6000/rs6000.opt: Add -mpower10-fusion
and -mpower10-fusion-ld-cmpi.
* config/rs6000/t-rs6000: Add dependencies involving fusion.md.
Diff:
---
gcc/config/rs6000/fusion.md | 357 ++++++++++++++++++++++++++++++++++++++
gcc/config/rs6000/genfusion.pl | 144 +++++++++++++++
gcc/config/rs6000/predicates.md | 14 ++
gcc/config/rs6000/rs6000-cpus.def | 6 +-
gcc/config/rs6000/rs6000-protos.h | 2 +
gcc/config/rs6000/rs6000.c | 51 ++++++
gcc/config/rs6000/rs6000.h | 1 +
gcc/config/rs6000/rs6000.md | 1 +
gcc/config/rs6000/rs6000.opt | 8 +
gcc/config/rs6000/t-rs6000 | 6 +-
10 files changed, 588 insertions(+), 2 deletions(-)
diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
new file mode 100644
index 00000000000..a4d3a6ae7f3
--- /dev/null
+++ b/gcc/config/rs6000/fusion.md
@@ -0,0 +1,357 @@
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:DI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CC extend is none
+(define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is DI result mode is DI compare mode is CCUNS extend is none
+(define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "m")
+ (match_operand:DI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "ld%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), DImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is clobber compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:SI 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CC extend is none
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is SI compare mode is CCUNS extend is none
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (match_dup 1))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwa%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_DS))"
+ [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
+ (match_operand:SI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (zero_extend:EXTSI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lwz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), SImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTSI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
+(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
+ [(set (match_operand:CC 2 "cc_reg_operand" "=x")
+ (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_m1_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (sign_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lha%X1 %0,%1\;cmpdi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (sign_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CC (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
+(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
+ (match_operand:HI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:EXTHI 0 "gpc_reg_operand" "=r") (zero_extend:EXTHI (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lhz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), HImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:EXTHI (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is clobber compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (clobber (match_scratch:GPR 0 "=r"))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
+;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
+;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
+(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
+ [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
+ (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
+ (match_operand:QI 3 "const_0_to_1_operand" "n")))
+ (set (match_operand:GPR 0 "gpc_reg_operand" "=r") (zero_extend:GPR (match_dup 1)))]
+ "(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)"
+ "lbz%X1 %0,%1\;cmpldi 0,%0,%3"
+ "&& reload_completed
+ && (cc_reg_not_cr0_operand (operands[2], CCmode)
+ || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), QImode, NON_PREFIXED_D))"
+ [(set (match_dup 0) (zero_extend:GPR (match_dup 1)))
+ (set (match_dup 2)
+ (compare:CCUNS (match_dup 0)
+ (match_dup 3)))]
+ ""
+ [(set_attr "type" "load")
+ (set_attr "cost" "8")
+ (set_attr "length" "8")])
+
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
new file mode 100755
index 00000000000..494537c9439
--- /dev/null
+++ b/gcc/config/rs6000/genfusion.pl
@@ -0,0 +1,144 @@
+#!/usr/bin/perl -w
+# Generate fusion.md
+# Copyright (C) 2020 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+#
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3. If not see
+# <http://www.gnu.org/licenses/>.
+
+my $copyright = <<'EOF';
+;; -*- buffer-read-only: t -*-
+;; Generated automatically by genfusion.pl
+
+;; Copyright (C) 2020 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it under
+;; the terms of the GNU General Public License as published by the Free
+;; Software Foundation; either version 3, or (at your option) any later
+;; version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+;; WARRANTY; without even the implied warranty of MERCHANTABILITY or
+;; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+;; for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3. If not see
+;; <http://www.gnu.org/licenses/>.
+
+EOF
+
+print $copyright;
+
+sub mode_to_ldst_char
+{
+ my ($mode) = @_;
+ if ($mode eq 'DI') { return 'd'; }
+ if ($mode eq 'SI') { return 'w'; }
+ if ($mode eq 'HI') { return 'h'; }
+ if ($mode eq 'QI') { return 'b'; }
+ return '?';
+}
+
+sub gen_ld_cmpi_p10
+{
+ LMODE: foreach $lmode ('DI','SI','HI','QI') {
+ $ldst = mode_to_ldst_char($lmode);
+ $clobbermode = $lmode;
+ # For clobber, we need a SI/DI reg in case we split because we have to sign/zero extend.
+ if ( $lmode eq 'HI' || $lmode eq 'QI' ) { $clobbermode = "GPR"; }
+ RESULT: foreach $result ('clobber', $lmode, "EXT".$lmode) {
+ # EXTDI does not exist, and we cannot directly produce HI/QI results.
+ next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI";
+ # Don't allow EXTQI because that would allow HI result which we can't do.
+ if ( $result eq "EXTQI" ) { $result = "GPR"; }
+ CCMODE: foreach $ccmode ('CC','CCUNS') {
+ $np = "NON_PREFIXED_D";
+ if ( $ccmode eq 'CC' ) {
+ next CCMODE if $lmode eq 'QI';
+ if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
+ # ld and lwa are both DS-FORM.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "";
+ $echr = "a";
+ $constpred = "const_m1_to_1_operand";
+ } else {
+ if ( $lmode eq 'DI' ) {
+ # ld is DS-form, but lwz is not.
+ $np = "NON_PREFIXED_DS";
+ }
+ $cmpl = "l";
+ $echr = "z";
+ $constpred = "const_0_to_1_operand";
+ }
+ if ($lmode eq 'DI') { $echr = ""; }
+ if ($result =~ m/EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') {
+ # We always need extension if result > lmode.
+ if ( $ccmode eq 'CC' ) {
+ $extend = "sign";
+ } else {
+ $extend = "zero";
+ }
+ } else {
+ # Result of SI/DI does not need sign extension.
+ $extend = "none";
+ }
+ print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
+ print ";; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend\n";
+
+ print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
+ print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
+ print " (compare:${ccmode} (match_operand:${lmode} 1 \"non_update_memory_operand\" \"m\")\n";
+ print " (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
+ if ($result eq 'clobber') {
+ print " (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
+ } elsif ($result eq $lmode) {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (match_dup 1))]\n";
+ } else {
+ print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n";
+ }
+ print " \"(TARGET_P10_FUSION && TARGET_P10_FUSION_LD_CMPI)\"\n";
+ print " \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di 0,%0,%3\"\n";
+ print " \"&& reload_completed\n";
+ print " && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
+ print " || !address_is_non_pfx_d_or_x (XEXP (operands[1],0), ${lmode}mode, ${np}))\"\n";
+ if ($extend eq "none") {
+ print " [(set (match_dup 0) (match_dup 1))\n";
+ } else {
+ $resultmode = $result;
+ if ( $result eq 'clobber' ) { $resultmode = $clobbermode }
+ print " [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 1)))\n";
+ }
+ print " (set (match_dup 2)\n";
+ print " (compare:${ccmode} (match_dup 0)\n";
+ print " (match_dup 3)))]\n";
+ print " \"\"\n";
+ print " [(set_attr \"type\" \"load\")\n";
+ print " (set_attr \"cost\" \"8\")\n";
+ print " (set_attr \"length\" \"8\")])\n";
+ print "\n";
+ }
+ }
+ }
+}
+
+
+gen_ld_cmpi_p10();
+
+exit(0);
+
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 9ad5ae67302..78de8102f44 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -297,6 +297,11 @@
(and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 0, 1)")))
+;; Match op = -1, op = 0, or op = 1.
+(define_predicate "const_m1_to_1_operand"
+ (and (match_code "const_int")
+ (match_test "IN_RANGE (INTVAL (op), -1, 1)")))
+
;; Match op = 0..3.
(define_predicate "const_0_to_3_operand"
(and (match_code "const_int")
@@ -847,6 +852,15 @@
|| GET_CODE (XEXP (op, 0)) == PRE_DEC
|| GET_CODE (XEXP (op, 0)) == PRE_MODIFY))"))
+;; Anything that matches memory_operand but does not update the address.
+(define_predicate "non_update_memory_operand"
+ (match_code "mem")
+{
+ if (update_address_mem (op, mode))
+ return 0;
+ return memory_operand (op, mode);
+})
+
;; Return 1 if the operand is a MEM with an indexed-form address.
(define_special_predicate "indexed_address_mem"
(match_test "(MEM_P (op)
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index 8d2c1ffd6cf..3e65289d8df 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -82,7 +82,9 @@
#define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \
| OPTION_MASK_POWER10 \
- | OTHER_POWER10_MASKS)
+ | OTHER_POWER10_MASKS \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI)
/* Flags that need to be turned off if -mno-power9-vector. */
#define OTHER_P9_VECTOR_MASKS (OPTION_MASK_FLOAT128_HW \
@@ -129,6 +131,8 @@
| OPTION_MASK_FLOAT128_KEYWORD \
| OPTION_MASK_FPRND \
| OPTION_MASK_POWER10 \
+ | OPTION_MASK_P10_FUSION \
+ | OPTION_MASK_P10_FUSION_LD_CMPI \
| OPTION_MASK_HTM \
| OPTION_MASK_ISEL \
| OPTION_MASK_MFCRF \
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 3c4682b0e26..cd644083558 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -191,6 +191,8 @@ enum non_prefixed_form {
extern enum insn_form address_to_insn_form (rtx, machine_mode,
enum non_prefixed_form);
+extern bool address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefix_format);
extern bool prefixed_load_p (rtx_insn *);
extern bool prefixed_store_p (rtx_insn *);
extern bool prefixed_paddi_p (rtx_insn *);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 517467ebc63..759551d07ec 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4423,6 +4423,12 @@ rs6000_option_override_internal (bool global_init_p)
if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_MMA) == 0)
rs6000_isa_flags |= OPTION_MASK_MMA;
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION;
+
+ if (TARGET_POWER10 && (rs6000_isa_flags_explicit & OPTION_MASK_P10_FUSION_LD_CMPI) == 0)
+ rs6000_isa_flags |= OPTION_MASK_P10_FUSION_LD_CMPI;
+
/* Turn off vector pair/mma options on non-power10 systems. */
else if (!TARGET_POWER10 && TARGET_MMA)
{
@@ -23614,6 +23620,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "power9-minmax", OPTION_MASK_P9_MINMAX, false, true },
{ "power9-misc", OPTION_MASK_P9_MISC, false, true },
{ "power9-vector", OPTION_MASK_P9_VECTOR, false, true },
+ { "power10-fusion", OPTION_MASK_P10_FUSION, false, true },
{ "powerpc-gfxopt", OPTION_MASK_PPC_GFXOPT, false, true },
{ "powerpc-gpopt", OPTION_MASK_PPC_GPOPT, false, true },
{ "prefixed", OPTION_MASK_PREFIXED, false, true },
@@ -25705,6 +25712,50 @@ address_to_insn_form (rtx addr,
return INSN_FORM_BAD;
}
+/* Given address rtx ADDR for a load of MODE, is this legitimate for a
+ non-prefixed D-form or X-form instruction? NON_PREFIXED_FORMAT is
+ given NON_PREFIXED_D or NON_PREFIXED_DS to indicate whether we want
+ a D-form or DS-form instruction. X-form and base_reg are always
+ allowed. */
+bool
+address_is_non_pfx_d_or_x (rtx addr, machine_mode mode,
+ enum non_prefixed_form non_prefixed_format)
+{
+ enum insn_form result_form;
+
+ result_form = address_to_insn_form (addr, mode, non_prefixed_format);
+
+ switch (non_prefixed_format)
+ {
+ case NON_PREFIXED_D:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_D:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ break;
+ }
+ break;
+ case NON_PREFIXED_DS:
+ switch (result_form)
+ {
+ case INSN_FORM_X:
+ case INSN_FORM_DS:
+ case INSN_FORM_BASE_REG:
+ return true;
+ default:
+ break;
+ }
+ break;
+ default:
+ break;
+ }
+ return false;
+}
+
/* Helper function to see if we're potentially looking at lfs/stfs.
- PARALLEL containing a SET and a CLOBBER
- stfs:
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 5bf9c83fc1e..307c0b200bd 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -539,6 +539,7 @@ extern int rs6000_vector_align[];
#define MASK_UPDATE OPTION_MASK_UPDATE
#define MASK_VSX OPTION_MASK_VSX
#define MASK_POWER10 OPTION_MASK_POWER10
+#define MASK_P10_FUSION OPTION_MASK_P10_FUSION
#ifndef IN_LIBGCC2
#define MASK_POWERPC64 OPTION_MASK_POWERPC64
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b89990f46bf..c39b7098978 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -14926,3 +14926,4 @@
(include "dfp.md")
(include "crypto.md")
(include "htm.md")
+(include "fusion.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 2888172cb27..008a318b98d 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -479,6 +479,14 @@ mpower8-vector
Target Report Mask(P8_VECTOR) Var(rs6000_isa_flags)
Use vector and scalar instructions added in ISA 2.07.
+mpower10-fusion
+Target Report Mask(P10_FUSION) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
+mpower10-fusion-ld-cmpi
+Target Undocumented Mask(P10_FUSION_LD_CMPI) Var(rs6000_isa_flags)
+Fuse certain integer operations together for better performance on power10.
+
mcrypto
Target Report Mask(CRYPTO) Var(rs6000_isa_flags)
Use ISA 2.07 Category:Vector.AES and Category:Vector.SHA2 instructions.
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index 1ddb5729cb2..bcc71a9e21b 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -47,6 +47,9 @@ rs6000-call.o: $(srcdir)/config/rs6000/rs6000-call.c
$(COMPILE) $<
$(POSTCOMPILE)
+$(srcdir)/config/rs6000/fusion.md: $(srcdir)/config/rs6000/genfusion.pl
+ $(srcdir)/config/rs6000/genfusion.pl > $(srcdir)/config/rs6000/fusion.md
+
$(srcdir)/config/rs6000/rs6000-tables.opt: $(srcdir)/config/rs6000/genopt.sh \
$(srcdir)/config/rs6000/rs6000-cpus.def
$(SHELL) $(srcdir)/config/rs6000/genopt.sh $(srcdir)/config/rs6000 > \
@@ -86,4 +89,5 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
$(srcdir)/config/rs6000/mma.md \
$(srcdir)/config/rs6000/crypto.md \
$(srcdir)/config/rs6000/htm.md \
- $(srcdir)/config/rs6000/dfp.md
+ $(srcdir)/config/rs6000/dfp.md \
+ $(srcdir)/config/rs6000/fusion.md
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-01-26 22:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-26 22:42 [gcc(refs/users/acsawdey/heads/fusion-combine)] Combine patterns for p10 load-cmpi fusion Aaron Sawdey
-- strict thread matches above, loose matches on Subject: below --
2021-01-26 22:48 Aaron Sawdey
2021-01-19 18:30 Aaron Sawdey
2021-01-05 18:54 Aaron Sawdey
2021-01-05 18:33 Aaron Sawdey
2020-12-11 2:51 Aaron Sawdey
2020-12-04 19:16 Aaron Sawdey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).