public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-08 20:23 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-08 20:23 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:0abc9422389b07deedc88e7e676b927e524d0efe

commit 0abc9422389b07deedc88e7e676b927e524d0efe
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Jun 8 16:23:19 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems with the current GCC:
    
        1)  The prefixed attribute was not checking insns with the type
            fused_load_cmpi for being load insns.  This meant if the load + compare
            fusion was not split, it would not know that the load should be
            prefixed.
    
        2)  The recognition of LWA for being prefixed looks at the "sign_extend"
            attribute and whether the register mode was different than the memory
            load (i.e. does it have a sign_extend wrapper around the load).
    
        3)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        4)  There is a lwa_operand predicate that should be used instead of
            ds_form_mem_operand for the LWA instruction.
    
    The main fix is to modify genfusion.pl that it sets the appropriate predicates
    and constraints.
    
    I also added support in genfusion.md so that if we are doing a LWA operation and
    just setting the CC bits (throwing away the result of the load after the
    comparison), it generates a LWZ instruction and does a CMPWI instead of CMPDI.
    This way those loads can use normal D-FORM restrictions instead of DS-form.
    
    I set the "sign_extend" attribute on the cases that generate LWA.
    
    I modified the "prefixed" attribute so that it also checks fused_load_cmpi.
    
    2023-06-07   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix constraints and
            predicates for LD and LWA.  Optimize LWA/CMPDI to generate LWZ/CMPWI if
            we don't need the result of the load after the comparison.  Set the
            sign_extend attribute for LWA instruction.  For LWA where we do not need
            the result of the load after the comparison, use LWZ and CMPWI.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/rs6000.md (prefixed attribute): Treat fused_load_cmpi
            insns as being load insns.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.
            * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust names and insn counts
            for PR target/105325 fix.

Diff:
---
 gcc/config/rs6000/fusion.md                        | 22 ++++++++-------
 gcc/config/rs6000/genfusion.pl                     | 33 ++++++++++++++++------
 gcc/config/rs6000/rs6000.md                        |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C        | 26 +++++++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c         | 15 ++++++----
 5 files changed, 73 insertions(+), 25 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..5b82c61e959 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -104,17 +104,17 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+(define_insn_and_split "*lwz_cmpwi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
-  "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+  "lwz%X1 %0,%1\;cmpwi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      SImode, NON_PREFIXED_D))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
@@ -148,7 +148,7 @@
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -163,6 +163,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -190,7 +191,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +206,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..310ac5f359a 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -57,14 +57,26 @@ sub gen_ld_cmpi_p10_one
 {
   my ($lmode, $result, $ccmode) = @_;
 
+  my $cmp_size = "d";
+  my $echr = ($ccmode eq "CC") ? "a" : "z";
+  if ($lmode eq "DI") { $echr = ""; }
   my $np = "NON_PREFIXED_D";
   my $mempred = "non_update_memory_operand";
   my $extend;
 
   if ($ccmode eq "CC") {
-    # ld and lwa are both DS-FORM.
-    ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    # if we would generate lwa and just want the CC value, generate lwz instead
+    # and use cmpwi.  This allows us to avoid the ds-form restrictions on lwa.
+    if ($lmode eq "SI" && $result eq "clobber") {
+      $echr = "z";
+      $cmp_size = "w";
+
+    } else {
+      # ld and lwa are both DS-FORM, use LWA_OPERAND for LWA
+      ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
+      ($lmode eq "DI") and $mempred = "ds_form_mem_operand";
+      ($lmode eq "SI") and $mempred = "lwa_operand";
+    }
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
@@ -74,8 +86,6 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $cmpl = ($ccmode eq "CC") ? "" : "l";
-  my $echr = ($ccmode eq "CC") ? "a" : "z";
-  if ($lmode eq "DI") { $echr = ""; }
   my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand"
   				    : "const_0_to_1_operand";
 
@@ -91,12 +101,13 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
-(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
+(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}${cmp_size}i_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -119,7 +130,7 @@ HERE
 
   print <<HERE;
   "(TARGET_P10_FUSION)"
-  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3"
+  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}${cmp_size}i %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
@@ -140,6 +151,12 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+  # prefixed_load_p looks at sign_extend to deal with lwa.
+  if ($mempred eq "lwa_operand") {
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+  print <<HERE;
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..ed2f06e56ac 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..29762ba3a47 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -60,19 +60,22 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    1 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_SI_CC_none"            8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"       2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"             0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_EXTHI_CC_sign"         4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"     2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_SI_CC_none"           36 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_clobber_CC_none"       8 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target ilp32 } } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-13 23:24 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-13 23:24 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:4070564c15613cbb757ebb9b2debd3508458aa0c

commit 4070564c15613cbb757ebb9b2debd3508458aa0c
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Jun 13 19:24:28 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    The important thing in the bug is that -fstack-protector is used, but it could
    potentially happen with fused load-compare to any stack location when the stack
    frame is larger than 32K without -fstack-protector.
    
    What happens is the initial insn that is created is:
    
    (insn 6 5 7 2 (parallel [
                (set (reg:CC 119)
                     (compare:CC (mem/c:SI (plus:DI (reg/f:DI 110 sfp)
                                                    (const_int -4))
                                 (const_int 0 [0])))
                (clobber (scratch:DI))
            ])
         (nil))
    
    After the stack size is finalized, the frame pointer removed, and the post
    reload phase is run, the insn is now:
    
    (insn 6 5 7 2 (parallel [
                (set (reg:CC 100 0 [119])
                     (compare:CC (mem/c:SI (plus:DI (reg/f:DI 1 1)
                                                    (const_int 40044))
                                 (const_int 0 [0])))
                (clobber (reg:DI 9 9 [120]))
            ])
         (nil))
    
    When the split2 pass is run after reload has finished the ds_form_mem_operand
    predicate that used for lwa and ld no longer returns true.  This means that
    since the operand predicates aren't recognized, it won't be split.
    
    The solution involves:
    
        1)  Don't use ds_form_mem_operand for ld and lwa, always use
            non_update_memory_operand.
    
        2)  Delete ds_form_mem_operand since it is no longer used.
    
        3)  Use the "YZ" constraints for ld/lwa instead of "m".
    
        4)  If we don't need to sign extend the lwa, convert it to lwz, and use
            cmpwi instead of cmpdi.  Adjust the insn name to reflect the code
            generate.
    
        5)  Insure that the insn using lwa will be recognized as having a prefixed
            operand (and hence the instruction length is 16 bytes instead of 8
            bytes).
    
            5a) Set the prefixed and maybe_prefix attributes to know that
                fused_load_cmpi are also load insns;
    
            5b) In the case where we are just setting CC and not using the memory
                afterward, set the clobber to use a DI register, and put an
                explicit sign_extend operation in the split;
    
            5c) Set the sign_extend attribute to "yes".
    
            5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to
                ensure that lwa is treated as a ds-form instruction and not as
                a d-form instruction (i.e. lwz).
    
        6)  Add a new test case for this case.
    
        7)  Adjust the insn counts in fusion-p10-ldcmpi.c.  Because we are no
            longer using ds_form_mem_operand, the ld and lwa instructions will fuse
            x-form (reg+reg) addresses in addition ds-form (reg+offset or reg).
    
    2023-06-12   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
            allowed prefixed lwa to be generated.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/predicates.md (ds_form_mem_operand): Delete.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for load
            plus compare immediate fused insns.
            (maybe_prefixed): Likewise.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.
            * gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn
            counts.

Diff:
---
 gcc/config/rs6000/fusion.md                        | 27 ++++++++--------
 gcc/config/rs6000/genfusion.pl                     | 36 +++++++++++++++++-----
 gcc/config/rs6000/predicates.md                    | 14 ---------
 gcc/config/rs6000/rs6000.md                        |  4 +--
 gcc/testsuite/g++.target/powerpc/pr105325.C        | 26 ++++++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c         | 16 ++++++----
 6 files changed, 81 insertions(+), 42 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..e286bf526a2 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "non_update_memory_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "non_update_memory_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -104,17 +104,17 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+(define_insn_and_split "*lwz_cmpwi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
-  "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+  "lwz%X1 %0,%1\;cmpwi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      SImode, NON_PREFIXED_D))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
@@ -146,17 +146,17 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is SI compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+(define_insn_and_split "*lwz_cmpwi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
-  "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+  "lwz%X1 %0,%1\;cmpwi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      SImode, NON_PREFIXED_D))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
@@ -190,7 +190,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +205,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..c8b232847a8 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -61,20 +61,30 @@ sub gen_ld_cmpi_p10_one
   my $mempred = "non_update_memory_operand";
   my $extend;
 
+  # We need to special case lwa.  The prefixed_load_p function in rs6000.cc
+  # (which determines if a load instruction is prefixed) uses the fact that the
+  # register mode is different from the memory mode, and that the sign_extend
+  # attribute is set to use DS-form rules for the address instead of D-form.
+  # If the register size is the same, prefixed_load_p assumes we are doing a
+  # lwz.  We change to use an lwz and word compare if we don't need to sign
+  # extend the SImode value.  Otherwise if we need the value, we need to
+  # make sure the insn is marked as ds-form.
+  my $lwa_insn = ($lmode eq "SI" && $ccmode eq "CC");
+  my $cmp_size = ($lwa_insn && $result !~ /^EXT|^DI$/) ? "w" : "d";
+
   if ($ccmode eq "CC") {
     # ld and lwa are both DS-FORM.
-    ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    ($lmode eq "DI") and $np = "NON_PREFIXED_DS";
+    ($lmode eq "SI" && $cmp_size eq "d") and $np = "NON_PREFIXED_DS";
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
       $np = "NON_PREFIXED_DS";
-      $mempred = "ds_form_mem_operand";
     }
   }
 
   my $cmpl = ($ccmode eq "CC") ? "" : "l";
-  my $echr = ($ccmode eq "CC") ? "a" : "z";
+  my $echr = ($ccmode eq "CC" && $cmp_size eq "d") ? "a" : "z";
   if ($lmode eq "DI") { $echr = ""; }
   my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand"
   				    : "const_0_to_1_operand";
@@ -91,12 +101,15 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+
+  # DS-form addresses need YZ, and not m.
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
-(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
+(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}${cmp_size}i_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -119,7 +132,7 @@ HERE
 
   print <<HERE;
   "(TARGET_P10_FUSION)"
-  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3"
+  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}${cmp_size}i %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
@@ -140,6 +153,15 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+
+  if ($lwa_insn && $cmp_size eq "d") {
+    # prefixed_load_p needs the sign_extend attribute to validate lwa as a
+    # DS-form instruction instead of D-form.
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+
+  print <<HERE
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index a16ee30f0c0..6b564837c6e 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1125,20 +1125,6 @@
   return INTVAL (offset) % 4 == 0;
 })
 
-;; Return 1 if the operand is a memory operand that has a valid address for
-;; a DS-form instruction. I.e. the address has to be either just a register,
-;; or register + const where the two low order bits of const are zero.
-(define_predicate "ds_form_mem_operand"
-  (match_code "subreg,mem")
-{
-  if (!any_memory_operand (op, mode))
-    return false;
-
-  rtx addr = XEXP (op, 0);
-
-  return address_to_insn_form (addr, mode, NON_PREFIXED_DS) == INSN_FORM_DS;
-})
-
 ;; Return 1 if the operand, used inside a MEM, is a SYMBOL_REF.
 (define_predicate "symbol_ref_operand"
   (and (match_code "symbol_ref")
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..75c5e5fc93d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -287,7 +287,7 @@
 ;; Whether this insn has a prefixed form and a non-prefixed form.
 (define_attr "maybe_prefixed" "no,yes"
   (if_then_else (eq_attr "type" "load,fpload,vecload,store,fpstore,vecstore,
-  				 integer,add")
+  				 integer,add,fused_load_cmpi")
 		(const_string "yes")
 		(const_string "no")))
 
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..165bd9a07ad 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -54,15 +54,17 @@ TEST(uint8_t)
 TEST(int8_t)
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"             4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         1 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"            24 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_SI_CC_none"            8 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
@@ -73,6 +75,8 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_SI_CC_none"           36 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target ilp32 } } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-13 17:21 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-13 17:21 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:816ce136d5776e4d82ac3beca98989ce58e77f1a

commit 816ce136d5776e4d82ac3beca98989ce58e77f1a
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Jun 13 13:21:15 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    The important thing in the bug is that -fstack-protector is used, but it could
    potentially happen with fused load-compare to any stack location when the stack
    frame is larger than 32K without -fstack-protector.
    
    What happens is the initial insn that is created is:
    
    (insn 6 5 7 2 (parallel [
                (set (reg:CC 119)
                     (compare:CC (mem/c:SI (plus:DI (reg/f:DI 110 sfp)
                                                    (const_int -4))
                                 (const_int 0 [0])))
                (clobber (scratch:DI))
            ])
         (nil))
    
    After the stack size is finalized, the frame pointer removed, and the post
    reload phase is run, the insn is now:
    
    (insn 6 5 7 2 (parallel [
                (set (reg:CC 100 0 [119])
                     (compare:CC (mem/c:SI (plus:DI (reg/f:DI 1 1)
                                                    (const_int 40044))
                                 (const_int 0 [0])))
                (clobber (reg:DI 9 9 [120]))
            ])
         (nil))
    
    When the split2 pass is run after reload has finished the ds_form_mem_operand
    predicate that used for lwa and ld no longer returns true.  This means that
    since the operand predicates aren't recognized, it won't be split.
    
    The solution involves:
    
        1)  Don't use ds_form_mem_operand for ld and lwa, always use
            non_update_memory_operand.
    
        2)  Delete ds_form_mem_operand since it is no longer use.
    
        3)  Use the "YZ" constraints for ld/lwa instead of "m".
    
        4)  Insure that the insn will be recognized as having a prefixed operand
            (and hence the instruction length is 16 bytes instead of 8 bytes).
    
            4a) Set the prefixed and maybe_prefix attributes to know that
                fused_load_cmpi are also load insns;
    
            4b) In the case where we are just setting CC and not using the memory
                afterward, set the clobber to use a DI register, and put an
                explicit sign_extend operation in the split;
    
            4c) Set the sign_extend attribute to "yes".
    
            4d) 4a-4c are the things that prefixed_load_p in rs6000.cc checks to
                ensure that lwa is treated as a ds-form instruction and not as
                a d-form instruction (i.e. lwz).
    
        5)  Add a new test case for this case.
    
        6)  Adjust the insn counts in fusion-p10-ldcmpi.c.  Because we are no
            longer using ds_form_mem_operand, the ld and lwa instructions will fuse
            x-form (reg+reg) addresses in addition ds-form (reg+offset or reg).
    
    2023-06-12   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
            allowed prefixed lwa to be generated.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/predicates.md (ds_form_mem_operand): Delete.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for load
            plus compare immediate fused insns.
            (maybe_prefixed): Likewise.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.
            * gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn
            counts.

Diff:
---
 gcc/config/rs6000/fusion.md                        | 23 ++++++++------
 gcc/config/rs6000/genfusion.pl                     | 37 +++++++++++++++++++---
 gcc/config/rs6000/predicates.md                    | 14 --------
 gcc/config/rs6000/rs6000.md                        |  4 +--
 gcc/testsuite/g++.target/powerpc/pr105325.C        | 26 +++++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c         | 14 ++++----
 6 files changed, 81 insertions(+), 37 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..9eefae22a1a 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "non_update_memory_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "non_update_memory_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -106,21 +106,22 @@
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
-   (clobber (match_scratch:SI 0 "=r"))]
+   (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
   "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
                                       SImode, NON_PREFIXED_DS))"
-  [(set (match_dup 0) (match_dup 1))
+  [(set (match_dup 0) (sign_extend:DI (match_dup 1)))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -148,7 +149,7 @@
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -157,12 +158,13 @@
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
                                       SImode, NON_PREFIXED_DS))"
-  [(set (match_dup 0) (match_dup 1))
+  [(set (match_dup 0) (sign_extend:DI (match_dup 1)))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -190,7 +192,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +207,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..31ee54aea93 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -61,15 +61,23 @@ sub gen_ld_cmpi_p10_one
   my $mempred = "non_update_memory_operand";
   my $extend;
 
+  # We need to special case lwa.  The prefixed_load_p function in rs6000.cc
+  # (which determines if a load instruction is prefixed) uses the fact that the
+  # register mode is different from the memory mode, and that the sign_extend
+  # attribute is set to use DS-form rules for the address instead of D-form.
+  # If the register size is the same, prefixed_load_p assumes we are doing a
+  # lwz.
+  my $lwa_insn = ($lmode eq "SI" && $ccmode eq "CC");
+
   if ($ccmode eq "CC") {
     # ld and lwa are both DS-FORM.
     ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+#   ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
       $np = "NON_PREFIXED_DS";
-      $mempred = "ds_form_mem_operand";
+ #    $mempred = "ds_form_mem_operand";
     }
   }
 
@@ -81,7 +89,9 @@ sub gen_ld_cmpi_p10_one
 
   # For clobber, we need a SI/DI reg in case we
   # split because we have to sign/zero extend.
-  my $clobbermode = ($lmode =~ /^[QH]I$/) ? "GPR" : $lmode;
+  my $clobbermode = (($lmode =~ /^[QH]I$/)
+		     ? "GPR"
+		     : ($lwa_insn ? "DI" : $lmode));
   if ($result =~ /^EXT/ || $result eq "GPR" || $clobbermode eq "GPR") {
     # We always need extension if result > lmode.
     $extend = ($ccmode eq "CC") ? "sign" : "zero";
@@ -91,12 +101,15 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+
+  # DS-form addresses need YZ, and not m.
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
 (define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -126,7 +139,12 @@ HERE
                                       ${lmode}mode, ${np}))"
 HERE
 
-  if ($extend eq "none") {
+  # prefixed_load_p needs to see the register mode being different than the
+  # memory insn in order to validate lwa as a DS-form instruction and not a
+  # D-form instruction.
+  if ($lwa_insn && $extend eq "none") {
+    print "  [(set (match_dup 0) (sign_extend:${clobbermode} (match_dup 1)))\n";
+  } elsif ($extend eq "none") {
     print "  [(set (match_dup 0) (match_dup 1))\n";
   } elsif ($result eq "clobber") {
     print "  [(set (match_dup 0) (${extend}_extend:${clobbermode} (match_dup 1)))\n";
@@ -140,6 +158,15 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+
+  if ($lwa_insn) {
+    # prefixed_load_p needs the sign_extend attribute to validate lwa as a
+    # DS-form instruction instead of D-form.
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+
+  print <<HERE
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index a16ee30f0c0..6b564837c6e 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1125,20 +1125,6 @@
   return INTVAL (offset) % 4 == 0;
 })
 
-;; Return 1 if the operand is a memory operand that has a valid address for
-;; a DS-form instruction. I.e. the address has to be either just a register,
-;; or register + const where the two low order bits of const are zero.
-(define_predicate "ds_form_mem_operand"
-  (match_code "subreg,mem")
-{
-  if (!any_memory_operand (op, mode))
-    return false;
-
-  rtx addr = XEXP (op, 0);
-
-  return address_to_insn_form (addr, mode, NON_PREFIXED_DS) == INSN_FORM_DS;
-})
-
 ;; Return 1 if the operand, used inside a MEM, is a SYMBOL_REF.
 (define_predicate "symbol_ref_operand"
   (and (match_code "symbol_ref")
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..75c5e5fc93d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -287,7 +287,7 @@
 ;; Whether this insn has a prefixed form and a non-prefixed form.
 (define_attr "maybe_prefixed" "no,yes"
   (if_then_else (eq_attr "type" "load,fpload,vecload,store,fpstore,vecstore,
-  				 integer,add")
+  				 integer,add,fused_load_cmpi")
 		(const_string "yes")
 		(const_string "no")))
 
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..3efbb34f2b4 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -54,14 +54,14 @@ TEST(uint8_t)
 TEST(int8_t)
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"             4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         1 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"            24 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
 
@@ -73,6 +73,8 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_SI_CC_none"           36 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        0 { target ilp32 } } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-13  2:57 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-13  2:57 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:1e79bbf02e6321874f28e84fdcaa8a2cf94c514d

commit 1e79bbf02e6321874f28e84fdcaa8a2cf94c514d
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Jun 12 22:56:55 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems with the current GCC:
    
        1)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        2)  The rules for automatically setting the prefixed attribute were not
            checking that these fused load and compare immediate fusion operations
            might have prefixed addresses.
    
    The fix is to modify genfusion.pl that it sets the "YZ" constraint instead of
    "m" for the ld and lwa instructions.
    
    This patch also modifies the prefixed and maybe_prefixed attributes so that they
    check load + compare immediate instructions for be a load instruction.  The
    patch also modifies genfusion.pl so that the lwa_cmp* insns also sets things up
    so that the prefixed_load_p function declares the address to be prefixed.  These
    modifications include using a DImode scratch register instead of SImode, and
    setting the "sign_extend" attribute.
    
    I also added a test case for this condition.
    
    2023-06-12   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
            allowed prefixed lwa to be generated.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for load
            plus compare immediate fused insns.
            (maybe_prefixed): Likewise.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.

Diff:
---
 gcc/config/rs6000/fusion.md                 | 21 ++++++++++++---------
 gcc/config/rs6000/genfusion.pl              | 29 +++++++++++++++++++++++++++--
 gcc/config/rs6000/rs6000.md                 |  4 ++--
 gcc/testsuite/g++.target/powerpc/pr105325.C | 26 ++++++++++++++++++++++++++
 4 files changed, 67 insertions(+), 13 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..fdf710bdfc7 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -106,7 +106,7 @@
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -115,12 +115,13 @@
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
                                       SImode, NON_PREFIXED_DS))"
-  [(set (match_dup 0) (match_dup 1))
+  [(set (match_dup 0) (sign_extend:SI (match_dup 1)))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -148,7 +149,7 @@
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -157,12 +158,13 @@
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
                                       SImode, NON_PREFIXED_DS))"
-  [(set (match_dup 0) (match_dup 1))
+  [(set (match_dup 0) (sign_extend:SI (match_dup 1)))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -190,7 +192,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +207,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..1a8419d41ef 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -61,6 +61,14 @@ sub gen_ld_cmpi_p10_one
   my $mempred = "non_update_memory_operand";
   my $extend;
 
+  # We need to special case lwa.  The prefixed_load_p function in rs6000.cc
+  # (which determines if a load instruction is prefixed) uses the fact that the
+  # register mode is different from the memory mode, and that the sign_extend
+  # attribute is set to use DS-form rules for the address instead of D-form.
+  # If the register size is the same, prefixed_load_p assumes we are doing a
+  # lwz.
+  my $lwa_insn = ($lmode eq "SI" && $ccmode eq "CC");
+
   if ($ccmode eq "CC") {
     # ld and lwa are both DS-FORM.
     ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
@@ -91,12 +99,15 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+
+  # DS-form addresses need YZ, and not m.
+  my $constraint = ($mempred eq "ds_form_mem_operand") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
 (define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -126,7 +137,12 @@ HERE
                                       ${lmode}mode, ${np}))"
 HERE
 
-  if ($extend eq "none") {
+  # prefixed_load_p needs to see the register mode being different than the
+  # memory insn in order to validate lwa as a DS-form instruction and not a
+  # D-form instruction.
+  if ($lwa_insn && $extend eq "none") {
+    print "  [(set (match_dup 0) (sign_extend:${clobbermode} (match_dup 1)))\n";
+  } elsif ($extend eq "none") {
     print "  [(set (match_dup 0) (match_dup 1))\n";
   } elsif ($result eq "clobber") {
     print "  [(set (match_dup 0) (${extend}_extend:${clobbermode} (match_dup 1)))\n";
@@ -140,6 +156,15 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+
+  if ($lwa_insn) {
+    # prefixed_load_p needs the sign_extend attribute to validate lwa as a
+    # DS-form instruction instead of D-form.
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+
+  print <<HERE
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..75c5e5fc93d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -287,7 +287,7 @@
 ;; Whether this insn has a prefixed form and a non-prefixed form.
 (define_attr "maybe_prefixed" "no,yes"
   (if_then_else (eq_attr "type" "load,fpload,vecload,store,fpstore,vecstore,
-  				 integer,add")
+  				 integer,add,fused_load_cmpi")
 		(const_string "yes")
 		(const_string "no")))
 
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-09 21:04 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-09 21:04 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:838ca32dc66c8587ff3730fdb71b07392ed046f6

commit 838ca32dc66c8587ff3730fdb71b07392ed046f6
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Jun 9 17:03:42 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems with the current GCC:
    
        1)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        2)  The calls to address_is_non_pfx_d_or_x doesn't work with lwa using
            SImode as the mode.  You need to pass in DImode instead of SImode.  This
            is to allow lwz to be treated differently than lwa.
    
        3)  The rules for automatically setting the prefixed attribute were not
            checking that these fused load and compare immediate fusion operations
            might have prefixed addresses.
    
        4)  We should use lwa_operand for lwa instead of ds_form_mem_operand.  The
            lwa_operand has some additional checks for the lwa instruction.
    
    The fix is to modify genfusion.pl that it sets the "YZ" constraint instead of
    "m" for the ld and lwa instructions.
    
    This patch also passes DImode to the address_is_non_pfx_d_or_x function for the
    lwa instruction so that it properly cheks for DS form restrictions when
    splitting the insn because it won't fuse.
    
    This patch also modifies the prefixed attribute so that it checks load + compare
    immediate instructions for be a load instruction.  This means that the
    lwa_cmpdi_cr0_SI_clobber_CC_none insn must ask for a DImode scratch register
    instead of SImode scratch register and also set the sign_extend attribute.  The
    code for checking if a load is a lwa expects that the register is DImode, the
    memory is SImode, and the sign_extend attribute is set.
    
    I also added a test case for this condition.
    
    2023-06-08   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
            allowed prefixed lwa to be generated.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for load
            plus compare immediate fused insns.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.

Diff:
---
 gcc/config/rs6000/fusion.md                 | 39 ++++++++++++++++-------------
 gcc/config/rs6000/genfusion.pl              | 26 ++++++++++++++++---
 gcc/config/rs6000/rs6000.md                 |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C | 26 +++++++++++++++++++
 4 files changed, 71 insertions(+), 22 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..be1f6384c5d 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -103,24 +103,25 @@
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
-;; load mode is SI result mode is clobber compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+;; load mode is SI result mode is clobber compare mode is CC extend is sign
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
-   (clobber (match_scratch:SI 0 "=r"))]
+   (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
   "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
-  [(set (match_dup 0) (match_dup 1))
+                                      DImode, NON_PREFIXED_DS))"
+  [(set (match_dup 0) (sign_extend:DI (match_dup 1)))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+    (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -145,10 +146,10 @@
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
-;; load mode is SI result mode is SI compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+;; load mode is SI result mode is SI compare mode is CC extend is sign
+(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -156,13 +157,14 @@
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
-  [(set (match_dup 0) (match_dup 1))
+                                      DImode, NON_PREFIXED_DS))"
+  [(set (match_dup 0) (sign_extend:SI (match_dup 1)))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+    (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -190,7 +192,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -198,13 +200,14 @@
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      DImode, NON_PREFIXED_DS))"
   [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+    (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -247,6 +250,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+    (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -289,6 +293,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+    (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..7016f9c7512 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -59,12 +59,17 @@ sub gen_ld_cmpi_p10_one
 
   my $np = "NON_PREFIXED_D";
   my $mempred = "non_update_memory_operand";
+  my $lmode_prefix_call = $lmode;
   my $extend;
 
   if ($ccmode eq "CC") {
-    # ld and lwa are both DS-FORM.
+    # ld and lwa are both DS-FORM.  lwa needs to pass DImode to
+    # address_is_non_pfx_d_or_x so that it is properly treated as a DS mode
+    # instruction.
     ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    ($lmode eq "DI") and $mempred = "ds_form_mem_operand";
+    ($lmode eq "SI") and $mempred = "lwa_operand";
+    ($lmode eq "SI") and $lmode_prefix_call = "DI";
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
@@ -81,22 +86,28 @@ sub gen_ld_cmpi_p10_one
 
   # For clobber, we need a SI/DI reg in case we
   # split because we have to sign/zero extend.
+  # for lwa, we need a DImode temporary to properly signal that
+  # lwa is a ds-form instruction.
   my $clobbermode = ($lmode =~ /^[QH]I$/) ? "GPR" : $lmode;
   if ($result =~ /^EXT/ || $result eq "GPR" || $clobbermode eq "GPR") {
     # We always need extension if result > lmode.
     $extend = ($ccmode eq "CC") ? "sign" : "zero";
+  } elsif ($ccmode eq "CC" && $lmode eq "SI") {
+     $clobbermode = "DI";
+     $extend = "sign";
   } else {
     # Result of SI/DI does not need sign extension.
     $extend = "none";
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
 (define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -123,7 +134,7 @@ HERE
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      ${lmode}mode, ${np}))"
+                                      ${lmode_prefix_call}mode, ${np}))"
 HERE
 
   if ($extend eq "none") {
@@ -140,6 +151,13 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+
+  if ($ccmode eq "CC" && $lmode ne "DI") {
+    print "    (set_attr \"sign_extend\" \"yes\")\n";
+  }
+
+  print <<HERE
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..ed2f06e56ac 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-09  6:08 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-09  6:08 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:7c642fe4c35062120128089a5665796b7e64cb84

commit 7c642fe4c35062120128089a5665796b7e64cb84
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Jun 9 02:08:24 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems with the current GCC:
    
        1)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        2)  The calls to address_is_non_pfx_d_or_x doesn't work with lwa using
            SImode as the mode.  You need to pass in DImode instead of SImode.  This
            is to allow lwz to be treated differently than lwa.
    
    The fix is to modify genfusion.pl that it sets the "YZ" constraint instead of
    "m" for the ld and lwa instructions.  And to pass DImode to the
    address_is_non_pfx_d_or_x function for lwa.
    
    2023-06-08   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
            allowed prefixed lwa to be generated.
            * config/rs6000/fusion.md: Regenerate.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.

Diff:
---
 gcc/config/rs6000/fusion.md                 | 25 +++++++++++++++----------
 gcc/config/rs6000/genfusion.pl              | 17 ++++++++++++++---
 gcc/testsuite/g++.target/powerpc/pr105325.C | 26 ++++++++++++++++++++++++++
 3 files changed, 55 insertions(+), 13 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..20af9b66fcf 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -106,7 +106,7 @@
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -114,13 +114,14 @@
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      DImode, NON_PREFIXED_DS))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -148,7 +149,7 @@
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -156,13 +157,14 @@
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      DImode, NON_PREFIXED_DS))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -190,7 +192,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -198,13 +200,14 @@
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      DImode, NON_PREFIXED_DS))"
   [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -247,6 +250,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -289,6 +293,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..622ef211b4c 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -59,12 +59,15 @@ sub gen_ld_cmpi_p10_one
 
   my $np = "NON_PREFIXED_D";
   my $mempred = "non_update_memory_operand";
+  my $lmode_prefix_call = $lmode;
   my $extend;
 
   if ($ccmode eq "CC") {
-    # ld and lwa are both DS-FORM.
+    # ld and lwa are both DS-FORM.  lwa needs to use lwa_operand to properly
+    # reject prefixed addresses.
     ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
     ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    ($lmode eq "SI") and $lmode_prefix_call = "DI";
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
@@ -91,12 +94,13 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
 (define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -123,7 +127,7 @@ HERE
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      ${lmode}mode, ${np}))"
+                                      ${lmode_prefix_call}mode, ${np}))"
 HERE
 
   if ($extend eq "none") {
@@ -140,6 +144,13 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+
+  if ($ccmode eq "CC" && $lmode ne "DI") {
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+
+  print <<HERE;
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-09  4:17 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-09  4:17 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:5a6a3d7f0a6ccc1ddf64f76ff68329d2be949da1

commit 5a6a3d7f0a6ccc1ddf64f76ff68329d2be949da1
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Jun 9 00:16:48 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems with the current GCC:
    
        1)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        2)  The normal calls to address_is_non_pfx_d_or_x doesn't work with lwa,
            because you need to pass in DImode instead of SImode.  It is simplified
            if you use the lwa_operand predicate instead of ds_form_mem_operand.
    
    The fix is to modify genfusion.pl that it sets the "YZ" constraint instead of
    "m" for the ld and lwa instructions.
    
    I modified the genfusion.pl code use to use lwa_operand for the lwa instruction
    to prevent prefixed addresses at combine time.
    
    2023-06-08   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
            allowed prefixed lwa to be generated.
            * config/rs6000/fusion.md: Regenerate.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.

Diff:
---
 gcc/config/rs6000/fusion.md                 | 19 ++++++++++++-------
 gcc/config/rs6000/genfusion.pl              | 16 +++++++++++++---
 gcc/testsuite/g++.target/powerpc/pr105325.C | 26 ++++++++++++++++++++++++++
 3 files changed, 51 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..bdcb2aba4fe 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -106,7 +106,7 @@
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -121,6 +121,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -148,7 +149,7 @@
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -163,6 +164,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -190,7 +192,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +207,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -247,6 +250,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -289,6 +293,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..44a2e2ab4c3 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -62,9 +62,11 @@ sub gen_ld_cmpi_p10_one
   my $extend;
 
   if ($ccmode eq "CC") {
-    # ld and lwa are both DS-FORM.
+    # ld and lwa are both DS-FORM.  lwa needs to use lwa_operand to properly
+    # reject prefixed addresses.
     ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    ($lmode eq "DI") and $mempred = "ds_form_mem_operand";
+    ($lmode eq "SI") and $mempred = "lwa_operand";
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
@@ -91,12 +93,13 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
 (define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -140,6 +143,13 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+
+  if ($ccmode eq "CC" && $lmode ne "DI") {
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+
+  print <<HERE;
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-09  1:32 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-09  1:32 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:58463f8c025a5473e187cb1d9149cd27f71fb4eb

commit 58463f8c025a5473e187cb1d9149cd27f71fb4eb
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Jun 8 21:32:01 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems with the current GCC:
    
        1)  The prefixed attribute was not checking insns with the type
            fused_load_cmpi for being load insns.  This meant if the load + compare
            fusion was not split, the insn size might be off, and the wrong
            instruction might be generated.
    
        2)  The code in prefixed_load_p looks at the "sign_extend" attribute and
            whether the register mode was different than the memory for SImode loads
            to detect LWA instructions.
    
        3)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        4)  The code to determine whether to split the insn if it has a prefixed
            address was passing in SImode.  For LWA, we need to pass in DImode and
            not SImode to properly determine if the address is prefixed.
    
    The fix is to modify genfusion.pl that it sets the "YZ" constraint instead of
    "m" for the ld and lwa instructions.
    
    I also set the signed attribute on the HI/SImode loads that do sign extension.
    
    I modified the predicate for lwa to use lwa_operand instead of the predicate
    ds_form_mem_operand.  This predicate has direct checks to prevent a prefixed lwa
    from being generated.  I also modified the condition for the insn splitter so
    that it passes DImode for the lwa instructions and not SImode.
    
    I modified the "prefixed" attribute so that it also checks fused_load_cmpi.
    
    2023-06-07   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
            allowed prefixed lwa to be generated.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/rs6000.md (prefixed attribute): Treat fused_load_cmpi
            insns as being load insns.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.

Diff:
---
 gcc/config/rs6000/fusion.md                 | 25 +++++++++++++++----------
 gcc/config/rs6000/genfusion.pl              | 27 ++++++++++++++++++++++-----
 gcc/config/rs6000/rs6000.md                 |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C | 26 ++++++++++++++++++++++++++
 4 files changed, 64 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..b7c9035e882 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -106,7 +106,7 @@
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -114,13 +114,14 @@
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      DImode, NON_PREFIXED_DS))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -148,7 +149,7 @@
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -156,13 +157,14 @@
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      DImode, NON_PREFIXED_DS))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -190,7 +192,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -198,13 +200,14 @@
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      DImode, NON_PREFIXED_DS))"
   [(set (match_dup 0) (sign_extend:EXTSI (match_dup 1)))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -247,6 +250,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -289,6 +293,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..599bd3ad19b 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -57,14 +57,23 @@ sub gen_ld_cmpi_p10_one
 {
   my ($lmode, $result, $ccmode) = @_;
 
+  my $lmode_prefix_call = $lmode;
   my $np = "NON_PREFIXED_D";
   my $mempred = "non_update_memory_operand";
   my $extend;
 
   if ($ccmode eq "CC") {
-    # ld and lwa are both DS-FORM.
-    ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    # ld and lwa are both DS-FORM.  lwa needs to use lwa_operand and also
+    # call address_is_non_pfx_d_or_x with DImode, not SImode to properly
+    # handle recognizing whether the address is prefixed.
+    if ($lmode eq "DI") {
+      $np = "NON_PREFIXED_DS";
+      $mempred = "ds_form_mem_operand";
+    } elsif ($lmode eq "SI") {
+      $np = "NON_PREFIXED_DS";
+      $mempred = "lwa_operand";
+      $lmode_prefix_call = "DI";
+    }
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
@@ -91,12 +100,13 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
 (define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -123,7 +133,7 @@ HERE
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      ${lmode}mode, ${np}))"
+                                      ${lmode_prefix_call}mode, ${np}))"
 HERE
 
   if ($extend eq "none") {
@@ -140,6 +150,13 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+
+  if ($ccmode eq "CC" && $lmode ne "DI") {
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+
+  print <<HERE;
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..ed2f06e56ac 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-08 21:20 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-08 21:20 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:2b5e166e37b11de781204a8372b7d01a8c5b49c9

commit 2b5e166e37b11de781204a8372b7d01a8c5b49c9
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Jun 8 17:20:08 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems with the current GCC:
    
        1)  The prefixed attribute was not checking insns with the type
            fused_load_cmpi for being load insns.  This meant if the load + compare
            fusion was not split, it would not know that the load should be
            prefixed.
    
        2)  The recognition of LWA for being prefixed looks at the "sign_extend"
            attribute and whether the register mode was different than the memory
            load (i.e. does it have a sign_extend wrapper around the load).
    
        3)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        4)  There is a lwa_operand predicate that should be used instead of
            ds_form_mem_operand for the LWA instruction.
    
    The main fix is to modify genfusion.pl that it sets the appropriate predicates
    and constraints.
    
    I also added support in genfusion.md so that if we are doing a LWA operation and
    just setting the CC bits (throwing away the result of the load after the
    comparison), it generates a LWZ instruction and does a CMPWI instead of CMPDI.
    This way those loads can use normal D-FORM restrictions instead of DS-form.
    
    I set the "sign_extend" attribute on the cases that generate LWA.
    
    I modified the "prefixed" attribute so that it also checks fused_load_cmpi.
    
    2023-06-07   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix constraints and
            predicates for LD and LWA.  Optimize LWA/CMPDI to generate LWZ/CMPWI if
            we don't need the result of the load after the comparison.  Set the
            sign_extend attribute for LWA instruction.  For LWA where we do not need
            the result of the load after the comparison, use LWZ and CMPWI.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/rs6000.md (prefixed attribute): Treat fused_load_cmpi
            insns as being load insns.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.
            * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust names and insn counts
            for PR target/105325 fix.

Diff:
---
 gcc/config/rs6000/fusion.md                        | 22 ++++++++-------
 gcc/config/rs6000/genfusion.pl                     | 33 ++++++++++++++++------
 gcc/config/rs6000/rs6000.md                        |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C        | 26 +++++++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c         | 15 ++++++----
 5 files changed, 73 insertions(+), 25 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..5b82c61e959 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -104,17 +104,17 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+(define_insn_and_split "*lwz_cmpwi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
-  "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+  "lwz%X1 %0,%1\;cmpwi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      SImode, NON_PREFIXED_D))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
@@ -148,7 +148,7 @@
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -163,6 +163,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -190,7 +191,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +206,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..310ac5f359a 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -57,14 +57,26 @@ sub gen_ld_cmpi_p10_one
 {
   my ($lmode, $result, $ccmode) = @_;
 
+  my $cmp_size = "d";
+  my $echr = ($ccmode eq "CC") ? "a" : "z";
+  if ($lmode eq "DI") { $echr = ""; }
   my $np = "NON_PREFIXED_D";
   my $mempred = "non_update_memory_operand";
   my $extend;
 
   if ($ccmode eq "CC") {
-    # ld and lwa are both DS-FORM.
-    ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    # if we would generate lwa and just want the CC value, generate lwz instead
+    # and use cmpwi.  This allows us to avoid the ds-form restrictions on lwa.
+    if ($lmode eq "SI" && $result eq "clobber") {
+      $echr = "z";
+      $cmp_size = "w";
+
+    } else {
+      # ld and lwa are both DS-FORM, use LWA_OPERAND for LWA
+      ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
+      ($lmode eq "DI") and $mempred = "ds_form_mem_operand";
+      ($lmode eq "SI") and $mempred = "lwa_operand";
+    }
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
@@ -74,8 +86,6 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $cmpl = ($ccmode eq "CC") ? "" : "l";
-  my $echr = ($ccmode eq "CC") ? "a" : "z";
-  if ($lmode eq "DI") { $echr = ""; }
   my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand"
   				    : "const_0_to_1_operand";
 
@@ -91,12 +101,13 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
-(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
+(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}${cmp_size}i_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -119,7 +130,7 @@ HERE
 
   print <<HERE;
   "(TARGET_P10_FUSION)"
-  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3"
+  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}${cmp_size}i %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
@@ -140,6 +151,12 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+  # prefixed_load_p looks at sign_extend to deal with lwa.
+  if ($mempred eq "lwa_operand") {
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+  print <<HERE;
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..ed2f06e56ac 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..9902a340e0f 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -60,19 +60,22 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    1 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_SI_CC_none"            8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"       2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"             0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_EXTHI_CC_sign"         4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"     2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_SI_CC_none"           36 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target ilp32 } } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-08 16:53 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-08 16:53 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:3d58f15f98a3f7a0dc88a88d8a78f9de27a6260e

commit 3d58f15f98a3f7a0dc88a88d8a78f9de27a6260e
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Jun 8 12:53:23 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems with the current GCC:
    
        1)  The prefixed attribute was not checking insns with the type
            fused_load_cmpi for being load insns.  This meant if the load + compare
            fusion was not split, it would not know that the load should be
            prefixed.
    
        2)  The recognition of LWA for being prefixed looks at the "sign_extend"
            attribute and whether the register mode was different than the memory
            load (i.e. does it have a sign_extend wrapper around the load).
    
        3)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        4)  There is a lwa_operand predicate that should be used instead of
            ds_form_mem_operand for the LWA instruction.
    
    The main fix is to modify genfusion.pl that it sets the appropriate predicates
    and constraints.
    
    I also added support in genfusion.md so that if we are doing a LWA operation and
    just setting the CC bits (throwing away the result of the load after the
    comparison), it generates a LWZ instruction and does a CMPWI instead of CMPDI.
    This way those loads can use normal D-FORM restrictions instead of DS-form.
    
    I set the "sign_extend" attribute on the cases that generate LWA.
    
    I modified the "prefixed" attribute so that it also checks fused_load_cmpi.
    
    2023-06-07   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix constraints and
            predicates for LD and LWA.  Optimize LWA/CMPDI to generate LWZ/CMPWI if
            we don't need the result of the load after the comparison.  Set the
            sign_extend attribute for LWA instruction.  For LWA where we do not need
            the result of the load after the comparison, use LWZ and CMPWI.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/rs6000.md (prefixed attribute): Treat fused_load_cmpi
            insns as being load insns.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.
            * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust names and insn counts
            for PR target/105325 fix.

Diff:
---
 gcc/config/rs6000/fusion.md                        | 22 ++++++++-------
 gcc/config/rs6000/genfusion.pl                     | 33 ++++++++++++++++------
 gcc/config/rs6000/rs6000.md                        |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C        | 26 +++++++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c         | 15 ++++++----
 5 files changed, 73 insertions(+), 25 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..5b82c61e959 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -104,17 +104,17 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+(define_insn_and_split "*lwz_cmpwi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
-  "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+  "lwz%X1 %0,%1\;cmpwi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      SImode, NON_PREFIXED_D))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
@@ -148,7 +148,7 @@
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -163,6 +163,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -190,7 +191,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +206,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..310ac5f359a 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -57,14 +57,26 @@ sub gen_ld_cmpi_p10_one
 {
   my ($lmode, $result, $ccmode) = @_;
 
+  my $cmp_size = "d";
+  my $echr = ($ccmode eq "CC") ? "a" : "z";
+  if ($lmode eq "DI") { $echr = ""; }
   my $np = "NON_PREFIXED_D";
   my $mempred = "non_update_memory_operand";
   my $extend;
 
   if ($ccmode eq "CC") {
-    # ld and lwa are both DS-FORM.
-    ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    # if we would generate lwa and just want the CC value, generate lwz instead
+    # and use cmpwi.  This allows us to avoid the ds-form restrictions on lwa.
+    if ($lmode eq "SI" && $result eq "clobber") {
+      $echr = "z";
+      $cmp_size = "w";
+
+    } else {
+      # ld and lwa are both DS-FORM, use LWA_OPERAND for LWA
+      ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
+      ($lmode eq "DI") and $mempred = "ds_form_mem_operand";
+      ($lmode eq "SI") and $mempred = "lwa_operand";
+    }
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
@@ -74,8 +86,6 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $cmpl = ($ccmode eq "CC") ? "" : "l";
-  my $echr = ($ccmode eq "CC") ? "a" : "z";
-  if ($lmode eq "DI") { $echr = ""; }
   my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand"
   				    : "const_0_to_1_operand";
 
@@ -91,12 +101,13 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
-(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
+(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}${cmp_size}i_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -119,7 +130,7 @@ HERE
 
   print <<HERE;
   "(TARGET_P10_FUSION)"
-  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3"
+  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}${cmp_size}i %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
@@ -140,6 +151,12 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+  # prefixed_load_p looks at sign_extend to deal with lwa.
+  if ($mempred eq "lwa_operand") {
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+  print <<HERE;
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..ed2f06e56ac 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..4f655f0b90c 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -60,19 +60,22 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    1 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_SI_CC_none"            8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"       2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"             0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_EXTHI_CC_sign"         4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"     2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_SI_CC_none"           36 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpdi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target ilp32 } } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-08 14:38 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-08 14:38 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:6c1d18a1db648ccce383a6c1ecdb49f08c5e2136

commit 6c1d18a1db648ccce383a6c1ecdb49f08c5e2136
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Jun 8 10:38:25 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems:
    
        1)  The prefixed attribute was not checking insns with the type
            fused_load_cmpi for being load insns.
    
        2)  The recognition of LWA for being prefixed looks at the "sign_extend"
            attribute and whether the register mode was different than the memory
            load (i.e. does it have a sign_extend wrapper around the load).
    
        3)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        4)  There is a lwa_operand that should be used instead of
            ds_form_mem_operand for the LWA instruction.
    
    The main fix is to modify genfusion.pl that it sets the appropriate predicates
    and constraints.
    
    I also added support in genfusion.md so that if we are doing a LWA operation and
    just setting the CC bits (throwing away the result of the load after the
    comparison), it generates a LWZ instruction and does a CMPWI instead of CMPDI.
    This way those loads can use normal D-FORM restrictions instead of DS-form.
    
    I set the "sign_extend" attribute on the cases that generate LWA.
    
    I modified the "prefixed" attribute so that it also checks fused_load_cmpi.
    
    2023-06-07   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix constraints and
            predicates for LD and LWA.  Optimize LWA/CMPDI to generate LWZ/CMPWI if
            we don't need the result of the load after the comparison.  Change the
            name of the insn pattern to reflect whether a DImode or SImode register
            is loaded.  Set sign_extend attribute for LWA instruction.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/rs6000.md (prefixed attribute): Treat fused_load_cmpi
            insns as being load insns.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.
            * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust names and insn counts
            for PR target/105325 fix.

Diff:
---
 gcc/config/rs6000/fusion.md                        | 22 ++++++++-------
 gcc/config/rs6000/genfusion.pl                     | 33 ++++++++++++++++------
 gcc/config/rs6000/rs6000.md                        |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C        | 26 +++++++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c         | 15 ++++++----
 5 files changed, 73 insertions(+), 25 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..5b82c61e959 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -104,17 +104,17 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+(define_insn_and_split "*lwz_cmpwi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
-  "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+  "lwz%X1 %0,%1\;cmpwi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      SImode, NON_PREFIXED_D))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
@@ -148,7 +148,7 @@
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -163,6 +163,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -190,7 +191,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +206,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..310ac5f359a 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -57,14 +57,26 @@ sub gen_ld_cmpi_p10_one
 {
   my ($lmode, $result, $ccmode) = @_;
 
+  my $cmp_size = "d";
+  my $echr = ($ccmode eq "CC") ? "a" : "z";
+  if ($lmode eq "DI") { $echr = ""; }
   my $np = "NON_PREFIXED_D";
   my $mempred = "non_update_memory_operand";
   my $extend;
 
   if ($ccmode eq "CC") {
-    # ld and lwa are both DS-FORM.
-    ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    # if we would generate lwa and just want the CC value, generate lwz instead
+    # and use cmpwi.  This allows us to avoid the ds-form restrictions on lwa.
+    if ($lmode eq "SI" && $result eq "clobber") {
+      $echr = "z";
+      $cmp_size = "w";
+
+    } else {
+      # ld and lwa are both DS-FORM, use LWA_OPERAND for LWA
+      ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
+      ($lmode eq "DI") and $mempred = "ds_form_mem_operand";
+      ($lmode eq "SI") and $mempred = "lwa_operand";
+    }
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
@@ -74,8 +86,6 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $cmpl = ($ccmode eq "CC") ? "" : "l";
-  my $echr = ($ccmode eq "CC") ? "a" : "z";
-  if ($lmode eq "DI") { $echr = ""; }
   my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand"
   				    : "const_0_to_1_operand";
 
@@ -91,12 +101,13 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
-(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
+(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}${cmp_size}i_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -119,7 +130,7 @@ HERE
 
   print <<HERE;
   "(TARGET_P10_FUSION)"
-  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3"
+  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}${cmp_size}i %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
@@ -140,6 +151,12 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+  # prefixed_load_p looks at sign_extend to deal with lwa.
+  if ($mempred eq "lwa_operand") {
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+  print <<HERE;
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..ed2f06e56ac 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..3a7a16be5d0 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -60,19 +60,22 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    1 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_SI_CC_none"            8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpdi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"       2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"             0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_EXTHI_CC_sign"         4 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"     2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_SI_CC_none"           36 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpdi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target ilp32 } } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-07 20:58 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-07 20:58 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:8c02d7e251f65cc70ec3d0ac81aa4145dea5943e

commit 8c02d7e251f65cc70ec3d0ac81aa4145dea5943e
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Jun 7 16:57:48 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems:
    
        1)  The prefixed attribute was not checking insns with the type
            fused_load_cmpi for being load insns.
    
        2)  The recognition of LWA for being prefixed looks at the "sign_extend"
            attribute and whether the register mode was different than the memory
            load (i.e. does it have a sign_extend wrapper around the load).
    
        3)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        4)  There is a lwa_operand that should be used instead of
            ds_form_mem_operand for the LWA instruction.
    
    The main fix is to modify genfusion.pl that it sets the appropriate predicates
    and constraints.
    
    I also added support in genfusion.md so that if we are doing a LWA operation and
    just setting the CC bits (throwing away the result of the load after the
    comparison), it generates a LWZ instruction and does a CMPWI instead of CMPDI.
    This way those loads can use normal D-FORM restrictions instead of DS-form.
    
    I set the "sign_extend" attribute on the cases that generate LWA.
    
    I modified the "prefixed" attribute so that it also checks fused_load_cmpi.
    
    2023-06-06   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix constraints and
            predicates for LD and LWA.  Optimize LWA/CMPDI to generate LWZ/CMPWI if
            we don't need the result of the load after the comparison.  Change the
            name of the insn pattern to reflect whether a DImode or SImode register
            is loaded.  Set sign_extend attribute for LWA instruction.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/rs6000.md (prefixed attribute): Treat fused_load_cmpi
            insns as being load insns.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.
            * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust names and insn counts
            for PR target/105325 fix.

Diff:
---
 gcc/config/rs6000/fusion.md                        | 44 +++++++++++-----------
 gcc/config/rs6000/genfusion.pl                     | 36 ++++++++++++++----
 gcc/config/rs6000/rs6000.md                        |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C        | 26 +++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c         | 31 ++++++++-------
 5 files changed, 95 insertions(+), 44 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..4444e3315dd 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -104,17 +104,17 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+(define_insn_and_split "*lwz_cmpsi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
-  "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+  "lwz%X1 %0,%1\;cmpwi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      SImode, NON_PREFIXED_D))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
@@ -125,7 +125,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is clobber compare mode is CCUNS extend is none
-(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
+(define_insn_and_split "*lwz_cmplsi_cr0_SI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
                        (match_operand:SI 3 "const_0_to_1_operand" "n")))
@@ -146,9 +146,9 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is SI compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+(define_insn_and_split "*lwa_cmpsi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -163,11 +163,12 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is SI compare mode is CCUNS extend is none
-(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
+(define_insn_and_split "*lwz_cmplsi_cr0_SI_SI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
                        (match_operand:SI 3 "const_0_to_1_operand" "n")))
@@ -188,9 +189,9 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
+(define_insn_and_split "*lwa_cmp<mode>_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,11 +206,12 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
-(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
+(define_insn_and_split "*lwz_cmpl<mode>_cr0_SI_EXTSI_CCUNS_zero"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
                        (match_operand:SI 3 "const_0_to_1_operand" "n")))
@@ -230,7 +232,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is HI result mode is clobber compare mode is CC extend is sign
-(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
+(define_insn_and_split "*lha_cmp<mode>_cr0_HI_clobber_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
         (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
                     (match_operand:HI 3 "const_m1_to_1_operand" "n")))
@@ -251,7 +253,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is HI result mode is clobber compare mode is CCUNS extend is zero
-(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
+(define_insn_and_split "*lhz_cmpl<mode>_cr0_HI_clobber_CCUNS_zero"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
                        (match_operand:HI 3 "const_0_to_1_operand" "n")))
@@ -272,7 +274,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
-(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
+(define_insn_and_split "*lha_cmp<mode>_cr0_HI_EXTHI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
         (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
                     (match_operand:HI 3 "const_m1_to_1_operand" "n")))
@@ -293,7 +295,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
-(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
+(define_insn_and_split "*lhz_cmpl<mode>_cr0_HI_EXTHI_CCUNS_zero"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
                        (match_operand:HI 3 "const_0_to_1_operand" "n")))
@@ -314,7 +316,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is QI result mode is clobber compare mode is CCUNS extend is zero
-(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
+(define_insn_and_split "*lbz_cmpl<mode>_cr0_QI_clobber_CCUNS_zero"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
                        (match_operand:QI 3 "const_0_to_1_operand" "n")))
@@ -335,7 +337,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
-(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
+(define_insn_and_split "*lbz_cmpl<mode>_cr0_QI_GPR_CCUNS_zero"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
                        (match_operand:QI 3 "const_0_to_1_operand" "n")))
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..42517b3bce7 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -57,14 +57,26 @@ sub gen_ld_cmpi_p10_one
 {
   my ($lmode, $result, $ccmode) = @_;
 
+  my $cmp_size = "d";
+  my $echr = ($ccmode eq "CC") ? "a" : "z";
+  if ($lmode eq "DI") { $echr = ""; }
   my $np = "NON_PREFIXED_D";
   my $mempred = "non_update_memory_operand";
   my $extend;
 
   if ($ccmode eq "CC") {
-    # ld and lwa are both DS-FORM.
-    ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    # if we would generate lwa and just want the CC value, generate lwz instead
+    # and use cmpwi.  This allows us to avoid the ds-form restrictions on lwa.
+    if ($lmode eq "SI" && $result eq "clobber") {
+      $echr = "z";
+      $cmp_size = "w";
+
+    } else {
+      # ld and lwa are both DS-FORM, use LWA_OPERAND for LWA
+      ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
+      ($lmode eq "DI") and $mempred = "ds_form_mem_operand";
+      ($lmode eq "SI") and $mempred = "lwa_operand";
+    }
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
@@ -74,8 +86,6 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $cmpl = ($ccmode eq "CC") ? "" : "l";
-  my $echr = ($ccmode eq "CC") ? "a" : "z";
-  if ($lmode eq "DI") { $echr = ""; }
   my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand"
   				    : "const_0_to_1_operand";
 
@@ -90,13 +100,17 @@ sub gen_ld_cmpi_p10_one
     $extend = "none";
   }
 
+  my $load_mode = (($clobbermode eq "GPR" || $result =~ /^EXT/)
+		   ? "<mode>"
+		   : lc ($clobbermode));
   my $ldst = mode_to_ldst_char($lmode);
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
-(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
+(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}${load_mode}_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -119,7 +133,7 @@ HERE
 
   print <<HERE;
   "(TARGET_P10_FUSION)"
-  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3"
+  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}${cmp_size}i %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
@@ -140,6 +154,12 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+  # prefixed_load_p looks at sign_extend to deal with lwa.
+  if ($mempred eq "lwa_operand") {
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+  print <<HERE;
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..ed2f06e56ac 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..0dad2c10c2f 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -53,26 +53,29 @@ TEST(int16_t)
 TEST(uint8_t)
 TEST(int8_t)
 
-/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lbz_cmplsi_cr0_QI_clobber_CCUNS_zero"   4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"             4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         1 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    1 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lha_cmpsi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lhz_cmplsi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpsi_cr0_SI_SI_CC_none"            8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmplsi_cr0_SI_SI_CCUNS_none"        2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmplsi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpsi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
 
-/* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lbz_cmplsi_cr0_QI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lbz_cmplsi_cr0_QI_GPR_CCUNS_zero"       2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"             0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lha_cmpsi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lha_cmpsi_cr0_HI_EXTHI_CC_sign"         4 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lhz_cmplsi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lhz_cmplsi_cr0_HI_EXTHI_CCUNS_zero"     2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpsi_cr0_SI_SI_CC_none"           36 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpsi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmplsi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmplsi_cr0_SI_SI_CCUNS_none"        2 { target ilp32 } } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-07 18:56 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2023-06-07 18:56 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:dc15fa44d7cf7eaab9dbefe81de828ae32d20589

commit dc15fa44d7cf7eaab9dbefe81de828ae32d20589
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Jun 7 14:56:05 2023 -0400

    Fix power10 fusion and -fstack-protector, PR target/105325
    
    This patch fixes an issue where if you use the -fstack-protector and
    -mcpu=power10 options and you have a large stack frame, the GCC compiler will
    generate a LWA instruction with a large offset.
    
    There are several problems:
    
        1)  The prefixed attribute was not checking insns with the type
            fused_load_cmpi for being load insns.
    
        2)  The recognition of LWA for being prefixed looks at the "sign_extend"
            attribute and whether the register mode was different than the memory
            load (i.e. does it have a sign_extend wrapper around the load).
    
        3)  The constraints in fusion.md (generated by genfusion.pl) use "m" for LWA
            and LD, when they should use "YZ".
    
        4)  There is a lwa_operand that should be used instead of
            ds_form_mem_operand for the LWA instruction.
    
    The main fix is to modify genfusion.pl that it sets the appropriate predicates
    and constraints.
    
    I also added support in genfusion.md so that if we are doing a LWA operation and
    just setting the CC bits (throwing away the result of the load after the
    comparison), it generates a LWZ instruction and does a CMPWI instead of CMPDI.
    This way those loads can use normal D-FORM restrictions instead of DS-form.
    
    I set the "sign_extend" attribute on the cases that generate LWA.
    
    I modified the "prefixed" attribute so that it also checks fused_load_cmpi.
    
    2023-06-06   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix constraints and
            predicates for LD and LWA.  Optimize LWA/CMPDI to generate LWZ/CMPWI if
            we don't need the result of the load after the comparison.  Change the
            name of the insn pattern to reflect whether a DImode or SImode register
            is loaded.  Set sign_extend attribute for LWA instruction.
            * config/rs6000/fusion.md: Regenerate.
            * config/rs6000/rs6000.md (prefixed attribute): Treat fused_load_cmpi
            insns as being load insns.
    
    gcc/testsuite/
    
            * g++.target/powerpc/pr105325.C: New test.

Diff:
---
 gcc/config/rs6000/fusion.md                 | 44 +++++++++++++++--------------
 gcc/config/rs6000/genfusion.pl              | 36 +++++++++++++++++------
 gcc/config/rs6000/rs6000.md                 |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C | 26 +++++++++++++++++
 4 files changed, 78 insertions(+), 30 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..4444e3315dd 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -104,17 +104,17 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+(define_insn_and_split "*lwz_cmpsi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
-  "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+  "lwz%X1 %0,%1\;cmpwi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      SImode, NON_PREFIXED_D))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
@@ -125,7 +125,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is clobber compare mode is CCUNS extend is none
-(define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
+(define_insn_and_split "*lwz_cmplsi_cr0_SI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
                        (match_operand:SI 3 "const_0_to_1_operand" "n")))
@@ -146,9 +146,9 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is SI compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+(define_insn_and_split "*lwa_cmpsi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -163,11 +163,12 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is SI compare mode is CCUNS extend is none
-(define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
+(define_insn_and_split "*lwz_cmplsi_cr0_SI_SI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
                        (match_operand:SI 3 "const_0_to_1_operand" "n")))
@@ -188,9 +189,9 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
+(define_insn_and_split "*lwa_cmp<mode>_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,11 +206,12 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is EXTSI compare mode is CCUNS extend is zero
-(define_insn_and_split "*lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"
+(define_insn_and_split "*lwz_cmpl<mode>_cr0_SI_EXTSI_CCUNS_zero"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:SI 1 "non_update_memory_operand" "m")
                        (match_operand:SI 3 "const_0_to_1_operand" "n")))
@@ -230,7 +232,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is HI result mode is clobber compare mode is CC extend is sign
-(define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
+(define_insn_and_split "*lha_cmp<mode>_cr0_HI_clobber_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
         (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
                     (match_operand:HI 3 "const_m1_to_1_operand" "n")))
@@ -251,7 +253,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is HI result mode is clobber compare mode is CCUNS extend is zero
-(define_insn_and_split "*lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"
+(define_insn_and_split "*lhz_cmpl<mode>_cr0_HI_clobber_CCUNS_zero"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
                        (match_operand:HI 3 "const_0_to_1_operand" "n")))
@@ -272,7 +274,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is HI result mode is EXTHI compare mode is CC extend is sign
-(define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
+(define_insn_and_split "*lha_cmp<mode>_cr0_HI_EXTHI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
         (compare:CC (match_operand:HI 1 "non_update_memory_operand" "m")
                     (match_operand:HI 3 "const_m1_to_1_operand" "n")))
@@ -293,7 +295,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is HI result mode is EXTHI compare mode is CCUNS extend is zero
-(define_insn_and_split "*lhz_cmpldi_cr0_HI_EXTHI_CCUNS_zero"
+(define_insn_and_split "*lhz_cmpl<mode>_cr0_HI_EXTHI_CCUNS_zero"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:HI 1 "non_update_memory_operand" "m")
                        (match_operand:HI 3 "const_0_to_1_operand" "n")))
@@ -314,7 +316,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is QI result mode is clobber compare mode is CCUNS extend is zero
-(define_insn_and_split "*lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"
+(define_insn_and_split "*lbz_cmpl<mode>_cr0_QI_clobber_CCUNS_zero"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
                        (match_operand:QI 3 "const_0_to_1_operand" "n")))
@@ -335,7 +337,7 @@
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is QI result mode is GPR compare mode is CCUNS extend is zero
-(define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
+(define_insn_and_split "*lbz_cmpl<mode>_cr0_QI_GPR_CCUNS_zero"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
         (compare:CCUNS (match_operand:QI 1 "non_update_memory_operand" "m")
                        (match_operand:QI 3 "const_0_to_1_operand" "n")))
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..42517b3bce7 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -57,14 +57,26 @@ sub gen_ld_cmpi_p10_one
 {
   my ($lmode, $result, $ccmode) = @_;
 
+  my $cmp_size = "d";
+  my $echr = ($ccmode eq "CC") ? "a" : "z";
+  if ($lmode eq "DI") { $echr = ""; }
   my $np = "NON_PREFIXED_D";
   my $mempred = "non_update_memory_operand";
   my $extend;
 
   if ($ccmode eq "CC") {
-    # ld and lwa are both DS-FORM.
-    ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    # if we would generate lwa and just want the CC value, generate lwz instead
+    # and use cmpwi.  This allows us to avoid the ds-form restrictions on lwa.
+    if ($lmode eq "SI" && $result eq "clobber") {
+      $echr = "z";
+      $cmp_size = "w";
+
+    } else {
+      # ld and lwa are both DS-FORM, use LWA_OPERAND for LWA
+      ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
+      ($lmode eq "DI") and $mempred = "ds_form_mem_operand";
+      ($lmode eq "SI") and $mempred = "lwa_operand";
+    }
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
@@ -74,8 +86,6 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $cmpl = ($ccmode eq "CC") ? "" : "l";
-  my $echr = ($ccmode eq "CC") ? "a" : "z";
-  if ($lmode eq "DI") { $echr = ""; }
   my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand"
   				    : "const_0_to_1_operand";
 
@@ -90,13 +100,17 @@ sub gen_ld_cmpi_p10_one
     $extend = "none";
   }
 
+  my $load_mode = (($clobbermode eq "GPR" || $result =~ /^EXT/)
+		   ? "<mode>"
+		   : lc ($clobbermode));
   my $ldst = mode_to_ldst_char($lmode);
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
-(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
+(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}${load_mode}_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -119,7 +133,7 @@ HERE
 
   print <<HERE;
   "(TARGET_P10_FUSION)"
-  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3"
+  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}${cmp_size}i %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
@@ -140,6 +154,12 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+  # prefixed_load_p looks at sign_extend to deal with lwa.
+  if ($mempred eq "lwa_operand") {
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+  print <<HERE;
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..ed2f06e56ac 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-06-13 23:24 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-08 20:23 [gcc(refs/users/meissner/heads/work122)] Fix power10 fusion and -fstack-protector, PR target/105325 Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2023-06-13 23:24 Michael Meissner
2023-06-13 17:21 Michael Meissner
2023-06-13  2:57 Michael Meissner
2023-06-09 21:04 Michael Meissner
2023-06-09  6:08 Michael Meissner
2023-06-09  4:17 Michael Meissner
2023-06-09  1:32 Michael Meissner
2023-06-08 21:20 Michael Meissner
2023-06-08 16:53 Michael Meissner
2023-06-08 14:38 Michael Meissner
2023-06-07 20:58 Michael Meissner
2023-06-07 18:56 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).