public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, V6] Fix power10 fusion and -fstack-protector, PR target/105325
@ 2023-06-14  2:14 Michael Meissner
  2023-06-20 20:05 ` Segher Boessenkool
  0 siblings, 1 reply; 2+ messages in thread
From: Michael Meissner @ 2023-06-14  2:14 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

This patch fixes an issue where if you use the -fstack-protector and
-mcpu=power10 options and you have a large stack frame, the GCC compiler will
generate a LWA instruction with a large offset.

Unlike the previous versions of this patch, I dug into it, and I found it was
much more complex that I originally thought.

The important thing in the bug is that -fstack-protector is used, but it could
potentially happen with fused load-compare to any stack location when the stack
frame is larger than 32K without -fstack-protector.

Here is the initial fused initial insn that was created.  It refers to the
stack location based off of the virtrual frame pointer:

(insn 6 5 7 2 (parallel [
            (set (reg:CC 119)
                 (compare:CC (mem/c:SI (plus:DI (reg/f:DI 110 sfp)
                                                (const_int -4))
                             (const_int 0 [0])))
            (clobber (scratch:DI))
        ])
     (nil))

After the stack size is finalized, the frame pointer removed, and the post
reload phase is run, the insn is now:

(insn 6 5 7 2 (parallel [
            (set (reg:CC 100 0 [119])
                 (compare:CC (mem/c:SI (plus:DI (reg/f:DI 1 1)
                                                (const_int 40044))
                             (const_int 0 [0])))
            (clobber (reg:DI 9 9 [120]))
        ])
     (nil))

When the split2 pass is run after reload has finished the ds_form_mem_operand
predicate that was used for lwa and ld no longer returns true.  This means that
since the operand predicates aren't recognized, it won't be split.  Thus, it
goes all of the way to final.  The automatic prefix instruction support was not
run because the type was changed from "load" to "fused_load_cmpi".  This meant
that it was assume that the insn was only 8 bytes, and that we did not need to
prefer the lwa with a 'p'.

The solution involves:

    1)	Don't use ds_form_mem_operand for ld and lwa, always use
	non_update_memory_operand.

    2)	Delete ds_form_mem_operand since it is no longer used.

    3)	Use the "YZ" constraints for ld/lwa instead of "m".

    4)	If we don't need to sign extend the lwa, convert it to lwz, and use
	cmpwi instead of cmpdi.  Adjust the insn name to reflect the code
	generate.

    5)	Insure that the insn using lwa will be recognized as having a prefixed
	operand (and hence the instruction length is 16 bytes instead of 8
	bytes).

	5a) Set the prefixed and maybe_prefix attributes to know that
	    fused_load_cmpi are also load insns;

	5b) In the case where we are just setting CC and not using the memory
	    afterward, set the clobber to use a DI register, and put an
	    explicit sign_extend operation in the split;

	5c) Set the sign_extend attribute to "yes".

	5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to
	    ensure that lwa is treated as a ds-form instruction and not as
	    a d-form instruction (i.e. lwz).

    6)	Add a new test case for this case.

    7)	Adjust the insn counts in fusion-p10-ldcmpi.c.  Because we are no
	longer using ds_form_mem_operand, the ld and lwa instructions will fuse
	x-form (reg+reg) addresses in addition ds-form (reg+offset or reg).

I have built bootstrap compilers and tested them on the following environments.
There were no regressions in any of the runs.

	Little endian power10, long double is IBM 128-bit
	Little endian power9, long double is IBM 128-bit
	Little endian power9, long double is IEEE 128-bit
	Big endian power8, long double is IBM 128-bit (32/64-bit tests run)

Can I check this patch into the master GCC branch?  After a waiting period, once
the previous changes to genfusion.pl are checked in, can I install this patch in
previous GCC compilers?

2023-06-12   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
	allowed prefixed lwa to be generated.
	* config/rs6000/fusion.md: Regenerate.
	* config/rs6000/predicates.md (ds_form_mem_operand): Delete.
	* config/rs6000/rs6000.md (prefixed attribute): Add support for load
	plus compare immediate fused insns.
	(maybe_prefixed): Likewise.

gcc/testsuite/

	* g++.target/powerpc/pr105325.C: New test.
	* gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn
	counts.
---
 gcc/config/rs6000/fusion.md                   | 27 +++++++-------
 gcc/config/rs6000/genfusion.pl                | 36 +++++++++++++++----
 gcc/config/rs6000/predicates.md               | 14 --------
 gcc/config/rs6000/rs6000.md                   |  4 +--
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 26 ++++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c    | 16 +++++----
 6 files changed, 81 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..e286bf526a2 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "non_update_memory_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "non_update_memory_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "non_update_memory_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -104,17 +104,17 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
+(define_insn_and_split "*lwz_cmpwi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
-  "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+  "lwz%X1 %0,%1\;cmpwi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      SImode, NON_PREFIXED_D))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
@@ -146,17 +146,17 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is SI result mode is SI compare mode is CC extend is none
-(define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
+(define_insn_and_split "*lwz_cmpwi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "m")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
-  "lwa%X1 %0,%1\;cmpdi %2,%0,%3"
+  "lwz%X1 %0,%1\;cmpwi %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
-                                      SImode, NON_PREFIXED_DS))"
+                                      SImode, NON_PREFIXED_D))"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2)
         (compare:CC (match_dup 0) (match_dup 3)))]
@@ -190,7 +190,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "non_update_memory_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +205,7 @@ (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 82e8f863b02..c8b232847a8 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -61,20 +61,30 @@ sub gen_ld_cmpi_p10_one
   my $mempred = "non_update_memory_operand";
   my $extend;
 
+  # We need to special case lwa.  The prefixed_load_p function in rs6000.cc
+  # (which determines if a load instruction is prefixed) uses the fact that the
+  # register mode is different from the memory mode, and that the sign_extend
+  # attribute is set to use DS-form rules for the address instead of D-form.
+  # If the register size is the same, prefixed_load_p assumes we are doing a
+  # lwz.  We change to use an lwz and word compare if we don't need to sign
+  # extend the SImode value.  Otherwise if we need the value, we need to
+  # make sure the insn is marked as ds-form.
+  my $lwa_insn = ($lmode eq "SI" && $ccmode eq "CC");
+  my $cmp_size = ($lwa_insn && $result !~ /^EXT|^DI$/) ? "w" : "d";
+
   if ($ccmode eq "CC") {
     # ld and lwa are both DS-FORM.
-    ($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
-    ($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+    ($lmode eq "DI") and $np = "NON_PREFIXED_DS";
+    ($lmode eq "SI" && $cmp_size eq "d") and $np = "NON_PREFIXED_DS";
   } else {
     if ($lmode eq "DI") {
       # ld is DS-form, but lwz is not.
       $np = "NON_PREFIXED_DS";
-      $mempred = "ds_form_mem_operand";
     }
   }
 
   my $cmpl = ($ccmode eq "CC") ? "" : "l";
-  my $echr = ($ccmode eq "CC") ? "a" : "z";
+  my $echr = ($ccmode eq "CC" && $cmp_size eq "d") ? "a" : "z";
   if ($lmode eq "DI") { $echr = ""; }
   my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand"
   				    : "const_0_to_1_operand";
@@ -91,12 +101,15 @@ sub gen_ld_cmpi_p10_one
   }
 
   my $ldst = mode_to_ldst_char($lmode);
+
+  # DS-form addresses need YZ, and not m.
+  my $constraint = ($np eq "NON_PREFIXED_DS") ? "YZ" : "m";
   print <<HERE;
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
 ;; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend
-(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}"
+(define_insn_and_split "*l${ldst}${echr}_cmp${cmpl}${cmp_size}i_cr0_${lmode}_${result}_${ccmode}_${extend}"
   [(set (match_operand:${ccmode} 2 "cc_reg_operand" "=x")
-        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "m")
+        (compare:${ccmode} (match_operand:${lmode} 1 "${mempred}" "${constraint}")
 HERE
   print "   " if $ccmode eq "CCUNS";
 print <<HERE;
@@ -119,7 +132,7 @@ HERE
 
   print <<HERE;
   "(TARGET_P10_FUSION)"
-  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3"
+  "l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}${cmp_size}i %2,%0,%3"
   "&& reload_completed
    && (cc_reg_not_cr0_operand (operands[2], CCmode)
        || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),
@@ -140,6 +153,15 @@ HERE
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+HERE
+
+  if ($lwa_insn && $cmp_size eq "d") {
+    # prefixed_load_p needs the sign_extend attribute to validate lwa as a
+    # DS-form instruction instead of D-form.
+    print "   (set_attr \"sign_extend\" \"yes\")\n";
+  }
+
+  print <<HERE
    (set_attr "length" "8")])
 
 HERE
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index a16ee30f0c0..6b564837c6e 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1125,20 +1125,6 @@ (define_predicate "lwa_operand"
   return INTVAL (offset) % 4 == 0;
 })
 
-;; Return 1 if the operand is a memory operand that has a valid address for
-;; a DS-form instruction. I.e. the address has to be either just a register,
-;; or register + const where the two low order bits of const are zero.
-(define_predicate "ds_form_mem_operand"
-  (match_code "subreg,mem")
-{
-  if (!any_memory_operand (op, mode))
-    return false;
-
-  rtx addr = XEXP (op, 0);
-
-  return address_to_insn_form (addr, mode, NON_PREFIXED_DS) == INSN_FORM_DS;
-})
-
 ;; Return 1 if the operand, used inside a MEM, is a SYMBOL_REF.
 (define_predicate "symbol_ref_operand"
   (and (match_code "symbol_ref")
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..75c5e5fc93d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -287,7 +287,7 @@ (define_attr "cannot_copy" "no,yes" (const_string "no"))
 ;; Whether this insn has a prefixed form and a non-prefixed form.
 (define_attr "maybe_prefixed" "no,yes"
   (if_then_else (eq_attr "type" "load,fpload,vecload,store,fpstore,vecstore,
-  				 integer,add")
+  				 integer,add,fused_load_cmpi")
 		(const_string "yes")
 		(const_string "no")))
 
@@ -302,7 +302,7 @@ (define_attr "prefixed" "no,yes"
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..165bd9a07ad 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -54,15 +54,17 @@ TEST(uint8_t)
 TEST(int8_t)
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"             4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         1 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_DI_CC_none"            24 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpdi_cr0_DI_clobber_CC_none"        8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_DI_CCUNS_none"         2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "ld_cmpldi_cr0_DI_clobber_CCUNS_none"    2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_SI_CC_none"            8 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
 
 /* { dg-final { scan-assembler-times "lbz_cmpldi_cr0_QI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
@@ -73,6 +75,8 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_SI_CC_none"           36 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpwi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_SI_CCUNS_none"        2 { target ilp32 } } } */
-- 
2.40.1


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH, V6] Fix power10 fusion and -fstack-protector, PR target/105325
  2023-06-14  2:14 [PATCH, V6] Fix power10 fusion and -fstack-protector, PR target/105325 Michael Meissner
@ 2023-06-20 20:05 ` Segher Boessenkool
  0 siblings, 0 replies; 2+ messages in thread
From: Segher Boessenkool @ 2023-06-20 20:05 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn, Peter Bergner

Hi!

The patch looks great now, thanks you!

But the commit message needs some work:

First off, the subject, which is a short (50 character max!) summary of
what the patch is about.
Fix power10 fusion and -fstack-protector, PR target/105325

There is absolutely nothing to do with stack protector, it does not
belong in the commit message at all, and certainly not in the subject!

On Tue, Jun 13, 2023 at 10:14:02PM -0400, Michael Meissner wrote:
> This patch fixes an issue where if you use the -fstack-protector and
> -mcpu=power10 options and you have a large stack frame, the GCC compiler will
> generate a LWA instruction with a large offset.

That is not the core issue, it is just one example where things went
wrong.  That prompted this patch, sure, so you can talk about that ten
or so lines down if you think it is important (I don't fwiw), but not at
the start here.  You should just say what was wrong, so people with a
short attention span can just skip this patch when looking through git
log (and even earlier if the subject would be good).

Commit messages are for *future* users.  Not for getting your patch
approved.

> Here is the initial fused initial insn that was created.  It refers to the
> stack location based off of the virtrual frame pointer:

The soft frame pointer, not a virtual one.  For PowerPC this is not a
real register and LRA will eventually replace it, sure.  "Virtual" here
in GCC has a very specific meaning; virtual things are replaced very
soon after expand.

> When the split2 pass is run after reload has finished the ds_form_mem_operand
> predicate that was used for lwa and ld no longer returns true.

Yes.  It is the wrong predicate to use here.  *That* is the problem.

>     2)	Delete ds_form_mem_operand since it is no longer used.

... and we don't expect to use it any time soon.

>     3)	Use the "YZ" constraints for ld/lwa instead of "m".

Yes, constraints and predicates.

> 	* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
> 	allowed prefixed lwa to be generated.

You should not say what the *old* code did, in the changelog!

> --- a/gcc/config/rs6000/genfusion.pl
> +++ b/gcc/config/rs6000/genfusion.pl
> @@ -61,20 +61,30 @@ sub gen_ld_cmpi_p10_one
>    my $mempred = "non_update_memory_operand";
>    my $extend;
>  
> +  # We need to special case lwa.  The prefixed_load_p function in rs6000.cc
> +  # (which determines if a load instruction is prefixed) uses the fact that the
> +  # register mode is different from the memory mode, and that the sign_extend
> +  # attribute is set to use DS-form rules for the address instead of D-form.
> +  # If the register size is the same, prefixed_load_p assumes we are doing a
> +  # lwz.  We change to use an lwz and word compare if we don't need to sign
> +  # extend the SImode value.  Otherwise if we need the value, we need to
> +  # make sure the insn is marked as ds-form.
> +  my $lwa_insn = ($lmode eq "SI" && $ccmode eq "CC");

That is a pretty bad name, the variable does not hold an "insn" in any
way, shape, or form.  It is hardish to give it a good name because it
mixes two questions into one variable?  You can just repeat the tiny
conditions wherever you use them, and the code would be more readable
(and less cryptic!)

> +  if ($lwa_insn && $cmp_size eq "d") {

Name it "cmp_size_char" maybe?  "cmp_size" suggests a number.

> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
> @@ -0,0 +1,26 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-require-effective-target powerpc_prefixed_addr } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
> +
> +/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
> +   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
> +   load + compare -1/0/1 patterns did not handle the possibility that the load
> +   might be prefixed.  The -fstack-protector option is needed to show the
> +   bug.  */

Mention the PR number somewhere in the text as well?  For grep etc.

Okay for trunk, with some more reasonable commmit message.  Thank you!
Also okay for all backports.


Segher

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-06-20 20:06 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-14  2:14 [PATCH, V6] Fix power10 fusion and -fstack-protector, PR target/105325 Michael Meissner
2023-06-20 20:05 ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).