public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V5, 0/2] PR target/105325: Fix constraint issue with power10 fusion
@ 2023-05-10 15:37 Michael Meissner
  2023-05-10 15:38 ` [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function Michael Meissner
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Michael Meissner @ 2023-05-10 15:37 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

I have posted 4 previous versions of this patch (April 26th, March 28th, March
24th, and March 21st).

In this patch, rather than just add changes to the existing code in
genfusion.pl, I rewrote the function completely.  There are two patches within
this patch set:

    * The first patch rewrites the perl function to be more readable.  This
      patch produces the same output for fusion.md that the current version
      generates.

    * The second patch then using the rewrite in the first patch adds the
      changes to fix the problem.

The issue with the original bug is the power10 load GPR + cmpi -1/0/1 fusion
optimization generates illegal assembler code when the -fstack-protector option
is used.

Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
did not handle the possibility that the load might be prefixed.

The main cause is the constraints for the individual loads in the fusion did not
match the machine.  In particular, LWA is a ds format instruction when it is
unprefixed.  The code did not also set the prefixed attribute correctly.

These patch hav been tested on:

    * Little endian power9 with both IEEE and IBM long double
    * Little endian power10
    * Big endian power8 using both 32-bit and 64-bit code generation.

Can I check these into the master branch?  Assuming I can check this in, I will
also commit to the active GCC branches after a burn-in period.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function.
  2023-05-10 15:37 [PATCH V5, 0/2] PR target/105325: Fix constraint issue with power10 fusion Michael Meissner
@ 2023-05-10 15:38 ` Michael Meissner
  2023-05-26 14:35   ` Segher Boessenkool
  2023-05-10 15:40 ` [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion Michael Meissner
  2023-05-15 18:22 ` Ping: [PATCH V5] PR target/105325: Fix constraint issue with " Michael Meissner
  2 siblings, 1 reply; 6+ messages in thread
From: Michael Meissner @ 2023-05-10 15:38 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

This patch rewrites the gen_ld_cmpi_p10 function in genfusion.pl to be clearer.
The resulting fusion.md file that this patch generates is exactly the same
output that the previous version of genfusion.pl generated.  The next patch in
this series will fix PR target/105325 (provide correct predicates and
constraints for power10 fusion of load and compare immediate).

This patch has been tested on:

    * Little endian power9 with both IEEE and IBM long double
    * Little endian power10
    * Big endian power8 using both 32-bit and 64-bit code generation.

Can I check this into the master branch?  Assuming I can check this in, I will
also commit to the active GCC branches after a burn-in period.

2023-05-10   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/105325
	* config/rs6000/genfusion.pl (mode_to_ldst_char): Delete.
	(print_ld_cmpi_p10): New function, split off from gen_ld_cmpi_p10.
	(gen_ld_cmpi_p10): Rewrite completely.
---
 gcc/config/rs6000/genfusion.pl | 248 +++++++++++++++++++++------------
 1 file changed, 157 insertions(+), 91 deletions(-)

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index e4db352e0ce..81ba4b33940 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -45,103 +45,169 @@ print <<'EOF';
 
 EOF
 
-sub mode_to_ldst_char
+# Print the insns for load and compare with -1/0/1.
+# Arguments:
+# lmode      -- Integer mode ("DI", "SI", "HI", or "QI").
+# result     -- "clobber", "GPR", or $lmode
+# ccmode     -- Sign vs. unsigned ("CC" or "CCUNS").
+# mem_format -- Memory format ("d" or "ds").
+# cmpl       -- Suffix for compare ("l" or "")
+# const_pred -- Predicate for constant (i.e. -1/0/1 or 0/1).
+# extend     -- "sign", "zero", or "none".
+# echr       -- Suffix for load ("a", "z", or "").
+# load       -- Load instruction (i.e. "ld", "lwa", "lwz", etc.)
+# np         -- enum non_prefixed_form for memory type
+# constraint -- constraint to use
+# mem_pred   -- predicate for the memory operation
+
+sub print_ld_cmpi_p10
 {
-    my ($mode) = @_;
-    my %x = (DI => 'd', SI => 'w', HI => 'h', QI => 'b');
-    return $x{$mode} if exists $x{$mode};
-    return '?';
+  my ($lmode, $result, $ccmode, $cmpl, $const_pred,
+      $extend, $load, $np, $constraint, $mem_pred) = @_;
+
+  # For clobber, we need a SI/DI reg in case we split because we have to
+  # sign/zero extend.
+  my $clobbermode = ($lmode =~ m/^[HQ]I$/) ? "GPR" : $lmode;
+
+  # Break long print statements into smaller lines.
+  my $info = join (" ",
+		   "load mode is ${lmode} result mode is ${result}",
+		   "compare mode is ${ccmode} extend is ${extend}");
+
+  my $name = join ("",
+		   "${load}_cmp${cmpl}di_cr0_${lmode}",
+		   "_${result}_${ccmode}_${extend}");
+
+  my $cmp_op1 = "(match_operand:${lmode} 1 \"${mem_pred}\" \"${constraint}\")";
+
+  my $spaces = " " x (length ($ccmode) + 18);
+
+  print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
+  print ";; ${info}\n";
+  print "(define_insn_and_split \"*${name}\"\n";
+  print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
+  print "        (compare:${ccmode} ${cmp_op1}\n";
+  print "${spaces}(match_operand:${lmode} 3 \"${const_pred}\" \"n\")))\n";
+
+  if ($result eq "clobber")
+    {
+      print "   (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
+    }
+
+  else
+    {
+      my $load_op0 = "(match_operand:${result} 0 \"gpc_reg_operand\" \"=r\")";
+      my $load_op1 = (($result eq $lmode)
+		      ? "(match_dup 1)"
+		      : "(${extend}_extend:${result} (match_dup 1))");
+      print "   (set ${load_op0} ${load_op1})]\n";
+    }
+
+  # Do not match prefixed loads.  The machine only fuses non-prefixed loads
+  # with compare immediate.  Take into account whether the load is a ds-form
+  # or a d-form instruction.
+  print "  \"(TARGET_P10_FUSION)\"\n";
+  print "  \"${load}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3\"\n";
+  print "  \"&& reload_completed\n";
+  print "   && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
+  print "       || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),\n";
+  print "                                      ${lmode}mode, ${np}))\"\n";
+
+  if ($extend eq "none")
+    {
+      print "  [(set (match_dup 0) (match_dup 1))\n";
+    }
+
+  else
+    {
+      my $resultmode = ($result eq "clobber") ? $clobbermode : $result;
+      print "  [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 1)))\n";
+    }
+
+  print "   (set (match_dup 2)\n";
+  print "        (compare:${ccmode} (match_dup 0) (match_dup 3)))]\n";
+  print "  \"\"\n";
+  print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
+  print "   (set_attr \"cost\" \"8\")\n";
+  print "   (set_attr \"length\" \"8\")])\n";
+  print "\n";
 }
 
 sub gen_ld_cmpi_p10
 {
-    my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
-	$mempred, $ccmode, $np, $extend, $resultmode);
-  LMODE: foreach $lmode ('DI','SI','HI','QI') {
-      $ldst = mode_to_ldst_char($lmode);
-      $clobbermode = $lmode;
-      # For clobber, we need a SI/DI reg in case we
-      # split because we have to sign/zero extend.
-      if ($lmode eq 'HI' || $lmode eq 'QI') { $clobbermode = "GPR"; }
-    RESULT: foreach $result ('clobber', $lmode,  "EXT".$lmode) {
-	# EXTDI does not exist, and we cannot directly produce HI/QI results.
-	next RESULT if $result eq "EXTDI" || $result eq "HI" || $result eq "QI";
-	# Don't allow EXTQI because that would allow HI result which we can't do.
-	$result = "GPR" if $result eq "EXTQI";
-      CCMODE: foreach $ccmode ('CC','CCUNS') {
-	  $np = "NON_PREFIXED_D";
-	  $mempred = "non_update_memory_operand";
-	  if ( $ccmode eq 'CC' ) {
-	      next CCMODE if $lmode eq 'QI';
-	      if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
-		  # ld and lwa are both DS-FORM.
-		  $np = "NON_PREFIXED_DS";
-		  $mempred = "ds_form_mem_operand";
-	      }
-	      $cmpl = "";
-	      $echr = "a";
-	      $constpred = "const_m1_to_1_operand";
-	  } else {
-	      if ( $lmode eq 'DI' ) {
-		  # ld is DS-form, but lwz is not.
-		  $np = "NON_PREFIXED_DS";
-		  $mempred = "ds_form_mem_operand";
-	      }
-	      $cmpl = "l";
-	      $echr = "z";
-	      $constpred = "const_0_to_1_operand";
-	  }
-	  if ($lmode eq 'DI') { $echr = ""; }
-	  if ($result =~ m/^EXT/ || $result eq 'GPR' || $clobbermode eq 'GPR') {
-	      # We always need extension if result > lmode.
-	      if ( $ccmode eq 'CC' ) {
-		  $extend = "sign";
-	      } else {
-		  $extend = "zero";
-	      }
-	  } else {
-	      # Result of SI/DI does not need sign extension.
-	      $extend = "none";
-	  }
-	  print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
-	  print ";; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend\n";
-
-	  print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
-	  print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
-	  print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"m\")\n";
-	  if ($ccmode eq 'CCUNS') { print "   "; }
-	  print "                    (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
-	  if ($result eq 'clobber') {
-	      print "   (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
-	  } elsif ($result eq $lmode) {
-	      print "   (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (match_dup 1))]\n";
-	  } else {
-	      print "   (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (${extend}_extend:${result} (match_dup 1)))]\n";
-	  }
-	  print "  \"(TARGET_P10_FUSION)\"\n";
-	  print "  \"l${ldst}${echr}%X1 %0,%1\\;cmp${cmpl}di %2,%0,%3\"\n";
-	  print "  \"&& reload_completed\n";
-	  print "   && (cc_reg_not_cr0_operand (operands[2], CCmode)\n";
-	  print "       || !address_is_non_pfx_d_or_x (XEXP (operands[1], 0),\n";
-	  print "                                      ${lmode}mode, ${np}))\"\n";
-
-	  if ($extend eq "none") {
-	      print "  [(set (match_dup 0) (match_dup 1))\n";
-	  } else {
-	      $resultmode = $result;
-	      if ( $result eq 'clobber' ) { $resultmode = $clobbermode }
-	      print "  [(set (match_dup 0) (${extend}_extend:${resultmode} (match_dup 1)))\n";
-	  }
-	  print "   (set (match_dup 2)\n";
-	  print "        (compare:${ccmode} (match_dup 0) (match_dup 3)))]\n";
-	  print "  \"\"\n";
-	  print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
-	  print "   (set_attr \"cost\" \"8\")\n";
-	  print "   (set_attr \"length\" \"8\")])\n";
-	  print "\n";
-      }
+  my ($lmode, $result, $mem_format, $extend);
+
+  # Map mode to load instruction
+  my %signed_load = ("DI" => "ld",
+		     "SI" => "lwa",
+		     "HI" => "lha");
+
+  my %unsigned_load = ("DI" => "ld",
+		       "SI" => "lwz",
+		       "HI" => "lhz",
+		       "QI" => "lbz");
+
+  # Memory predicate to use.
+  my %signed_memory_predicate = ("DI" => "ds_form_mem_operand",
+				 "SI" => "ds_form_mem_operand",
+				 "HI" => "non_update_memory_operand");
+
+  my %unsigned_memory_predicate = ("DI" => "ds_form_mem_operand",
+				   "SI" => "non_update_memory_operand",
+				   "HI" => "non_update_memory_operand",
+				   "QI" => "non_update_memory_operand");
+
+  # Internal format of the memory instruction (enum non_prefixed_form) to use.
+  my %np = ("ds" => "NON_PREFIXED_DS",
+	    "d"  => "NON_PREFIXED_D");
+
+  # Result modes to use. Clobber is used when you are comparing the load to
+  # -1/0/1, but you are not using it otherwise.  EXTDI does not exist. We
+  # cannot directly use HI/QI results because we only have word and double word
+  # compared.  For promotion, don't allow EXTQI because that would allow HI
+  # results which we can't do (use GPR instead).
+  my %result_modes = ("DI" => ["clobber", "DI"],
+		      "SI" => ["clobber", "SI", "EXTSI" ],
+		      "HI" => ["clobber", "EXTHI" ],
+		      "QI" => ["clobber", "GPR" ]);
+
+  foreach $lmode ("DI", "SI", "HI", "QI")
+    {
+      foreach $result (@{ $result_modes{$lmode} })
+	{
+	  # Handle CCmode (sign extended compares to -1, 0, or 1).  We don't
+	  # have  a LBA instruction, so skip QImode.  Both LD and LWA are
+	  # DS-form instructions for signed loads.
+	  if ($lmode ne "QI")
+	    {
+	      $mem_format = ($lmode =~ m/^[DS]I$/) ? "ds" : "d";
+	      $extend = (($lmode eq "DI"
+			  || $lmode eq $result
+			  || ($lmode eq "SI" && $result eq "clobber"))
+			 ? "none"
+			 : "sign");
+
+	      print_ld_cmpi_p10 ($lmode, $result, "CC", "",
+				 "const_m1_to_1_operand", $extend,
+				 $signed_load{$lmode}, $np{$mem_format}, "m",
+				 $signed_memory_predicate{$lmode});
+	    }
+
+	  # Handle CCUNS mode (zero extended compares to 0 or 1.
+	  # LD is DS-form, but LWZ is not for unsigned loads.
+	  $mem_format = ($lmode eq "DI") ? "ds" : "d";
+	  $extend = (($lmode eq "DI"
+		      || $lmode eq $result
+		      || ($lmode eq "SI" && $result eq "clobber"))
+		     ? "none"
+		     : "zero");
+
+	  print_ld_cmpi_p10 ($lmode, $result, "CCUNS", "l",
+			     "const_0_to_1_operand", $extend,
+			     $unsigned_load{$lmode}, $np{$mem_format}, "m",
+			     $unsigned_memory_predicate{$lmode});
+	}
     }
-  }
 }
 
 sub gen_logical_addsubf
-- 
2.40.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion.
  2023-05-10 15:37 [PATCH V5, 0/2] PR target/105325: Fix constraint issue with power10 fusion Michael Meissner
  2023-05-10 15:38 ` [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function Michael Meissner
@ 2023-05-10 15:40 ` Michael Meissner
  2023-05-26 15:02   ` Segher Boessenkool
  2023-05-15 18:22 ` Ping: [PATCH V5] PR target/105325: Fix constraint issue with " Michael Meissner
  2 siblings, 1 reply; 6+ messages in thread
From: Michael Meissner @ 2023-05-10 15:40 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

This patch applies stricter predicates and constraints for LD and LWA
instructions with power10 fusion.  These instructions are DS-form instructions,
which means that the bottom 2 bits of the address must be 0.  In the past, we
did not use the stricter predicates and constraints, and if the user used the
-fstack-protector option, it would generate a non-prefixed load instruction
whose offset was too big if the stack is large.

This patch has been tested on:

    * Little endian power9 with both IEEE and IBM long double
    * Little endian power10
    * Big endian power8 using both 32-bit and 64-bit code generation.

Can I check this into the master branch?  Assuming I can check this in, I will
also commit to the active GCC branches after a burn-in period.

2023-05-10   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/105325
	* config/rs6000/genfusion.pl (print_ld_cmpi_p10): Use "YZ" constraints
	for DS-form loads.  Set the sign_extend attribute for loads that do sign
	extension.  Use the lwa_operand predicate for the LWA instruction.
	* config/rs6000/fusion.md: Regenerate.

gcc/testsuite/

	PR target/105325
	* g++.target/powerpc/pr105325.C: New test.
	* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
---
 gcc/config/rs6000/fusion.md                   | 17 +++++++-----
 gcc/config/rs6000/genfusion.pl                | 20 +++++++++++---
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 26 +++++++++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c    |  4 +--
 4 files changed, 54 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 81ba4b33940..836dbd20948 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -129,6 +129,12 @@ sub print_ld_cmpi_p10
   print "  \"\"\n";
   print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
   print "   (set_attr \"cost\" \"8\")\n";
+
+  if ($extend eq "sign")
+    {
+      print "   (set_attr \"sign_extend\" \"yes\")\n";
+    }
+
   print "   (set_attr \"length\" \"8\")])\n";
   print "\n";
 }
@@ -147,9 +153,9 @@ sub gen_ld_cmpi_p10
 		       "HI" => "lhz",
 		       "QI" => "lbz");
 
-  # Memory predicate to use.
+  # Memory predicate to use.  For LWA, use the special LWA_OPERAND.
   my %signed_memory_predicate = ("DI" => "ds_form_mem_operand",
-				 "SI" => "ds_form_mem_operand",
+				 "SI" => "lwa_operand",
 				 "HI" => "non_update_memory_operand");
 
   my %unsigned_memory_predicate = ("DI" => "ds_form_mem_operand",
@@ -161,6 +167,10 @@ sub gen_ld_cmpi_p10
   my %np = ("ds" => "NON_PREFIXED_DS",
 	    "d"  => "NON_PREFIXED_D");
 
+  # Constraint to use.
+  my %constraint = ("ds" => "YZ",
+		    "d"  => "m");
+
   # Result modes to use. Clobber is used when you are comparing the load to
   # -1/0/1, but you are not using it otherwise.  EXTDI does not exist. We
   # cannot directly use HI/QI results because we only have word and double word
@@ -189,7 +199,8 @@ sub gen_ld_cmpi_p10
 
 	      print_ld_cmpi_p10 ($lmode, $result, "CC", "",
 				 "const_m1_to_1_operand", $extend,
-				 $signed_load{$lmode}, $np{$mem_format}, "m",
+				 $signed_load{$lmode}, $np{$mem_format},
+				 $constraint{$mem_format},
 				 $signed_memory_predicate{$lmode});
 	    }
 
@@ -204,7 +215,8 @@ sub gen_ld_cmpi_p10
 
 	  print_ld_cmpi_p10 ($lmode, $result, "CCUNS", "l",
 			     "const_0_to_1_operand", $extend,
-			     $unsigned_load{$lmode}, $np{$mem_format}, "m",
+			     $unsigned_load{$lmode}, $np{$mem_format},
+			     $constraint{$mem_format},
 			     $unsigned_memory_predicate{$lmode});
 	}
     }
diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..da9953d9ad9 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -106,7 +106,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -148,7 +148,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -190,7 +190,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +205,7 @@ (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -247,6 +248,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -289,6 +291,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d0e66a0b897
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,26 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  The -fstack-protector option is needed to show the
+   bug.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..ca7297375a4 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -61,7 +61,7 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
 
@@ -73,6 +73,6 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
-- 
2.40.0


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Ping: [PATCH V5] PR target/105325: Fix constraint issue with power10 fusion
  2023-05-10 15:37 [PATCH V5, 0/2] PR target/105325: Fix constraint issue with power10 fusion Michael Meissner
  2023-05-10 15:38 ` [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function Michael Meissner
  2023-05-10 15:40 ` [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion Michael Meissner
@ 2023-05-15 18:22 ` Michael Meissner
  2 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2023-05-15 18:22 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

Ping both patches:

Patch #1, rewrite genfusion.pl's code for load and compare immediate fusion to
be more readable.  This patch produces the same output as the current sources.

| Date: Wed, 10 May 2023 11:38:55 -0400
| Subject: Re: [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function.
| Message-ID: <ZFu6j7SZaCUZpTYb@toto.the-meissners.org>

Patch #2, implement the fix for PR target/105325:

| Date: Wed, 10 May 2023 11:40:00 -0400
| Subject: [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion.
| Message-ID: <ZFu60EPEOJTV/GA1@toto.the-meissners.org>

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function.
  2023-05-10 15:38 ` [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function Michael Meissner
@ 2023-05-26 14:35   ` Segher Boessenkool
  0 siblings, 0 replies; 6+ messages in thread
From: Segher Boessenkool @ 2023-05-26 14:35 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn, Peter Bergner

Hi Mike,

On Wed, May 10, 2023 at 11:38:55AM -0400, Michael Meissner wrote:
> This patch rewrites the gen_ld_cmpi_p10 function in genfusion.pl to be clearer.

That is not at all what I asked for, even if I would agree the code is
nicer to read now (I don't).

What I asked for, what is needed, is for your patches to be readable.
This is a prerequisite for them to be reviewable, which is a
prerequisite for them to be approvable.  One way to do that is to split
out refactorings (which I asked for) and rewrites (which you did) to
earlier patches in the series.  Pure refactoring are easy to review:
they change exactly nothing in what code is executed.  Rewrites are much
harder to review.  But even then we can hope you didn't slip up once in
a hundred lines of code, sure.

The later patches can then be much more readable because there isn't so
much noise mixed in.

> Assuming I can check this in, I will
> also commit to the active GCC branches after a burn-in period.

No, you will never do that.  You always need approval for that.  We have
these procedures for a reason.  We do not want other things than what
was approved committed, doubly so if *nothing* was approved.

> 	* config/rs6000/genfusion.pl (mode_to_ldst_char): Delete.

This is a regression.

> +# Print the insns for load and compare with -1/0/1.
> +# Arguments:
> +# lmode      -- Integer mode ("DI", "SI", "HI", or "QI").
> +# result     -- "clobber", "GPR", or $lmode
> +# ccmode     -- Sign vs. unsigned ("CC" or "CCUNS").
> +# mem_format -- Memory format ("d" or "ds").
> +# cmpl       -- Suffix for compare ("l" or "")
> +# const_pred -- Predicate for constant (i.e. -1/0/1 or 0/1).
> +# extend     -- "sign", "zero", or "none".
> +# echr       -- Suffix for load ("a", "z", or "").
> +# load       -- Load instruction (i.e. "ld", "lwa", "lwz", etc.)
> +# np         -- enum non_prefixed_form for memory type
> +# constraint -- constraint to use
> +# mem_pred   -- predicate for the memory operation

If you need a huge block comment for your sub argument, that is a
not-so-subtle hint that you need to refactor.  Or if this was supposed
to be a refactoring, that something went terribly wrong :-(


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion.
  2023-05-10 15:40 ` [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion Michael Meissner
@ 2023-05-26 15:02   ` Segher Boessenkool
  0 siblings, 0 replies; 6+ messages in thread
From: Segher Boessenkool @ 2023-05-26 15:02 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn, Peter Bergner

On Wed, May 10, 2023 at 11:40:00AM -0400, Michael Meissner wrote:
> This patch applies stricter predicates and constraints for LD and LWA
> instructions with power10 fusion.  These instructions are DS-form instructions,
> which means that the bottom 2 bits of the address must be 0.

The low two bits of the offset, yes.

> --- a/gcc/config/rs6000/genfusion.pl
> +++ b/gcc/config/rs6000/genfusion.pl
> @@ -129,6 +129,12 @@ sub print_ld_cmpi_p10
>    print "  \"\"\n";
>    print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
>    print "   (set_attr \"cost\" \"8\")\n";
> +
> +  if ($extend eq "sign")
> +    {
> +      print "   (set_attr \"sign_extend\" \"yes\")\n";
> +    }

You never ever need backslashes like this in Perl code, btw.  For
example:
      print qq{   (set_attr "sign_extend" "yes")\n};
or
      print qq/   (set_attr "sign_extend" "yes")\n/;
or
            print <<"HERE"
   (set_attr "sign_extend" "yes")
HERE
or millions of other ways, all of which are much nicer than cramped code
that tries to look like C (but has very different semantics in all ways
that matter).  (Also zillions of ways that are worse still, but that is
the price of freedom maybe :-) )

> -  # Memory predicate to use.
> +  # Memory predicate to use.  For LWA, use the special LWA_OPERAND.

Explain *why*?  It is obvious *what*!

Maybe just split the series into more patches?
> @@ -0,0 +1,26 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-require-effective-target power10_ok } */

power10_ok should no longer exist, btw.  Technical debt has to be
repaid :-/

This patch is readable btw.  Thanks :-)


Segher

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-26 15:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-10 15:37 [PATCH V5, 0/2] PR target/105325: Fix constraint issue with power10 fusion Michael Meissner
2023-05-10 15:38 ` [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function Michael Meissner
2023-05-26 14:35   ` Segher Boessenkool
2023-05-10 15:40 ` [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion Michael Meissner
2023-05-26 15:02   ` Segher Boessenkool
2023-05-15 18:22 ` Ping: [PATCH V5] PR target/105325: Fix constraint issue with " Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).