public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, V4] PR target/105325, Make load/cmp fusion know about prefixed loads.
@ 2023-04-26 16:18 Michael Meissner
  2023-05-02 22:32 ` Segher Boessenkool
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Meissner @ 2023-04-26 16:18 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool, Kewen.Lin,
	David Edelsohn, Peter Bergner

I posted a version of patch on March 21st, a second version on March 24th, and
the third version on March 28th.

The V4 patch just adds a new condition to the new test case.  Previously, I was
using 'powerpc_prefixed_addr' to determine whether the GCC compiler would
automatically generate prefixed addresses.  The V4 version also adds a check
for 'power10_ok'.  Power10_ok is needed in case the compiler could generate
prefixed addresses, but the assembler does not support prefixed instructions.

The V3 patch makes some code changes suggested in the genfusion.pl code from
the last 2 patch submissions.  The fusion.md that is produced by genfusion.pl
is the same in all 3 versions.

In V3, I changed the genfusion.pl to match the suggestion for code layout.  I
also used the correct comment for each of the instructions (in the 2nd patch,
the when I rewrote the comments about ld and lwa being DS format instructions,
I had put the ld comment in the section handling lwa, and vice versa).

In V3, I also removed lp64 from the new test.  When I first added the prefixed
code, it was only done for 64-bit, but now it is allowed for 32-bit.  However,
the case that shows up (lwa) would not hit in 32-bit, since it only generates
lwz and not lwa.  It also would not generate ld.  But the test does pass when
it is built with -m32.

The issue with the original bug is the power10 load GPR + cmpi -1/0/1 fusion
optimization generates illegal assembler code.

Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
did not handle the possibility that the load might be prefixed.

The main cause is the constraints for the individual loads in the fusion did not
match the machine.  In particular, LWA is a ds format instruction when it is
unprefixed.  The code did not also set the prefixed attribute correctly.

This patch rewrites the genfusion.pl script so that it will have more accurate
constraints for the LWA and LD instructions (which are DS instructions).  The
updated genfusion.pl was then run to update fusion.md.  Finally, the code for
the "prefixed" attribute is modified so that it considers load + compare
immediate patterns to be like the normal load insns in checking whether
operand[1] is a prefixed instruction.

I have tested this code on a power9 little endian system (with long double
being IEEE 128-bit and IBM 128-bit), a power10 little endian system, and a
power8 big endian system, testing both 32-bit and 64-bit code generation.

For the V4 changes I also built the compiler on a big endian system with an
older assembler, and I verified that the pr105325.C test was listed as
unsupported.

Can I put this code into the master branch, and after a waiting period, apply
it to the GCC 12 and GCC 11 branches (the bug does show up in those branches,
and the patch applies without change).

2023-04-26   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	PR target/105325
	* gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
	of the ld and lwa instructions which use the DS encoding instead of D.
	Use the YZ constraint for these loads.	Handle prefixed loads better.
	Set the sign_extend attribute as appropriate.
	* gcc/config/rs6000/fusion.md: Regenerate.
	* gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
	instructions to the list of instructions that might have a prefixed load
	instruction.

gcc/testsuite/

	PR target/105325
	* g++.target/powerpc/pr105325.C: New test.
	* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
---
 gcc/config/rs6000/fusion.md                   | 17 +++++++-----
 gcc/config/rs6000/genfusion.pl                | 26 ++++++++++++++++---
 gcc/config/rs6000/rs6000.md                   |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C   | 25 ++++++++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c    |  4 +--
 5 files changed, 60 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..da9953d9ad9 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -106,7 +106,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -148,7 +148,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -190,7 +190,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +205,7 @@ (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -247,6 +248,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -289,6 +291,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index e4db352e0ce..4f367cadc52 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -56,7 +56,7 @@ sub mode_to_ldst_char
 sub gen_ld_cmpi_p10
 {
     my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
-	$mempred, $ccmode, $np, $extend, $resultmode);
+	$mempred, $ccmode, $np, $extend, $resultmode, $constraint);
   LMODE: foreach $lmode ('DI','SI','HI','QI') {
       $ldst = mode_to_ldst_char($lmode);
       $clobbermode = $lmode;
@@ -71,21 +71,34 @@ sub gen_ld_cmpi_p10
       CCMODE: foreach $ccmode ('CC','CCUNS') {
 	  $np = "NON_PREFIXED_D";
 	  $mempred = "non_update_memory_operand";
+	  $constraint = "m";
 	  if ( $ccmode eq 'CC' ) {
 	      next CCMODE if $lmode eq 'QI';
-	      if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
+	      if ( $lmode eq 'HI' ) {
+		  $np = "NON_PREFIXED_D";
+		  $mempred = "non_update_memory_operand";
+		  $echr = "a";
+	      } elsif ( $lmode eq 'SI' ) {
+		  # ld and lwa are both DS-FORM.
+		  $np = "NON_PREFIXED_DS";
+		  $mempred = "lwa_operand";
+		  $echr = "a";
+		  $constraint = "YZ";
+	      } elsif ( $lmode eq 'DI' ) {
 		  # ld and lwa are both DS-FORM.
 		  $np = "NON_PREFIXED_DS";
 		  $mempred = "ds_form_mem_operand";
+		  $echr = "";
+		  $constraint = "YZ";
 	      }
 	      $cmpl = "";
-	      $echr = "a";
 	      $constpred = "const_m1_to_1_operand";
 	  } else {
 	      if ( $lmode eq 'DI' ) {
 		  # ld is DS-form, but lwz is not.
 		  $np = "NON_PREFIXED_DS";
 		  $mempred = "ds_form_mem_operand";
+		  $constraint = "YZ";
 	      }
 	      $cmpl = "l";
 	      $echr = "z";
@@ -108,7 +121,7 @@ sub gen_ld_cmpi_p10
 
 	  print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
 	  print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
-	  print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"m\")\n";
+	  print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"${constraint}\")\n";
 	  if ($ccmode eq 'CCUNS') { print "   "; }
 	  print "                    (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
 	  if ($result eq 'clobber') {
@@ -137,6 +150,11 @@ sub gen_ld_cmpi_p10
 	  print "  \"\"\n";
 	  print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
 	  print "   (set_attr \"cost\" \"8\")\n";
+
+	  if ($extend eq "sign") {
+		  print "   (set_attr \"sign_extend\" \"yes\")\n";
+	  }
+
 	  print "   (set_attr \"length\" \"8\")])\n";
 	  print "\n";
       }
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index ec783803820..7d6c94aee5b 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@ (define_attr "prefixed" "no,yes"
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..d79cea30534
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,25 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..ca7297375a4 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -61,7 +61,7 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
 
@@ -73,6 +73,6 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
-- 
2.39.2


-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, V4] PR target/105325, Make load/cmp fusion know about prefixed loads.
  2023-04-26 16:18 [PATCH, V4] PR target/105325, Make load/cmp fusion know about prefixed loads Michael Meissner
@ 2023-05-02 22:32 ` Segher Boessenkool
  2023-05-03 16:56   ` Michael Meissner
  0 siblings, 1 reply; 3+ messages in thread
From: Segher Boessenkool @ 2023-05-02 22:32 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn, Peter Bergner

On Wed, Apr 26, 2023 at 12:18:36PM -0400, Michael Meissner wrote:
> 	* gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
> 	of the ld and lwa instructions which use the DS encoding instead of D.
> 	Use the YZ constraint for these loads.	Handle prefixed loads better.

Don't use tabs in the middle of a line.

"Handle prefixed loads better" is not what the patch does, and/or is so
vague as to be useless.

> --- a/gcc/config/rs6000/genfusion.pl
> +++ b/gcc/config/rs6000/genfusion.pl
> @@ -56,7 +56,7 @@ sub mode_to_ldst_char
>  sub gen_ld_cmpi_p10
>  {
>      my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
> -	$mempred, $ccmode, $np, $extend, $resultmode);
> +	$mempred, $ccmode, $np, $extend, $resultmode, $constraint);
>    LMODE: foreach $lmode ('DI','SI','HI','QI') {
>        $ldst = mode_to_ldst_char($lmode);
>        $clobbermode = $lmode;
> @@ -71,21 +71,34 @@ sub gen_ld_cmpi_p10
>        CCMODE: foreach $ccmode ('CC','CCUNS') {
>  	  $np = "NON_PREFIXED_D";
>  	  $mempred = "non_update_memory_operand";
> +	  $constraint = "m";
>  	  if ( $ccmode eq 'CC' ) {
>  	      next CCMODE if $lmode eq 'QI';
> -	      if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
> +	      if ( $lmode eq 'HI' ) {
> +		  $np = "NON_PREFIXED_D";
> +		  $mempred = "non_update_memory_operand";
> +		  $echr = "a";
> +	      } elsif ( $lmode eq 'SI' ) {
> +		  # ld and lwa are both DS-FORM.
> +		  $np = "NON_PREFIXED_DS";
> +		  $mempred = "lwa_operand";
> +		  $echr = "a";
> +		  $constraint = "YZ";
> +	      } elsif ( $lmode eq 'DI' ) {
>  		  # ld and lwa are both DS-FORM.
>  		  $np = "NON_PREFIXED_DS";
>  		  $mempred = "ds_form_mem_operand";
> +		  $echr = "";
> +		  $constraint = "YZ";
>  	      }
>  	      $cmpl = "";
> -	      $echr = "a";
>  	      $constpred = "const_m1_to_1_operand";
>  	  } else {
>  	      if ( $lmode eq 'DI' ) {
>  		  # ld is DS-form, but lwz is not.
>  		  $np = "NON_PREFIXED_DS";
>  		  $mempred = "ds_form_mem_operand";
> +		  $constraint = "YZ";
>  	      }
>  	      $cmpl = "l";
>  	      $echr = "z";
> @@ -108,7 +121,7 @@ sub gen_ld_cmpi_p10
>  
>  	  print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
>  	  print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
> -	  print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"m\")\n";
> +	  print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"${constraint}\")\n";
>  	  if ($ccmode eq 'CCUNS') { print "   "; }
>  	  print "                    (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
>  	  if ($result eq 'clobber') {
> @@ -137,6 +150,11 @@ sub gen_ld_cmpi_p10
>  	  print "  \"\"\n";
>  	  print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
>  	  print "   (set_attr \"cost\" \"8\")\n";
> +
> +	  if ($extend eq "sign") {
> +		  print "   (set_attr \"sign_extend\" \"yes\")\n";
> +	  }
> +
>  	  print "   (set_attr \"length\" \"8\")])\n";
>  	  print "\n";
>        }

This already was a 90-line function that did too many things.  Now it is
bigger and does more things, and the patch is unintelligible.

Please first factor things.  There are many more things terrible Perl
code style here (like all of the quoting), but where to start :-/

I once again spent many hours trying to review this, and once again
failed.  Please write better code, and please make better patches.

> index ec783803820..7d6c94aee5b 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -302,7 +302,7 @@ (define_attr "prefixed" "no,yes"
>  	      (eq_attr "maybe_prefixed" "no"))
>  	 (const_string "no")
>  
> -	 (eq_attr "type" "load,fpload,vecload")
> +	 (eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi")

Don't duplicate vecload.

> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
> @@ -0,0 +1,25 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-require-effective-target powerpc_prefixed_addr } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */

The power10_ok selector still is terribly broken (it allows only some
variants of 64-bit Linux and nothing more, to start with).  Do we still
need it in any case?

Same for powerpc_prefixed_addr.  Is there any supported target that does
not have a working assembler?

What is -fstack-protector here for?  That should be documented, or
better, it should just be removed if possible.

> -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */

> -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
> +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */

Why are these new counts correct?


Segher

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH, V4] PR target/105325, Make load/cmp fusion know about prefixed loads.
  2023-05-02 22:32 ` Segher Boessenkool
@ 2023-05-03 16:56   ` Michael Meissner
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Meissner @ 2023-05-03 16:56 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, Kewen.Lin, David Edelsohn, Peter Bergner

On Tue, May 02, 2023 at 05:32:04PM -0500, Segher Boessenkool wrote:
> On Wed, Apr 26, 2023 at 12:18:36PM -0400, Michael Meissner wrote:
> > 	* gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
> > 	of the ld and lwa instructions which use the DS encoding instead of D.
> > 	Use the YZ constraint for these loads.	Handle prefixed loads better.
> 
> Don't use tabs in the middle of a line.
> 
> "Handle prefixed loads better" is not what the patch does, and/or is so
> vague as to be useless.

Ok.

> > --- a/gcc/config/rs6000/genfusion.pl
> > +++ b/gcc/config/rs6000/genfusion.pl
> > @@ -56,7 +56,7 @@ sub mode_to_ldst_char
> >  sub gen_ld_cmpi_p10
> >  {
> >      my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
> > -	$mempred, $ccmode, $np, $extend, $resultmode);
> > +	$mempred, $ccmode, $np, $extend, $resultmode, $constraint);
> >    LMODE: foreach $lmode ('DI','SI','HI','QI') {
> >        $ldst = mode_to_ldst_char($lmode);
> >        $clobbermode = $lmode;
> > @@ -71,21 +71,34 @@ sub gen_ld_cmpi_p10
> >        CCMODE: foreach $ccmode ('CC','CCUNS') {
> >  	  $np = "NON_PREFIXED_D";
> >  	  $mempred = "non_update_memory_operand";
> > +	  $constraint = "m";
> >  	  if ( $ccmode eq 'CC' ) {
> >  	      next CCMODE if $lmode eq 'QI';
> > -	      if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
> > +	      if ( $lmode eq 'HI' ) {
> > +		  $np = "NON_PREFIXED_D";
> > +		  $mempred = "non_update_memory_operand";
> > +		  $echr = "a";
> > +	      } elsif ( $lmode eq 'SI' ) {
> > +		  # ld and lwa are both DS-FORM.
> > +		  $np = "NON_PREFIXED_DS";
> > +		  $mempred = "lwa_operand";
> > +		  $echr = "a";
> > +		  $constraint = "YZ";
> > +	      } elsif ( $lmode eq 'DI' ) {
> >  		  # ld and lwa are both DS-FORM.
> >  		  $np = "NON_PREFIXED_DS";
> >  		  $mempred = "ds_form_mem_operand";
> > +		  $echr = "";
> > +		  $constraint = "YZ";
> >  	      }
> >  	      $cmpl = "";
> > -	      $echr = "a";
> >  	      $constpred = "const_m1_to_1_operand";
> >  	  } else {
> >  	      if ( $lmode eq 'DI' ) {
> >  		  # ld is DS-form, but lwz is not.
> >  		  $np = "NON_PREFIXED_DS";
> >  		  $mempred = "ds_form_mem_operand";
> > +		  $constraint = "YZ";
> >  	      }
> >  	      $cmpl = "l";
> >  	      $echr = "z";
> > @@ -108,7 +121,7 @@ sub gen_ld_cmpi_p10
> >  
> >  	  print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
> >  	  print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
> > -	  print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"m\")\n";
> > +	  print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"${constraint}\")\n";
> >  	  if ($ccmode eq 'CCUNS') { print "   "; }
> >  	  print "                    (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
> >  	  if ($result eq 'clobber') {
> > @@ -137,6 +150,11 @@ sub gen_ld_cmpi_p10
> >  	  print "  \"\"\n";
> >  	  print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
> >  	  print "   (set_attr \"cost\" \"8\")\n";
> > +
> > +	  if ($extend eq "sign") {
> > +		  print "   (set_attr \"sign_extend\" \"yes\")\n";
> > +	  }
> > +
> >  	  print "   (set_attr \"length\" \"8\")])\n";
> >  	  print "\n";
> >        }
> 
> This already was a 90-line function that did too many things.  Now it is
> bigger and does more things, and the patch is unintelligible.
> 
> Please first factor things.  There are many more things terrible Perl
> code style here (like all of the quoting), but where to start :-/

Note, I didn't write the original patch nor the original code (Aaron did), but
without a lot of rewrites it will take more time to get it done.

> I once again spent many hours trying to review this, and once again
> failed.  Please write better code, and please make better patches.
> 
> > index ec783803820..7d6c94aee5b 100644
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -302,7 +302,7 @@ (define_attr "prefixed" "no,yes"
> >  	      (eq_attr "maybe_prefixed" "no"))
> >  	 (const_string "no")
> >  
> > -	 (eq_attr "type" "load,fpload,vecload")
> > +	 (eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi")
> 
> Don't duplicate vecload.

Ok.

> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
> > @@ -0,0 +1,25 @@
> > +/* { dg-do assemble } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-require-effective-target power10_ok } */
> > +/* { dg-require-effective-target powerpc_prefixed_addr } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
> 
> The power10_ok selector still is terribly broken (it allows only some
> variants of 64-bit Linux and nothing more, to start with).  Do we still
> need it in any case?
>
> Same for powerpc_prefixed_addr.  Is there any supported target that does
> not have a working assembler?

In particular, I was building it on a power8 BE system, and I happened to use
the system assembler instead of a new assembler that I had built.  The
powerpc_prefixed_addr test only tests whether the compiler spits out a prefixed
instruction by default if you use -mcpu=power10 (which succeeded) but because
the test involved running the assembler, and it failed because the assembler
doesn't know about -mpower10.  If I build my own assembler, it works fine.

> What is -fstack-protector here for?  That should be documented, or
> better, it should just be removed if possible.

While I can add documentation to the test, -fstack-protector is required to
show up the bug.  Both of the users experiencing the issue had used
-fstack-protector.  If you do not use -fstack-protector, you get no error.  I
believe it is due to the fact that -fstack-protector runs much later than
expand, and it creates code that doesn't take into account whether the load is
prefixed or not.  Because it is two insns joined together, you don't get the
normal processing that converts a LWA into a PLWA.

> 
> > -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
> > +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
> 
> > -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
> > +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
> 
> Why are these new counts correct?

I haven't looked in detail.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meissner@linux.ibm.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-05-03 16:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-26 16:18 [PATCH, V4] PR target/105325, Make load/cmp fusion know about prefixed loads Michael Meissner
2023-05-02 22:32 ` Segher Boessenkool
2023-05-03 16:56   ` Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).