public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work116-dmf)] Make load/cmp fusion know about prefixed loads.
@ 2023-03-24 22:33 Michael Meissner
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Meissner @ 2023-03-24 22:33 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:ec28dbf2f6572ebeb1f7da22d8fc42794c060345

commit ec28dbf2f6572ebeb1f7da22d8fc42794c060345
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Mar 24 18:33:28 2023 -0400

    Make load/cmp fusion know about prefixed loads.
    
    The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
    optimization generates illegal assembler code.
    
    Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
    did not handle the possibility that the load might be prefixed.
    
    The main cause is the constraints for the individual loads in the fusion did not
    match the machine.  In particular, LWA is a ds format instruction when it is
    unprefixed.  The code did not also set the prefixed attribute correctly.
    
    This patch rewrites the genfusion.pl script so that it will have more accurate
    constraints for the LWA and LD instructions (which are DS instructions).  The
    updated genfusion.pl was then run to update fusion.md.  Finally, the code for
    the "prefixed" attribute is modified so that it considers load + compare
    immediate patterns to be like the normal load insns in checking whether
    operand[1] is a prefixed instruction.
    
    I posted a version of patch on March 21st.  This patch makes some code changes
    suggested in the genfusion.pl code.
    
    2023-03-21   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/105325
            * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
            of the ld and lwa instructions which use the DS encoding instead of D.
            Use the YZ constraint for these loads.  Handle prefixed loads better.
            Set the sign_extend attribute as appropriate.
            * gcc/config/rs6000/fusion.md: Regenerate.
            * gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
            instructions to the list of instructions that might have a prefixed load
            instruction.
    
    gcc/testsuite/
    
            PR target/105325
            * g++.target/powerpc/pr105325.C: New test.
            * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.

Diff:
---
 gcc/config/rs6000/fusion.md                        | 17 +++++++-----
 gcc/config/rs6000/genfusion.pl                     | 32 +++++++++++++++++-----
 gcc/config/rs6000/rs6000.md                        |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C        | 24 ++++++++++++++++
 .../gcc.target/powerpc/fusion-p10-ldcmpi.c         |  4 +--
 5 files changed, 62 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index d45fb138a70..da9953d9ad9 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -22,7 +22,7 @@
 ;; load mode is DI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -43,7 +43,7 @@
 ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (clobber (match_scratch:DI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -64,7 +64,7 @@
 ;; load mode is DI result mode is DI compare mode is CC extend is none
 (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                     (match_operand:DI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -85,7 +85,7 @@
 ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
 (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
   [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
-        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
+        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                        (match_operand:DI 3 "const_0_to_1_operand" "n")))
    (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -106,7 +106,7 @@
 ;; load mode is SI result mode is clobber compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (clobber (match_scratch:SI 0 "=r"))]
   "(TARGET_P10_FUSION)"
@@ -148,7 +148,7 @@
 ;; load mode is SI result mode is SI compare mode is CC extend is none
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
   "(TARGET_P10_FUSION)"
@@ -190,7 +190,7 @@
 ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
 (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
   [(set (match_operand:CC 2 "cc_reg_operand" "=x")
-        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
+        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                     (match_operand:SI 3 "const_m1_to_1_operand" "n")))
    (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
   "(TARGET_P10_FUSION)"
@@ -205,6 +205,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -247,6 +248,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
@@ -289,6 +291,7 @@
   ""
   [(set_attr "type" "fused_load_cmpi")
    (set_attr "cost" "8")
+   (set_attr "sign_extend" "yes")
    (set_attr "length" "8")])
 
 ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index e4db352e0ce..fa9ab7e9704 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -56,7 +56,7 @@ sub mode_to_ldst_char
 sub gen_ld_cmpi_p10
 {
     my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
-	$mempred, $ccmode, $np, $extend, $resultmode);
+	$ccmode, $extend, $resultmode);
   LMODE: foreach $lmode ('DI','SI','HI','QI') {
       $ldst = mode_to_ldst_char($lmode);
       $clobbermode = $lmode;
@@ -69,23 +69,36 @@ sub gen_ld_cmpi_p10
 	# Don't allow EXTQI because that would allow HI result which we can't do.
 	$result = "GPR" if $result eq "EXTQI";
       CCMODE: foreach $ccmode ('CC','CCUNS') {
-	  $np = "NON_PREFIXED_D";
-	  $mempred = "non_update_memory_operand";
+	  my $np = "NON_PREFIXED_D";
+	  my $mempred = "non_update_memory_operand";
+	  my $constraint = "m";
 	  if ( $ccmode eq 'CC' ) {
 	      next CCMODE if $lmode eq 'QI';
-	      if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
-		  # ld and lwa are both DS-FORM.
+	      if ( $lmode eq 'HI' ) {
+		  $np = "NON_PREFIXED_D";
+		  $mempred = "non_update_memory_operand";
+		  $echr = "a";
+	      } elsif ( $lmode eq 'SI' ) {
+		  # ld is DS-FORM.
+		  $np = "NON_PREFIXED_DS";
+		  $mempred = "lwa_operand";
+		  $echr = "a";
+		  $constraint = "YZ";
+	      } elsif ( $lmode eq 'DI' ) {
+		  # lwa is DS-FORM.
 		  $np = "NON_PREFIXED_DS";
 		  $mempred = "ds_form_mem_operand";
+		  $echr = "";
+		  $constraint = "YZ";
 	      }
 	      $cmpl = "";
-	      $echr = "a";
 	      $constpred = "const_m1_to_1_operand";
 	  } else {
 	      if ( $lmode eq 'DI' ) {
 		  # ld is DS-form, but lwz is not.
 		  $np = "NON_PREFIXED_DS";
 		  $mempred = "ds_form_mem_operand";
+		  $constraint = "YZ";
 	      }
 	      $cmpl = "l";
 	      $echr = "z";
@@ -108,7 +121,7 @@ sub gen_ld_cmpi_p10
 
 	  print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
 	  print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
-	  print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"m\")\n";
+	  print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"${constraint}\")\n";
 	  if ($ccmode eq 'CCUNS') { print "   "; }
 	  print "                    (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
 	  if ($result eq 'clobber') {
@@ -137,6 +150,11 @@ sub gen_ld_cmpi_p10
 	  print "  \"\"\n";
 	  print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
 	  print "   (set_attr \"cost\" \"8\")\n";
+
+	  if ($extend eq "sign") {
+		  print "   (set_attr \"sign_extend\" \"yes\")\n";
+	  }
+
 	  print "   (set_attr \"length\" \"8\")])\n";
 	  print "\n";
       }
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..d836a8a58b3 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
index e69de29bb2d..f4ab384daa7 100644
--- a/gcc/testsuite/g++.target/powerpc/pr105325.C
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,24 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target lp64 } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
index 526a026d874..ca7297375a4 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
@@ -61,7 +61,7 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
 
@@ -73,6 +73,6 @@ TEST(int8_t)
 /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [gcc(refs/users/meissner/heads/work116-dmf)] Make load/cmp fusion know about prefixed loads.
@ 2023-03-28 20:12 Michael Meissner
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Meissner @ 2023-03-28 20:12 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:9d761a7e61742c0af1cc1f1152579221f61f6013

commit 9d761a7e61742c0af1cc1f1152579221f61f6013
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Mar 28 16:11:32 2023 -0400

    Make load/cmp fusion know about prefixed loads.
    
    I posted a version of patch on March 21st and a second version on March 24th.
    This patch makes some code changes suggested in the genfusion.pl code from the
    last 2 patch submissions.  The fusion.md that is produced by genfusion.pl is
    the same in all 3 versions.
    
    I changed the genfusion.pl to match the suggestion for code layout.  I also
    used the correct comment for each of the instructions (in the 2nd patch, the
    when I rewrote the comments about ld and lwa being DS format instructions, I
    had put the ld comment in the section handling lwa, and vice versa).
    
    I also removed lp64 from the new test.  When I first added the prefixed code,
    it was only done for 64-bit, but now it is allowed for 32-bit.  However, the
    case that shows up (lwa) would not hit in 32-bit, since it only generates lwz
    and not lwa.  It also would not generate ld.  But the test does pass when it is
    built with -m32.
    
    The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
    optimization generates illegal assembler code.
    
    Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
    did not handle the possibility that the load might be prefixed.
    
    The main cause is the constraints for the individual loads in the fusion did not
    match the machine.  In particular, LWA is a ds format instruction when it is
    unprefixed.  The code did not also set the prefixed attribute correctly.
    
    This patch rewrites the genfusion.pl script so that it will have more accurate
    constraints for the LWA and LD instructions (which are DS instructions).  The
    updated genfusion.pl was then run to update fusion.md.  Finally, the code for
    the "prefixed" attribute is modified so that it considers load + compare
    immediate patterns to be like the normal load insns in checking whether
    operand[1] is a prefixed instruction.
    
    I have tested this code on a power9 little endian system (with long double
    being IEEE 128-bit and IBM 128-bit), a power10 little endian system, and a
    power8 big endian system, testing both 32-bit and 64-bit code generation.  Can
    I put this code into the master branch, and after a waiting period, apply it to
    the GCC 12 and GCC 11 branches (the bug does show up in those branches, and the
    patch applies without change).
    
    2023-03-27   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/105325
            * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
            of the ld and lwa instructions which use the DS encoding instead of D.
            Use the YZ constraint for these loads.  Handle prefixed loads better.
            Set the sign_extend attribute as appropriate.
            * gcc/config/rs6000/fusion.md: Regenerate.
            * gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
            instructions to the list of instructions that might have a prefixed load
            instruction.
            * ChangeLog.meissner: Update.
    
    gcc/testsuite/
    
            PR target/105325
            * g++.target/powerpc/pr105325.C: New test.
            * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
    ---
     gcc/config/rs6000/fusion.md                   | 17 +++++----
     gcc/config/rs6000/genfusion.pl                | 36 ++++++++++++++-----
     gcc/config/rs6000/rs6000.md                   |  2 +-
     gcc/testsuite/g++.target/powerpc/pr105325.C   | 23 ++++++++++++
     .../gcc.target/powerpc/fusion-p10-ldcmpi.c    |  4 +--
     5 files changed, 64 insertions(+), 18 deletions(-)
     create mode 100644 gcc/testsuite/g++.target/powerpc/pr105325.C
    
    diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
    index d45fb138a70..da9953d9ad9 100644
    --- a/gcc/config/rs6000/fusion.md
    +++ b/gcc/config/rs6000/fusion.md
    @@ -22,7 +22,7 @@
     ;; load mode is DI result mode is clobber compare mode is CC extend is none
     (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
       [(set (match_operand:CC 2 "cc_reg_operand" "=x")
    -        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
    +        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                         (match_operand:DI 3 "const_m1_to_1_operand" "n")))
        (clobber (match_scratch:DI 0 "=r"))]
       "(TARGET_P10_FUSION)"
    @@ -43,7 +43,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_clobber_CC_none"
     ;; load mode is DI result mode is clobber compare mode is CCUNS extend is none
     (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
       [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
    -        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
    +        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                            (match_operand:DI 3 "const_0_to_1_operand" "n")))
        (clobber (match_scratch:DI 0 "=r"))]
       "(TARGET_P10_FUSION)"
    @@ -64,7 +64,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_clobber_CCUNS_none"
     ;; load mode is DI result mode is DI compare mode is CC extend is none
     (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
       [(set (match_operand:CC 2 "cc_reg_operand" "=x")
    -        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "m")
    +        (compare:CC (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                         (match_operand:DI 3 "const_m1_to_1_operand" "n")))
        (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
       "(TARGET_P10_FUSION)"
    @@ -85,7 +85,7 @@ (define_insn_and_split "*ld_cmpdi_cr0_DI_DI_CC_none"
     ;; load mode is DI result mode is DI compare mode is CCUNS extend is none
     (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
       [(set (match_operand:CCUNS 2 "cc_reg_operand" "=x")
    -        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "m")
    +        (compare:CCUNS (match_operand:DI 1 "ds_form_mem_operand" "YZ")
                            (match_operand:DI 3 "const_0_to_1_operand" "n")))
        (set (match_operand:DI 0 "gpc_reg_operand" "=r") (match_dup 1))]
       "(TARGET_P10_FUSION)"
    @@ -106,7 +106,7 @@ (define_insn_and_split "*ld_cmpldi_cr0_DI_DI_CCUNS_none"
     ;; load mode is SI result mode is clobber compare mode is CC extend is none
     (define_insn_and_split "*lwa_cmpdi_cr0_SI_clobber_CC_none"
       [(set (match_operand:CC 2 "cc_reg_operand" "=x")
    -        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
    +        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                         (match_operand:SI 3 "const_m1_to_1_operand" "n")))
        (clobber (match_scratch:SI 0 "=r"))]
       "(TARGET_P10_FUSION)"
    @@ -148,7 +148,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_clobber_CCUNS_none"
     ;; load mode is SI result mode is SI compare mode is CC extend is none
     (define_insn_and_split "*lwa_cmpdi_cr0_SI_SI_CC_none"
       [(set (match_operand:CC 2 "cc_reg_operand" "=x")
    -        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
    +        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                         (match_operand:SI 3 "const_m1_to_1_operand" "n")))
        (set (match_operand:SI 0 "gpc_reg_operand" "=r") (match_dup 1))]
       "(TARGET_P10_FUSION)"
    @@ -190,7 +190,7 @@ (define_insn_and_split "*lwz_cmpldi_cr0_SI_SI_CCUNS_none"
     ;; load mode is SI result mode is EXTSI compare mode is CC extend is sign
     (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
       [(set (match_operand:CC 2 "cc_reg_operand" "=x")
    -        (compare:CC (match_operand:SI 1 "ds_form_mem_operand" "m")
    +        (compare:CC (match_operand:SI 1 "lwa_operand" "YZ")
                         (match_operand:SI 3 "const_m1_to_1_operand" "n")))
        (set (match_operand:EXTSI 0 "gpc_reg_operand" "=r") (sign_extend:EXTSI (match_dup 1)))]
       "(TARGET_P10_FUSION)"
    @@ -205,6 +205,7 @@ (define_insn_and_split "*lwa_cmpdi_cr0_SI_EXTSI_CC_sign"
       ""
       [(set_attr "type" "fused_load_cmpi")
        (set_attr "cost" "8")
    +   (set_attr "sign_extend" "yes")
        (set_attr "length" "8")])
    
     ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
    @@ -247,6 +248,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_clobber_CC_sign"
       ""
       [(set_attr "type" "fused_load_cmpi")
        (set_attr "cost" "8")
    +   (set_attr "sign_extend" "yes")
        (set_attr "length" "8")])
    
     ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
    @@ -289,6 +291,7 @@ (define_insn_and_split "*lha_cmpdi_cr0_HI_EXTHI_CC_sign"
       ""
       [(set_attr "type" "fused_load_cmpi")
        (set_attr "cost" "8")
    +   (set_attr "sign_extend" "yes")
        (set_attr "length" "8")])
    
     ;; load-cmpi fusion pattern generated by gen_ld_cmpi_p10
    diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
    index e4db352e0ce..05d66c18dd9 100755
    --- a/gcc/config/rs6000/genfusion.pl
    +++ b/gcc/config/rs6000/genfusion.pl
    @@ -56,7 +56,7 @@ sub mode_to_ldst_char
     sub gen_ld_cmpi_p10
     {
         my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
    -       $mempred, $ccmode, $np, $extend, $resultmode);
    +       $mempred, $ccmode, $np, $extend, $resultmode, $constraint);
       LMODE: foreach $lmode ('DI','SI','HI','QI') {
           $ldst = mode_to_ldst_char($lmode);
           $clobbermode = $lmode;
    @@ -69,23 +69,38 @@ sub gen_ld_cmpi_p10
            # Don't allow EXTQI because that would allow HI result which we can't do.
            $result = "GPR" if $result eq "EXTQI";
           CCMODE: foreach $ccmode ('CC','CCUNS') {
    -         $np = "NON_PREFIXED_D";
    -         $mempred = "non_update_memory_operand";
              if ( $ccmode eq 'CC' ) {
                  next CCMODE if $lmode eq 'QI';
    -             if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
    -                 # ld and lwa are both DS-FORM.
    +             if ( $lmode eq 'HI' ) {
    +                 $np = "NON_PREFIXED_D";
    +                 $mempred = "non_update_memory_operand";
    +                 $echr = "a";
    +                 $constraint = "m";
    +             } elsif ( $lmode eq 'SI' ) {
    +                 # lwa is DS-FORM.
    +                 $np = "NON_PREFIXED_DS";
    +                 $mempred = "lwa_operand";
    +                 $echr = "a";
    +                 $constraint = "YZ";
    +             } elsif ( $lmode eq 'DI' ) {
    +                 # ld is DS-FORM.
                      $np = "NON_PREFIXED_DS";
                      $mempred = "ds_form_mem_operand";
    +                 $echr = "";
    +                 $constraint = "YZ";
                  }
                  $cmpl = "";
    -             $echr = "a";
                  $constpred = "const_m1_to_1_operand";
              } else {
                  if ( $lmode eq 'DI' ) {
    -                 # ld is DS-form, but lwz is not.
    +                 # ld is DS-form
                      $np = "NON_PREFIXED_DS";
                      $mempred = "ds_form_mem_operand";
    +                 $constraint = "YZ";
    +             } else {
    +                 $np = "NON_PREFIXED_D";
    +                 $mempred = "non_update_memory_operand";
    +                 $constraint = "m";
                  }
                  $cmpl = "l";
                  $echr = "z";
    @@ -108,7 +123,7 @@ sub gen_ld_cmpi_p10
    
              print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
              print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n";
    -         print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"m\")\n";
    +         print "        (compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"${constraint}\")\n";
              if ($ccmode eq 'CCUNS') { print "   "; }
              print "                    (match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n";
              if ($result eq 'clobber') {
    @@ -137,6 +152,11 @@ sub gen_ld_cmpi_p10
              print "  \"\"\n";
              print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
              print "   (set_attr \"cost\" \"8\")\n";
    +
    +         if ($extend eq "sign") {
    +                 print "   (set_attr \"sign_extend\" \"yes\")\n";
    +         }
    +
              print "   (set_attr \"length\" \"8\")])\n";
              print "\n";
           }
    diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
    index 44f7dd509cb..d836a8a58b3 100644
    --- a/gcc/config/rs6000/rs6000.md
    +++ b/gcc/config/rs6000/rs6000.md
    @@ -302,7 +302,7 @@ (define_attr "prefixed" "no,yes"
                  (eq_attr "maybe_prefixed" "no"))
             (const_string "no")
    
    -        (eq_attr "type" "load,fpload,vecload")
    +        (eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi")
             (if_then_else (match_test "prefixed_load_p (insn)")
                           (const_string "yes")
                           (const_string "no"))
    diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
    new file mode 100644
    index 00000000000..e42c8f9b30f
    --- /dev/null
    +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
    @@ -0,0 +1,23 @@
    +/* { dg-do assemble } */
    +/* { dg-require-effective-target powerpc_prefixed_addr } */
    +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
    +
    +/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
    +   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
    +   load + compare -1/0/1 patterns did not handle the possibility that the load
    +   might be prefixed.  */
    +
    +struct Ath__array1D {
    +  int _current;
    +  int getCnt() { return _current; }
    +};
    +struct extMeasure {
    +  int _mapTable[10000];
    +  Ath__array1D _metRCTable;
    +};
    +void measureRC() {
    +  extMeasure m;
    +  for (; m._metRCTable.getCnt();)
    +    for (;;)
    +      ;
    +}
    diff --git a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
    index 526a026d874..ca7297375a4 100644
    --- a/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
    +++ b/gcc/testsuite/gcc.target/powerpc/fusion-p10-ldcmpi.c
    @@ -61,7 +61,7 @@ TEST(int8_t)
     /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"      16 { target lp64 } } } */
     /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   4 { target lp64 } } } */
     /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target lp64 } } } */
    -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       4 { target lp64 } } } */
    +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       8 { target lp64 } } } */
     /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target lp64 } } } */
     /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   2 { target lp64 } } } */
    
    @@ -73,6 +73,6 @@ TEST(int8_t)
     /* { dg-final { scan-assembler-times "lha_cmpdi_cr0_HI_clobber_CC_sign"       8 { target ilp32 } } } */
     /* { dg-final { scan-assembler-times "lhz_cmpldi_cr0_HI_clobber_CCUNS_zero"   2 { target ilp32 } } } */
     /* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_EXTSI_CC_sign"         0 { target ilp32 } } } */
    -/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"       9 { target ilp32 } } } */
    +/* { dg-final { scan-assembler-times "lwa_cmpdi_cr0_SI_clobber_CC_none"      16 { target ilp32 } } } */
     /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_EXTSI_CCUNS_zero"     0 { target ilp32 } } } */
     /* { dg-final { scan-assembler-times "lwz_cmpldi_cr0_SI_clobber_CCUNS_none"   6 { target ilp32 } } } */
    --
    2.39.2

Diff:
---
 gcc/ChangeLog.meissner                      | 11 +++++++----
 gcc/config/rs6000/genfusion.pl              | 16 +++++++++-------
 gcc/config/rs6000/rs6000.md                 |  2 +-
 gcc/testsuite/g++.target/powerpc/pr105325.C | 23 +++++++++++++++++++++++
 4 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/gcc/ChangeLog.meissner b/gcc/ChangeLog.meissner
index 781e30b3a21..d60e78b028b 100644
--- a/gcc/ChangeLog.meissner
+++ b/gcc/ChangeLog.meissner
@@ -1,4 +1,4 @@
-==================== Branch work116-dmf, patch #1 ====================
+==================== Branch work116-dmf, patch #2 ====================
 
 Make load/cmp fusion know about prefixed loads.
 
@@ -19,10 +19,11 @@ the "prefixed" attribute is modified so that it considers load + compare
 immediate patterns to be like the normal load insns in checking whether
 operand[1] is a prefixed instruction.
 
-I posted a version of patch on March 21st.  This patch makes some code changes
-suggested in the genfusion.pl code.
+I posted a version of patch on March 21st and a second version on March 24th.
+This patch makes some code changes suggested in the genfusion.pl code.  The
+fusion.md that is produced by genfusion.pl is the same in all 3 versions.
 
-2023-03-21   Michael Meissner  <meissner@linux.ibm.com>
+2023-03-27   Michael Meissner  <meissner@linux.ibm.com>
 
 gcc/
 
@@ -42,6 +43,8 @@ gcc/testsuite/
 	* g++.target/powerpc/pr105325.C: New test.
 	* gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.
 
+==================== Branch work116-dmf, patch #1 was reverted ====================
+
 ==================== Branch work116-dmf, baseline ====================
 
 2023-03-24   Michael Meissner  <meissner@linux.ibm.com>
diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index fa9ab7e9704..05d66c18dd9 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -56,7 +56,7 @@ sub mode_to_ldst_char
 sub gen_ld_cmpi_p10
 {
     my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
-	$ccmode, $extend, $resultmode);
+	$mempred, $ccmode, $np, $extend, $resultmode, $constraint);
   LMODE: foreach $lmode ('DI','SI','HI','QI') {
       $ldst = mode_to_ldst_char($lmode);
       $clobbermode = $lmode;
@@ -69,23 +69,21 @@ sub gen_ld_cmpi_p10
 	# Don't allow EXTQI because that would allow HI result which we can't do.
 	$result = "GPR" if $result eq "EXTQI";
       CCMODE: foreach $ccmode ('CC','CCUNS') {
-	  my $np = "NON_PREFIXED_D";
-	  my $mempred = "non_update_memory_operand";
-	  my $constraint = "m";
 	  if ( $ccmode eq 'CC' ) {
 	      next CCMODE if $lmode eq 'QI';
 	      if ( $lmode eq 'HI' ) {
 		  $np = "NON_PREFIXED_D";
 		  $mempred = "non_update_memory_operand";
 		  $echr = "a";
+		  $constraint = "m";
 	      } elsif ( $lmode eq 'SI' ) {
-		  # ld is DS-FORM.
+		  # lwa is DS-FORM.
 		  $np = "NON_PREFIXED_DS";
 		  $mempred = "lwa_operand";
 		  $echr = "a";
 		  $constraint = "YZ";
 	      } elsif ( $lmode eq 'DI' ) {
-		  # lwa is DS-FORM.
+		  # ld is DS-FORM.
 		  $np = "NON_PREFIXED_DS";
 		  $mempred = "ds_form_mem_operand";
 		  $echr = "";
@@ -95,10 +93,14 @@ sub gen_ld_cmpi_p10
 	      $constpred = "const_m1_to_1_operand";
 	  } else {
 	      if ( $lmode eq 'DI' ) {
-		  # ld is DS-form, but lwz is not.
+		  # ld is DS-form
 		  $np = "NON_PREFIXED_DS";
 		  $mempred = "ds_form_mem_operand";
 		  $constraint = "YZ";
+	      } else {
+		  $np = "NON_PREFIXED_D";
+		  $mempred = "non_update_memory_operand";
+		  $constraint = "m";
 	      }
 	      $cmpl = "l";
 	      $echr = "z";
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 44f7dd509cb..d836a8a58b3 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -302,7 +302,7 @@
 	      (eq_attr "maybe_prefixed" "no"))
 	 (const_string "no")
 
-	 (eq_attr "type" "load,fpload,vecload")
+	 (eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi")
 	 (if_then_else (match_test "prefixed_load_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))
diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..e42c8f9b30f
--- /dev/null
+++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
@@ -0,0 +1,23 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target powerpc_prefixed_addr } */
+/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
+
+/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
+   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
+   load + compare -1/0/1 patterns did not handle the possibility that the load
+   might be prefixed.  */
+
+struct Ath__array1D {
+  int _current;
+  int getCnt() { return _current; }
+};
+struct extMeasure {
+  int _mapTable[10000];
+  Ath__array1D _metRCTable;
+};
+void measureRC() {
+  extMeasure m;
+  for (; m._metRCTable.getCnt();)
+    for (;;)
+      ;
+}

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [gcc(refs/users/meissner/heads/work116-dmf)] Make load/cmp fusion know about prefixed loads.
@ 2023-03-24 22:33 Michael Meissner
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Meissner @ 2023-03-24 22:33 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:b362bc06c993a5cbc9d34ae185de88c15630fede

commit b362bc06c993a5cbc9d34ae185de88c15630fede
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Mar 24 18:32:38 2023 -0400

    Make load/cmp fusion know about prefixed loads.
    
    The issue with the bug is the power10 load GPR + cmpi -1/0/1 fusion
    optimization generates illegal assembler code.
    
    Ultimately the code was dying because the fusion load + compare -1/0/1 patterns
    did not handle the possibility that the load might be prefixed.
    
    The main cause is the constraints for the individual loads in the fusion did not
    match the machine.  In particular, LWA is a ds format instruction when it is
    unprefixed.  The code did not also set the prefixed attribute correctly.
    
    This patch rewrites the genfusion.pl script so that it will have more accurate
    constraints for the LWA and LD instructions (which are DS instructions).  The
    updated genfusion.pl was then run to update fusion.md.  Finally, the code for
    the "prefixed" attribute is modified so that it considers load + compare
    immediate patterns to be like the normal load insns in checking whether
    operand[1] is a prefixed instruction.
    
    I posted a version of patch on March 21st.  This patch makes some code changes
    suggested in the genfusion.pl code.
    
    2023-03-21   Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            PR target/105325
            * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
            of the ld and lwa instructions which use the DS encoding instead of D.
            Use the YZ constraint for these loads.  Handle prefixed loads better.
            Set the sign_extend attribute as appropriate.
            * gcc/config/rs6000/fusion.md: Regenerate.
            * gcc/config/rs6000/rs6000.md (prefixed attribute): Add fused_load_cmpi
            instructions to the list of instructions that might have a prefixed load
            instruction.
    
    gcc/testsuite/
    
            PR target/105325
            * g++.target/powerpc/pr105325.C: New test.
            * gcc.target/powerpc/fusion-p10-ldcmpi.c: Adjust insn counts.

Diff:
---
 gcc/testsuite/g++.target/powerpc/pr105325.C | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

diff --git a/gcc/testsuite/g++.target/powerpc/pr105325.C b/gcc/testsuite/g++.target/powerpc/pr105325.C
new file mode 100644
index 00000000000..e69de29bb2d

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-03-28 20:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-24 22:33 [gcc(refs/users/meissner/heads/work116-dmf)] Make load/cmp fusion know about prefixed loads Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2023-03-28 20:12 Michael Meissner
2023-03-24 22:33 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).