public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
@ 2022-08-29  3:42 Jiufu Guo
  2022-09-15  7:35 ` Ping: " Jiufu Guo
  2022-12-13 21:02 ` Segher Boessenkool
  0 siblings, 2 replies; 11+ messages in thread
From: Jiufu Guo @ 2022-08-29  3:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, linkw, guojiufu

Hi,

When checking eq/ne with a constant which has only 16bits, it can be
optimized to check the rotated data.  By this, the constant building
is optimized.

As the example in PR103743:
For "in == 0x8000000000000000LL", this patch generates:
        rotldi %r3,%r3,16
        cmpldi %cr0,%r3,32768
instead:
        li %r9,-1
        rldicr %r9,%r9,0,0
        cmpd %cr0,%r3,%r9

Compare with previous patchs:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600385.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600198.html

This patch releases the condition on can_create_pseudo_p and adds
clobbers to allow the splitter can be run both before and after RA.

This is updated patch based on previous patch and comments:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600315.html

This patch pass bootstrap and regtest on ppc64 and ppc64le.
Is it ok for trunk?  Thanks for comments!

BR,
Jeff(Jiufu)


	PR target/103743

gcc/ChangeLog:

	* config/rs6000/rs6000-protos.h (rotate_from_leading_zeros_const): New.
	(compare_rotate_immediate_p): New.
	* config/rs6000/rs6000.cc (rotate_from_leading_zeros_const): New
	definition.
	(compare_rotate_immediate_p): New definition.
	* config/rs6000/rs6000.md (EQNE): New code_attr.
	(*rotate_on_cmpdi): New define_insn_and_split.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr103743.c: New test.
	* gcc.target/powerpc/pr103743_1.c: New test.

---
 gcc/config/rs6000/rs6000-protos.h             |  2 +
 gcc/config/rs6000/rs6000.cc                   | 41 ++++++++
 gcc/config/rs6000/rs6000.md                   | 62 +++++++++++-
 gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++++++++++
 gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++++++++++++++++++
 5 files changed, 251 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c

diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index b3c16e7448d..78847e6b3db 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -35,6 +35,8 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
+extern int rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT, int);
+extern bool compare_rotate_immediate_p (unsigned HOST_WIDE_INT);
 extern int num_insns_constant (rtx, machine_mode);
 extern int small_data_operand (rtx, machine_mode);
 extern bool mem_operand_gpr (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..a548db42660 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -14797,6 +14797,47 @@ rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
     return reverse_condition (code);
 }
 
+/* Check if C can be rotated from an immediate which starts (as 64bit integer)
+   with at least CLZ bits zero.
+
+   Return the number by which C can be rotated from the immediate.
+   Return -1 if C can not be rotated as from.  */
+
+int
+rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
+{
+  /* case a. 0..0xxx: already at least clz zeros.  */
+  int lz = clz_hwi (c);
+  if (lz >= clz)
+    return 0;
+
+  /* case b. 0..0xxx0..0: at least clz zeros.  */
+  int tz = ctz_hwi (c);
+  if (lz + tz >= clz)
+    return tz;
+
+  /* case c. xx10.....0xx: rotate 'clz + 1' bits firstly, then check case b.
+	       ^bit -> Vbit
+	     00...00xxx100, 'clz + 1' >= bits of xxxx.  */
+  const int rot_bits = HOST_BITS_PER_WIDE_INT - clz + 1;
+  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
+  tz = ctz_hwi (rc);
+  if (clz_hwi (rc) + tz >= clz)
+    return tz + rot_bits;
+
+  return -1;
+}
+
+/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi.  */
+
+bool
+compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
+{
+  /* leading 48 zeros (cmpldi), or leading 49 ones (cmpdi).  */
+  return rotate_from_leading_zeros_const (~c, 49) > 0
+	 || rotate_from_leading_zeros_const (c, 48) > 0;
+}
+
 /* Generate a compare for CODE.  Return a brand-new rtx that
    represents the result of the compare.  */
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index e9e5cd1e54d..cad3cfc98cd 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7766,6 +7766,67 @@ (define_insn "*movsi_from_df"
   "xscvdpsp %x0,%x1"
   [(set_attr "type" "fp")])
 
+
+(define_code_iterator eqne [eq ne])
+(define_code_attr EQNE [(eq "EQ") (ne "NE")])
+
+;; "i == C" ==> "rotl(i,N) == rotl(C,N)"
+(define_insn_and_split "*rotate_on_cmpdi"
+  [(set (pc)
+	(if_then_else (eqne (match_operand:DI 1 "gpc_reg_operand" "r")
+			    (match_operand:DI 2 "const_int_operand" "n"))
+		      (label_ref (match_operand 0 ""))
+		      (pc)))
+  (clobber (match_scratch:DI 3 "=r"))
+  (clobber (match_scratch:CCUNS 4 "=y"))]
+  "TARGET_POWERPC64 && num_insns_constant (operands[2], DImode) > 1
+   && compare_rotate_immediate_p (UINTVAL (operands[2]))"
+   "#"
+   "&& 1"
+  [(pc)]
+{
+  rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
+  bool sgn = false;
+  unsigned HOST_WIDE_INT C = INTVAL (operands[2]);
+  int rot = rotate_from_leading_zeros_const (C, 48);
+  if (rot < 0)
+    {
+      sgn = true;
+      rot = rotate_from_leading_zeros_const (~C, 49);
+    }
+  rtx n = GEN_INT (HOST_BITS_PER_WIDE_INT - rot);
+
+  /* i' = rotl (i, n) */
+  rtx op0 = can_create_pseudo_p () ? gen_reg_rtx (DImode) : operands[3];
+  emit_insn (gen_rtx_SET (op0, gen_rtx_ROTATE (DImode, operands[1], n)));
+
+  /* C' = rotl (C, n) */
+  rtx op1 = GEN_INT ((C << INTVAL (n)) | (C >> rot));
+
+  /* i' ==  C' */
+  machine_mode comp_mode = sgn ? CCmode : CCUNSmode;
+  rtx cc = can_create_pseudo_p () ? gen_reg_rtx (comp_mode) : operands[4];
+  PUT_MODE (cc, comp_mode);
+  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (comp_mode, op0, op1)));
+  rtx cmp = gen_rtx_<EQNE> (CCmode, cc, const0_rtx);
+  rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[0]);
+  emit_jump_insn (gen_rtx_SET (pc_rtx,
+			       gen_rtx_IF_THEN_ELSE (VOIDmode, cmp,
+						     loc_ref, pc_rtx)));
+
+  /* keep the probability info for the prediction of the branch insn.  */
+  if (note)
+    {
+      profile_probability prob
+	= profile_probability::from_reg_br_prob_note (XINT (note, 0));
+
+      add_reg_br_prob_note (get_last_insn (), prob);
+    }
+
+  DONE;
+}
+)
+
 ;; Split a load of a large constant into the appropriate two-insn
 ;; sequence.
 
@@ -13472,7 +13533,6 @@ (define_expand "@ctr<mode>"
 ;; rs6000_legitimate_combined_insn prevents combine creating any of
 ;; the ctr<mode> insns.
 
-(define_code_iterator eqne [eq ne])
 (define_code_attr bd [(eq "bdz") (ne "bdnz")])
 (define_code_attr bd_neg [(eq "bdnz") (ne "bdz")])
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743.c b/gcc/testsuite/gcc.target/powerpc/pr103743.c
new file mode 100644
index 00000000000..abb876ed79e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
@@ -0,0 +1,52 @@
+/* { dg-options "-O2" } */
+/* { dg-do compile { target has_arch_ppc64 } } */
+
+/* { dg-final { scan-assembler-times {\mcmpldi\M} 10  } } */
+/* { dg-final { scan-assembler-times {\mcmpdi\M} 4  } } */
+/* { dg-final { scan-assembler-times {\mrotldi\M} 14  } } */
+
+int foo (int a);
+
+int __attribute__ ((noinline)) udi_fun (unsigned long long in)
+{
+  if (in == (0x8642000000000000ULL))
+    return foo (1);
+  if (in == (0x7642000000000000ULL))
+    return foo (12);
+  if (in == (0x8000000000000000ULL))
+    return foo (32);
+  if (in == (0x8000000000000001ULL))
+    return foo (33);
+  if (in == (0x8642FFFFFFFFFFFFULL))
+    return foo (46);
+  if (in == (0x7642FFFFFFFFFFFFULL))
+    return foo (51);
+  if (in == (0x7567000000ULL))
+    return foo (9);
+  if (in == (0xFFF8567FFFFFFFFFULL))
+    return foo (19);
+
+  return 0;
+}
+
+int __attribute__ ((noinline)) di_fun (long long in)
+{
+  if (in == (0x8642000000000000LL))
+    return foo (1);
+  if (in == (0x7642000000000000LL))
+    return foo (12);
+  if (in == (0x8000000000000000LL))
+    return foo (32);
+  if (in == (0x8000000000000001LL))
+    return foo (33);
+  if (in == (0x8642FFFFFFFFFFFFLL))
+    return foo (46);
+  if (in == (0x7642FFFFFFFFFFFFLL))
+    return foo (51);
+  if (in == (0x7567000000LL))
+    return foo (9);
+  if (in == (0xFFF8567FFFFFFFFFLL))
+    return foo (19);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743_1.c b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
new file mode 100644
index 00000000000..2c08c56714a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
@@ -0,0 +1,95 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -std=c99" } */
+
+int
+foo (int a)
+{
+  return a + 6;
+}
+
+int __attribute__ ((noinline)) udi_fun (unsigned long long in)
+{
+  if (in == (0x8642000000000000ULL))
+    return foo (1);
+  if (in == (0x7642000000000000ULL))
+    return foo (12);
+  if (in == (0x8000000000000000ULL))
+    return foo (32);
+  if (in == (0x8000000000000001ULL))
+    return foo (33);
+  if (in == (0x8642FFFFFFFFFFFFULL))
+    return foo (46);
+  if (in == (0x7642FFFFFFFFFFFFULL))
+    return foo (51);
+  if (in == (0x7567000000ULL))
+    return foo (9);
+  if (in == (0xFFF8567FFFFFFFFFULL))
+    return foo (19);
+  
+  return 0;
+}
+
+int __attribute__ ((noinline)) di_fun (long long in)
+{
+  if (in == (0x8642000000000000LL))
+    return foo (1);
+  if (in == (0x7642000000000000LL))
+    return foo (12);
+  if (in == (0x8000000000000000LL))
+    return foo (32);
+  if (in == (0x8000000000000001LL))
+    return foo (33);
+  if (in == (0x8642FFFFFFFFFFFFLL))
+    return foo (46);
+  if (in == (0x7642FFFFFFFFFFFFLL))
+    return foo (51);
+  return 0;
+}
+
+int
+main ()
+{
+  int e = 0;
+  if (udi_fun (6) != 0)
+    e++;
+  if (udi_fun (0x8642000000000000ULL) != foo (1))
+    e++;
+  if (udi_fun (0x7642000000000000ULL) != foo (12))
+    e++;
+  if (udi_fun (0x8000000000000000ULL) != foo (32))
+    e++;
+  if (udi_fun (0x8000000000000001ULL) != foo (33))
+    e++;
+  if (udi_fun (0x8642FFFFFFFFFFFFULL) != foo (46))
+    e++;
+  if (udi_fun (0x7642FFFFFFFFFFFFULL) != foo (51))
+    e++;
+  if (udi_fun (0x7567000000ULL) != foo (9))
+    e++;
+  if (udi_fun (0xFFF8567FFFFFFFFFULL) != foo (19))
+    e++;
+
+  if (di_fun (6) != 0)
+    e++;
+  if (di_fun (0x8642000000000000LL) != foo (1))
+    e++;
+  if (di_fun (0x7642000000000000LL) != foo (12))
+    e++;
+  if (di_fun (0x8000000000000000LL) != foo (32))
+    e++;
+  if (di_fun (0x8000000000000001LL) != foo (33))
+    e++;
+  if (di_fun (0x8642FFFFFFFFFFFFLL) != foo (46))
+    e++;
+  if (di_fun (0x7642FFFFFFFFFFFFLL) != foo (51))
+    e++;
+  if (udi_fun (0x7567000000LL) != foo (9))
+    e++;
+  if (udi_fun (0xFFF8567FFFFFFFFFLL) != foo (19))
+    e++;
+
+  if (e)
+    __builtin_abort ();
+  return 0;
+}
+
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Ping: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
  2022-08-29  3:42 [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant Jiufu Guo
@ 2022-09-15  7:35 ` Jiufu Guo
  2022-10-12  6:42   ` Ping^2: " Jiufu Guo
  2022-12-13 21:02 ` Segher Boessenkool
  1 sibling, 1 reply; 11+ messages in thread
From: Jiufu Guo @ 2022-09-15  7:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, linkw


Ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html

BR,
Jeff(Jiufu)


Jiufu Guo <guojiufu@linux.ibm.com> writes:

> Hi,
>
> When checking eq/ne with a constant which has only 16bits, it can be
> optimized to check the rotated data.  By this, the constant building
> is optimized.
>
> As the example in PR103743:
> For "in == 0x8000000000000000LL", this patch generates:
>         rotldi %r3,%r3,16
>         cmpldi %cr0,%r3,32768
> instead:
>         li %r9,-1
>         rldicr %r9,%r9,0,0
>         cmpd %cr0,%r3,%r9
>
> Compare with previous patchs:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600385.html
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600198.html
>
> This patch releases the condition on can_create_pseudo_p and adds
> clobbers to allow the splitter can be run both before and after RA.
>
> This is updated patch based on previous patch and comments:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600315.html
>
> This patch pass bootstrap and regtest on ppc64 and ppc64le.
> Is it ok for trunk?  Thanks for comments!
>
> BR,
> Jeff(Jiufu)
>
>
> 	PR target/103743
>
> gcc/ChangeLog:
>
> 	* config/rs6000/rs6000-protos.h (rotate_from_leading_zeros_const): New.
> 	(compare_rotate_immediate_p): New.
> 	* config/rs6000/rs6000.cc (rotate_from_leading_zeros_const): New
> 	definition.
> 	(compare_rotate_immediate_p): New definition.
> 	* config/rs6000/rs6000.md (EQNE): New code_attr.
> 	(*rotate_on_cmpdi): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/powerpc/pr103743.c: New test.
> 	* gcc.target/powerpc/pr103743_1.c: New test.
>
> ---
>  gcc/config/rs6000/rs6000-protos.h             |  2 +
>  gcc/config/rs6000/rs6000.cc                   | 41 ++++++++
>  gcc/config/rs6000/rs6000.md                   | 62 +++++++++++-
>  gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++++++++++
>  gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++++++++++++++++++
>  5 files changed, 251 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>
> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index b3c16e7448d..78847e6b3db 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -35,6 +35,8 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
>  extern int vspltis_shifted (rtx);
>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
> +extern int rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT, int);
> +extern bool compare_rotate_immediate_p (unsigned HOST_WIDE_INT);
>  extern int num_insns_constant (rtx, machine_mode);
>  extern int small_data_operand (rtx, machine_mode);
>  extern bool mem_operand_gpr (rtx, machine_mode);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index df491bee2ea..a548db42660 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14797,6 +14797,47 @@ rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
>      return reverse_condition (code);
>  }
>  
> +/* Check if C can be rotated from an immediate which starts (as 64bit integer)
> +   with at least CLZ bits zero.
> +
> +   Return the number by which C can be rotated from the immediate.
> +   Return -1 if C can not be rotated as from.  */
> +
> +int
> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
> +{
> +  /* case a. 0..0xxx: already at least clz zeros.  */
> +  int lz = clz_hwi (c);
> +  if (lz >= clz)
> +    return 0;
> +
> +  /* case b. 0..0xxx0..0: at least clz zeros.  */
> +  int tz = ctz_hwi (c);
> +  if (lz + tz >= clz)
> +    return tz;
> +
> +  /* case c. xx10.....0xx: rotate 'clz + 1' bits firstly, then check case b.
> +	       ^bit -> Vbit
> +	     00...00xxx100, 'clz + 1' >= bits of xxxx.  */
> +  const int rot_bits = HOST_BITS_PER_WIDE_INT - clz + 1;
> +  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
> +  tz = ctz_hwi (rc);
> +  if (clz_hwi (rc) + tz >= clz)
> +    return tz + rot_bits;
> +
> +  return -1;
> +}
> +
> +/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi.  */
> +
> +bool
> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
> +{
> +  /* leading 48 zeros (cmpldi), or leading 49 ones (cmpdi).  */
> +  return rotate_from_leading_zeros_const (~c, 49) > 0
> +	 || rotate_from_leading_zeros_const (c, 48) > 0;
> +}
> +
>  /* Generate a compare for CODE.  Return a brand-new rtx that
>     represents the result of the compare.  */
>  
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index e9e5cd1e54d..cad3cfc98cd 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7766,6 +7766,67 @@ (define_insn "*movsi_from_df"
>    "xscvdpsp %x0,%x1"
>    [(set_attr "type" "fp")])
>  
> +
> +(define_code_iterator eqne [eq ne])
> +(define_code_attr EQNE [(eq "EQ") (ne "NE")])
> +
> +;; "i == C" ==> "rotl(i,N) == rotl(C,N)"
> +(define_insn_and_split "*rotate_on_cmpdi"
> +  [(set (pc)
> +	(if_then_else (eqne (match_operand:DI 1 "gpc_reg_operand" "r")
> +			    (match_operand:DI 2 "const_int_operand" "n"))
> +		      (label_ref (match_operand 0 ""))
> +		      (pc)))
> +  (clobber (match_scratch:DI 3 "=r"))
> +  (clobber (match_scratch:CCUNS 4 "=y"))]
> +  "TARGET_POWERPC64 && num_insns_constant (operands[2], DImode) > 1
> +   && compare_rotate_immediate_p (UINTVAL (operands[2]))"
> +   "#"
> +   "&& 1"
> +  [(pc)]
> +{
> +  rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
> +  bool sgn = false;
> +  unsigned HOST_WIDE_INT C = INTVAL (operands[2]);
> +  int rot = rotate_from_leading_zeros_const (C, 48);
> +  if (rot < 0)
> +    {
> +      sgn = true;
> +      rot = rotate_from_leading_zeros_const (~C, 49);
> +    }
> +  rtx n = GEN_INT (HOST_BITS_PER_WIDE_INT - rot);
> +
> +  /* i' = rotl (i, n) */
> +  rtx op0 = can_create_pseudo_p () ? gen_reg_rtx (DImode) : operands[3];
> +  emit_insn (gen_rtx_SET (op0, gen_rtx_ROTATE (DImode, operands[1], n)));
> +
> +  /* C' = rotl (C, n) */
> +  rtx op1 = GEN_INT ((C << INTVAL (n)) | (C >> rot));
> +
> +  /* i' ==  C' */
> +  machine_mode comp_mode = sgn ? CCmode : CCUNSmode;
> +  rtx cc = can_create_pseudo_p () ? gen_reg_rtx (comp_mode) : operands[4];
> +  PUT_MODE (cc, comp_mode);
> +  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (comp_mode, op0, op1)));
> +  rtx cmp = gen_rtx_<EQNE> (CCmode, cc, const0_rtx);
> +  rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[0]);
> +  emit_jump_insn (gen_rtx_SET (pc_rtx,
> +			       gen_rtx_IF_THEN_ELSE (VOIDmode, cmp,
> +						     loc_ref, pc_rtx)));
> +
> +  /* keep the probability info for the prediction of the branch insn.  */
> +  if (note)
> +    {
> +      profile_probability prob
> +	= profile_probability::from_reg_br_prob_note (XINT (note, 0));
> +
> +      add_reg_br_prob_note (get_last_insn (), prob);
> +    }
> +
> +  DONE;
> +}
> +)
> +
>  ;; Split a load of a large constant into the appropriate two-insn
>  ;; sequence.
>  
> @@ -13472,7 +13533,6 @@ (define_expand "@ctr<mode>"
>  ;; rs6000_legitimate_combined_insn prevents combine creating any of
>  ;; the ctr<mode> insns.
>  
> -(define_code_iterator eqne [eq ne])
>  (define_code_attr bd [(eq "bdz") (ne "bdnz")])
>  (define_code_attr bd_neg [(eq "bdnz") (ne "bdz")])
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743.c b/gcc/testsuite/gcc.target/powerpc/pr103743.c
> new file mode 100644
> index 00000000000..abb876ed79e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
> @@ -0,0 +1,52 @@
> +/* { dg-options "-O2" } */
> +/* { dg-do compile { target has_arch_ppc64 } } */
> +
> +/* { dg-final { scan-assembler-times {\mcmpldi\M} 10  } } */
> +/* { dg-final { scan-assembler-times {\mcmpdi\M} 4  } } */
> +/* { dg-final { scan-assembler-times {\mrotldi\M} 14  } } */
> +
> +int foo (int a);
> +
> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
> +{
> +  if (in == (0x8642000000000000ULL))
> +    return foo (1);
> +  if (in == (0x7642000000000000ULL))
> +    return foo (12);
> +  if (in == (0x8000000000000000ULL))
> +    return foo (32);
> +  if (in == (0x8000000000000001ULL))
> +    return foo (33);
> +  if (in == (0x8642FFFFFFFFFFFFULL))
> +    return foo (46);
> +  if (in == (0x7642FFFFFFFFFFFFULL))
> +    return foo (51);
> +  if (in == (0x7567000000ULL))
> +    return foo (9);
> +  if (in == (0xFFF8567FFFFFFFFFULL))
> +    return foo (19);
> +
> +  return 0;
> +}
> +
> +int __attribute__ ((noinline)) di_fun (long long in)
> +{
> +  if (in == (0x8642000000000000LL))
> +    return foo (1);
> +  if (in == (0x7642000000000000LL))
> +    return foo (12);
> +  if (in == (0x8000000000000000LL))
> +    return foo (32);
> +  if (in == (0x8000000000000001LL))
> +    return foo (33);
> +  if (in == (0x8642FFFFFFFFFFFFLL))
> +    return foo (46);
> +  if (in == (0x7642FFFFFFFFFFFFLL))
> +    return foo (51);
> +  if (in == (0x7567000000LL))
> +    return foo (9);
> +  if (in == (0xFFF8567FFFFFFFFFLL))
> +    return foo (19);
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743_1.c b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
> new file mode 100644
> index 00000000000..2c08c56714a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
> @@ -0,0 +1,95 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -std=c99" } */
> +
> +int
> +foo (int a)
> +{
> +  return a + 6;
> +}
> +
> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
> +{
> +  if (in == (0x8642000000000000ULL))
> +    return foo (1);
> +  if (in == (0x7642000000000000ULL))
> +    return foo (12);
> +  if (in == (0x8000000000000000ULL))
> +    return foo (32);
> +  if (in == (0x8000000000000001ULL))
> +    return foo (33);
> +  if (in == (0x8642FFFFFFFFFFFFULL))
> +    return foo (46);
> +  if (in == (0x7642FFFFFFFFFFFFULL))
> +    return foo (51);
> +  if (in == (0x7567000000ULL))
> +    return foo (9);
> +  if (in == (0xFFF8567FFFFFFFFFULL))
> +    return foo (19);
> +  
> +  return 0;
> +}
> +
> +int __attribute__ ((noinline)) di_fun (long long in)
> +{
> +  if (in == (0x8642000000000000LL))
> +    return foo (1);
> +  if (in == (0x7642000000000000LL))
> +    return foo (12);
> +  if (in == (0x8000000000000000LL))
> +    return foo (32);
> +  if (in == (0x8000000000000001LL))
> +    return foo (33);
> +  if (in == (0x8642FFFFFFFFFFFFLL))
> +    return foo (46);
> +  if (in == (0x7642FFFFFFFFFFFFLL))
> +    return foo (51);
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  int e = 0;
> +  if (udi_fun (6) != 0)
> +    e++;
> +  if (udi_fun (0x8642000000000000ULL) != foo (1))
> +    e++;
> +  if (udi_fun (0x7642000000000000ULL) != foo (12))
> +    e++;
> +  if (udi_fun (0x8000000000000000ULL) != foo (32))
> +    e++;
> +  if (udi_fun (0x8000000000000001ULL) != foo (33))
> +    e++;
> +  if (udi_fun (0x8642FFFFFFFFFFFFULL) != foo (46))
> +    e++;
> +  if (udi_fun (0x7642FFFFFFFFFFFFULL) != foo (51))
> +    e++;
> +  if (udi_fun (0x7567000000ULL) != foo (9))
> +    e++;
> +  if (udi_fun (0xFFF8567FFFFFFFFFULL) != foo (19))
> +    e++;
> +
> +  if (di_fun (6) != 0)
> +    e++;
> +  if (di_fun (0x8642000000000000LL) != foo (1))
> +    e++;
> +  if (di_fun (0x7642000000000000LL) != foo (12))
> +    e++;
> +  if (di_fun (0x8000000000000000LL) != foo (32))
> +    e++;
> +  if (di_fun (0x8000000000000001LL) != foo (33))
> +    e++;
> +  if (di_fun (0x8642FFFFFFFFFFFFLL) != foo (46))
> +    e++;
> +  if (di_fun (0x7642FFFFFFFFFFFFLL) != foo (51))
> +    e++;
> +  if (udi_fun (0x7567000000LL) != foo (9))
> +    e++;
> +  if (udi_fun (0xFFF8567FFFFFFFFFLL) != foo (19))
> +    e++;
> +
> +  if (e)
> +    __builtin_abort ();
> +  return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Ping^2: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
  2022-09-15  7:35 ` Ping: " Jiufu Guo
@ 2022-10-12  6:42   ` Jiufu Guo
  2022-11-09  3:01     ` Ping^3: " Jiufu Guo
  0 siblings, 1 reply; 11+ messages in thread
From: Jiufu Guo @ 2022-10-12  6:42 UTC (permalink / raw)
  To: Jiufu Guo via Gcc-patches; +Cc: dje.gcc, segher, linkw


Gentle ping:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html

BR,
Jeff (Jiufu)

Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>
> BR,
> Jeff(Jiufu)
>
>
> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>
>> Hi,
>>
>> When checking eq/ne with a constant which has only 16bits, it can be
>> optimized to check the rotated data.  By this, the constant building
>> is optimized.
>>
>> As the example in PR103743:
>> For "in == 0x8000000000000000LL", this patch generates:
>>         rotldi %r3,%r3,16
>>         cmpldi %cr0,%r3,32768
>> instead:
>>         li %r9,-1
>>         rldicr %r9,%r9,0,0
>>         cmpd %cr0,%r3,%r9
>>
>> Compare with previous patchs:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600385.html
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600198.html
>>
>> This patch releases the condition on can_create_pseudo_p and adds
>> clobbers to allow the splitter can be run both before and after RA.
>>
>> This is updated patch based on previous patch and comments:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600315.html
>>
>> This patch pass bootstrap and regtest on ppc64 and ppc64le.
>> Is it ok for trunk?  Thanks for comments!
>>
>> BR,
>> Jeff(Jiufu)
>>
>>
>> 	PR target/103743
>>
>> gcc/ChangeLog:
>>
>> 	* config/rs6000/rs6000-protos.h (rotate_from_leading_zeros_const): New.
>> 	(compare_rotate_immediate_p): New.
>> 	* config/rs6000/rs6000.cc (rotate_from_leading_zeros_const): New
>> 	definition.
>> 	(compare_rotate_immediate_p): New definition.
>> 	* config/rs6000/rs6000.md (EQNE): New code_attr.
>> 	(*rotate_on_cmpdi): New define_insn_and_split.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.target/powerpc/pr103743.c: New test.
>> 	* gcc.target/powerpc/pr103743_1.c: New test.
>>
>> ---
>>  gcc/config/rs6000/rs6000-protos.h             |  2 +
>>  gcc/config/rs6000/rs6000.cc                   | 41 ++++++++
>>  gcc/config/rs6000/rs6000.md                   | 62 +++++++++++-
>>  gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++++++++++
>>  gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++++++++++++++++++
>>  5 files changed, 251 insertions(+), 1 deletion(-)
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
>> index b3c16e7448d..78847e6b3db 100644
>> --- a/gcc/config/rs6000/rs6000-protos.h
>> +++ b/gcc/config/rs6000/rs6000-protos.h
>> @@ -35,6 +35,8 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
>>  extern int vspltis_shifted (rtx);
>>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
>> +extern int rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT, int);
>> +extern bool compare_rotate_immediate_p (unsigned HOST_WIDE_INT);
>>  extern int num_insns_constant (rtx, machine_mode);
>>  extern int small_data_operand (rtx, machine_mode);
>>  extern bool mem_operand_gpr (rtx, machine_mode);
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index df491bee2ea..a548db42660 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -14797,6 +14797,47 @@ rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
>>      return reverse_condition (code);
>>  }
>>  
>> +/* Check if C can be rotated from an immediate which starts (as 64bit integer)
>> +   with at least CLZ bits zero.
>> +
>> +   Return the number by which C can be rotated from the immediate.
>> +   Return -1 if C can not be rotated as from.  */
>> +
>> +int
>> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
>> +{
>> +  /* case a. 0..0xxx: already at least clz zeros.  */
>> +  int lz = clz_hwi (c);
>> +  if (lz >= clz)
>> +    return 0;
>> +
>> +  /* case b. 0..0xxx0..0: at least clz zeros.  */
>> +  int tz = ctz_hwi (c);
>> +  if (lz + tz >= clz)
>> +    return tz;
>> +
>> +  /* case c. xx10.....0xx: rotate 'clz + 1' bits firstly, then check case b.
>> +	       ^bit -> Vbit
>> +	     00...00xxx100, 'clz + 1' >= bits of xxxx.  */
>> +  const int rot_bits = HOST_BITS_PER_WIDE_INT - clz + 1;
>> +  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
>> +  tz = ctz_hwi (rc);
>> +  if (clz_hwi (rc) + tz >= clz)
>> +    return tz + rot_bits;
>> +
>> +  return -1;
>> +}
>> +
>> +/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi.  */
>> +
>> +bool
>> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
>> +{
>> +  /* leading 48 zeros (cmpldi), or leading 49 ones (cmpdi).  */
>> +  return rotate_from_leading_zeros_const (~c, 49) > 0
>> +	 || rotate_from_leading_zeros_const (c, 48) > 0;
>> +}
>> +
>>  /* Generate a compare for CODE.  Return a brand-new rtx that
>>     represents the result of the compare.  */
>>  
>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>> index e9e5cd1e54d..cad3cfc98cd 100644
>> --- a/gcc/config/rs6000/rs6000.md
>> +++ b/gcc/config/rs6000/rs6000.md
>> @@ -7766,6 +7766,67 @@ (define_insn "*movsi_from_df"
>>    "xscvdpsp %x0,%x1"
>>    [(set_attr "type" "fp")])
>>  
>> +
>> +(define_code_iterator eqne [eq ne])
>> +(define_code_attr EQNE [(eq "EQ") (ne "NE")])
>> +
>> +;; "i == C" ==> "rotl(i,N) == rotl(C,N)"
>> +(define_insn_and_split "*rotate_on_cmpdi"
>> +  [(set (pc)
>> +	(if_then_else (eqne (match_operand:DI 1 "gpc_reg_operand" "r")
>> +			    (match_operand:DI 2 "const_int_operand" "n"))
>> +		      (label_ref (match_operand 0 ""))
>> +		      (pc)))
>> +  (clobber (match_scratch:DI 3 "=r"))
>> +  (clobber (match_scratch:CCUNS 4 "=y"))]
>> +  "TARGET_POWERPC64 && num_insns_constant (operands[2], DImode) > 1
>> +   && compare_rotate_immediate_p (UINTVAL (operands[2]))"
>> +   "#"
>> +   "&& 1"
>> +  [(pc)]
>> +{
>> +  rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
>> +  bool sgn = false;
>> +  unsigned HOST_WIDE_INT C = INTVAL (operands[2]);
>> +  int rot = rotate_from_leading_zeros_const (C, 48);
>> +  if (rot < 0)
>> +    {
>> +      sgn = true;
>> +      rot = rotate_from_leading_zeros_const (~C, 49);
>> +    }
>> +  rtx n = GEN_INT (HOST_BITS_PER_WIDE_INT - rot);
>> +
>> +  /* i' = rotl (i, n) */
>> +  rtx op0 = can_create_pseudo_p () ? gen_reg_rtx (DImode) : operands[3];
>> +  emit_insn (gen_rtx_SET (op0, gen_rtx_ROTATE (DImode, operands[1], n)));
>> +
>> +  /* C' = rotl (C, n) */
>> +  rtx op1 = GEN_INT ((C << INTVAL (n)) | (C >> rot));
>> +
>> +  /* i' ==  C' */
>> +  machine_mode comp_mode = sgn ? CCmode : CCUNSmode;
>> +  rtx cc = can_create_pseudo_p () ? gen_reg_rtx (comp_mode) : operands[4];
>> +  PUT_MODE (cc, comp_mode);
>> +  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (comp_mode, op0, op1)));
>> +  rtx cmp = gen_rtx_<EQNE> (CCmode, cc, const0_rtx);
>> +  rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[0]);
>> +  emit_jump_insn (gen_rtx_SET (pc_rtx,
>> +			       gen_rtx_IF_THEN_ELSE (VOIDmode, cmp,
>> +						     loc_ref, pc_rtx)));
>> +
>> +  /* keep the probability info for the prediction of the branch insn.  */
>> +  if (note)
>> +    {
>> +      profile_probability prob
>> +	= profile_probability::from_reg_br_prob_note (XINT (note, 0));
>> +
>> +      add_reg_br_prob_note (get_last_insn (), prob);
>> +    }
>> +
>> +  DONE;
>> +}
>> +)
>> +
>>  ;; Split a load of a large constant into the appropriate two-insn
>>  ;; sequence.
>>  
>> @@ -13472,7 +13533,6 @@ (define_expand "@ctr<mode>"
>>  ;; rs6000_legitimate_combined_insn prevents combine creating any of
>>  ;; the ctr<mode> insns.
>>  
>> -(define_code_iterator eqne [eq ne])
>>  (define_code_attr bd [(eq "bdz") (ne "bdnz")])
>>  (define_code_attr bd_neg [(eq "bdnz") (ne "bdz")])
>>  
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743.c b/gcc/testsuite/gcc.target/powerpc/pr103743.c
>> new file mode 100644
>> index 00000000000..abb876ed79e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
>> @@ -0,0 +1,52 @@
>> +/* { dg-options "-O2" } */
>> +/* { dg-do compile { target has_arch_ppc64 } } */
>> +
>> +/* { dg-final { scan-assembler-times {\mcmpldi\M} 10  } } */
>> +/* { dg-final { scan-assembler-times {\mcmpdi\M} 4  } } */
>> +/* { dg-final { scan-assembler-times {\mrotldi\M} 14  } } */
>> +
>> +int foo (int a);
>> +
>> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
>> +{
>> +  if (in == (0x8642000000000000ULL))
>> +    return foo (1);
>> +  if (in == (0x7642000000000000ULL))
>> +    return foo (12);
>> +  if (in == (0x8000000000000000ULL))
>> +    return foo (32);
>> +  if (in == (0x8000000000000001ULL))
>> +    return foo (33);
>> +  if (in == (0x8642FFFFFFFFFFFFULL))
>> +    return foo (46);
>> +  if (in == (0x7642FFFFFFFFFFFFULL))
>> +    return foo (51);
>> +  if (in == (0x7567000000ULL))
>> +    return foo (9);
>> +  if (in == (0xFFF8567FFFFFFFFFULL))
>> +    return foo (19);
>> +
>> +  return 0;
>> +}
>> +
>> +int __attribute__ ((noinline)) di_fun (long long in)
>> +{
>> +  if (in == (0x8642000000000000LL))
>> +    return foo (1);
>> +  if (in == (0x7642000000000000LL))
>> +    return foo (12);
>> +  if (in == (0x8000000000000000LL))
>> +    return foo (32);
>> +  if (in == (0x8000000000000001LL))
>> +    return foo (33);
>> +  if (in == (0x8642FFFFFFFFFFFFLL))
>> +    return foo (46);
>> +  if (in == (0x7642FFFFFFFFFFFFLL))
>> +    return foo (51);
>> +  if (in == (0x7567000000LL))
>> +    return foo (9);
>> +  if (in == (0xFFF8567FFFFFFFFFLL))
>> +    return foo (19);
>> +
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743_1.c b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>> new file mode 100644
>> index 00000000000..2c08c56714a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>> @@ -0,0 +1,95 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -std=c99" } */
>> +
>> +int
>> +foo (int a)
>> +{
>> +  return a + 6;
>> +}
>> +
>> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
>> +{
>> +  if (in == (0x8642000000000000ULL))
>> +    return foo (1);
>> +  if (in == (0x7642000000000000ULL))
>> +    return foo (12);
>> +  if (in == (0x8000000000000000ULL))
>> +    return foo (32);
>> +  if (in == (0x8000000000000001ULL))
>> +    return foo (33);
>> +  if (in == (0x8642FFFFFFFFFFFFULL))
>> +    return foo (46);
>> +  if (in == (0x7642FFFFFFFFFFFFULL))
>> +    return foo (51);
>> +  if (in == (0x7567000000ULL))
>> +    return foo (9);
>> +  if (in == (0xFFF8567FFFFFFFFFULL))
>> +    return foo (19);
>> +  
>> +  return 0;
>> +}
>> +
>> +int __attribute__ ((noinline)) di_fun (long long in)
>> +{
>> +  if (in == (0x8642000000000000LL))
>> +    return foo (1);
>> +  if (in == (0x7642000000000000LL))
>> +    return foo (12);
>> +  if (in == (0x8000000000000000LL))
>> +    return foo (32);
>> +  if (in == (0x8000000000000001LL))
>> +    return foo (33);
>> +  if (in == (0x8642FFFFFFFFFFFFLL))
>> +    return foo (46);
>> +  if (in == (0x7642FFFFFFFFFFFFLL))
>> +    return foo (51);
>> +  return 0;
>> +}
>> +
>> +int
>> +main ()
>> +{
>> +  int e = 0;
>> +  if (udi_fun (6) != 0)
>> +    e++;
>> +  if (udi_fun (0x8642000000000000ULL) != foo (1))
>> +    e++;
>> +  if (udi_fun (0x7642000000000000ULL) != foo (12))
>> +    e++;
>> +  if (udi_fun (0x8000000000000000ULL) != foo (32))
>> +    e++;
>> +  if (udi_fun (0x8000000000000001ULL) != foo (33))
>> +    e++;
>> +  if (udi_fun (0x8642FFFFFFFFFFFFULL) != foo (46))
>> +    e++;
>> +  if (udi_fun (0x7642FFFFFFFFFFFFULL) != foo (51))
>> +    e++;
>> +  if (udi_fun (0x7567000000ULL) != foo (9))
>> +    e++;
>> +  if (udi_fun (0xFFF8567FFFFFFFFFULL) != foo (19))
>> +    e++;
>> +
>> +  if (di_fun (6) != 0)
>> +    e++;
>> +  if (di_fun (0x8642000000000000LL) != foo (1))
>> +    e++;
>> +  if (di_fun (0x7642000000000000LL) != foo (12))
>> +    e++;
>> +  if (di_fun (0x8000000000000000LL) != foo (32))
>> +    e++;
>> +  if (di_fun (0x8000000000000001LL) != foo (33))
>> +    e++;
>> +  if (di_fun (0x8642FFFFFFFFFFFFLL) != foo (46))
>> +    e++;
>> +  if (di_fun (0x7642FFFFFFFFFFFFLL) != foo (51))
>> +    e++;
>> +  if (udi_fun (0x7567000000LL) != foo (9))
>> +    e++;
>> +  if (udi_fun (0xFFF8567FFFFFFFFFLL) != foo (19))
>> +    e++;
>> +
>> +  if (e)
>> +    __builtin_abort ();
>> +  return 0;
>> +}
>> +

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Ping^3: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
  2022-10-12  6:42   ` Ping^2: " Jiufu Guo
@ 2022-11-09  3:01     ` Jiufu Guo
  2022-12-02  2:45       ` Ping^4: " Jiufu Guo
  0 siblings, 1 reply; 11+ messages in thread
From: Jiufu Guo @ 2022-11-09  3:01 UTC (permalink / raw)
  To: Jiufu Guo via Gcc-patches; +Cc: segher, dje.gcc, linkw


Hi,

Gentle ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html

BR,
Jeff (Jiufu)


Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Gentle ping:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>
> BR,
> Jeff (Jiufu)
>
> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
>> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>>
>> BR,
>> Jeff(Jiufu)
>>
>>
>> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>>
>>> Hi,
>>>
>>> When checking eq/ne with a constant which has only 16bits, it can be
>>> optimized to check the rotated data.  By this, the constant building
>>> is optimized.
>>>
>>> As the example in PR103743:
>>> For "in == 0x8000000000000000LL", this patch generates:
>>>         rotldi %r3,%r3,16
>>>         cmpldi %cr0,%r3,32768
>>> instead:
>>>         li %r9,-1
>>>         rldicr %r9,%r9,0,0
>>>         cmpd %cr0,%r3,%r9
>>>
>>> Compare with previous patchs:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600385.html
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600198.html
>>>
>>> This patch releases the condition on can_create_pseudo_p and adds
>>> clobbers to allow the splitter can be run both before and after RA.
>>>
>>> This is updated patch based on previous patch and comments:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600315.html
>>>
>>> This patch pass bootstrap and regtest on ppc64 and ppc64le.
>>> Is it ok for trunk?  Thanks for comments!
>>>
>>> BR,
>>> Jeff(Jiufu)
>>>
>>>
>>> 	PR target/103743
>>>
>>> gcc/ChangeLog:
>>>
>>> 	* config/rs6000/rs6000-protos.h (rotate_from_leading_zeros_const): New.
>>> 	(compare_rotate_immediate_p): New.
>>> 	* config/rs6000/rs6000.cc (rotate_from_leading_zeros_const): New
>>> 	definition.
>>> 	(compare_rotate_immediate_p): New definition.
>>> 	* config/rs6000/rs6000.md (EQNE): New code_attr.
>>> 	(*rotate_on_cmpdi): New define_insn_and_split.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	* gcc.target/powerpc/pr103743.c: New test.
>>> 	* gcc.target/powerpc/pr103743_1.c: New test.
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000-protos.h             |  2 +
>>>  gcc/config/rs6000/rs6000.cc                   | 41 ++++++++
>>>  gcc/config/rs6000/rs6000.md                   | 62 +++++++++++-
>>>  gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++++++++++
>>>  gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++++++++++++++++++
>>>  5 files changed, 251 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
>>> index b3c16e7448d..78847e6b3db 100644
>>> --- a/gcc/config/rs6000/rs6000-protos.h
>>> +++ b/gcc/config/rs6000/rs6000-protos.h
>>> @@ -35,6 +35,8 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
>>>  extern int vspltis_shifted (rtx);
>>>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>>>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
>>> +extern int rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT, int);
>>> +extern bool compare_rotate_immediate_p (unsigned HOST_WIDE_INT);
>>>  extern int num_insns_constant (rtx, machine_mode);
>>>  extern int small_data_operand (rtx, machine_mode);
>>>  extern bool mem_operand_gpr (rtx, machine_mode);
>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>> index df491bee2ea..a548db42660 100644
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -14797,6 +14797,47 @@ rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
>>>      return reverse_condition (code);
>>>  }
>>>  
>>> +/* Check if C can be rotated from an immediate which starts (as 64bit integer)
>>> +   with at least CLZ bits zero.
>>> +
>>> +   Return the number by which C can be rotated from the immediate.
>>> +   Return -1 if C can not be rotated as from.  */
>>> +
>>> +int
>>> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
>>> +{
>>> +  /* case a. 0..0xxx: already at least clz zeros.  */
>>> +  int lz = clz_hwi (c);
>>> +  if (lz >= clz)
>>> +    return 0;
>>> +
>>> +  /* case b. 0..0xxx0..0: at least clz zeros.  */
>>> +  int tz = ctz_hwi (c);
>>> +  if (lz + tz >= clz)
>>> +    return tz;
>>> +
>>> +  /* case c. xx10.....0xx: rotate 'clz + 1' bits firstly, then check case b.
>>> +	       ^bit -> Vbit
>>> +	     00...00xxx100, 'clz + 1' >= bits of xxxx.  */
>>> +  const int rot_bits = HOST_BITS_PER_WIDE_INT - clz + 1;
>>> +  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
>>> +  tz = ctz_hwi (rc);
>>> +  if (clz_hwi (rc) + tz >= clz)
>>> +    return tz + rot_bits;
>>> +
>>> +  return -1;
>>> +}
>>> +
>>> +/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi.  */
>>> +
>>> +bool
>>> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
>>> +{
>>> +  /* leading 48 zeros (cmpldi), or leading 49 ones (cmpdi).  */
>>> +  return rotate_from_leading_zeros_const (~c, 49) > 0
>>> +	 || rotate_from_leading_zeros_const (c, 48) > 0;
>>> +}
>>> +
>>>  /* Generate a compare for CODE.  Return a brand-new rtx that
>>>     represents the result of the compare.  */
>>>  
>>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>>> index e9e5cd1e54d..cad3cfc98cd 100644
>>> --- a/gcc/config/rs6000/rs6000.md
>>> +++ b/gcc/config/rs6000/rs6000.md
>>> @@ -7766,6 +7766,67 @@ (define_insn "*movsi_from_df"
>>>    "xscvdpsp %x0,%x1"
>>>    [(set_attr "type" "fp")])
>>>  
>>> +
>>> +(define_code_iterator eqne [eq ne])
>>> +(define_code_attr EQNE [(eq "EQ") (ne "NE")])
>>> +
>>> +;; "i == C" ==> "rotl(i,N) == rotl(C,N)"
>>> +(define_insn_and_split "*rotate_on_cmpdi"
>>> +  [(set (pc)
>>> +	(if_then_else (eqne (match_operand:DI 1 "gpc_reg_operand" "r")
>>> +			    (match_operand:DI 2 "const_int_operand" "n"))
>>> +		      (label_ref (match_operand 0 ""))
>>> +		      (pc)))
>>> +  (clobber (match_scratch:DI 3 "=r"))
>>> +  (clobber (match_scratch:CCUNS 4 "=y"))]
>>> +  "TARGET_POWERPC64 && num_insns_constant (operands[2], DImode) > 1
>>> +   && compare_rotate_immediate_p (UINTVAL (operands[2]))"
>>> +   "#"
>>> +   "&& 1"
>>> +  [(pc)]
>>> +{
>>> +  rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
>>> +  bool sgn = false;
>>> +  unsigned HOST_WIDE_INT C = INTVAL (operands[2]);
>>> +  int rot = rotate_from_leading_zeros_const (C, 48);
>>> +  if (rot < 0)
>>> +    {
>>> +      sgn = true;
>>> +      rot = rotate_from_leading_zeros_const (~C, 49);
>>> +    }
>>> +  rtx n = GEN_INT (HOST_BITS_PER_WIDE_INT - rot);
>>> +
>>> +  /* i' = rotl (i, n) */
>>> +  rtx op0 = can_create_pseudo_p () ? gen_reg_rtx (DImode) : operands[3];
>>> +  emit_insn (gen_rtx_SET (op0, gen_rtx_ROTATE (DImode, operands[1], n)));
>>> +
>>> +  /* C' = rotl (C, n) */
>>> +  rtx op1 = GEN_INT ((C << INTVAL (n)) | (C >> rot));
>>> +
>>> +  /* i' ==  C' */
>>> +  machine_mode comp_mode = sgn ? CCmode : CCUNSmode;
>>> +  rtx cc = can_create_pseudo_p () ? gen_reg_rtx (comp_mode) : operands[4];
>>> +  PUT_MODE (cc, comp_mode);
>>> +  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (comp_mode, op0, op1)));
>>> +  rtx cmp = gen_rtx_<EQNE> (CCmode, cc, const0_rtx);
>>> +  rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[0]);
>>> +  emit_jump_insn (gen_rtx_SET (pc_rtx,
>>> +			       gen_rtx_IF_THEN_ELSE (VOIDmode, cmp,
>>> +						     loc_ref, pc_rtx)));
>>> +
>>> +  /* keep the probability info for the prediction of the branch insn.  */
>>> +  if (note)
>>> +    {
>>> +      profile_probability prob
>>> +	= profile_probability::from_reg_br_prob_note (XINT (note, 0));
>>> +
>>> +      add_reg_br_prob_note (get_last_insn (), prob);
>>> +    }
>>> +
>>> +  DONE;
>>> +}
>>> +)
>>> +
>>>  ;; Split a load of a large constant into the appropriate two-insn
>>>  ;; sequence.
>>>  
>>> @@ -13472,7 +13533,6 @@ (define_expand "@ctr<mode>"
>>>  ;; rs6000_legitimate_combined_insn prevents combine creating any of
>>>  ;; the ctr<mode> insns.
>>>  
>>> -(define_code_iterator eqne [eq ne])
>>>  (define_code_attr bd [(eq "bdz") (ne "bdnz")])
>>>  (define_code_attr bd_neg [(eq "bdnz") (ne "bdz")])
>>>  
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743.c b/gcc/testsuite/gcc.target/powerpc/pr103743.c
>>> new file mode 100644
>>> index 00000000000..abb876ed79e
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
>>> @@ -0,0 +1,52 @@
>>> +/* { dg-options "-O2" } */
>>> +/* { dg-do compile { target has_arch_ppc64 } } */
>>> +
>>> +/* { dg-final { scan-assembler-times {\mcmpldi\M} 10  } } */
>>> +/* { dg-final { scan-assembler-times {\mcmpdi\M} 4  } } */
>>> +/* { dg-final { scan-assembler-times {\mrotldi\M} 14  } } */
>>> +
>>> +int foo (int a);
>>> +
>>> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
>>> +{
>>> +  if (in == (0x8642000000000000ULL))
>>> +    return foo (1);
>>> +  if (in == (0x7642000000000000ULL))
>>> +    return foo (12);
>>> +  if (in == (0x8000000000000000ULL))
>>> +    return foo (32);
>>> +  if (in == (0x8000000000000001ULL))
>>> +    return foo (33);
>>> +  if (in == (0x8642FFFFFFFFFFFFULL))
>>> +    return foo (46);
>>> +  if (in == (0x7642FFFFFFFFFFFFULL))
>>> +    return foo (51);
>>> +  if (in == (0x7567000000ULL))
>>> +    return foo (9);
>>> +  if (in == (0xFFF8567FFFFFFFFFULL))
>>> +    return foo (19);
>>> +
>>> +  return 0;
>>> +}
>>> +
>>> +int __attribute__ ((noinline)) di_fun (long long in)
>>> +{
>>> +  if (in == (0x8642000000000000LL))
>>> +    return foo (1);
>>> +  if (in == (0x7642000000000000LL))
>>> +    return foo (12);
>>> +  if (in == (0x8000000000000000LL))
>>> +    return foo (32);
>>> +  if (in == (0x8000000000000001LL))
>>> +    return foo (33);
>>> +  if (in == (0x8642FFFFFFFFFFFFLL))
>>> +    return foo (46);
>>> +  if (in == (0x7642FFFFFFFFFFFFLL))
>>> +    return foo (51);
>>> +  if (in == (0x7567000000LL))
>>> +    return foo (9);
>>> +  if (in == (0xFFF8567FFFFFFFFFLL))
>>> +    return foo (19);
>>> +
>>> +  return 0;
>>> +}
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743_1.c b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>> new file mode 100644
>>> index 00000000000..2c08c56714a
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>> @@ -0,0 +1,95 @@
>>> +/* { dg-do run } */
>>> +/* { dg-options "-O2 -std=c99" } */
>>> +
>>> +int
>>> +foo (int a)
>>> +{
>>> +  return a + 6;
>>> +}
>>> +
>>> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
>>> +{
>>> +  if (in == (0x8642000000000000ULL))
>>> +    return foo (1);
>>> +  if (in == (0x7642000000000000ULL))
>>> +    return foo (12);
>>> +  if (in == (0x8000000000000000ULL))
>>> +    return foo (32);
>>> +  if (in == (0x8000000000000001ULL))
>>> +    return foo (33);
>>> +  if (in == (0x8642FFFFFFFFFFFFULL))
>>> +    return foo (46);
>>> +  if (in == (0x7642FFFFFFFFFFFFULL))
>>> +    return foo (51);
>>> +  if (in == (0x7567000000ULL))
>>> +    return foo (9);
>>> +  if (in == (0xFFF8567FFFFFFFFFULL))
>>> +    return foo (19);
>>> +  
>>> +  return 0;
>>> +}
>>> +
>>> +int __attribute__ ((noinline)) di_fun (long long in)
>>> +{
>>> +  if (in == (0x8642000000000000LL))
>>> +    return foo (1);
>>> +  if (in == (0x7642000000000000LL))
>>> +    return foo (12);
>>> +  if (in == (0x8000000000000000LL))
>>> +    return foo (32);
>>> +  if (in == (0x8000000000000001LL))
>>> +    return foo (33);
>>> +  if (in == (0x8642FFFFFFFFFFFFLL))
>>> +    return foo (46);
>>> +  if (in == (0x7642FFFFFFFFFFFFLL))
>>> +    return foo (51);
>>> +  return 0;
>>> +}
>>> +
>>> +int
>>> +main ()
>>> +{
>>> +  int e = 0;
>>> +  if (udi_fun (6) != 0)
>>> +    e++;
>>> +  if (udi_fun (0x8642000000000000ULL) != foo (1))
>>> +    e++;
>>> +  if (udi_fun (0x7642000000000000ULL) != foo (12))
>>> +    e++;
>>> +  if (udi_fun (0x8000000000000000ULL) != foo (32))
>>> +    e++;
>>> +  if (udi_fun (0x8000000000000001ULL) != foo (33))
>>> +    e++;
>>> +  if (udi_fun (0x8642FFFFFFFFFFFFULL) != foo (46))
>>> +    e++;
>>> +  if (udi_fun (0x7642FFFFFFFFFFFFULL) != foo (51))
>>> +    e++;
>>> +  if (udi_fun (0x7567000000ULL) != foo (9))
>>> +    e++;
>>> +  if (udi_fun (0xFFF8567FFFFFFFFFULL) != foo (19))
>>> +    e++;
>>> +
>>> +  if (di_fun (6) != 0)
>>> +    e++;
>>> +  if (di_fun (0x8642000000000000LL) != foo (1))
>>> +    e++;
>>> +  if (di_fun (0x7642000000000000LL) != foo (12))
>>> +    e++;
>>> +  if (di_fun (0x8000000000000000LL) != foo (32))
>>> +    e++;
>>> +  if (di_fun (0x8000000000000001LL) != foo (33))
>>> +    e++;
>>> +  if (di_fun (0x8642FFFFFFFFFFFFLL) != foo (46))
>>> +    e++;
>>> +  if (di_fun (0x7642FFFFFFFFFFFFLL) != foo (51))
>>> +    e++;
>>> +  if (udi_fun (0x7567000000LL) != foo (9))
>>> +    e++;
>>> +  if (udi_fun (0xFFF8567FFFFFFFFFLL) != foo (19))
>>> +    e++;
>>> +
>>> +  if (e)
>>> +    __builtin_abort ();
>>> +  return 0;
>>> +}
>>> +

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Ping^4: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
  2022-11-09  3:01     ` Ping^3: " Jiufu Guo
@ 2022-12-02  2:45       ` Jiufu Guo
  2022-12-13  3:49         ` Ping^5: " Jiufu Guo
  0 siblings, 1 reply; 11+ messages in thread
From: Jiufu Guo @ 2022-12-02  2:45 UTC (permalink / raw)
  To: Jiufu Guo via Gcc-patches; +Cc: segher, dje.gcc, linkw

Hi,

Gentle ping:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html

BR,
Jeff(Jiufu)

Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Hi,
>
> Gentle ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>
> BR,
> Jeff (Jiufu)
>
>
> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
>> Gentle ping:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>>
>> BR,
>> Jeff (Jiufu)
>>
>> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>
>>> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>>>
>>> BR,
>>> Jeff(Jiufu)
>>>
>>>
>>> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>>>
>>>> Hi,
>>>>
>>>> When checking eq/ne with a constant which has only 16bits, it can be
>>>> optimized to check the rotated data.  By this, the constant building
>>>> is optimized.
>>>>
>>>> As the example in PR103743:
>>>> For "in == 0x8000000000000000LL", this patch generates:
>>>>         rotldi %r3,%r3,16
>>>>         cmpldi %cr0,%r3,32768
>>>> instead:
>>>>         li %r9,-1
>>>>         rldicr %r9,%r9,0,0
>>>>         cmpd %cr0,%r3,%r9
>>>>
>>>> Compare with previous patchs:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600385.html
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600198.html
>>>>
>>>> This patch releases the condition on can_create_pseudo_p and adds
>>>> clobbers to allow the splitter can be run both before and after RA.
>>>>
>>>> This is updated patch based on previous patch and comments:
>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600315.html
>>>>
>>>> This patch pass bootstrap and regtest on ppc64 and ppc64le.
>>>> Is it ok for trunk?  Thanks for comments!
>>>>
>>>> BR,
>>>> Jeff(Jiufu)
>>>>
>>>>
>>>> 	PR target/103743
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 	* config/rs6000/rs6000-protos.h (rotate_from_leading_zeros_const): New.
>>>> 	(compare_rotate_immediate_p): New.
>>>> 	* config/rs6000/rs6000.cc (rotate_from_leading_zeros_const): New
>>>> 	definition.
>>>> 	(compare_rotate_immediate_p): New definition.
>>>> 	* config/rs6000/rs6000.md (EQNE): New code_attr.
>>>> 	(*rotate_on_cmpdi): New define_insn_and_split.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 	* gcc.target/powerpc/pr103743.c: New test.
>>>> 	* gcc.target/powerpc/pr103743_1.c: New test.
>>>>
>>>> ---
>>>>  gcc/config/rs6000/rs6000-protos.h             |  2 +
>>>>  gcc/config/rs6000/rs6000.cc                   | 41 ++++++++
>>>>  gcc/config/rs6000/rs6000.md                   | 62 +++++++++++-
>>>>  gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++++++++++
>>>>  gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++++++++++++++++++
>>>>  5 files changed, 251 insertions(+), 1 deletion(-)
>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>>>
>>>> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
>>>> index b3c16e7448d..78847e6b3db 100644
>>>> --- a/gcc/config/rs6000/rs6000-protos.h
>>>> +++ b/gcc/config/rs6000/rs6000-protos.h
>>>> @@ -35,6 +35,8 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
>>>>  extern int vspltis_shifted (rtx);
>>>>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>>>>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
>>>> +extern int rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT, int);
>>>> +extern bool compare_rotate_immediate_p (unsigned HOST_WIDE_INT);
>>>>  extern int num_insns_constant (rtx, machine_mode);
>>>>  extern int small_data_operand (rtx, machine_mode);
>>>>  extern bool mem_operand_gpr (rtx, machine_mode);
>>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>>> index df491bee2ea..a548db42660 100644
>>>> --- a/gcc/config/rs6000/rs6000.cc
>>>> +++ b/gcc/config/rs6000/rs6000.cc
>>>> @@ -14797,6 +14797,47 @@ rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
>>>>      return reverse_condition (code);
>>>>  }
>>>>  
>>>> +/* Check if C can be rotated from an immediate which starts (as 64bit integer)
>>>> +   with at least CLZ bits zero.
>>>> +
>>>> +   Return the number by which C can be rotated from the immediate.
>>>> +   Return -1 if C can not be rotated as from.  */
>>>> +
>>>> +int
>>>> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
>>>> +{
>>>> +  /* case a. 0..0xxx: already at least clz zeros.  */
>>>> +  int lz = clz_hwi (c);
>>>> +  if (lz >= clz)
>>>> +    return 0;
>>>> +
>>>> +  /* case b. 0..0xxx0..0: at least clz zeros.  */
>>>> +  int tz = ctz_hwi (c);
>>>> +  if (lz + tz >= clz)
>>>> +    return tz;
>>>> +
>>>> +  /* case c. xx10.....0xx: rotate 'clz + 1' bits firstly, then check case b.
>>>> +	       ^bit -> Vbit
>>>> +	     00...00xxx100, 'clz + 1' >= bits of xxxx.  */
>>>> +  const int rot_bits = HOST_BITS_PER_WIDE_INT - clz + 1;
>>>> +  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
>>>> +  tz = ctz_hwi (rc);
>>>> +  if (clz_hwi (rc) + tz >= clz)
>>>> +    return tz + rot_bits;
>>>> +
>>>> +  return -1;
>>>> +}
>>>> +
>>>> +/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi.  */
>>>> +
>>>> +bool
>>>> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
>>>> +{
>>>> +  /* leading 48 zeros (cmpldi), or leading 49 ones (cmpdi).  */
>>>> +  return rotate_from_leading_zeros_const (~c, 49) > 0
>>>> +	 || rotate_from_leading_zeros_const (c, 48) > 0;
>>>> +}
>>>> +
>>>>  /* Generate a compare for CODE.  Return a brand-new rtx that
>>>>     represents the result of the compare.  */
>>>>  
>>>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>>>> index e9e5cd1e54d..cad3cfc98cd 100644
>>>> --- a/gcc/config/rs6000/rs6000.md
>>>> +++ b/gcc/config/rs6000/rs6000.md
>>>> @@ -7766,6 +7766,67 @@ (define_insn "*movsi_from_df"
>>>>    "xscvdpsp %x0,%x1"
>>>>    [(set_attr "type" "fp")])
>>>>  
>>>> +
>>>> +(define_code_iterator eqne [eq ne])
>>>> +(define_code_attr EQNE [(eq "EQ") (ne "NE")])
>>>> +
>>>> +;; "i == C" ==> "rotl(i,N) == rotl(C,N)"
>>>> +(define_insn_and_split "*rotate_on_cmpdi"
>>>> +  [(set (pc)
>>>> +	(if_then_else (eqne (match_operand:DI 1 "gpc_reg_operand" "r")
>>>> +			    (match_operand:DI 2 "const_int_operand" "n"))
>>>> +		      (label_ref (match_operand 0 ""))
>>>> +		      (pc)))
>>>> +  (clobber (match_scratch:DI 3 "=r"))
>>>> +  (clobber (match_scratch:CCUNS 4 "=y"))]
>>>> +  "TARGET_POWERPC64 && num_insns_constant (operands[2], DImode) > 1
>>>> +   && compare_rotate_immediate_p (UINTVAL (operands[2]))"
>>>> +   "#"
>>>> +   "&& 1"
>>>> +  [(pc)]
>>>> +{
>>>> +  rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
>>>> +  bool sgn = false;
>>>> +  unsigned HOST_WIDE_INT C = INTVAL (operands[2]);
>>>> +  int rot = rotate_from_leading_zeros_const (C, 48);
>>>> +  if (rot < 0)
>>>> +    {
>>>> +      sgn = true;
>>>> +      rot = rotate_from_leading_zeros_const (~C, 49);
>>>> +    }
>>>> +  rtx n = GEN_INT (HOST_BITS_PER_WIDE_INT - rot);
>>>> +
>>>> +  /* i' = rotl (i, n) */
>>>> +  rtx op0 = can_create_pseudo_p () ? gen_reg_rtx (DImode) : operands[3];
>>>> +  emit_insn (gen_rtx_SET (op0, gen_rtx_ROTATE (DImode, operands[1], n)));
>>>> +
>>>> +  /* C' = rotl (C, n) */
>>>> +  rtx op1 = GEN_INT ((C << INTVAL (n)) | (C >> rot));
>>>> +
>>>> +  /* i' ==  C' */
>>>> +  machine_mode comp_mode = sgn ? CCmode : CCUNSmode;
>>>> +  rtx cc = can_create_pseudo_p () ? gen_reg_rtx (comp_mode) : operands[4];
>>>> +  PUT_MODE (cc, comp_mode);
>>>> +  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (comp_mode, op0, op1)));
>>>> +  rtx cmp = gen_rtx_<EQNE> (CCmode, cc, const0_rtx);
>>>> +  rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[0]);
>>>> +  emit_jump_insn (gen_rtx_SET (pc_rtx,
>>>> +			       gen_rtx_IF_THEN_ELSE (VOIDmode, cmp,
>>>> +						     loc_ref, pc_rtx)));
>>>> +
>>>> +  /* keep the probability info for the prediction of the branch insn.  */
>>>> +  if (note)
>>>> +    {
>>>> +      profile_probability prob
>>>> +	= profile_probability::from_reg_br_prob_note (XINT (note, 0));
>>>> +
>>>> +      add_reg_br_prob_note (get_last_insn (), prob);
>>>> +    }
>>>> +
>>>> +  DONE;
>>>> +}
>>>> +)
>>>> +
>>>>  ;; Split a load of a large constant into the appropriate two-insn
>>>>  ;; sequence.
>>>>  
>>>> @@ -13472,7 +13533,6 @@ (define_expand "@ctr<mode>"
>>>>  ;; rs6000_legitimate_combined_insn prevents combine creating any of
>>>>  ;; the ctr<mode> insns.
>>>>  
>>>> -(define_code_iterator eqne [eq ne])
>>>>  (define_code_attr bd [(eq "bdz") (ne "bdnz")])
>>>>  (define_code_attr bd_neg [(eq "bdnz") (ne "bdz")])
>>>>  
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743.c b/gcc/testsuite/gcc.target/powerpc/pr103743.c
>>>> new file mode 100644
>>>> index 00000000000..abb876ed79e
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
>>>> @@ -0,0 +1,52 @@
>>>> +/* { dg-options "-O2" } */
>>>> +/* { dg-do compile { target has_arch_ppc64 } } */
>>>> +
>>>> +/* { dg-final { scan-assembler-times {\mcmpldi\M} 10  } } */
>>>> +/* { dg-final { scan-assembler-times {\mcmpdi\M} 4  } } */
>>>> +/* { dg-final { scan-assembler-times {\mrotldi\M} 14  } } */
>>>> +
>>>> +int foo (int a);
>>>> +
>>>> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
>>>> +{
>>>> +  if (in == (0x8642000000000000ULL))
>>>> +    return foo (1);
>>>> +  if (in == (0x7642000000000000ULL))
>>>> +    return foo (12);
>>>> +  if (in == (0x8000000000000000ULL))
>>>> +    return foo (32);
>>>> +  if (in == (0x8000000000000001ULL))
>>>> +    return foo (33);
>>>> +  if (in == (0x8642FFFFFFFFFFFFULL))
>>>> +    return foo (46);
>>>> +  if (in == (0x7642FFFFFFFFFFFFULL))
>>>> +    return foo (51);
>>>> +  if (in == (0x7567000000ULL))
>>>> +    return foo (9);
>>>> +  if (in == (0xFFF8567FFFFFFFFFULL))
>>>> +    return foo (19);
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +int __attribute__ ((noinline)) di_fun (long long in)
>>>> +{
>>>> +  if (in == (0x8642000000000000LL))
>>>> +    return foo (1);
>>>> +  if (in == (0x7642000000000000LL))
>>>> +    return foo (12);
>>>> +  if (in == (0x8000000000000000LL))
>>>> +    return foo (32);
>>>> +  if (in == (0x8000000000000001LL))
>>>> +    return foo (33);
>>>> +  if (in == (0x8642FFFFFFFFFFFFLL))
>>>> +    return foo (46);
>>>> +  if (in == (0x7642FFFFFFFFFFFFLL))
>>>> +    return foo (51);
>>>> +  if (in == (0x7567000000LL))
>>>> +    return foo (9);
>>>> +  if (in == (0xFFF8567FFFFFFFFFLL))
>>>> +    return foo (19);
>>>> +
>>>> +  return 0;
>>>> +}
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743_1.c b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>>> new file mode 100644
>>>> index 00000000000..2c08c56714a
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>>> @@ -0,0 +1,95 @@
>>>> +/* { dg-do run } */
>>>> +/* { dg-options "-O2 -std=c99" } */
>>>> +
>>>> +int
>>>> +foo (int a)
>>>> +{
>>>> +  return a + 6;
>>>> +}
>>>> +
>>>> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
>>>> +{
>>>> +  if (in == (0x8642000000000000ULL))
>>>> +    return foo (1);
>>>> +  if (in == (0x7642000000000000ULL))
>>>> +    return foo (12);
>>>> +  if (in == (0x8000000000000000ULL))
>>>> +    return foo (32);
>>>> +  if (in == (0x8000000000000001ULL))
>>>> +    return foo (33);
>>>> +  if (in == (0x8642FFFFFFFFFFFFULL))
>>>> +    return foo (46);
>>>> +  if (in == (0x7642FFFFFFFFFFFFULL))
>>>> +    return foo (51);
>>>> +  if (in == (0x7567000000ULL))
>>>> +    return foo (9);
>>>> +  if (in == (0xFFF8567FFFFFFFFFULL))
>>>> +    return foo (19);
>>>> +  
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +int __attribute__ ((noinline)) di_fun (long long in)
>>>> +{
>>>> +  if (in == (0x8642000000000000LL))
>>>> +    return foo (1);
>>>> +  if (in == (0x7642000000000000LL))
>>>> +    return foo (12);
>>>> +  if (in == (0x8000000000000000LL))
>>>> +    return foo (32);
>>>> +  if (in == (0x8000000000000001LL))
>>>> +    return foo (33);
>>>> +  if (in == (0x8642FFFFFFFFFFFFLL))
>>>> +    return foo (46);
>>>> +  if (in == (0x7642FFFFFFFFFFFFLL))
>>>> +    return foo (51);
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +int
>>>> +main ()
>>>> +{
>>>> +  int e = 0;
>>>> +  if (udi_fun (6) != 0)
>>>> +    e++;
>>>> +  if (udi_fun (0x8642000000000000ULL) != foo (1))
>>>> +    e++;
>>>> +  if (udi_fun (0x7642000000000000ULL) != foo (12))
>>>> +    e++;
>>>> +  if (udi_fun (0x8000000000000000ULL) != foo (32))
>>>> +    e++;
>>>> +  if (udi_fun (0x8000000000000001ULL) != foo (33))
>>>> +    e++;
>>>> +  if (udi_fun (0x8642FFFFFFFFFFFFULL) != foo (46))
>>>> +    e++;
>>>> +  if (udi_fun (0x7642FFFFFFFFFFFFULL) != foo (51))
>>>> +    e++;
>>>> +  if (udi_fun (0x7567000000ULL) != foo (9))
>>>> +    e++;
>>>> +  if (udi_fun (0xFFF8567FFFFFFFFFULL) != foo (19))
>>>> +    e++;
>>>> +
>>>> +  if (di_fun (6) != 0)
>>>> +    e++;
>>>> +  if (di_fun (0x8642000000000000LL) != foo (1))
>>>> +    e++;
>>>> +  if (di_fun (0x7642000000000000LL) != foo (12))
>>>> +    e++;
>>>> +  if (di_fun (0x8000000000000000LL) != foo (32))
>>>> +    e++;
>>>> +  if (di_fun (0x8000000000000001LL) != foo (33))
>>>> +    e++;
>>>> +  if (di_fun (0x8642FFFFFFFFFFFFLL) != foo (46))
>>>> +    e++;
>>>> +  if (di_fun (0x7642FFFFFFFFFFFFLL) != foo (51))
>>>> +    e++;
>>>> +  if (udi_fun (0x7567000000LL) != foo (9))
>>>> +    e++;
>>>> +  if (udi_fun (0xFFF8567FFFFFFFFFLL) != foo (19))
>>>> +    e++;
>>>> +
>>>> +  if (e)
>>>> +    __builtin_abort ();
>>>> +  return 0;
>>>> +}
>>>> +

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Ping^5: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
  2022-12-02  2:45       ` Ping^4: " Jiufu Guo
@ 2022-12-13  3:49         ` Jiufu Guo
  0 siblings, 0 replies; 11+ messages in thread
From: Jiufu Guo @ 2022-12-13  3:49 UTC (permalink / raw)
  To: Jiufu Guo via Gcc-patches; +Cc: segher, dje.gcc, linkw

Hi,

I would like to ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html

BR,
Jeff (Jiufu)

Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Hi,
>
> Gentle ping:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>
> BR,
> Jeff(Jiufu)
>
> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>
>> Hi,
>>
>> Gentle ping this:
>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>>
>> BR,
>> Jeff (Jiufu)
>>
>>
>> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>
>>> Gentle ping:
>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>>>
>>> BR,
>>> Jeff (Jiufu)
>>>
>>> Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>>
>>>> Ping: https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html
>>>>
>>>> BR,
>>>> Jeff(Jiufu)
>>>>
>>>>
>>>> Jiufu Guo <guojiufu@linux.ibm.com> writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> When checking eq/ne with a constant which has only 16bits, it can be
>>>>> optimized to check the rotated data.  By this, the constant building
>>>>> is optimized.
>>>>>
>>>>> As the example in PR103743:
>>>>> For "in == 0x8000000000000000LL", this patch generates:
>>>>>         rotldi %r3,%r3,16
>>>>>         cmpldi %cr0,%r3,32768
>>>>> instead:
>>>>>         li %r9,-1
>>>>>         rldicr %r9,%r9,0,0
>>>>>         cmpd %cr0,%r3,%r9
>>>>>
>>>>> Compare with previous patchs:
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600385.html
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600198.html
>>>>>
>>>>> This patch releases the condition on can_create_pseudo_p and adds
>>>>> clobbers to allow the splitter can be run both before and after RA.
>>>>>
>>>>> This is updated patch based on previous patch and comments:
>>>>> https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600315.html
>>>>>
>>>>> This patch pass bootstrap and regtest on ppc64 and ppc64le.
>>>>> Is it ok for trunk?  Thanks for comments!
>>>>>
>>>>> BR,
>>>>> Jeff(Jiufu)
>>>>>
>>>>>
>>>>> 	PR target/103743
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 	* config/rs6000/rs6000-protos.h (rotate_from_leading_zeros_const): New.
>>>>> 	(compare_rotate_immediate_p): New.
>>>>> 	* config/rs6000/rs6000.cc (rotate_from_leading_zeros_const): New
>>>>> 	definition.
>>>>> 	(compare_rotate_immediate_p): New definition.
>>>>> 	* config/rs6000/rs6000.md (EQNE): New code_attr.
>>>>> 	(*rotate_on_cmpdi): New define_insn_and_split.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>> 	* gcc.target/powerpc/pr103743.c: New test.
>>>>> 	* gcc.target/powerpc/pr103743_1.c: New test.
>>>>>
>>>>> ---
>>>>>  gcc/config/rs6000/rs6000-protos.h             |  2 +
>>>>>  gcc/config/rs6000/rs6000.cc                   | 41 ++++++++
>>>>>  gcc/config/rs6000/rs6000.md                   | 62 +++++++++++-
>>>>>  gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++++++++++
>>>>>  gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++++++++++++++++++
>>>>>  5 files changed, 251 insertions(+), 1 deletion(-)
>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
>>>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>>>>
>>>>> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
>>>>> index b3c16e7448d..78847e6b3db 100644
>>>>> --- a/gcc/config/rs6000/rs6000-protos.h
>>>>> +++ b/gcc/config/rs6000/rs6000-protos.h
>>>>> @@ -35,6 +35,8 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
>>>>>  extern int vspltis_shifted (rtx);
>>>>>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>>>>>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
>>>>> +extern int rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT, int);
>>>>> +extern bool compare_rotate_immediate_p (unsigned HOST_WIDE_INT);
>>>>>  extern int num_insns_constant (rtx, machine_mode);
>>>>>  extern int small_data_operand (rtx, machine_mode);
>>>>>  extern bool mem_operand_gpr (rtx, machine_mode);
>>>>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>>>>> index df491bee2ea..a548db42660 100644
>>>>> --- a/gcc/config/rs6000/rs6000.cc
>>>>> +++ b/gcc/config/rs6000/rs6000.cc
>>>>> @@ -14797,6 +14797,47 @@ rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
>>>>>      return reverse_condition (code);
>>>>>  }
>>>>>  
>>>>> +/* Check if C can be rotated from an immediate which starts (as 64bit integer)
>>>>> +   with at least CLZ bits zero.
>>>>> +
>>>>> +   Return the number by which C can be rotated from the immediate.
>>>>> +   Return -1 if C can not be rotated as from.  */
>>>>> +
>>>>> +int
>>>>> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
>>>>> +{
>>>>> +  /* case a. 0..0xxx: already at least clz zeros.  */
>>>>> +  int lz = clz_hwi (c);
>>>>> +  if (lz >= clz)
>>>>> +    return 0;
>>>>> +
>>>>> +  /* case b. 0..0xxx0..0: at least clz zeros.  */
>>>>> +  int tz = ctz_hwi (c);
>>>>> +  if (lz + tz >= clz)
>>>>> +    return tz;
>>>>> +
>>>>> +  /* case c. xx10.....0xx: rotate 'clz + 1' bits firstly, then check case b.
>>>>> +	       ^bit -> Vbit
>>>>> +	     00...00xxx100, 'clz + 1' >= bits of xxxx.  */
>>>>> +  const int rot_bits = HOST_BITS_PER_WIDE_INT - clz + 1;
>>>>> +  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
>>>>> +  tz = ctz_hwi (rc);
>>>>> +  if (clz_hwi (rc) + tz >= clz)
>>>>> +    return tz + rot_bits;
>>>>> +
>>>>> +  return -1;
>>>>> +}
>>>>> +
>>>>> +/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi.  */
>>>>> +
>>>>> +bool
>>>>> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
>>>>> +{
>>>>> +  /* leading 48 zeros (cmpldi), or leading 49 ones (cmpdi).  */
>>>>> +  return rotate_from_leading_zeros_const (~c, 49) > 0
>>>>> +	 || rotate_from_leading_zeros_const (c, 48) > 0;
>>>>> +}
>>>>> +
>>>>>  /* Generate a compare for CODE.  Return a brand-new rtx that
>>>>>     represents the result of the compare.  */
>>>>>  
>>>>> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
>>>>> index e9e5cd1e54d..cad3cfc98cd 100644
>>>>> --- a/gcc/config/rs6000/rs6000.md
>>>>> +++ b/gcc/config/rs6000/rs6000.md
>>>>> @@ -7766,6 +7766,67 @@ (define_insn "*movsi_from_df"
>>>>>    "xscvdpsp %x0,%x1"
>>>>>    [(set_attr "type" "fp")])
>>>>>  
>>>>> +
>>>>> +(define_code_iterator eqne [eq ne])
>>>>> +(define_code_attr EQNE [(eq "EQ") (ne "NE")])
>>>>> +
>>>>> +;; "i == C" ==> "rotl(i,N) == rotl(C,N)"
>>>>> +(define_insn_and_split "*rotate_on_cmpdi"
>>>>> +  [(set (pc)
>>>>> +	(if_then_else (eqne (match_operand:DI 1 "gpc_reg_operand" "r")
>>>>> +			    (match_operand:DI 2 "const_int_operand" "n"))
>>>>> +		      (label_ref (match_operand 0 ""))
>>>>> +		      (pc)))
>>>>> +  (clobber (match_scratch:DI 3 "=r"))
>>>>> +  (clobber (match_scratch:CCUNS 4 "=y"))]
>>>>> +  "TARGET_POWERPC64 && num_insns_constant (operands[2], DImode) > 1
>>>>> +   && compare_rotate_immediate_p (UINTVAL (operands[2]))"
>>>>> +   "#"
>>>>> +   "&& 1"
>>>>> +  [(pc)]
>>>>> +{
>>>>> +  rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
>>>>> +  bool sgn = false;
>>>>> +  unsigned HOST_WIDE_INT C = INTVAL (operands[2]);
>>>>> +  int rot = rotate_from_leading_zeros_const (C, 48);
>>>>> +  if (rot < 0)
>>>>> +    {
>>>>> +      sgn = true;
>>>>> +      rot = rotate_from_leading_zeros_const (~C, 49);
>>>>> +    }
>>>>> +  rtx n = GEN_INT (HOST_BITS_PER_WIDE_INT - rot);
>>>>> +
>>>>> +  /* i' = rotl (i, n) */
>>>>> +  rtx op0 = can_create_pseudo_p () ? gen_reg_rtx (DImode) : operands[3];
>>>>> +  emit_insn (gen_rtx_SET (op0, gen_rtx_ROTATE (DImode, operands[1], n)));
>>>>> +
>>>>> +  /* C' = rotl (C, n) */
>>>>> +  rtx op1 = GEN_INT ((C << INTVAL (n)) | (C >> rot));
>>>>> +
>>>>> +  /* i' ==  C' */
>>>>> +  machine_mode comp_mode = sgn ? CCmode : CCUNSmode;
>>>>> +  rtx cc = can_create_pseudo_p () ? gen_reg_rtx (comp_mode) : operands[4];
>>>>> +  PUT_MODE (cc, comp_mode);
>>>>> +  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (comp_mode, op0, op1)));
>>>>> +  rtx cmp = gen_rtx_<EQNE> (CCmode, cc, const0_rtx);
>>>>> +  rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[0]);
>>>>> +  emit_jump_insn (gen_rtx_SET (pc_rtx,
>>>>> +			       gen_rtx_IF_THEN_ELSE (VOIDmode, cmp,
>>>>> +						     loc_ref, pc_rtx)));
>>>>> +
>>>>> +  /* keep the probability info for the prediction of the branch insn.  */
>>>>> +  if (note)
>>>>> +    {
>>>>> +      profile_probability prob
>>>>> +	= profile_probability::from_reg_br_prob_note (XINT (note, 0));
>>>>> +
>>>>> +      add_reg_br_prob_note (get_last_insn (), prob);
>>>>> +    }
>>>>> +
>>>>> +  DONE;
>>>>> +}
>>>>> +)
>>>>> +
>>>>>  ;; Split a load of a large constant into the appropriate two-insn
>>>>>  ;; sequence.
>>>>>  
>>>>> @@ -13472,7 +13533,6 @@ (define_expand "@ctr<mode>"
>>>>>  ;; rs6000_legitimate_combined_insn prevents combine creating any of
>>>>>  ;; the ctr<mode> insns.
>>>>>  
>>>>> -(define_code_iterator eqne [eq ne])
>>>>>  (define_code_attr bd [(eq "bdz") (ne "bdnz")])
>>>>>  (define_code_attr bd_neg [(eq "bdnz") (ne "bdz")])
>>>>>  
>>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743.c b/gcc/testsuite/gcc.target/powerpc/pr103743.c
>>>>> new file mode 100644
>>>>> index 00000000000..abb876ed79e
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
>>>>> @@ -0,0 +1,52 @@
>>>>> +/* { dg-options "-O2" } */
>>>>> +/* { dg-do compile { target has_arch_ppc64 } } */
>>>>> +
>>>>> +/* { dg-final { scan-assembler-times {\mcmpldi\M} 10  } } */
>>>>> +/* { dg-final { scan-assembler-times {\mcmpdi\M} 4  } } */
>>>>> +/* { dg-final { scan-assembler-times {\mrotldi\M} 14  } } */
>>>>> +
>>>>> +int foo (int a);
>>>>> +
>>>>> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
>>>>> +{
>>>>> +  if (in == (0x8642000000000000ULL))
>>>>> +    return foo (1);
>>>>> +  if (in == (0x7642000000000000ULL))
>>>>> +    return foo (12);
>>>>> +  if (in == (0x8000000000000000ULL))
>>>>> +    return foo (32);
>>>>> +  if (in == (0x8000000000000001ULL))
>>>>> +    return foo (33);
>>>>> +  if (in == (0x8642FFFFFFFFFFFFULL))
>>>>> +    return foo (46);
>>>>> +  if (in == (0x7642FFFFFFFFFFFFULL))
>>>>> +    return foo (51);
>>>>> +  if (in == (0x7567000000ULL))
>>>>> +    return foo (9);
>>>>> +  if (in == (0xFFF8567FFFFFFFFFULL))
>>>>> +    return foo (19);
>>>>> +
>>>>> +  return 0;
>>>>> +}
>>>>> +
>>>>> +int __attribute__ ((noinline)) di_fun (long long in)
>>>>> +{
>>>>> +  if (in == (0x8642000000000000LL))
>>>>> +    return foo (1);
>>>>> +  if (in == (0x7642000000000000LL))
>>>>> +    return foo (12);
>>>>> +  if (in == (0x8000000000000000LL))
>>>>> +    return foo (32);
>>>>> +  if (in == (0x8000000000000001LL))
>>>>> +    return foo (33);
>>>>> +  if (in == (0x8642FFFFFFFFFFFFLL))
>>>>> +    return foo (46);
>>>>> +  if (in == (0x7642FFFFFFFFFFFFLL))
>>>>> +    return foo (51);
>>>>> +  if (in == (0x7567000000LL))
>>>>> +    return foo (9);
>>>>> +  if (in == (0xFFF8567FFFFFFFFFLL))
>>>>> +    return foo (19);
>>>>> +
>>>>> +  return 0;
>>>>> +}
>>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743_1.c b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>>>> new file mode 100644
>>>>> index 00000000000..2c08c56714a
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>>>>> @@ -0,0 +1,95 @@
>>>>> +/* { dg-do run } */
>>>>> +/* { dg-options "-O2 -std=c99" } */
>>>>> +
>>>>> +int
>>>>> +foo (int a)
>>>>> +{
>>>>> +  return a + 6;
>>>>> +}
>>>>> +
>>>>> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
>>>>> +{
>>>>> +  if (in == (0x8642000000000000ULL))
>>>>> +    return foo (1);
>>>>> +  if (in == (0x7642000000000000ULL))
>>>>> +    return foo (12);
>>>>> +  if (in == (0x8000000000000000ULL))
>>>>> +    return foo (32);
>>>>> +  if (in == (0x8000000000000001ULL))
>>>>> +    return foo (33);
>>>>> +  if (in == (0x8642FFFFFFFFFFFFULL))
>>>>> +    return foo (46);
>>>>> +  if (in == (0x7642FFFFFFFFFFFFULL))
>>>>> +    return foo (51);
>>>>> +  if (in == (0x7567000000ULL))
>>>>> +    return foo (9);
>>>>> +  if (in == (0xFFF8567FFFFFFFFFULL))
>>>>> +    return foo (19);
>>>>> +  
>>>>> +  return 0;
>>>>> +}
>>>>> +
>>>>> +int __attribute__ ((noinline)) di_fun (long long in)
>>>>> +{
>>>>> +  if (in == (0x8642000000000000LL))
>>>>> +    return foo (1);
>>>>> +  if (in == (0x7642000000000000LL))
>>>>> +    return foo (12);
>>>>> +  if (in == (0x8000000000000000LL))
>>>>> +    return foo (32);
>>>>> +  if (in == (0x8000000000000001LL))
>>>>> +    return foo (33);
>>>>> +  if (in == (0x8642FFFFFFFFFFFFLL))
>>>>> +    return foo (46);
>>>>> +  if (in == (0x7642FFFFFFFFFFFFLL))
>>>>> +    return foo (51);
>>>>> +  return 0;
>>>>> +}
>>>>> +
>>>>> +int
>>>>> +main ()
>>>>> +{
>>>>> +  int e = 0;
>>>>> +  if (udi_fun (6) != 0)
>>>>> +    e++;
>>>>> +  if (udi_fun (0x8642000000000000ULL) != foo (1))
>>>>> +    e++;
>>>>> +  if (udi_fun (0x7642000000000000ULL) != foo (12))
>>>>> +    e++;
>>>>> +  if (udi_fun (0x8000000000000000ULL) != foo (32))
>>>>> +    e++;
>>>>> +  if (udi_fun (0x8000000000000001ULL) != foo (33))
>>>>> +    e++;
>>>>> +  if (udi_fun (0x8642FFFFFFFFFFFFULL) != foo (46))
>>>>> +    e++;
>>>>> +  if (udi_fun (0x7642FFFFFFFFFFFFULL) != foo (51))
>>>>> +    e++;
>>>>> +  if (udi_fun (0x7567000000ULL) != foo (9))
>>>>> +    e++;
>>>>> +  if (udi_fun (0xFFF8567FFFFFFFFFULL) != foo (19))
>>>>> +    e++;
>>>>> +
>>>>> +  if (di_fun (6) != 0)
>>>>> +    e++;
>>>>> +  if (di_fun (0x8642000000000000LL) != foo (1))
>>>>> +    e++;
>>>>> +  if (di_fun (0x7642000000000000LL) != foo (12))
>>>>> +    e++;
>>>>> +  if (di_fun (0x8000000000000000LL) != foo (32))
>>>>> +    e++;
>>>>> +  if (di_fun (0x8000000000000001LL) != foo (33))
>>>>> +    e++;
>>>>> +  if (di_fun (0x8642FFFFFFFFFFFFLL) != foo (46))
>>>>> +    e++;
>>>>> +  if (di_fun (0x7642FFFFFFFFFFFFLL) != foo (51))
>>>>> +    e++;
>>>>> +  if (udi_fun (0x7567000000LL) != foo (9))
>>>>> +    e++;
>>>>> +  if (udi_fun (0xFFF8567FFFFFFFFFLL) != foo (19))
>>>>> +    e++;
>>>>> +
>>>>> +  if (e)
>>>>> +    __builtin_abort ();
>>>>> +  return 0;
>>>>> +}
>>>>> +

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
  2022-08-29  3:42 [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant Jiufu Guo
  2022-09-15  7:35 ` Ping: " Jiufu Guo
@ 2022-12-13 21:02 ` Segher Boessenkool
  2022-12-14  8:26   ` Jiufu Guo
  1 sibling, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2022-12-13 21:02 UTC (permalink / raw)
  To: Jiufu Guo; +Cc: gcc-patches, dje.gcc, linkw

Hi!

Sorry for the tardiness.

On Mon, Aug 29, 2022 at 11:42:16AM +0800, Jiufu Guo wrote:
> When checking eq/ne with a constant which has only 16bits, it can be
> optimized to check the rotated data.  By this, the constant building
> is optimized.
> 
> As the example in PR103743:
> For "in == 0x8000000000000000LL", this patch generates:
>         rotldi %r3,%r3,16
>         cmpldi %cr0,%r3,32768
> instead:
>         li %r9,-1
>         rldicr %r9,%r9,0,0
>         cmpd %cr0,%r3,%r9

FWIW, I find the winnt assembler syntax very hard to read, and I doubt
I am the only one.

So you're doing
  rotldi 3,3,16 ; cmpldi 3,0x8000
instead of
  li 9,-1 ; rldicr 9,9,0,0 ; cmpd 3,9

> +/* Check if C can be rotated from an immediate which starts (as 64bit integer)
> +   with at least CLZ bits zero.
> +
> +   Return the number by which C can be rotated from the immediate.
> +   Return -1 if C can not be rotated as from.  */
> +
> +int
> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)

The name does not say what the function does.  Can you think of a better
name?

Maybe it is better to not return magic values anyway?  So perhaps

bool
can_be_done_as_compare_of_rotate (unsigned HOST_WIDE_INT c, int clz, int *rot)

(with *rot written if the return value is true).

> +  /* case c. xx10.....0xx: rotate 'clz + 1' bits firstly, then check case b.

s/firstly/first/

> +/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi.  */
> +
> +bool
> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)

No _p please, this function is not a predicate (at least, the name does
not say what it tests).  So a better name please.  This matters even
more for extern functions (like this one) because the function
implementation is always farther away so you do not easily have all
interface details in mind.  Good names help :-)

> +(define_code_iterator eqne [eq ne])
> +(define_code_attr EQNE [(eq "EQ") (ne "NE")])

Just <CODE> or <CODE:eqne> should work?

Please fix these things.  Almost there :-)


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
  2022-12-13 21:02 ` Segher Boessenkool
@ 2022-12-14  8:26   ` Jiufu Guo
  2022-12-15  2:28     ` Jiufu Guo
  2022-12-16 20:36     ` Segher Boessenkool
  0 siblings, 2 replies; 11+ messages in thread
From: Jiufu Guo @ 2022-12-14  8:26 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc, linkw

Hi,

Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> Sorry for the tardiness.
>
> On Mon, Aug 29, 2022 at 11:42:16AM +0800, Jiufu Guo wrote:
>> When checking eq/ne with a constant which has only 16bits, it can be
>> optimized to check the rotated data.  By this, the constant building
>> is optimized.
>> 
>> As the example in PR103743:
>> For "in == 0x8000000000000000LL", this patch generates:
>>         rotldi %r3,%r3,16
>>         cmpldi %cr0,%r3,32768
>> instead:
>>         li %r9,-1
>>         rldicr %r9,%r9,0,0
>>         cmpd %cr0,%r3,%r9
>
> FWIW, I find the winnt assembler syntax very hard to read, and I doubt
> I am the only one.
Oh, sorry about that.  I will avoid to add '-mregnames' to dump asm. :)
BTW, what options are you used to dump asm code? 
>
> So you're doing
>   rotldi 3,3,16 ; cmpldi 3,0x8000
> instead of
>   li 9,-1 ; rldicr 9,9,0,0 ; cmpd 3,9
>
>> +/* Check if C can be rotated from an immediate which starts (as 64bit integer)
>> +   with at least CLZ bits zero.
>> +
>> +   Return the number by which C can be rotated from the immediate.
>> +   Return -1 if C can not be rotated as from.  */
>> +
>> +int
>> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
>
> The name does not say what the function does.  Can you think of a better
> name?
>
> Maybe it is better to not return magic values anyway?  So perhaps
>
> bool
> can_be_done_as_compare_of_rotate (unsigned HOST_WIDE_INT c, int clz, int *rot)
>
> (with *rot written if the return value is true).
Thanks for your suggestion!
It is checking if a constant can be rotated from/to a value which has
only few tailing nonzero bits (all leading bits are zeros). 

So, I'm thinking to name the function as something like:
can_be_rotated_to_lowbits.

>
>> +  /* case c. xx10.....0xx: rotate 'clz + 1' bits firstly, then check case b.
>
> s/firstly/first/
Thanks! 
>
>> +/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi.  */
>> +
>> +bool
>> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
>
> No _p please, this function is not a predicate (at least, the name does
> not say what it tests).  So a better name please.  This matters even
> more for extern functions (like this one) because the function
> implementation is always farther away so you do not easily have all
> interface details in mind.  Good names help :-)
Thanks! Name is always a matter. :)

Maybe we can name this funciton as "can_be_rotated_as_compare_operand",
or "is_constant_rotateable_for_compare", because this function checks
"if a constant can be rotated to/from an immediate operand of
cmpdi/cmpldi". 

>
>> +(define_code_iterator eqne [eq ne])
>> +(define_code_attr EQNE [(eq "EQ") (ne "NE")])
>
> Just <CODE> or <CODE:eqne> should work?
Great! Thanks for point out this! <eqne:CODE> works.
>
> Please fix these things.  Almost there :-)

I updated the patch as below. Bootstraping and regtesting is ongoing.
Thanks again for your careful and insight review!


BR,
Jeff (Jiufu)

----------
When checking eq/ne with a constant which has only 16bits, it can be
optimized to check the rotated data.  By this, the constant building
is optimized.

As the example in PR103743:
For "in == 0x8000000000000000LL", this patch generates:
        rotldi 3,3,1 ; cmpldi 0,3,1
instead of:
        li 9,-1 ; rldicr 9,9,0,0 ; cmpd 0,3,9

Compare with previous version:
This patch refactor the code according to review comments.
e.g. updating function names/comments/code.


	PR target/103743

gcc/ChangeLog:

	* config/rs6000/rs6000-protos.h (can_be_rotated_to_lowbits): New.
	(can_be_rotated_as_compare_operand): New.
	* config/rs6000/rs6000.cc (can_be_rotated_to_lowbits): New definition.
	(can_be_rotated_as_compare_operand): New definition.
	* config/rs6000/rs6000.md (*rotate_on_cmpdi): New define_insn_and_split.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr103743.c: New test.
	* gcc.target/powerpc/pr103743_1.c: New test.

---
 gcc/config/rs6000/rs6000-protos.h             |  2 +
 gcc/config/rs6000/rs6000.cc                   | 56 +++++++++++
 gcc/config/rs6000/rs6000.md                   | 63 +++++++++++-
 gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++++++++++
 gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++++++++++++++++++
 5 files changed, 267 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c

diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index d0d89320ef6..9626917e359 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -35,6 +35,8 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
+extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int *);
+extern bool can_be_rotated_as_compare_operand (unsigned HOST_WIDE_INT);
 extern int num_insns_constant (rtx, machine_mode);
 extern int small_data_operand (rtx, machine_mode);
 extern bool mem_operand_gpr (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b3a609f3aa3..2b0df8479f0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -14925,6 +14925,62 @@ rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
     return reverse_condition (code);
 }
 
+/* Check if C (as 64bit integer) can be rotated to a constant which constains
+   nonzero bits at LOWBITS only.
+
+   Return true if C can be rotated to such constant.  And *ROT is written to
+   the number by which C is rotated.
+   Return false otherwise.  */
+
+bool
+can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT c, int lowbits, int *rot)
+{
+  int clz = HOST_BITS_PER_WIDE_INT - lowbits;
+
+  /* case a. 0..0xxx: already at least clz zeros.  */
+  int lz = clz_hwi (c);
+  if (lz >= clz)
+    {
+      *rot = 0;
+      return true;
+    }
+
+  /* case b. 0..0xxx0..0: at least clz zeros.  */
+  int tz = ctz_hwi (c);
+  if (lz + tz >= clz)
+    {
+      *rot = HOST_BITS_PER_WIDE_INT - tz;
+      return true;
+    }
+
+  /* case c. xx10.....0xx: rotate 'clz - 1' bits first, then check case b.
+	       ^bit -> Vbit, , then zeros are at head or tail.
+	     00...00xxx100, 'clz - 1' >= 'bits of xxxx'.  */
+  const int rot_bits = lowbits + 1;
+  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
+  tz = ctz_hwi (rc);
+  if (clz_hwi (rc) + tz >= clz)
+    {
+      *rot = HOST_BITS_PER_WIDE_INT - (tz + rot_bits);
+      return true;
+    }
+
+  return false;
+}
+
+/* Check if C can be rotated to an immediate operand of cmpdi or cmpldi.  */
+
+bool
+can_be_rotated_as_compare_operand (unsigned HOST_WIDE_INT c)
+{
+  int rot = 0;
+  /* leading 48 zeros + 16 lowbits (cmpldi),
+     or leading 49 ones + 15 lowbits (cmpdi).  */
+  bool res = can_be_rotated_to_lowbits (~c, 15, &rot)
+	     || can_be_rotated_to_lowbits (c, 16, &rot);
+  return res && rot > 0;
+}
+
 /* Generate a compare for CODE.  Return a brand-new rtx that
    represents the result of the compare.  */
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 4bd1dfd3da9..dabfe5dfdfd 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7765,6 +7765,68 @@ (define_insn "*movsi_from_df"
   "xscvdpsp %x0,%x1"
   [(set_attr "type" "fp")])
 
+
+(define_code_iterator eqne [eq ne])
+
+;; "i == C" ==> "rotl(i,N) == rotl(C,N)"
+(define_insn_and_split "*rotate_on_cmpdi"
+  [(set (pc)
+	(if_then_else (eqne (match_operand:DI 1 "gpc_reg_operand" "r")
+			    (match_operand:DI 2 "const_int_operand" "n"))
+		      (label_ref (match_operand 0 ""))
+		      (pc)))
+  (clobber (match_scratch:DI 3 "=r"))
+  (clobber (match_scratch:CCUNS 4 "=y"))]
+  "TARGET_POWERPC64 && num_insns_constant (operands[2], DImode) > 1
+   && can_be_rotated_as_compare_operand (UINTVAL (operands[2]))"
+   "#"
+   "&& 1"
+  [(pc)]
+{
+  rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
+  bool sgn = false;
+  unsigned HOST_WIDE_INT C = INTVAL (operands[2]);
+  int rot;
+  if (!can_be_rotated_to_lowbits (C, 16, &rot))
+    {
+      /* cmpdi */
+      sgn = true;
+      bool res = can_be_rotated_to_lowbits (~C, 15, &rot);
+      gcc_assert (res);
+    }
+  rtx n = GEN_INT (rot);
+
+  /* i' = rotl (i, n) */
+  rtx op0 = can_create_pseudo_p () ? gen_reg_rtx (DImode) : operands[3];
+  emit_insn (gen_rtx_SET (op0, gen_rtx_ROTATE (DImode, operands[1], n)));
+
+  /* C' = rotl (C, n) */
+  rtx op1 = GEN_INT ((C << rot) | (C >> (HOST_BITS_PER_WIDE_INT - rot)));
+
+  /* i' ==  C' */
+  machine_mode comp_mode = sgn ? CCmode : CCUNSmode;
+  rtx cc = can_create_pseudo_p () ? gen_reg_rtx (comp_mode) : operands[4];
+  PUT_MODE (cc, comp_mode);
+  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (comp_mode, op0, op1)));
+  rtx cmp = gen_rtx_<eqne:CODE> (CCmode, cc, const0_rtx);
+  rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[0]);
+  emit_jump_insn (gen_rtx_SET (pc_rtx,
+			       gen_rtx_IF_THEN_ELSE (VOIDmode, cmp,
+						     loc_ref, pc_rtx)));
+
+  /* keep the probability info for the prediction of the branch insn.  */
+  if (note)
+    {
+      profile_probability prob
+	= profile_probability::from_reg_br_prob_note (XINT (note, 0));
+
+      add_reg_br_prob_note (get_last_insn (), prob);
+    }
+
+  DONE;
+}
+)
+
 ;; Split a load of a large constant into the appropriate two-insn
 ;; sequence.
 
@@ -13453,7 +13515,6 @@ (define_expand "@ctr<mode>"
 ;; rs6000_legitimate_combined_insn prevents combine creating any of
 ;; the ctr<mode> insns.
 
-(define_code_iterator eqne [eq ne])
 (define_code_attr bd [(eq "bdz") (ne "bdnz")])
 (define_code_attr bd_neg [(eq "bdnz") (ne "bdz")])
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743.c b/gcc/testsuite/gcc.target/powerpc/pr103743.c
new file mode 100644
index 00000000000..41c686bb4cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
@@ -0,0 +1,52 @@
+/* { dg-options "-O2" } */
+/* { dg-do compile { target has_arch_ppc64 } } */
+
+/* { dg-final { scan-assembler-times {\mcmpldi\M} 10  } } */
+/* { dg-final { scan-assembler-times {\mcmpdi\M} 4  } } */
+/* { dg-final { scan-assembler-times {\mrotldi\M} 14  } } */
+
+int foo (int a);
+
+int __attribute__ ((noinline)) udi_fun (unsigned long long in)
+{
+  if (in == (0x8642000000000000ULL))
+    return foo (1);
+  if (in == (0x7642000000000000ULL))
+    return foo (12);
+  if (in == (0x8000000000000000ULL))
+    return foo (32);
+  if (in == (0x8700000000000091ULL))
+    return foo (33);
+  if (in == (0x8642FFFFFFFFFFFFULL))
+    return foo (46);
+  if (in == (0x7642FFFFFFFFFFFFULL))
+    return foo (51);
+  if (in == (0x7567000000ULL))
+    return foo (9);
+  if (in == (0xFFF8567FFFFFFFFFULL))
+    return foo (19);
+
+  return 0;
+}
+
+int __attribute__ ((noinline)) di_fun (long long in)
+{
+  if (in == (0x8642000000000000LL))
+    return foo (1);
+  if (in == (0x7642000000000000LL))
+    return foo (12);
+  if (in == (0x8000000000000000LL))
+    return foo (32);
+  if (in == (0x8700000000000091LL))
+    return foo (33);
+  if (in == (0x8642FFFFFFFFFFFFLL))
+    return foo (46);
+  if (in == (0x7642FFFFFFFFFFFFLL))
+    return foo (51);
+  if (in == (0x7567000000LL))
+    return foo (9);
+  if (in == (0xFFF8567FFFFFFFFFLL))
+    return foo (19);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743_1.c b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
new file mode 100644
index 00000000000..e128aae7574
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
@@ -0,0 +1,95 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -std=c99" } */
+
+int
+foo (int a)
+{
+  return a + 6;
+}
+
+int __attribute__ ((noipa)) udi_fun (unsigned long long in)
+{
+  if (in == (0x8642000000000000ULL))
+    return foo (1);
+  if (in == (0x7642000000000000ULL))
+    return foo (12);
+  if (in == (0x8000000000000000ULL))
+    return foo (32);
+  if (in == (0x8700000000000091ULL))
+    return foo (33);
+  if (in == (0x8642FFFFFFFFFFFFULL))
+    return foo (46);
+  if (in == (0x7642FFFFFFFFFFFFULL))
+    return foo (51);
+  if (in == (0x7567000000ULL))
+    return foo (9);
+  if (in == (0xFFF8567FFFFFFFFFULL))
+    return foo (19);
+  
+  return 0;
+}
+
+int __attribute__ ((noipa)) di_fun (long long in)
+{
+  if (in == (0x8642000000000000LL))
+    return foo (1);
+  if (in == (0x7642000000000000LL))
+    return foo (12);
+  if (in == (0x8000000000000000LL))
+    return foo (32);
+  if (in == (0x8700000000000091LL))
+    return foo (33);
+  if (in == (0x8642FFFFFFFFFFFFLL))
+    return foo (46);
+  if (in == (0x7642FFFFFFFFFFFFLL))
+    return foo (51);
+  return 0;
+}
+
+int
+main ()
+{
+  int e = 0;
+  if (udi_fun (6) != 0)
+    e++;
+  if (udi_fun (0x8642000000000000ULL) != foo (1))
+    e++;
+  if (udi_fun (0x7642000000000000ULL) != foo (12))
+    e++;
+  if (udi_fun (0x8000000000000000ULL) != foo (32))
+    e++;
+  if (udi_fun (0x8700000000000091ULL) != foo (33))
+    e++;
+  if (udi_fun (0x8642FFFFFFFFFFFFULL) != foo (46))
+    e++;
+  if (udi_fun (0x7642FFFFFFFFFFFFULL) != foo (51))
+    e++;
+  if (udi_fun (0x7567000000ULL) != foo (9))
+    e++;
+  if (udi_fun (0xFFF8567FFFFFFFFFULL) != foo (19))
+    e++;
+
+  if (di_fun (6) != 0)
+    e++;
+  if (di_fun (0x8642000000000000LL) != foo (1))
+    e++;
+  if (di_fun (0x7642000000000000LL) != foo (12))
+    e++;
+  if (di_fun (0x8000000000000000LL) != foo (32))
+    e++;
+  if (di_fun (0x8700000000000091LL) != foo (33))
+    e++;
+  if (di_fun (0x8642FFFFFFFFFFFFLL) != foo (46))
+    e++;
+  if (di_fun (0x7642FFFFFFFFFFFFLL) != foo (51))
+    e++;
+  if (udi_fun (0x7567000000LL) != foo (9))
+    e++;
+  if (udi_fun (0xFFF8567FFFFFFFFFLL) != foo (19))
+    e++;
+
+  if (e)
+    __builtin_abort ();
+  return 0;
+}
+
-- 
2.17.1


>
>
> Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
  2022-12-14  8:26   ` Jiufu Guo
@ 2022-12-15  2:28     ` Jiufu Guo
  2022-12-16 20:36     ` Segher Boessenkool
  1 sibling, 0 replies; 11+ messages in thread
From: Jiufu Guo @ 2022-12-15  2:28 UTC (permalink / raw)
  To: Jiufu Guo via Gcc-patches; +Cc: Segher Boessenkool, dje.gcc, linkw

Hi,

Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Hi,
>
> Segher Boessenkool <segher@kernel.crashing.org> writes:
>
>> Hi!
>>
>> Sorry for the tardiness.
>>
>> On Mon, Aug 29, 2022 at 11:42:16AM +0800, Jiufu Guo wrote:
>>> When checking eq/ne with a constant which has only 16bits, it can be
>>> optimized to check the rotated data.  By this, the constant building
>>> is optimized.
>>> 
>>> As the example in PR103743:
>>> For "in == 0x8000000000000000LL", this patch generates:
>>>         rotldi %r3,%r3,16
>>>         cmpldi %cr0,%r3,32768
>>> instead:
>>>         li %r9,-1
>>>         rldicr %r9,%r9,0,0
>>>         cmpd %cr0,%r3,%r9
>>
>> FWIW, I find the winnt assembler syntax very hard to read, and I doubt
>> I am the only one.
> Oh, sorry about that.  I will avoid to add '-mregnames' to dump asm. :)
> BTW, what options are you used to dump asm code? 
>>
>> So you're doing
>>   rotldi 3,3,16 ; cmpldi 3,0x8000
>> instead of
>>   li 9,-1 ; rldicr 9,9,0,0 ; cmpd 3,9
>>
>>> +/* Check if C can be rotated from an immediate which starts (as 64bit integer)
>>> +   with at least CLZ bits zero.
>>> +
>>> +   Return the number by which C can be rotated from the immediate.
>>> +   Return -1 if C can not be rotated as from.  */
>>> +
>>> +int
>>> +rotate_from_leading_zeros_const (unsigned HOST_WIDE_INT c, int clz)
>>
>> The name does not say what the function does.  Can you think of a better
>> name?
>>
>> Maybe it is better to not return magic values anyway?  So perhaps
>>
>> bool
>> can_be_done_as_compare_of_rotate (unsigned HOST_WIDE_INT c, int clz, int *rot)
>>
>> (with *rot written if the return value is true).
> Thanks for your suggestion!
> It is checking if a constant can be rotated from/to a value which has
> only few tailing nonzero bits (all leading bits are zeros). 
>
> So, I'm thinking to name the function as something like:
> can_be_rotated_to_lowbits.
>
>>
>>> +  /* case c. xx10.....0xx: rotate 'clz + 1' bits firstly, then check case b.
>>
>> s/firstly/first/
> Thanks! 
>>
>>> +/* Check if C can be rotated from an immediate operand of cmpdi or cmpldi.  */
>>> +
>>> +bool
>>> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
>>
>> No _p please, this function is not a predicate (at least, the name does
>> not say what it tests).  So a better name please.  This matters even
>> more for extern functions (like this one) because the function
>> implementation is always farther away so you do not easily have all
>> interface details in mind.  Good names help :-)
> Thanks! Name is always a matter. :)
>
> Maybe we can name this funciton as "can_be_rotated_as_compare_operand",
> or "is_constant_rotateable_for_compare", because this function checks
> "if a constant can be rotated to/from an immediate operand of
> cmpdi/cmpldi". 
>
>>
>>> +(define_code_iterator eqne [eq ne])
>>> +(define_code_attr EQNE [(eq "EQ") (ne "NE")])
>>
>> Just <CODE> or <CODE:eqne> should work?
> Great! Thanks for point out this! <eqne:CODE> works.
>>
>> Please fix these things.  Almost there :-)
>
> I updated the patch as below. Bootstraping and regtesting is ongoing.
> Thanks again for your careful and insight review!

Bootstrap and regtests pass on ppc64{,le}.

BR,
Jeff (Jiufu)

>
>
> BR,
> Jeff (Jiufu)
>
> ----------
> When checking eq/ne with a constant which has only 16bits, it can be
> optimized to check the rotated data.  By this, the constant building
> is optimized.
>
> As the example in PR103743:
> For "in == 0x8000000000000000LL", this patch generates:
>         rotldi 3,3,1 ; cmpldi 0,3,1
> instead of:
>         li 9,-1 ; rldicr 9,9,0,0 ; cmpd 0,3,9
>
> Compare with previous version:
> This patch refactor the code according to review comments.
> e.g. updating function names/comments/code.
>
>
> 	PR target/103743
>
> gcc/ChangeLog:
>
> 	* config/rs6000/rs6000-protos.h (can_be_rotated_to_lowbits): New.
> 	(can_be_rotated_as_compare_operand): New.
> 	* config/rs6000/rs6000.cc (can_be_rotated_to_lowbits): New definition.
> 	(can_be_rotated_as_compare_operand): New definition.
> 	* config/rs6000/rs6000.md (*rotate_on_cmpdi): New define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/powerpc/pr103743.c: New test.
> 	* gcc.target/powerpc/pr103743_1.c: New test.
>
> ---
>  gcc/config/rs6000/rs6000-protos.h             |  2 +
>  gcc/config/rs6000/rs6000.cc                   | 56 +++++++++++
>  gcc/config/rs6000/rs6000.md                   | 63 +++++++++++-
>  gcc/testsuite/gcc.target/powerpc/pr103743.c   | 52 ++++++++++
>  gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++++++++++++++++++
>  5 files changed, 267 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c
>
> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index d0d89320ef6..9626917e359 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -35,6 +35,8 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
>  extern int vspltis_shifted (rtx);
>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
> +extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int *);
> +extern bool can_be_rotated_as_compare_operand (unsigned HOST_WIDE_INT);
>  extern int num_insns_constant (rtx, machine_mode);
>  extern int small_data_operand (rtx, machine_mode);
>  extern bool mem_operand_gpr (rtx, machine_mode);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index b3a609f3aa3..2b0df8479f0 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14925,6 +14925,62 @@ rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
>      return reverse_condition (code);
>  }
>  
> +/* Check if C (as 64bit integer) can be rotated to a constant which constains
> +   nonzero bits at LOWBITS only.
> +
> +   Return true if C can be rotated to such constant.  And *ROT is written to
> +   the number by which C is rotated.
> +   Return false otherwise.  */
> +
> +bool
> +can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT c, int lowbits, int *rot)
> +{
> +  int clz = HOST_BITS_PER_WIDE_INT - lowbits;
> +
> +  /* case a. 0..0xxx: already at least clz zeros.  */
> +  int lz = clz_hwi (c);
> +  if (lz >= clz)
> +    {
> +      *rot = 0;
> +      return true;
> +    }
> +
> +  /* case b. 0..0xxx0..0: at least clz zeros.  */
> +  int tz = ctz_hwi (c);
> +  if (lz + tz >= clz)
> +    {
> +      *rot = HOST_BITS_PER_WIDE_INT - tz;
> +      return true;
> +    }
> +
> +  /* case c. xx10.....0xx: rotate 'clz - 1' bits first, then check case b.
> +	       ^bit -> Vbit, , then zeros are at head or tail.
> +	     00...00xxx100, 'clz - 1' >= 'bits of xxxx'.  */
> +  const int rot_bits = lowbits + 1;
> +  unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
> +  tz = ctz_hwi (rc);
> +  if (clz_hwi (rc) + tz >= clz)
> +    {
> +      *rot = HOST_BITS_PER_WIDE_INT - (tz + rot_bits);
> +      return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Check if C can be rotated to an immediate operand of cmpdi or cmpldi.  */
> +
> +bool
> +can_be_rotated_as_compare_operand (unsigned HOST_WIDE_INT c)
> +{
> +  int rot = 0;
> +  /* leading 48 zeros + 16 lowbits (cmpldi),
> +     or leading 49 ones + 15 lowbits (cmpdi).  */
> +  bool res = can_be_rotated_to_lowbits (~c, 15, &rot)
> +	     || can_be_rotated_to_lowbits (c, 16, &rot);
> +  return res && rot > 0;
> +}
> +
>  /* Generate a compare for CODE.  Return a brand-new rtx that
>     represents the result of the compare.  */
>  
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 4bd1dfd3da9..dabfe5dfdfd 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7765,6 +7765,68 @@ (define_insn "*movsi_from_df"
>    "xscvdpsp %x0,%x1"
>    [(set_attr "type" "fp")])
>  
> +
> +(define_code_iterator eqne [eq ne])
> +
> +;; "i == C" ==> "rotl(i,N) == rotl(C,N)"
> +(define_insn_and_split "*rotate_on_cmpdi"
> +  [(set (pc)
> +	(if_then_else (eqne (match_operand:DI 1 "gpc_reg_operand" "r")
> +			    (match_operand:DI 2 "const_int_operand" "n"))
> +		      (label_ref (match_operand 0 ""))
> +		      (pc)))
> +  (clobber (match_scratch:DI 3 "=r"))
> +  (clobber (match_scratch:CCUNS 4 "=y"))]
> +  "TARGET_POWERPC64 && num_insns_constant (operands[2], DImode) > 1
> +   && can_be_rotated_as_compare_operand (UINTVAL (operands[2]))"
> +   "#"
> +   "&& 1"
> +  [(pc)]
> +{
> +  rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
> +  bool sgn = false;
> +  unsigned HOST_WIDE_INT C = INTVAL (operands[2]);
> +  int rot;
> +  if (!can_be_rotated_to_lowbits (C, 16, &rot))
> +    {
> +      /* cmpdi */
> +      sgn = true;
> +      bool res = can_be_rotated_to_lowbits (~C, 15, &rot);
> +      gcc_assert (res);
> +    }
> +  rtx n = GEN_INT (rot);
> +
> +  /* i' = rotl (i, n) */
> +  rtx op0 = can_create_pseudo_p () ? gen_reg_rtx (DImode) : operands[3];
> +  emit_insn (gen_rtx_SET (op0, gen_rtx_ROTATE (DImode, operands[1], n)));
> +
> +  /* C' = rotl (C, n) */
> +  rtx op1 = GEN_INT ((C << rot) | (C >> (HOST_BITS_PER_WIDE_INT - rot)));
> +
> +  /* i' ==  C' */
> +  machine_mode comp_mode = sgn ? CCmode : CCUNSmode;
> +  rtx cc = can_create_pseudo_p () ? gen_reg_rtx (comp_mode) : operands[4];
> +  PUT_MODE (cc, comp_mode);
> +  emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (comp_mode, op0, op1)));
> +  rtx cmp = gen_rtx_<eqne:CODE> (CCmode, cc, const0_rtx);
> +  rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[0]);
> +  emit_jump_insn (gen_rtx_SET (pc_rtx,
> +			       gen_rtx_IF_THEN_ELSE (VOIDmode, cmp,
> +						     loc_ref, pc_rtx)));
> +
> +  /* keep the probability info for the prediction of the branch insn.  */
> +  if (note)
> +    {
> +      profile_probability prob
> +	= profile_probability::from_reg_br_prob_note (XINT (note, 0));
> +
> +      add_reg_br_prob_note (get_last_insn (), prob);
> +    }
> +
> +  DONE;
> +}
> +)
> +
>  ;; Split a load of a large constant into the appropriate two-insn
>  ;; sequence.
>  
> @@ -13453,7 +13515,6 @@ (define_expand "@ctr<mode>"
>  ;; rs6000_legitimate_combined_insn prevents combine creating any of
>  ;; the ctr<mode> insns.
>  
> -(define_code_iterator eqne [eq ne])
>  (define_code_attr bd [(eq "bdz") (ne "bdnz")])
>  (define_code_attr bd_neg [(eq "bdnz") (ne "bdz")])
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743.c b/gcc/testsuite/gcc.target/powerpc/pr103743.c
> new file mode 100644
> index 00000000000..41c686bb4cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
> @@ -0,0 +1,52 @@
> +/* { dg-options "-O2" } */
> +/* { dg-do compile { target has_arch_ppc64 } } */
> +
> +/* { dg-final { scan-assembler-times {\mcmpldi\M} 10  } } */
> +/* { dg-final { scan-assembler-times {\mcmpdi\M} 4  } } */
> +/* { dg-final { scan-assembler-times {\mrotldi\M} 14  } } */
> +
> +int foo (int a);
> +
> +int __attribute__ ((noinline)) udi_fun (unsigned long long in)
> +{
> +  if (in == (0x8642000000000000ULL))
> +    return foo (1);
> +  if (in == (0x7642000000000000ULL))
> +    return foo (12);
> +  if (in == (0x8000000000000000ULL))
> +    return foo (32);
> +  if (in == (0x8700000000000091ULL))
> +    return foo (33);
> +  if (in == (0x8642FFFFFFFFFFFFULL))
> +    return foo (46);
> +  if (in == (0x7642FFFFFFFFFFFFULL))
> +    return foo (51);
> +  if (in == (0x7567000000ULL))
> +    return foo (9);
> +  if (in == (0xFFF8567FFFFFFFFFULL))
> +    return foo (19);
> +
> +  return 0;
> +}
> +
> +int __attribute__ ((noinline)) di_fun (long long in)
> +{
> +  if (in == (0x8642000000000000LL))
> +    return foo (1);
> +  if (in == (0x7642000000000000LL))
> +    return foo (12);
> +  if (in == (0x8000000000000000LL))
> +    return foo (32);
> +  if (in == (0x8700000000000091LL))
> +    return foo (33);
> +  if (in == (0x8642FFFFFFFFFFFFLL))
> +    return foo (46);
> +  if (in == (0x7642FFFFFFFFFFFFLL))
> +    return foo (51);
> +  if (in == (0x7567000000LL))
> +    return foo (9);
> +  if (in == (0xFFF8567FFFFFFFFFLL))
> +    return foo (19);
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr103743_1.c b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
> new file mode 100644
> index 00000000000..e128aae7574
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743_1.c
> @@ -0,0 +1,95 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -std=c99" } */
> +
> +int
> +foo (int a)
> +{
> +  return a + 6;
> +}
> +
> +int __attribute__ ((noipa)) udi_fun (unsigned long long in)
> +{
> +  if (in == (0x8642000000000000ULL))
> +    return foo (1);
> +  if (in == (0x7642000000000000ULL))
> +    return foo (12);
> +  if (in == (0x8000000000000000ULL))
> +    return foo (32);
> +  if (in == (0x8700000000000091ULL))
> +    return foo (33);
> +  if (in == (0x8642FFFFFFFFFFFFULL))
> +    return foo (46);
> +  if (in == (0x7642FFFFFFFFFFFFULL))
> +    return foo (51);
> +  if (in == (0x7567000000ULL))
> +    return foo (9);
> +  if (in == (0xFFF8567FFFFFFFFFULL))
> +    return foo (19);
> +  
> +  return 0;
> +}
> +
> +int __attribute__ ((noipa)) di_fun (long long in)
> +{
> +  if (in == (0x8642000000000000LL))
> +    return foo (1);
> +  if (in == (0x7642000000000000LL))
> +    return foo (12);
> +  if (in == (0x8000000000000000LL))
> +    return foo (32);
> +  if (in == (0x8700000000000091LL))
> +    return foo (33);
> +  if (in == (0x8642FFFFFFFFFFFFLL))
> +    return foo (46);
> +  if (in == (0x7642FFFFFFFFFFFFLL))
> +    return foo (51);
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  int e = 0;
> +  if (udi_fun (6) != 0)
> +    e++;
> +  if (udi_fun (0x8642000000000000ULL) != foo (1))
> +    e++;
> +  if (udi_fun (0x7642000000000000ULL) != foo (12))
> +    e++;
> +  if (udi_fun (0x8000000000000000ULL) != foo (32))
> +    e++;
> +  if (udi_fun (0x8700000000000091ULL) != foo (33))
> +    e++;
> +  if (udi_fun (0x8642FFFFFFFFFFFFULL) != foo (46))
> +    e++;
> +  if (udi_fun (0x7642FFFFFFFFFFFFULL) != foo (51))
> +    e++;
> +  if (udi_fun (0x7567000000ULL) != foo (9))
> +    e++;
> +  if (udi_fun (0xFFF8567FFFFFFFFFULL) != foo (19))
> +    e++;
> +
> +  if (di_fun (6) != 0)
> +    e++;
> +  if (di_fun (0x8642000000000000LL) != foo (1))
> +    e++;
> +  if (di_fun (0x7642000000000000LL) != foo (12))
> +    e++;
> +  if (di_fun (0x8000000000000000LL) != foo (32))
> +    e++;
> +  if (di_fun (0x8700000000000091LL) != foo (33))
> +    e++;
> +  if (di_fun (0x8642FFFFFFFFFFFFLL) != foo (46))
> +    e++;
> +  if (di_fun (0x7642FFFFFFFFFFFFLL) != foo (51))
> +    e++;
> +  if (udi_fun (0x7567000000LL) != foo (9))
> +    e++;
> +  if (udi_fun (0xFFF8567FFFFFFFFFLL) != foo (19))
> +    e++;
> +
> +  if (e)
> +    __builtin_abort ();
> +  return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
  2022-12-14  8:26   ` Jiufu Guo
  2022-12-15  2:28     ` Jiufu Guo
@ 2022-12-16 20:36     ` Segher Boessenkool
  2022-12-19 14:10       ` Jiufu Guo
  1 sibling, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2022-12-16 20:36 UTC (permalink / raw)
  To: Jiufu Guo; +Cc: gcc-patches, dje.gcc, linkw

On Wed, Dec 14, 2022 at 04:26:54PM +0800, Jiufu Guo wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:
> > On Mon, Aug 29, 2022 at 11:42:16AM +0800, Jiufu Guo wrote:
> >>         li %r9,-1
> >>         rldicr %r9,%r9,0,0
> >>         cmpd %cr0,%r3,%r9
> >
> > FWIW, I find the winnt assembler syntax very hard to read, and I doubt
> > I am the only one.
> Oh, sorry about that.  I will avoid to add '-mregnames' to dump asm. :)
> BTW, what options are you used to dump asm code? 

The same as GCC outputs, and as I write assembler code as: bare numbers.
It is much easier to type, and very much easier to read.

-mregnames is fine for output (and it is the default as well), but
problematic for input.  Take for example
	li r10,r10
which translates to
	li 10,10
while what was probably wanted is to load the address of the global
symbol r10, which can be written as
	li r10,(r10)

I do understand that liking the bare numbers syntax is an acquired taste
of course.  But less clutter is very useful.  This goes hand in hand
with writing multiple asm statements per line, which allows you to group
things together nicely:
	li 9,-1 ; rldicr 9,9,0,0 ; cmpd 3,9

> > Maybe it is better to not return magic values anyway?  So perhaps
> >
> > bool
> > can_be_done_as_compare_of_rotate (unsigned HOST_WIDE_INT c, int clz, int *rot)
> >
> > (with *rot written if the return value is true).
> Thanks for your suggestion!
> It is checking if a constant can be rotated from/to a value which has
> only few tailing nonzero bits (all leading bits are zeros). 
> 
> So, I'm thinking to name the function as something like:
> can_be_rotated_to_lowbits.

That is a good name yeah.

> >> +bool
> >> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
> >
> > No _p please, this function is not a predicate (at least, the name does
> > not say what it tests).  So a better name please.  This matters even
> > more for extern functions (like this one) because the function
> > implementation is always farther away so you do not easily have all
> > interface details in mind.  Good names help :-)
> Thanks! Name is always a matter. :)
> 
> Maybe we can name this funciton as "can_be_rotated_as_compare_operand",
> or "is_constant_rotateable_for_compare", because this function checks
> "if a constant can be rotated to/from an immediate operand of
> cmpdi/cmpldi". 

Maybe just "constant_can_be_rotated_to_lowbits"?  (If that is what the
function does).  It doesn't clearly say that it allows negative numbers
as well, but that is a problem of the function itself already; maybe it
would be better to do signed and unsigned separately.

> >> +(define_code_iterator eqne [eq ne])
> >> +(define_code_attr EQNE [(eq "EQ") (ne "NE")])
> >
> > Just <CODE> or <CODE:eqne> should work?
> Great! Thanks for point out this! <eqne:CODE> works.
> >
> > Please fix these things.  Almost there :-)
> 
> I updated the patch as below. Bootstraping and regtesting is ongoing.
> Thanks again for your careful and insight review!

Please send as new message (not as reply even), that is much easier to
handle.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant
  2022-12-16 20:36     ` Segher Boessenkool
@ 2022-12-19 14:10       ` Jiufu Guo
  0 siblings, 0 replies; 11+ messages in thread
From: Jiufu Guo @ 2022-12-19 14:10 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc, linkw

Hi,

Segher Boessenkool <segher@kernel.crashing.org> writes:

> On Wed, Dec 14, 2022 at 04:26:54PM +0800, Jiufu Guo wrote:
>> Segher Boessenkool <segher@kernel.crashing.org> writes:
>> > On Mon, Aug 29, 2022 at 11:42:16AM +0800, Jiufu Guo wrote:
>> >>         li %r9,-1
>> >>         rldicr %r9,%r9,0,0
>> >>         cmpd %cr0,%r3,%r9
>> >
>> > FWIW, I find the winnt assembler syntax very hard to read, and I doubt
>> > I am the only one.
>> Oh, sorry about that.  I will avoid to add '-mregnames' to dump asm. :)
>> BTW, what options are you used to dump asm code? 
>
> The same as GCC outputs, and as I write assembler code as: bare numbers.
> It is much easier to type, and very much easier to read.
>
> -mregnames is fine for output (and it is the default as well), but
> problematic for input.  Take for example
> 	li r10,r10
> which translates to
> 	li 10,10
> while what was probably wanted is to load the address of the global
> symbol r10, which can be written as
> 	li r10,(r10)
>
> I do understand that liking the bare numbers syntax is an acquired taste
> of course.  But less clutter is very useful.  This goes hand in hand
> with writing multiple asm statements per line, which allows you to group
> things together nicely:
> 	li 9,-1 ; rldicr 9,9,0,0 ; cmpd 3,9
>
Great!  Thanks for your helpful comments!
>> > Maybe it is better to not return magic values anyway?  So perhaps
>> >
>> > bool
>> > can_be_done_as_compare_of_rotate (unsigned HOST_WIDE_INT c, int clz, int *rot)
>> >
>> > (with *rot written if the return value is true).
>> Thanks for your suggestion!
>> It is checking if a constant can be rotated from/to a value which has
>> only few tailing nonzero bits (all leading bits are zeros). 
>> 
>> So, I'm thinking to name the function as something like:
>> can_be_rotated_to_lowbits.
>
> That is a good name yeah.
>
>> >> +bool
>> >> +compare_rotate_immediate_p (unsigned HOST_WIDE_INT c)
>> >
>> > No _p please, this function is not a predicate (at least, the name does
>> > not say what it tests).  So a better name please.  This matters even
>> > more for extern functions (like this one) because the function
>> > implementation is always farther away so you do not easily have all
>> > interface details in mind.  Good names help :-)
>> Thanks! Name is always a matter. :)
>> 
>> Maybe we can name this funciton as "can_be_rotated_as_compare_operand",
>> or "is_constant_rotateable_for_compare", because this function checks
>> "if a constant can be rotated to/from an immediate operand of
>> cmpdi/cmpldi". 
>
> Maybe just "constant_can_be_rotated_to_lowbits"?  (If that is what the
> function does).  It doesn't clearly say that it allows negative numbers
> as well, but that is a problem of the function itself already; maybe it
> would be better to do signed and unsigned separately.
It makes sense. I updated a new version patch.
>
>> >> +(define_code_iterator eqne [eq ne])
>> >> +(define_code_attr EQNE [(eq "EQ") (ne "NE")])
>> >
>> > Just <CODE> or <CODE:eqne> should work?
>> Great! Thanks for point out this! <eqne:CODE> works.
>> >
>> > Please fix these things.  Almost there :-)
>> 
>> I updated the patch as below. Bootstraping and regtesting is ongoing.
>> Thanks again for your careful and insight review!
>
> Please send as new message (not as reply even), that is much easier to
> handle.  Thanks!

Sure, I just submit a new patch version.
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608765.html

Thanks a lot for your review.


BR,
Jeff (Jiufu)

>
>
> Segher

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-12-19 14:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-29  3:42 [PATCH V6] rs6000: Optimize cmp on rotated 16bits constant Jiufu Guo
2022-09-15  7:35 ` Ping: " Jiufu Guo
2022-10-12  6:42   ` Ping^2: " Jiufu Guo
2022-11-09  3:01     ` Ping^3: " Jiufu Guo
2022-12-02  2:45       ` Ping^4: " Jiufu Guo
2022-12-13  3:49         ` Ping^5: " Jiufu Guo
2022-12-13 21:02 ` Segher Boessenkool
2022-12-14  8:26   ` Jiufu Guo
2022-12-15  2:28     ` Jiufu Guo
2022-12-16 20:36     ` Segher Boessenkool
2022-12-19 14:10       ` Jiufu Guo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).