public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support
@ 2021-11-11 14:10 Philipp Tomsich
  2021-11-11 14:10 ` [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode Philipp Tomsich
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 14:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, wilson, Philipp Tomsich


This series provides assorted improvements for the RISC-V Zb[abcs]
support collected over the last year and a half and forward-ported to
the recently merged upstream support for the Zb[abcs] extensions.

Improvements include:
 - synthesis of HImode bswap from SImode/DImode rev8
 - cost-model change to support shift-and-add (sh[123]add) in the
   strength-reduction of multiplication operations
 - support for constant-loading of (1ULL << 31) on RV64 using bseti
 - generating a polarity-reversed mask from a bit-test
 - adds orc.b as UNSPEC
 - improves min/minu/max/maxu patterns to suppress redundant extensions


Philipp Tomsich (8):
  bswap: synthesize HImode bswap from SImode or DImode
  RISC-V: costs: handle BSWAP
  RISC-V: costs: support shift-and-add in strength-reduction
  RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode
  RISC-V: bitmanip: improvements to rotate instructions
  RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ?
    0 : -1"
  RISC-V: bitmanip: add orc.b as an unspec
  RISC-V: bitmanip: relax minmax to operate on GPR

 gcc/config/riscv/bitmanip.md                 | 74 +++++++++++++++++---
 gcc/config/riscv/riscv.c                     | 31 ++++++++
 gcc/config/riscv/riscv.h                     | 11 ++-
 gcc/config/riscv/riscv.md                    |  3 +
 gcc/optabs.c                                 |  6 ++
 gcc/testsuite/gcc.target/riscv/zbb-bswap.c   | 22 ++++++
 gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +++++-
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c   | 14 ++++
 8 files changed, 162 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bexti.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode
  2021-11-11 14:10 [PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support Philipp Tomsich
@ 2021-11-11 14:10 ` Philipp Tomsich
  2021-11-17 14:51   ` Kito Cheng
  2021-11-19 10:20   ` Richard Biener
  2021-11-11 14:10 ` [PATCH v1 2/8] RISC-V: costs: handle BSWAP Philipp Tomsich
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 14:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, wilson, Philipp Tomsich

The RISC-V Zbb extension adds an XLEN (i.e. SImode for rv32, DImode
for rv64) bswap instruction (rev8).  While, with the current master,
SImode is synthesized correctly from DImode, HImode is not.

This change adds an appropriate expansion for a HImode bswap, if a
wider bswap is available.

Without this change, the following rv64gc_zbb code is generated for
__builtin_bswap16():
  	slliw	a5,a0,8
	zext.h	a0,a0
	srliw	a0,a0,8
	or	a0,a5,a0
	sext.h	a0,a0      // this is a 16bit sign-extension following
			   // the byteswap (e.g. on a 'short' function
			   // return).

After this change, a bswap (rev8) is used and any extensions are
combined into the shift-right:
	rev8	a0,a0
	srai	a0,a0,48   // the sign-extension is combined into the
			   // shift; a srli is emitted otherwise...

gcc/ChangeLog:

	* optabs.c (expand_unop): support expanding a HImode bswap
	  using SImode or DImode, followed by a shift.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/zbb-bswap.c: New test.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

 gcc/optabs.c                               |  6 ++++++
 gcc/testsuite/gcc.target/riscv/zbb-bswap.c | 22 ++++++++++++++++++++++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c

diff --git a/gcc/optabs.c b/gcc/optabs.c
index 019bbb62882..7a3ffbe4525 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -3307,6 +3307,12 @@ expand_unop (machine_mode mode, optab unoptab, rtx op0, rtx target,
 		return temp;
 	    }
 
+	  /* If we are missing a HImode BSWAP, but have one for SImode or
+	     DImode, use a BSWAP followed by a SHIFT.  */
+	  temp = widen_bswap (as_a <scalar_int_mode> (mode), op0, target);
+	  if (temp)
+	    return temp;
+
 	  last = get_last_insn ();
 
 	  temp1 = expand_binop (mode, ashl_optab, op0,
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-bswap.c b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
new file mode 100644
index 00000000000..6ee27d9f47a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
+
+unsigned long
+func64 (unsigned long i)
+{
+  return __builtin_bswap64(i);
+}
+
+unsigned int
+func32 (unsigned int i)
+{
+  return __builtin_bswap32(i);
+}
+
+unsigned short
+func16 (unsigned short i)
+{
+  return __builtin_bswap16(i);
+}
+
+/* { dg-final { scan-assembler-times "rev8" 3 } } */
-- 
2.32.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 2/8] RISC-V: costs: handle BSWAP
  2021-11-11 14:10 [PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support Philipp Tomsich
  2021-11-11 14:10 ` [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode Philipp Tomsich
@ 2021-11-11 14:10 ` Philipp Tomsich
  2021-11-17 14:52   ` Kito Cheng
  2021-11-11 14:10 ` [PATCH v1 3/8] RISC-V: costs: support shift-and-add in strength-reduction Philipp Tomsich
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 14:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, wilson, Philipp Tomsich

The BSWAP operation is not handled in rtx_costs. Add it.

gcc/ChangeLog:

        * config/riscv/riscv.c (rtx_costs): Add BSWAP.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

 gcc/config/riscv/riscv.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index c77b0322869..8480cf09294 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2131,6 +2131,14 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
       *total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND);
       return false;
 
+    case BSWAP:
+      if (TARGET_ZBB)
+	{
+	  *total = COSTS_N_INSNS (1);
+	  return true;
+	}
+      return false;
+
     case FLOAT:
     case UNSIGNED_FLOAT:
     case FIX:
-- 
2.32.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 3/8] RISC-V: costs: support shift-and-add in strength-reduction
  2021-11-11 14:10 [PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support Philipp Tomsich
  2021-11-11 14:10 ` [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode Philipp Tomsich
  2021-11-11 14:10 ` [PATCH v1 2/8] RISC-V: costs: handle BSWAP Philipp Tomsich
@ 2021-11-11 14:10 ` Philipp Tomsich
  2021-11-11 14:10 ` [PATCH v1 4/8] RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode Philipp Tomsich
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 14:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, wilson, Philipp Tomsich

The strength-reduction implementation in expmed.c will assess the
profitability of using shift-and-add using a RTL expression that wraps
a MULT (with a power-of-2) in a PLUS.  Unless the RISC-V rtx_costs
function recognizes this as expressing a sh[123]add instruction, we
will return an inflated cost, thus defeating the optimization.

This change adds the necessary idiom recognition to provide an
accurate cost for this for of expressing sh[123]add.

Instead on expanding to
	li	a5,200
	mulw	a0,a5,a0
with this change, the expression 'a * 200' is sythesized as:
	sh2add	a0,a0,a0   // *5 = a + 4 * a
	sh2add	a0,a0,a0   // *5 = a + 4 * a
	slli	a0,a0,3    // *8

gcc/ChangeLog:

	* config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
	if expressed as a plus and multiplication with a power-of-2.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

 gcc/config/riscv/riscv.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 8480cf09294..dff4e370471 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -2020,6 +2020,20 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
 	  *total = COSTS_N_INSNS (1);
 	  return true;
 	}
+      /* Before strength-reduction, the shNadd can be expressed as the addition
+	 of a multiplication with a power-of-two.  If this case is not handled,
+	 the strength-reduction in expmed.c will calculate an inflated cost. */
+      if (TARGET_ZBA
+	  && ((!TARGET_64BIT && (mode == SImode)) ||
+	      (TARGET_64BIT && (mode == DImode)))
+	  && (GET_CODE (XEXP (x, 0)) == MULT)
+	  && REG_P (XEXP (XEXP (x, 0), 0))
+	  && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+	  && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
+	{
+	  *total = COSTS_N_INSNS (1);
+	  return true;
+	}
       /* shNadd.uw pattern for zba.
 	 [(set (match_operand:DI 0 "register_operand" "=r")
 	       (plus:DI
-- 
2.32.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 4/8] RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode
  2021-11-11 14:10 [PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support Philipp Tomsich
                   ` (2 preceding siblings ...)
  2021-11-11 14:10 ` [PATCH v1 3/8] RISC-V: costs: support shift-and-add in strength-reduction Philipp Tomsich
@ 2021-11-11 14:10 ` Philipp Tomsich
  2021-11-11 14:10 ` [PATCH v1 5/8] RISC-V: bitmanip: improvements to rotate instructions Philipp Tomsich
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 14:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, wilson, Philipp Tomsich

The SINGLE_BIT_MASK_OPERAND() is overly restrictive, triggering for
bits above 31 only (to side-step any issues with the negative SImode
value 0x80000000).  This moves the special handling of this SImode
value (i.e. the check for -2147483648) to riscv.c and relaxes the
SINGLE_BIT_MASK_OPERAND() test.

This changes the code-generation for loading (1ULL << 31) from:
	li	a0,1
	slli	a0,a0,31
to:
	bseti	a0,zero,31

gcc/ChangeLog:

	* config/riscv/riscv.c (riscv_build_integer_1): Rewrite value as
	-2147483648 for the single-bit case, when operating on 0x80000000
	in SImode.
	* gcc/config/riscv/riscv.h (SINGLE_BIT_MASK_OPERAND): Allow for
	any single-bit value, moving the special case for 0x80000000 to
	riscv_build_integer_1 (in riscv.c).

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

 gcc/config/riscv/riscv.c |  9 +++++++++
 gcc/config/riscv/riscv.h | 11 ++++-------
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index dff4e370471..4c30d4e521d 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -415,6 +415,15 @@ riscv_build_integer_1 (struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS],
       /* Simply BSETI.  */
       codes[0].code = UNKNOWN;
       codes[0].value = value;
+
+      /* RISC-V sign-extends all 32bit values that life in a 32bit
+	 register.  To avoid paradoxes, we thus need to use the
+	 sign-extended (negative) representation for the value, if we
+	 want to build 0x80000000 in SImode.  This will then expand
+	 to an ADDI/LI instruction.  */
+      if (mode == SImode && value == 0x80000000)
+	codes[0].value = -2147483648;
+
       return 1;
     }
 
diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index 64287124735..abb121ddbea 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -526,13 +526,10 @@ enum reg_class
   (((VALUE) | ((1UL<<31) - IMM_REACH)) == ((1UL<<31) - IMM_REACH)	\
    || ((VALUE) | ((1UL<<31) - IMM_REACH)) + IMM_REACH == 0)
 
-/* If this is a single bit mask, then we can load it with bseti.  But this
-   is not useful for any of the low 31 bits because we can use addi or lui
-   to load them.  It is wrong for loading SImode 0x80000000 on rv64 because it
-   needs to be sign-extended.  So we restrict this to the upper 32-bits
-   only.  */
-#define SINGLE_BIT_MASK_OPERAND(VALUE) \
-  (pow2p_hwi (VALUE) && (ctz_hwi (VALUE) >= 32))
+/* If this is a single bit mask, then we can load it with bseti.  Special
+   handling of SImode 0x80000000 on RV64 is done in riscv_build_integer_1. */
+#define SINGLE_BIT_MASK_OPERAND(VALUE)					\
+  (pow2p_hwi (VALUE))
 
 /* Stack layout; function entry, exit and calling.  */
 
-- 
2.32.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 5/8] RISC-V: bitmanip: improvements to rotate instructions
  2021-11-11 14:10 [PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support Philipp Tomsich
                   ` (3 preceding siblings ...)
  2021-11-11 14:10 ` [PATCH v1 4/8] RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode Philipp Tomsich
@ 2021-11-11 14:10 ` Philipp Tomsich
  2021-11-11 14:10 ` [PATCH v1 6/8] RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ? 0 : -1" Philipp Tomsich
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 14:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, wilson, Philipp Tomsich

This change improves rotate instructions (motivated by a review of the
code generated for OpenSSL): rotate-left by a constant are synthesized
using a rotate-right-immediate to avoid putting the shift-amount into
a temporary; to do so, we allow either a register or an immediate for
the expansion of rotl<mode>3 and then check if the shift-amount is a
constant.

Without these changes, the function
    unsigned int f(unsigned int a)
    {
      return (a << 2) | (a >> 30);
    }
turns into
    li      a5,2
    rolw    a0,a0,a5
while these changes give us:
    roriw   a0,a0,30

gcc/ChangeLog:

	* config/riscv/bitmanip.md (rotlsi3, rotldi3, rotlsi3_sext):
	Synthesize rotate-left-by-immediate from a rotate-right insn.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

 gcc/config/riscv/bitmanip.md | 39 ++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 59779b48f27..178d1ca0e4b 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -204,25 +204,52 @@ (define_insn "rotrsi3_sext"
 (define_insn "rotlsi3"
   [(set (match_operand:SI 0 "register_operand" "=r")
 	(rotate:SI (match_operand:SI 1 "register_operand" "r")
-		   (match_operand:QI 2 "register_operand" "r")))]
+		   (match_operand:QI 2 "arith_operand" "rI")))]
   "TARGET_ZBB"
-  { return TARGET_64BIT ? "rolw\t%0,%1,%2" : "rol\t%0,%1,%2"; }
+  {
+    /* If the rotate-amount is constant, let's synthesize using a
+       rotate-right-immediate instead of using a temporary. */
+
+    if (CONST_INT_P(operands[2])) {
+      operands[2] = GEN_INT(32 - INTVAL(operands[2]));
+      return TARGET_64BIT ? "roriw\t%0,%1,%2" : "rori\t%0,%1,%2";
+    }
+
+    return TARGET_64BIT ? "rolw\t%0,%1,%2" : "rol\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
 (define_insn "rotldi3"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(rotate:DI (match_operand:DI 1 "register_operand" "r")
-		   (match_operand:QI 2 "register_operand" "r")))]
+		   (match_operand:QI 2 "arith_operand" "rI")))]
   "TARGET_64BIT && TARGET_ZBB"
-  "rol\t%0,%1,%2"
+  {
+    if (CONST_INT_P(operands[2])) {
+      operands[2] = GEN_INT(64 - INTVAL(operands[2]));
+      return "rori\t%0,%1,%2";
+    }
+
+    return "rol\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
+;; Until we have improved REE to understand that sign-extending the result of
+;; an implicitly sign-extending operation is redundant, we need an additional
+;; pattern to gobble up the redundant sign-extension.
 (define_insn "rotlsi3_sext"
   [(set (match_operand:DI 0 "register_operand" "=r")
 	(sign_extend:DI (rotate:SI (match_operand:SI 1 "register_operand" "r")
-				   (match_operand:QI 2 "register_operand" "r"))))]
+				   (match_operand:QI 2 "arith_operand" "rI"))))]
   "TARGET_64BIT && TARGET_ZBB"
-  "rolw\t%0,%1,%2"
+  {
+    if (CONST_INT_P(operands[2])) {
+      operands[2] = GEN_INT(32 - INTVAL(operands[2]));
+      return "roriw\t%0,%1,%2";
+    }
+
+    return "rolw\t%0,%1,%2";
+  }
   [(set_attr "type" "bitmanip")])
 
 (define_insn "bswap<mode>2"
-- 
2.32.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 6/8] RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ? 0 : -1"
  2021-11-11 14:10 [PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support Philipp Tomsich
                   ` (4 preceding siblings ...)
  2021-11-11 14:10 ` [PATCH v1 5/8] RISC-V: bitmanip: improvements to rotate instructions Philipp Tomsich
@ 2021-11-11 14:10 ` Philipp Tomsich
  2021-11-18  9:45   ` Kito Cheng
  2021-11-11 14:10 ` [PATCH v1 7/8] RISC-V: bitmanip: add orc.b as an unspec Philipp Tomsich
  2021-11-11 14:10 ` [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR Philipp Tomsich
  7 siblings, 1 reply; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 14:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, wilson, Philipp Tomsich

Consider creating a polarity-reversed mask from a set-bit (i.e., if
the bit is set, produce all-ones; otherwise: all-zeros).  Using Zbb,
this can be expressed as bexti, followed by an addi of minus-one.  To
enable the combiner to discover this opportunity, we need to split the
canonical expression for "(a & (1 << BIT_NO)) ? 0 : -1" into a form
combinable into bexti.

Consider the function:
    long f(long a)
    {
      return (a & (1 << BIT_NO)) ? 0 : -1;
    }
This produces the following sequence prior to this change:
	andi	a0,a0,16
	seqz	a0,a0
	neg	a0,a0
	ret
Following this change, it results in:
	bexti	a0,a0,4
	addi	a0,a0,-1
	ret

gcc/ChangeLog:

	* config/riscv/bitmanip.md: Add a splitter to generate
          polarity-reversed masks from a set bit using bexti + addi.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/zbs-bexti.c: New test.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

 gcc/config/riscv/bitmanip.md               | 13 +++++++++++++
 gcc/testsuite/gcc.target/riscv/zbs-bexti.c | 14 ++++++++++++++
 2 files changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-bexti.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 178d1ca0e4b..9e10280e306 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -367,3 +367,16 @@ (define_insn "*bexti"
   "TARGET_ZBS"
   "bexti\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
+
+;; We can create a polarity-reversed mask (i.e. bit N -> { set = 0, clear = -1 })
+;; using a bext(i) followed by an addi instruction.
+;; This splits the canonical representation of "(a & (1 << BIT_NO)) ? 0 : -1".
+(define_split
+  [(set (match_operand:GPR 0 "register_operand")
+       (neg:GPR (eq:GPR (zero_extract:GPR (match_operand:GPR 1 "register_operand")
+                                          (const_int 1)
+                                          (match_operand 2))
+                        (const_int 0))))]
+  "TARGET_ZBB"
+  [(set (match_dup 0) (zero_extract:GPR (match_dup 1) (const_int 1) (match_dup 2)))
+   (set (match_dup 0) (plus:GPR (match_dup 0) (const_int -1)))])
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
new file mode 100644
index 00000000000..d02c3f7a98d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64 -O2" } */
+
+/* bexti */
+#define BIT_NO  27
+
+long
+foo0 (long a)
+{
+  return (a & (1 << BIT_NO)) ? 0 : -1;
+}
+
+/* { dg-final { scan-assembler "bexti" } } */
+/* { dg-final { scan-assembler "addi" } } */
-- 
2.32.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 7/8] RISC-V: bitmanip: add orc.b as an unspec
  2021-11-11 14:10 [PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support Philipp Tomsich
                   ` (5 preceding siblings ...)
  2021-11-11 14:10 ` [PATCH v1 6/8] RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ? 0 : -1" Philipp Tomsich
@ 2021-11-11 14:10 ` Philipp Tomsich
  2021-11-11 14:10 ` [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR Philipp Tomsich
  7 siblings, 0 replies; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 14:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, wilson, Philipp Tomsich

As a basis for optimized string functions (e.g., the by-pieces
implementations), we need orc.b available.  This adds orc.b as an
unspec, so we can expand to it.

gcc/ChangeLog:

	* config/riscv/bitmanip.md (orcb<mode>2): Add orc.b as an unspec.
	* config/riscv/riscv.md: Add UNSPEC_ORC_B.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

 gcc/config/riscv/bitmanip.md | 8 ++++++++
 gcc/config/riscv/riscv.md    | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 9e10280e306..000deb48b16 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -267,6 +267,14 @@ (define_insn "<bitmanip_optab><mode>3"
   "<bitmanip_insn>\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; orc.b (or-combine) is added as an unspec for the benefit of the support
+;; for optimized string functions (such as strcmp).
+(define_insn "orcb<mode>2"
+  [(set (match_operand:X 0 "register_operand" "=r")
+       (unspec:X [(match_operand:X 1 "register_operand")] UNSPEC_ORC_B))]
+  "TARGET_ZBB"
+  "orc.b\t%0,%1")
+
 ;; ZBS extension.
 
 (define_insn "*bset<mode>"
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 225e5b259c1..7a2501ec7a9 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -45,6 +45,9 @@ (define_c_enum "unspec" [
 
   ;; Stack tie
   UNSPEC_TIE
+
+  ;; Zbb OR-combine instruction
+  UNSPEC_ORC_B
 ])
 
 (define_c_enum "unspecv" [
-- 
2.32.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR
  2021-11-11 14:10 [PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support Philipp Tomsich
                   ` (6 preceding siblings ...)
  2021-11-11 14:10 ` [PATCH v1 7/8] RISC-V: bitmanip: add orc.b as an unspec Philipp Tomsich
@ 2021-11-11 14:10 ` Philipp Tomsich
  2021-11-11 16:00   ` Kito Cheng
  7 siblings, 1 reply; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 14:10 UTC (permalink / raw)
  To: gcc-patches; +Cc: kito.cheng, wilson, Philipp Tomsich

While min/minu/max/maxu instructions are provided for XLEN only, these
can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is
always sign-extended, which ensures that the XLEN-wide instructions
can be used for signed and unsigned comparisons on SImode yielding a
correct ordering of value.

This commit
 - relaxes the minmax pattern to express for GPR (instead of X only),
   providing both a si3 and di3 expansion on RV64
 - adds a sign-extending form for thee si3 pattern for RV64 to all REE
   to eliminate redundant extensions
 - adds test-cases for both

gcc/ChangeLog:

	* config/riscv/bitmanip.md: Relax minmax to GPR (i.e SImode or
	  DImode) on RV64.
	* config/riscv/bitmanip.md (<bitmanip_optab>si3_sext): Add
          pattern for REE.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/zbb-min-max.c: Add testcases for SImode
          operands checking that no redundant sign- or zero-extensions
          are emitted.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

 gcc/config/riscv/bitmanip.md                 | 14 +++++++++++---
 gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +++++++++++++++++---
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 000deb48b16..2a28f78f5f6 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -260,13 +260,21 @@ (define_insn "bswap<mode>2"
   [(set_attr "type" "bitmanip")])
 
 (define_insn "<bitmanip_optab><mode>3"
-  [(set (match_operand:X 0 "register_operand" "=r")
-        (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
-			   (match_operand:X 2 "register_operand" "r")))]
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+        (bitmanip_minmax:GPR (match_operand:GPR 1 "register_operand" "r")
+			     (match_operand:GPR 2 "register_operand" "r")))]
   "TARGET_ZBB"
   "<bitmanip_insn>\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+(define_insn "<bitmanip_optab>si3_sext"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+        (sign_extend:DI (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
+			     (match_operand:SI 2 "register_operand" "r"))))]
+  "TARGET_64BIT && TARGET_ZBB"
+  "<bitmanip_insn>\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
 ;; orc.b (or-combine) is added as an unspec for the benefit of the support
 ;; for optimized string functions (such as strcmp).
 (define_insn "orcb<mode>2"
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
index f44c398ea08..7169e873551 100644
--- a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
+/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64 -O2" } */
 
 long
 foo1 (long i, long j)
@@ -25,7 +25,21 @@ foo4 (unsigned long i, unsigned long j)
   return i > j ? i : j;
 }
 
+unsigned int
+foo5(unsigned int a, unsigned int b)
+{
+  return a > b ? a : b;
+}
+
+int
+foo6(int a, int b)
+{
+  return a > b ? a : b;
+}
+
 /* { dg-final { scan-assembler-times "min" 3 } } */
-/* { dg-final { scan-assembler-times "max" 3 } } */
+/* { dg-final { scan-assembler-times "max" 4 } } */
 /* { dg-final { scan-assembler-times "minu" 1 } } */
-/* { dg-final { scan-assembler-times "maxu" 1 } } */
+/* { dg-final { scan-assembler-times "maxu" 3 } } */
+/* { dg-final { scan-assembler-not "zext.w" } } */
+/* { dg-final { scan-assembler-not "sext.w" } } */
-- 
2.32.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR
  2021-11-11 14:10 ` [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR Philipp Tomsich
@ 2021-11-11 16:00   ` Kito Cheng
  2021-11-11 16:18     ` Philipp Tomsich
  0 siblings, 1 reply; 19+ messages in thread
From: Kito Cheng @ 2021-11-11 16:00 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches, wilson

Hi Philipp:

We can't pretend we have SImode min/max instruction without that semantic.
Give this testcase, x86 and rv64gc print out 8589934592 8589934591 = 0,
but with this patch and compile with rv64gc_zba_zbb -O3, the output
become 8589934592 8589934591 = 8589934592

-------------Testcase---------------
#include <stdio.h>
long long __attribute__((noinline, noipa))
foo6(long long a, long long b)
{
  int xa = a;
  int xb = b;
  return (xa > xb ? xa : xb);
}
int main() {
  long long a = 0x200000000ll;
  long long b = 0x1ffffffffl;
  long long c = foo6(a, b);
  printf ("%lld %lld = %lld\n", a, b, c);
  return 0;
}
--------------------------------------
v64gc_zba_zbb -O3 w/o this patch:
foo6:
        sext.w  a1,a1
        sext.w  a0,a0
        max     a0,a0,a1
        ret

--------------------------------------
v64gc_zba_zbb -O3 w/ this patch:
foo6:
        max     a0,a0,a1
        ret

On Thu, Nov 11, 2021 at 10:10 PM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:
>
> While min/minu/max/maxu instructions are provided for XLEN only, these
> can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is
> always sign-extended, which ensures that the XLEN-wide instructions
> can be used for signed and unsigned comparisons on SImode yielding a
> correct ordering of value.
>
> This commit
>  - relaxes the minmax pattern to express for GPR (instead of X only),
>    providing both a si3 and di3 expansion on RV64
>  - adds a sign-extending form for thee si3 pattern for RV64 to all REE
>    to eliminate redundant extensions
>  - adds test-cases for both
>
> gcc/ChangeLog:
>
>         * config/riscv/bitmanip.md: Relax minmax to GPR (i.e SImode or
>           DImode) on RV64.
>         * config/riscv/bitmanip.md (<bitmanip_optab>si3_sext): Add
>           pattern for REE.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/zbb-min-max.c: Add testcases for SImode
>           operands checking that no redundant sign- or zero-extensions
>           are emitted.
>
> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> ---
>
>  gcc/config/riscv/bitmanip.md                 | 14 +++++++++++---
>  gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +++++++++++++++++---
>  2 files changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index 000deb48b16..2a28f78f5f6 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -260,13 +260,21 @@ (define_insn "bswap<mode>2"
>    [(set_attr "type" "bitmanip")])
>
>  (define_insn "<bitmanip_optab><mode>3"
> -  [(set (match_operand:X 0 "register_operand" "=r")
> -        (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
> -                          (match_operand:X 2 "register_operand" "r")))]
> +  [(set (match_operand:GPR 0 "register_operand" "=r")
> +        (bitmanip_minmax:GPR (match_operand:GPR 1 "register_operand" "r")
> +                            (match_operand:GPR 2 "register_operand" "r")))]
>    "TARGET_ZBB"
>    "<bitmanip_insn>\t%0,%1,%2"
>    [(set_attr "type" "bitmanip")])
>
> +(define_insn "<bitmanip_optab>si3_sext"
> +  [(set (match_operand:DI 0 "register_operand" "=r")
> +        (sign_extend:DI (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> +                            (match_operand:SI 2 "register_operand" "r"))))]
> +  "TARGET_64BIT && TARGET_ZBB"
> +  "<bitmanip_insn>\t%0,%1,%2"
> +  [(set_attr "type" "bitmanip")])
> +
>  ;; orc.b (or-combine) is added as an unspec for the benefit of the support
>  ;; for optimized string functions (such as strcmp).
>  (define_insn "orcb<mode>2"
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> index f44c398ea08..7169e873551 100644
> --- a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
> +/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64 -O2" } */
>
>  long
>  foo1 (long i, long j)
> @@ -25,7 +25,21 @@ foo4 (unsigned long i, unsigned long j)
>    return i > j ? i : j;
>  }
>
> +unsigned int
> +foo5(unsigned int a, unsigned int b)
> +{
> +  return a > b ? a : b;
> +}
> +
> +int
> +foo6(int a, int b)
> +{
> +  return a > b ? a : b;
> +}
> +
>  /* { dg-final { scan-assembler-times "min" 3 } } */
> -/* { dg-final { scan-assembler-times "max" 3 } } */
> +/* { dg-final { scan-assembler-times "max" 4 } } */
>  /* { dg-final { scan-assembler-times "minu" 1 } } */
> -/* { dg-final { scan-assembler-times "maxu" 1 } } */
> +/* { dg-final { scan-assembler-times "maxu" 3 } } */
> +/* { dg-final { scan-assembler-not "zext.w" } } */
> +/* { dg-final { scan-assembler-not "sext.w" } } */
> --
> 2.32.0
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR
  2021-11-11 16:00   ` Kito Cheng
@ 2021-11-11 16:18     ` Philipp Tomsich
  2021-11-11 16:27       ` Kito Cheng
  0 siblings, 1 reply; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 16:18 UTC (permalink / raw)
  To: Kito Cheng; +Cc: GCC Patches, wilson

Kito,

Unless I am missing something, the problem is not the relaxation to
GPR, but rather the sign-extending pattern I had squashed into the
same patch.
If you disable "<bitmanip_optab>si3_sext", a sext.w will be have to be
emitted after the 'max' and before the return (or before the SImode
output is consumed as a DImode), pushing the REE opportunity to a
subsequent consumer (e.g. an addw).

This will generate
   foo6:
      max a0,a0,a1
      sext.w a0,a0
      ret
which (assuming that the inputs to max are properly sign-extended
SImode values living in DImode registers) will be the same as
performing the two sext.w before the max.

Having a second set of eyes on this is appreciated — let me know if
you agree and I'll revise, once I have collected feedback on the
remaining patches of the series.

Philipp.


On Thu, 11 Nov 2021 at 17:00, Kito Cheng <kito.cheng@gmail.com> wrote:
>
> Hi Philipp:
>
> We can't pretend we have SImode min/max instruction without that semantic.
> Give this testcase, x86 and rv64gc print out 8589934592 8589934591 = 0,
> but with this patch and compile with rv64gc_zba_zbb -O3, the output
> become 8589934592 8589934591 = 8589934592
>
> -------------Testcase---------------
> #include <stdio.h>
> long long __attribute__((noinline, noipa))
> foo6(long long a, long long b)
> {
>   int xa = a;
>   int xb = b;
>   return (xa > xb ? xa : xb);
> }
> int main() {
>   long long a = 0x200000000ll;
>   long long b = 0x1ffffffffl;
>   long long c = foo6(a, b);
>   printf ("%lld %lld = %lld\n", a, b, c);
>   return 0;
> }
> --------------------------------------
> v64gc_zba_zbb -O3 w/o this patch:
> foo6:
>         sext.w  a1,a1
>         sext.w  a0,a0
>         max     a0,a0,a1
>         ret
>
> --------------------------------------
> v64gc_zba_zbb -O3 w/ this patch:
> foo6:
>         max     a0,a0,a1
>         ret
>
> On Thu, Nov 11, 2021 at 10:10 PM Philipp Tomsich
> <philipp.tomsich@vrull.eu> wrote:
> >
> > While min/minu/max/maxu instructions are provided for XLEN only, these
> > can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is
> > always sign-extended, which ensures that the XLEN-wide instructions
> > can be used for signed and unsigned comparisons on SImode yielding a
> > correct ordering of value.
> >
> > This commit
> >  - relaxes the minmax pattern to express for GPR (instead of X only),
> >    providing both a si3 and di3 expansion on RV64
> >  - adds a sign-extending form for thee si3 pattern for RV64 to all REE
> >    to eliminate redundant extensions
> >  - adds test-cases for both
> >
> > gcc/ChangeLog:
> >
> >         * config/riscv/bitmanip.md: Relax minmax to GPR (i.e SImode or
> >           DImode) on RV64.
> >         * config/riscv/bitmanip.md (<bitmanip_optab>si3_sext): Add
> >           pattern for REE.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/zbb-min-max.c: Add testcases for SImode
> >           operands checking that no redundant sign- or zero-extensions
> >           are emitted.
> >
> > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > ---
> >
> >  gcc/config/riscv/bitmanip.md                 | 14 +++++++++++---
> >  gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +++++++++++++++++---
> >  2 files changed, 28 insertions(+), 6 deletions(-)
> >
> > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > index 000deb48b16..2a28f78f5f6 100644
> > --- a/gcc/config/riscv/bitmanip.md
> > +++ b/gcc/config/riscv/bitmanip.md
> > @@ -260,13 +260,21 @@ (define_insn "bswap<mode>2"
> >    [(set_attr "type" "bitmanip")])
> >
> >  (define_insn "<bitmanip_optab><mode>3"
> > -  [(set (match_operand:X 0 "register_operand" "=r")
> > -        (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
> > -                          (match_operand:X 2 "register_operand" "r")))]
> > +  [(set (match_operand:GPR 0 "register_operand" "=r")
> > +        (bitmanip_minmax:GPR (match_operand:GPR 1 "register_operand" "r")
> > +                            (match_operand:GPR 2 "register_operand" "r")))]
> >    "TARGET_ZBB"
> >    "<bitmanip_insn>\t%0,%1,%2"
> >    [(set_attr "type" "bitmanip")])
> >
> > +(define_insn "<bitmanip_optab>si3_sext"
> > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > +        (sign_extend:DI (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> > +                            (match_operand:SI 2 "register_operand" "r"))))]
> > +  "TARGET_64BIT && TARGET_ZBB"
> > +  "<bitmanip_insn>\t%0,%1,%2"
> > +  [(set_attr "type" "bitmanip")])
> > +
> >  ;; orc.b (or-combine) is added as an unspec for the benefit of the support
> >  ;; for optimized string functions (such as strcmp).
> >  (define_insn "orcb<mode>2"
> > diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> > index f44c398ea08..7169e873551 100644
> > --- a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> > +++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> > @@ -1,5 +1,5 @@
> >  /* { dg-do compile } */
> > -/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
> > +/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64 -O2" } */
> >
> >  long
> >  foo1 (long i, long j)
> > @@ -25,7 +25,21 @@ foo4 (unsigned long i, unsigned long j)
> >    return i > j ? i : j;
> >  }
> >
> > +unsigned int
> > +foo5(unsigned int a, unsigned int b)
> > +{
> > +  return a > b ? a : b;
> > +}
> > +
> > +int
> > +foo6(int a, int b)
> > +{
> > +  return a > b ? a : b;
> > +}
> > +
> >  /* { dg-final { scan-assembler-times "min" 3 } } */
> > -/* { dg-final { scan-assembler-times "max" 3 } } */
> > +/* { dg-final { scan-assembler-times "max" 4 } } */
> >  /* { dg-final { scan-assembler-times "minu" 1 } } */
> > -/* { dg-final { scan-assembler-times "maxu" 1 } } */
> > +/* { dg-final { scan-assembler-times "maxu" 3 } } */
> > +/* { dg-final { scan-assembler-not "zext.w" } } */
> > +/* { dg-final { scan-assembler-not "sext.w" } } */
> > --
> > 2.32.0
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR
  2021-11-11 16:18     ` Philipp Tomsich
@ 2021-11-11 16:27       ` Kito Cheng
  2021-11-11 16:42         ` Kito Cheng
  0 siblings, 1 reply; 19+ messages in thread
From: Kito Cheng @ 2021-11-11 16:27 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: wilson, GCC Patches

IIRC it's not work even without sign extend pattern since I did similar
experimental before (not for RISC-V, but same concept), I guess I need more
time to test that.

Philipp Tomsich <philipp.tomsich@vrull.eu> 於 2021年11月12日 週五 00:18 寫道:

> Kito,
>
> Unless I am missing something, the problem is not the relaxation to
> GPR, but rather the sign-extending pattern I had squashed into the
> same patch.
> If you disable "<bitmanip_optab>si3_sext", a sext.w will be have to be
> emitted after the 'max' and before the return (or before the SImode
> output is consumed as a DImode), pushing the REE opportunity to a
> subsequent consumer (e.g. an addw).
>
> This will generate
>    foo6:
>       max a0,a0,a1
>       sext.w a0,a0
>       ret
> which (assuming that the inputs to max are properly sign-extended
> SImode values living in DImode registers) will be the same as
> performing the two sext.w before the max.
>
> Having a second set of eyes on this is appreciated — let me know if
> you agree and I'll revise, once I have collected feedback on the
> remaining patches of the series.
>
> Philipp.
>
>
> On Thu, 11 Nov 2021 at 17:00, Kito Cheng <kito.cheng@gmail.com> wrote:
> >
> > Hi Philipp:
> >
> > We can't pretend we have SImode min/max instruction without that
> semantic.
> > Give this testcase, x86 and rv64gc print out 8589934592 8589934591 = 0,
> > but with this patch and compile with rv64gc_zba_zbb -O3, the output
> > become 8589934592 8589934591 = 8589934592
> >
> > -------------Testcase---------------
> > #include <stdio.h>
> > long long __attribute__((noinline, noipa))
> > foo6(long long a, long long b)
> > {
> >   int xa = a;
> >   int xb = b;
> >   return (xa > xb ? xa : xb);
> > }
> > int main() {
> >   long long a = 0x200000000ll;
> >   long long b = 0x1ffffffffl;
> >   long long c = foo6(a, b);
> >   printf ("%lld %lld = %lld\n", a, b, c);
> >   return 0;
> > }
> > --------------------------------------
> > v64gc_zba_zbb -O3 w/o this patch:
> > foo6:
> >         sext.w  a1,a1
> >         sext.w  a0,a0
> >         max     a0,a0,a1
> >         ret
> >
> > --------------------------------------
> > v64gc_zba_zbb -O3 w/ this patch:
> > foo6:
> >         max     a0,a0,a1
> >         ret
> >
> > On Thu, Nov 11, 2021 at 10:10 PM Philipp Tomsich
> > <philipp.tomsich@vrull.eu> wrote:
> > >
> > > While min/minu/max/maxu instructions are provided for XLEN only, these
> > > can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is
> > > always sign-extended, which ensures that the XLEN-wide instructions
> > > can be used for signed and unsigned comparisons on SImode yielding a
> > > correct ordering of value.
> > >
> > > This commit
> > >  - relaxes the minmax pattern to express for GPR (instead of X only),
> > >    providing both a si3 and di3 expansion on RV64
> > >  - adds a sign-extending form for thee si3 pattern for RV64 to all REE
> > >    to eliminate redundant extensions
> > >  - adds test-cases for both
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/riscv/bitmanip.md: Relax minmax to GPR (i.e SImode or
> > >           DImode) on RV64.
> > >         * config/riscv/bitmanip.md (<bitmanip_optab>si3_sext): Add
> > >           pattern for REE.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/riscv/zbb-min-max.c: Add testcases for SImode
> > >           operands checking that no redundant sign- or zero-extensions
> > >           are emitted.
> > >
> > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > > ---
> > >
> > >  gcc/config/riscv/bitmanip.md                 | 14 +++++++++++---
> > >  gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +++++++++++++++++---
> > >  2 files changed, 28 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/gcc/config/riscv/bitmanip.md
> b/gcc/config/riscv/bitmanip.md
> > > index 000deb48b16..2a28f78f5f6 100644
> > > --- a/gcc/config/riscv/bitmanip.md
> > > +++ b/gcc/config/riscv/bitmanip.md
> > > @@ -260,13 +260,21 @@ (define_insn "bswap<mode>2"
> > >    [(set_attr "type" "bitmanip")])
> > >
> > >  (define_insn "<bitmanip_optab><mode>3"
> > > -  [(set (match_operand:X 0 "register_operand" "=r")
> > > -        (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
> > > -                          (match_operand:X 2 "register_operand"
> "r")))]
> > > +  [(set (match_operand:GPR 0 "register_operand" "=r")
> > > +        (bitmanip_minmax:GPR (match_operand:GPR 1 "register_operand"
> "r")
> > > +                            (match_operand:GPR 2 "register_operand"
> "r")))]
> > >    "TARGET_ZBB"
> > >    "<bitmanip_insn>\t%0,%1,%2"
> > >    [(set_attr "type" "bitmanip")])
> > >
> > > +(define_insn "<bitmanip_optab>si3_sext"
> > > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > > +        (sign_extend:DI (bitmanip_minmax:SI (match_operand:SI 1
> "register_operand" "r")
> > > +                            (match_operand:SI 2 "register_operand"
> "r"))))]
> > > +  "TARGET_64BIT && TARGET_ZBB"
> > > +  "<bitmanip_insn>\t%0,%1,%2"
> > > +  [(set_attr "type" "bitmanip")])
> > > +
> > >  ;; orc.b (or-combine) is added as an unspec for the benefit of the
> support
> > >  ;; for optimized string functions (such as strcmp).
> > >  (define_insn "orcb<mode>2"
> > > diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> > > index f44c398ea08..7169e873551 100644
> > > --- a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> > > +++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> > > @@ -1,5 +1,5 @@
> > >  /* { dg-do compile } */
> > > -/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
> > > +/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64 -O2" } */
> > >
> > >  long
> > >  foo1 (long i, long j)
> > > @@ -25,7 +25,21 @@ foo4 (unsigned long i, unsigned long j)
> > >    return i > j ? i : j;
> > >  }
> > >
> > > +unsigned int
> > > +foo5(unsigned int a, unsigned int b)
> > > +{
> > > +  return a > b ? a : b;
> > > +}
> > > +
> > > +int
> > > +foo6(int a, int b)
> > > +{
> > > +  return a > b ? a : b;
> > > +}
> > > +
> > >  /* { dg-final { scan-assembler-times "min" 3 } } */
> > > -/* { dg-final { scan-assembler-times "max" 3 } } */
> > > +/* { dg-final { scan-assembler-times "max" 4 } } */
> > >  /* { dg-final { scan-assembler-times "minu" 1 } } */
> > > -/* { dg-final { scan-assembler-times "maxu" 1 } } */
> > > +/* { dg-final { scan-assembler-times "maxu" 3 } } */
> > > +/* { dg-final { scan-assembler-not "zext.w" } } */
> > > +/* { dg-final { scan-assembler-not "sext.w" } } */
> > > --
> > > 2.32.0
> > >
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR
  2021-11-11 16:27       ` Kito Cheng
@ 2021-11-11 16:42         ` Kito Cheng
  2021-11-11 18:33           ` Philipp Tomsich
  0 siblings, 1 reply; 19+ messages in thread
From: Kito Cheng @ 2021-11-11 16:42 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: wilson, GCC Patches

Hi Philipp:

This testcase got wrong result with this patch even w/o
<bitmanip_optab>si3_sext pattern:

#include <stdio.h>

#define MAX(A, B) ((A) > (B) ? (A) : (B))

long long __attribute__((noinline, noipa))
foo6(long long a, long long b, int c)
{
  int xa = a;
  int xb = b;
  return MAX(MAX(xa, xb), c);
}
int main() {
  long long a = 0x200000000ll;
  long long b = 0x1ffffffffl;
  int c = 10;
  long long d = foo6(a, b, c);
  printf ("%lld %lld %d = %lld\n", a, b, c, d);
  return 0;
}

On Fri, Nov 12, 2021 at 12:27 AM Kito Cheng <kito.cheng@gmail.com> wrote:
>
> IIRC it's not work even without sign extend pattern since I did similar experimental before (not for RISC-V, but same concept), I guess I need more time to test that.
>
> Philipp Tomsich <philipp.tomsich@vrull.eu> 於 2021年11月12日 週五 00:18 寫道:
>>
>> Kito,
>>
>> Unless I am missing something, the problem is not the relaxation to
>> GPR, but rather the sign-extending pattern I had squashed into the
>> same patch.
>> If you disable "<bitmanip_optab>si3_sext", a sext.w will be have to be
>> emitted after the 'max' and before the return (or before the SImode
>> output is consumed as a DImode), pushing the REE opportunity to a
>> subsequent consumer (e.g. an addw).
>>
>> This will generate
>>    foo6:
>>       max a0,a0,a1
>>       sext.w a0,a0
>>       ret
>> which (assuming that the inputs to max are properly sign-extended
>> SImode values living in DImode registers) will be the same as
>> performing the two sext.w before the max.
>>
>> Having a second set of eyes on this is appreciated — let me know if
>> you agree and I'll revise, once I have collected feedback on the
>> remaining patches of the series.
>>
>> Philipp.
>>
>>
>> On Thu, 11 Nov 2021 at 17:00, Kito Cheng <kito.cheng@gmail.com> wrote:
>> >
>> > Hi Philipp:
>> >
>> > We can't pretend we have SImode min/max instruction without that semantic.
>> > Give this testcase, x86 and rv64gc print out 8589934592 8589934591 = 0,
>> > but with this patch and compile with rv64gc_zba_zbb -O3, the output
>> > become 8589934592 8589934591 = 8589934592
>> >
>> > -------------Testcase---------------
>> > #include <stdio.h>
>> > long long __attribute__((noinline, noipa))
>> > foo6(long long a, long long b)
>> > {
>> >   int xa = a;
>> >   int xb = b;
>> >   return (xa > xb ? xa : xb);
>> > }
>> > int main() {
>> >   long long a = 0x200000000ll;
>> >   long long b = 0x1ffffffffl;
>> >   long long c = foo6(a, b);
>> >   printf ("%lld %lld = %lld\n", a, b, c);
>> >   return 0;
>> > }
>> > --------------------------------------
>> > v64gc_zba_zbb -O3 w/o this patch:
>> > foo6:
>> >         sext.w  a1,a1
>> >         sext.w  a0,a0
>> >         max     a0,a0,a1
>> >         ret
>> >
>> > --------------------------------------
>> > v64gc_zba_zbb -O3 w/ this patch:
>> > foo6:
>> >         max     a0,a0,a1
>> >         ret
>> >
>> > On Thu, Nov 11, 2021 at 10:10 PM Philipp Tomsich
>> > <philipp.tomsich@vrull.eu> wrote:
>> > >
>> > > While min/minu/max/maxu instructions are provided for XLEN only, these
>> > > can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is
>> > > always sign-extended, which ensures that the XLEN-wide instructions
>> > > can be used for signed and unsigned comparisons on SImode yielding a
>> > > correct ordering of value.
>> > >
>> > > This commit
>> > >  - relaxes the minmax pattern to express for GPR (instead of X only),
>> > >    providing both a si3 and di3 expansion on RV64
>> > >  - adds a sign-extending form for thee si3 pattern for RV64 to all REE
>> > >    to eliminate redundant extensions
>> > >  - adds test-cases for both
>> > >
>> > > gcc/ChangeLog:
>> > >
>> > >         * config/riscv/bitmanip.md: Relax minmax to GPR (i.e SImode or
>> > >           DImode) on RV64.
>> > >         * config/riscv/bitmanip.md (<bitmanip_optab>si3_sext): Add
>> > >           pattern for REE.
>> > >
>> > > gcc/testsuite/ChangeLog:
>> > >
>> > >         * gcc.target/riscv/zbb-min-max.c: Add testcases for SImode
>> > >           operands checking that no redundant sign- or zero-extensions
>> > >           are emitted.
>> > >
>> > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
>> > > ---
>> > >
>> > >  gcc/config/riscv/bitmanip.md                 | 14 +++++++++++---
>> > >  gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +++++++++++++++++---
>> > >  2 files changed, 28 insertions(+), 6 deletions(-)
>> > >
>> > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
>> > > index 000deb48b16..2a28f78f5f6 100644
>> > > --- a/gcc/config/riscv/bitmanip.md
>> > > +++ b/gcc/config/riscv/bitmanip.md
>> > > @@ -260,13 +260,21 @@ (define_insn "bswap<mode>2"
>> > >    [(set_attr "type" "bitmanip")])
>> > >
>> > >  (define_insn "<bitmanip_optab><mode>3"
>> > > -  [(set (match_operand:X 0 "register_operand" "=r")
>> > > -        (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
>> > > -                          (match_operand:X 2 "register_operand" "r")))]
>> > > +  [(set (match_operand:GPR 0 "register_operand" "=r")
>> > > +        (bitmanip_minmax:GPR (match_operand:GPR 1 "register_operand" "r")
>> > > +                            (match_operand:GPR 2 "register_operand" "r")))]
>> > >    "TARGET_ZBB"
>> > >    "<bitmanip_insn>\t%0,%1,%2"
>> > >    [(set_attr "type" "bitmanip")])
>> > >
>> > > +(define_insn "<bitmanip_optab>si3_sext"
>> > > +  [(set (match_operand:DI 0 "register_operand" "=r")
>> > > +        (sign_extend:DI (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
>> > > +                            (match_operand:SI 2 "register_operand" "r"))))]
>> > > +  "TARGET_64BIT && TARGET_ZBB"
>> > > +  "<bitmanip_insn>\t%0,%1,%2"
>> > > +  [(set_attr "type" "bitmanip")])
>> > > +
>> > >  ;; orc.b (or-combine) is added as an unspec for the benefit of the support
>> > >  ;; for optimized string functions (such as strcmp).
>> > >  (define_insn "orcb<mode>2"
>> > > diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
>> > > index f44c398ea08..7169e873551 100644
>> > > --- a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
>> > > +++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
>> > > @@ -1,5 +1,5 @@
>> > >  /* { dg-do compile } */
>> > > -/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
>> > > +/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64 -O2" } */
>> > >
>> > >  long
>> > >  foo1 (long i, long j)
>> > > @@ -25,7 +25,21 @@ foo4 (unsigned long i, unsigned long j)
>> > >    return i > j ? i : j;
>> > >  }
>> > >
>> > > +unsigned int
>> > > +foo5(unsigned int a, unsigned int b)
>> > > +{
>> > > +  return a > b ? a : b;
>> > > +}
>> > > +
>> > > +int
>> > > +foo6(int a, int b)
>> > > +{
>> > > +  return a > b ? a : b;
>> > > +}
>> > > +
>> > >  /* { dg-final { scan-assembler-times "min" 3 } } */
>> > > -/* { dg-final { scan-assembler-times "max" 3 } } */
>> > > +/* { dg-final { scan-assembler-times "max" 4 } } */
>> > >  /* { dg-final { scan-assembler-times "minu" 1 } } */
>> > > -/* { dg-final { scan-assembler-times "maxu" 1 } } */
>> > > +/* { dg-final { scan-assembler-times "maxu" 3 } } */
>> > > +/* { dg-final { scan-assembler-not "zext.w" } } */
>> > > +/* { dg-final { scan-assembler-not "sext.w" } } */
>> > > --
>> > > 2.32.0
>> > >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR
  2021-11-11 16:42         ` Kito Cheng
@ 2021-11-11 18:33           ` Philipp Tomsich
  0 siblings, 0 replies; 19+ messages in thread
From: Philipp Tomsich @ 2021-11-11 18:33 UTC (permalink / raw)
  To: Kito Cheng; +Cc: wilson, GCC Patches

Kito,

Thanks for the reality-check: the subreg-expressions are getting in the way.
I'll drop this from v2, as a permanent resolution for this will be a
bit more involved.

Philipp.

On Thu, 11 Nov 2021 at 17:42, Kito Cheng <kito.cheng@gmail.com> wrote:
>
> Hi Philipp:
>
> This testcase got wrong result with this patch even w/o
> <bitmanip_optab>si3_sext pattern:
>
> #include <stdio.h>
>
> #define MAX(A, B) ((A) > (B) ? (A) : (B))
>
> long long __attribute__((noinline, noipa))
> foo6(long long a, long long b, int c)
> {
>   int xa = a;
>   int xb = b;
>   return MAX(MAX(xa, xb), c);
> }
> int main() {
>   long long a = 0x200000000ll;
>   long long b = 0x1ffffffffl;
>   int c = 10;
>   long long d = foo6(a, b, c);
>   printf ("%lld %lld %d = %lld\n", a, b, c, d);
>   return 0;
> }
>
> On Fri, Nov 12, 2021 at 12:27 AM Kito Cheng <kito.cheng@gmail.com> wrote:
> >
> > IIRC it's not work even without sign extend pattern since I did similar experimental before (not for RISC-V, but same concept), I guess I need more time to test that.
> >
> > Philipp Tomsich <philipp.tomsich@vrull.eu> 於 2021年11月12日 週五 00:18 寫道:
> >>
> >> Kito,
> >>
> >> Unless I am missing something, the problem is not the relaxation to
> >> GPR, but rather the sign-extending pattern I had squashed into the
> >> same patch.
> >> If you disable "<bitmanip_optab>si3_sext", a sext.w will be have to be
> >> emitted after the 'max' and before the return (or before the SImode
> >> output is consumed as a DImode), pushing the REE opportunity to a
> >> subsequent consumer (e.g. an addw).
> >>
> >> This will generate
> >>    foo6:
> >>       max a0,a0,a1
> >>       sext.w a0,a0
> >>       ret
> >> which (assuming that the inputs to max are properly sign-extended
> >> SImode values living in DImode registers) will be the same as
> >> performing the two sext.w before the max.
> >>
> >> Having a second set of eyes on this is appreciated — let me know if
> >> you agree and I'll revise, once I have collected feedback on the
> >> remaining patches of the series.
> >>
> >> Philipp.
> >>
> >>
> >> On Thu, 11 Nov 2021 at 17:00, Kito Cheng <kito.cheng@gmail.com> wrote:
> >> >
> >> > Hi Philipp:
> >> >
> >> > We can't pretend we have SImode min/max instruction without that semantic.
> >> > Give this testcase, x86 and rv64gc print out 8589934592 8589934591 = 0,
> >> > but with this patch and compile with rv64gc_zba_zbb -O3, the output
> >> > become 8589934592 8589934591 = 8589934592
> >> >
> >> > -------------Testcase---------------
> >> > #include <stdio.h>
> >> > long long __attribute__((noinline, noipa))
> >> > foo6(long long a, long long b)
> >> > {
> >> >   int xa = a;
> >> >   int xb = b;
> >> >   return (xa > xb ? xa : xb);
> >> > }
> >> > int main() {
> >> >   long long a = 0x200000000ll;
> >> >   long long b = 0x1ffffffffl;
> >> >   long long c = foo6(a, b);
> >> >   printf ("%lld %lld = %lld\n", a, b, c);
> >> >   return 0;
> >> > }
> >> > --------------------------------------
> >> > v64gc_zba_zbb -O3 w/o this patch:
> >> > foo6:
> >> >         sext.w  a1,a1
> >> >         sext.w  a0,a0
> >> >         max     a0,a0,a1
> >> >         ret
> >> >
> >> > --------------------------------------
> >> > v64gc_zba_zbb -O3 w/ this patch:
> >> > foo6:
> >> >         max     a0,a0,a1
> >> >         ret
> >> >
> >> > On Thu, Nov 11, 2021 at 10:10 PM Philipp Tomsich
> >> > <philipp.tomsich@vrull.eu> wrote:
> >> > >
> >> > > While min/minu/max/maxu instructions are provided for XLEN only, these
> >> > > can safely operate on GPRs (i.e. SImode or DImode for RV64): SImode is
> >> > > always sign-extended, which ensures that the XLEN-wide instructions
> >> > > can be used for signed and unsigned comparisons on SImode yielding a
> >> > > correct ordering of value.
> >> > >
> >> > > This commit
> >> > >  - relaxes the minmax pattern to express for GPR (instead of X only),
> >> > >    providing both a si3 and di3 expansion on RV64
> >> > >  - adds a sign-extending form for thee si3 pattern for RV64 to all REE
> >> > >    to eliminate redundant extensions
> >> > >  - adds test-cases for both
> >> > >
> >> > > gcc/ChangeLog:
> >> > >
> >> > >         * config/riscv/bitmanip.md: Relax minmax to GPR (i.e SImode or
> >> > >           DImode) on RV64.
> >> > >         * config/riscv/bitmanip.md (<bitmanip_optab>si3_sext): Add
> >> > >           pattern for REE.
> >> > >
> >> > > gcc/testsuite/ChangeLog:
> >> > >
> >> > >         * gcc.target/riscv/zbb-min-max.c: Add testcases for SImode
> >> > >           operands checking that no redundant sign- or zero-extensions
> >> > >           are emitted.
> >> > >
> >> > > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> >> > > ---
> >> > >
> >> > >  gcc/config/riscv/bitmanip.md                 | 14 +++++++++++---
> >> > >  gcc/testsuite/gcc.target/riscv/zbb-min-max.c | 20 +++++++++++++++++---
> >> > >  2 files changed, 28 insertions(+), 6 deletions(-)
> >> > >
> >> > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> >> > > index 000deb48b16..2a28f78f5f6 100644
> >> > > --- a/gcc/config/riscv/bitmanip.md
> >> > > +++ b/gcc/config/riscv/bitmanip.md
> >> > > @@ -260,13 +260,21 @@ (define_insn "bswap<mode>2"
> >> > >    [(set_attr "type" "bitmanip")])
> >> > >
> >> > >  (define_insn "<bitmanip_optab><mode>3"
> >> > > -  [(set (match_operand:X 0 "register_operand" "=r")
> >> > > -        (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
> >> > > -                          (match_operand:X 2 "register_operand" "r")))]
> >> > > +  [(set (match_operand:GPR 0 "register_operand" "=r")
> >> > > +        (bitmanip_minmax:GPR (match_operand:GPR 1 "register_operand" "r")
> >> > > +                            (match_operand:GPR 2 "register_operand" "r")))]
> >> > >    "TARGET_ZBB"
> >> > >    "<bitmanip_insn>\t%0,%1,%2"
> >> > >    [(set_attr "type" "bitmanip")])
> >> > >
> >> > > +(define_insn "<bitmanip_optab>si3_sext"
> >> > > +  [(set (match_operand:DI 0 "register_operand" "=r")
> >> > > +        (sign_extend:DI (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> >> > > +                            (match_operand:SI 2 "register_operand" "r"))))]
> >> > > +  "TARGET_64BIT && TARGET_ZBB"
> >> > > +  "<bitmanip_insn>\t%0,%1,%2"
> >> > > +  [(set_attr "type" "bitmanip")])
> >> > > +
> >> > >  ;; orc.b (or-combine) is added as an unspec for the benefit of the support
> >> > >  ;; for optimized string functions (such as strcmp).
> >> > >  (define_insn "orcb<mode>2"
> >> > > diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> >> > > index f44c398ea08..7169e873551 100644
> >> > > --- a/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> >> > > +++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max.c
> >> > > @@ -1,5 +1,5 @@
> >> > >  /* { dg-do compile } */
> >> > > -/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
> >> > > +/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64 -O2" } */
> >> > >
> >> > >  long
> >> > >  foo1 (long i, long j)
> >> > > @@ -25,7 +25,21 @@ foo4 (unsigned long i, unsigned long j)
> >> > >    return i > j ? i : j;
> >> > >  }
> >> > >
> >> > > +unsigned int
> >> > > +foo5(unsigned int a, unsigned int b)
> >> > > +{
> >> > > +  return a > b ? a : b;
> >> > > +}
> >> > > +
> >> > > +int
> >> > > +foo6(int a, int b)
> >> > > +{
> >> > > +  return a > b ? a : b;
> >> > > +}
> >> > > +
> >> > >  /* { dg-final { scan-assembler-times "min" 3 } } */
> >> > > -/* { dg-final { scan-assembler-times "max" 3 } } */
> >> > > +/* { dg-final { scan-assembler-times "max" 4 } } */
> >> > >  /* { dg-final { scan-assembler-times "minu" 1 } } */
> >> > > -/* { dg-final { scan-assembler-times "maxu" 1 } } */
> >> > > +/* { dg-final { scan-assembler-times "maxu" 3 } } */
> >> > > +/* { dg-final { scan-assembler-not "zext.w" } } */
> >> > > +/* { dg-final { scan-assembler-not "sext.w" } } */
> >> > > --
> >> > > 2.32.0
> >> > >

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode
  2021-11-11 14:10 ` [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode Philipp Tomsich
@ 2021-11-17 14:51   ` Kito Cheng
  2021-11-19 10:20   ` Richard Biener
  1 sibling, 0 replies; 19+ messages in thread
From: Kito Cheng @ 2021-11-17 14:51 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches, wilson

Hi Philipp:

I would suggest add define_expand pattern for bswaphi2 rather than
changing expand_unop with following reasons:

- There is a comment above this change, and it also tried widen_bswap
after this if-block,
  so I think this patch is kind of violating this comment.
     /* HImode is special because in this mode BSWAP is equivalent to ROTATE
        or ROTATERT.  First try these directly; if this fails, then try the
        obvious pair of shifts with allowed widening, as this will probably
        be always more efficient than the other fallback methods.  */

- This change doesn't improve the code gen without bswapsi2 or bswapdi2,
  (e.g. rv64gc result same code) and this also might also affect other targets,
  but we didn't have evidence it will always get better results, so I guess at
  least we should add a target hook for this.

- ...I didn't have permission to approve this change since it's not
part of RISC-V back-end :p

On Thu, Nov 11, 2021 at 10:10 PM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:
>
> The RISC-V Zbb extension adds an XLEN (i.e. SImode for rv32, DImode
> for rv64) bswap instruction (rev8).  While, with the current master,
> SImode is synthesized correctly from DImode, HImode is not.
>
> This change adds an appropriate expansion for a HImode bswap, if a
> wider bswap is available.
>
> Without this change, the following rv64gc_zbb code is generated for
> __builtin_bswap16():
>         slliw   a5,a0,8
>         zext.h  a0,a0
>         srliw   a0,a0,8
>         or      a0,a5,a0
>         sext.h  a0,a0      // this is a 16bit sign-extension following
>                            // the byteswap (e.g. on a 'short' function
>                            // return).
>
> After this change, a bswap (rev8) is used and any extensions are
> combined into the shift-right:
>         rev8    a0,a0
>         srai    a0,a0,48   // the sign-extension is combined into the
>                            // shift; a srli is emitted otherwise...
>
> gcc/ChangeLog:
>
>         * optabs.c (expand_unop): support expanding a HImode bswap
>           using SImode or DImode, followed by a shift.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/zbb-bswap.c: New test.
>
> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> ---
>
>  gcc/optabs.c                               |  6 ++++++
>  gcc/testsuite/gcc.target/riscv/zbb-bswap.c | 22 ++++++++++++++++++++++
>  2 files changed, 28 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c
>
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 019bbb62882..7a3ffbe4525 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -3307,6 +3307,12 @@ expand_unop (machine_mode mode, optab unoptab, rtx op0, rtx target,
>                 return temp;
>             }
>
> +         /* If we are missing a HImode BSWAP, but have one for SImode or
> +            DImode, use a BSWAP followed by a SHIFT.  */
> +         temp = widen_bswap (as_a <scalar_int_mode> (mode), op0, target);
> +         if (temp)
> +           return temp;
> +
>           last = get_last_insn ();
>
>           temp1 = expand_binop (mode, ashl_optab, op0,
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-bswap.c b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
> new file mode 100644
> index 00000000000..6ee27d9f47a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
> +
> +unsigned long
> +func64 (unsigned long i)
> +{
> +  return __builtin_bswap64(i);
> +}
> +
> +unsigned int
> +func32 (unsigned int i)
> +{
> +  return __builtin_bswap32(i);
> +}
> +
> +unsigned short
> +func16 (unsigned short i)
> +{
> +  return __builtin_bswap16(i);
> +}
> +
> +/* { dg-final { scan-assembler-times "rev8" 3 } } */
> --
> 2.32.0
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 2/8] RISC-V: costs: handle BSWAP
  2021-11-11 14:10 ` [PATCH v1 2/8] RISC-V: costs: handle BSWAP Philipp Tomsich
@ 2021-11-17 14:52   ` Kito Cheng
  0 siblings, 0 replies; 19+ messages in thread
From: Kito Cheng @ 2021-11-17 14:52 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches, wilson

> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index c77b0322869..8480cf09294 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -2131,6 +2131,14 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
>        *total = riscv_extend_cost (XEXP (x, 0), GET_CODE (x) == ZERO_EXTEND);
>        return false;
>
> +    case BSWAP:
> +      if (TARGET_ZBB)
> +       {
> +         *total = COSTS_N_INSNS (1);

Add a cost model for HImode? maybe `*total = COSTS_N_INSNS (mode ==
HImode ? 2 : 1);` ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 6/8] RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ? 0 : -1"
  2021-11-11 14:10 ` [PATCH v1 6/8] RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ? 0 : -1" Philipp Tomsich
@ 2021-11-18  9:45   ` Kito Cheng
  0 siblings, 0 replies; 19+ messages in thread
From: Kito Cheng @ 2021-11-18  9:45 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches, wilson

> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -367,3 +367,16 @@ (define_insn "*bexti"
>    "TARGET_ZBS"
>    "bexti\t%0,%1,%2"
>    [(set_attr "type" "bitmanip")])
> +
> +;; We can create a polarity-reversed mask (i.e. bit N -> { set = 0, clear = -1 })
> +;; using a bext(i) followed by an addi instruction.
> +;; This splits the canonical representation of "(a & (1 << BIT_NO)) ? 0 : -1".
> +(define_split
> +  [(set (match_operand:GPR 0 "register_operand")
> +       (neg:GPR (eq:GPR (zero_extract:GPR (match_operand:GPR 1 "register_operand")
> +                                          (const_int 1)
> +                                          (match_operand 2))
> +                        (const_int 0))))]
> +  "TARGET_ZBB"

Should be TARGET_ZBS?

> +  [(set (match_dup 0) (zero_extract:GPR (match_dup 1) (const_int 1) (match_dup 2)))
> +   (set (match_dup 0) (plus:GPR (match_dup 0) (const_int -1)))])
> diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bexti.c b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
> new file mode 100644
> index 00000000000..d02c3f7a98d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbs-bexti.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbs -mabi=lp64 -O2" } */
> +
> +/* bexti */
> +#define BIT_NO  27
> +
> +long
> +foo0 (long a)
> +{
> +  return (a & (1 << BIT_NO)) ? 0 : -1;

I got the same code gen for rv64gc_zbs both w/ and w/o this patch,
but got better code gen when I changed BIT_NO to 4,
so I guess we should use 4 rather than 27 for demonstrating this patch?

long
foo2 (long a)
{
 return (a & (1 << 4)) ? 0 : -1;
}

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode
  2021-11-11 14:10 ` [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode Philipp Tomsich
  2021-11-17 14:51   ` Kito Cheng
@ 2021-11-19 10:20   ` Richard Biener
  2021-11-19 10:21     ` Richard Biener
  1 sibling, 1 reply; 19+ messages in thread
From: Richard Biener @ 2021-11-19 10:20 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches, Jim Wilson, Kito Cheng

On Thu, Nov 11, 2021 at 3:13 PM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:
>
> The RISC-V Zbb extension adds an XLEN (i.e. SImode for rv32, DImode
> for rv64) bswap instruction (rev8).  While, with the current master,
> SImode is synthesized correctly from DImode, HImode is not.
>
> This change adds an appropriate expansion for a HImode bswap, if a
> wider bswap is available.
>
> Without this change, the following rv64gc_zbb code is generated for
> __builtin_bswap16():
>         slliw   a5,a0,8
>         zext.h  a0,a0
>         srliw   a0,a0,8
>         or      a0,a5,a0
>         sext.h  a0,a0      // this is a 16bit sign-extension following
>                            // the byteswap (e.g. on a 'short' function
>                            // return).
>
> After this change, a bswap (rev8) is used and any extensions are
> combined into the shift-right:
>         rev8    a0,a0
>         srai    a0,a0,48   // the sign-extension is combined into the
>                            // shift; a srli is emitted otherwise...
>
> gcc/ChangeLog:
>
>         * optabs.c (expand_unop): support expanding a HImode bswap
>           using SImode or DImode, followed by a shift.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/zbb-bswap.c: New test.
>
> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> ---
>
>  gcc/optabs.c                               |  6 ++++++
>  gcc/testsuite/gcc.target/riscv/zbb-bswap.c | 22 ++++++++++++++++++++++
>  2 files changed, 28 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c
>
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 019bbb62882..7a3ffbe4525 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -3307,6 +3307,12 @@ expand_unop (machine_mode mode, optab unoptab, rtx op0, rtx target,
>                 return temp;
>             }
>
> +         /* If we are missing a HImode BSWAP, but have one for SImode or
> +            DImode, use a BSWAP followed by a SHIFT.  */
> +         temp = widen_bswap (as_a <scalar_int_mode> (mode), op0, target);
> +         if (temp)
> +           return temp;
> +

I think it would be more natural to temporarily terminate the HImode case here
and re-open it inside the following

      if (is_a <scalar_int_mode> (mode, &int_mode))
        {
          temp = widen_bswap (int_mode, op0, target);
          if (temp)
            return temp;

here to handle the ashl/lshr/ior fallback.

Richard.

>           last = get_last_insn ();
>
>           temp1 = expand_binop (mode, ashl_optab, op0,
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-bswap.c b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
> new file mode 100644
> index 00000000000..6ee27d9f47a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
> +
> +unsigned long
> +func64 (unsigned long i)
> +{
> +  return __builtin_bswap64(i);
> +}
> +
> +unsigned int
> +func32 (unsigned int i)
> +{
> +  return __builtin_bswap32(i);
> +}
> +
> +unsigned short
> +func16 (unsigned short i)
> +{
> +  return __builtin_bswap16(i);
> +}
> +
> +/* { dg-final { scan-assembler-times "rev8" 3 } } */
> --
> 2.32.0
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode
  2021-11-19 10:20   ` Richard Biener
@ 2021-11-19 10:21     ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2021-11-19 10:21 UTC (permalink / raw)
  To: Philipp Tomsich; +Cc: GCC Patches, Jim Wilson, Kito Cheng

On Fri, Nov 19, 2021 at 11:20 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Nov 11, 2021 at 3:13 PM Philipp Tomsich
> <philipp.tomsich@vrull.eu> wrote:
> >
> > The RISC-V Zbb extension adds an XLEN (i.e. SImode for rv32, DImode
> > for rv64) bswap instruction (rev8).  While, with the current master,
> > SImode is synthesized correctly from DImode, HImode is not.
> >
> > This change adds an appropriate expansion for a HImode bswap, if a
> > wider bswap is available.
> >
> > Without this change, the following rv64gc_zbb code is generated for
> > __builtin_bswap16():
> >         slliw   a5,a0,8
> >         zext.h  a0,a0
> >         srliw   a0,a0,8
> >         or      a0,a5,a0
> >         sext.h  a0,a0      // this is a 16bit sign-extension following
> >                            // the byteswap (e.g. on a 'short' function
> >                            // return).
> >
> > After this change, a bswap (rev8) is used and any extensions are
> > combined into the shift-right:
> >         rev8    a0,a0
> >         srai    a0,a0,48   // the sign-extension is combined into the
> >                            // shift; a srli is emitted otherwise...
> >
> > gcc/ChangeLog:
> >
> >         * optabs.c (expand_unop): support expanding a HImode bswap
> >           using SImode or DImode, followed by a shift.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/zbb-bswap.c: New test.
> >
> > Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
> > ---
> >
> >  gcc/optabs.c                               |  6 ++++++
> >  gcc/testsuite/gcc.target/riscv/zbb-bswap.c | 22 ++++++++++++++++++++++
> >  2 files changed, 28 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-bswap.c
> >
> > diff --git a/gcc/optabs.c b/gcc/optabs.c
> > index 019bbb62882..7a3ffbe4525 100644
> > --- a/gcc/optabs.c
> > +++ b/gcc/optabs.c
> > @@ -3307,6 +3307,12 @@ expand_unop (machine_mode mode, optab unoptab, rtx op0, rtx target,
> >                 return temp;
> >             }
> >
> > +         /* If we are missing a HImode BSWAP, but have one for SImode or
> > +            DImode, use a BSWAP followed by a SHIFT.  */
> > +         temp = widen_bswap (as_a <scalar_int_mode> (mode), op0, target);
> > +         if (temp)
> > +           return temp;
> > +
>
> I think it would be more natural to temporarily terminate the HImode case here
> and re-open it inside the following
>
>       if (is_a <scalar_int_mode> (mode, &int_mode))
>         {
>           temp = widen_bswap (int_mode, op0, target);
>           if (temp)
>             return temp;
>
> here to handle the ashl/lshr/ior fallback.

But as Kito says, code generation for more targets would need to be looked at.

Richard.

> Richard.
>
> >           last = get_last_insn ();
> >
> >           temp1 = expand_binop (mode, ashl_optab, op0,
> > diff --git a/gcc/testsuite/gcc.target/riscv/zbb-bswap.c b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
> > new file mode 100644
> > index 00000000000..6ee27d9f47a
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/zbb-bswap.c
> > @@ -0,0 +1,22 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc_zbb -mabi=lp64 -O2" } */
> > +
> > +unsigned long
> > +func64 (unsigned long i)
> > +{
> > +  return __builtin_bswap64(i);
> > +}
> > +
> > +unsigned int
> > +func32 (unsigned int i)
> > +{
> > +  return __builtin_bswap32(i);
> > +}
> > +
> > +unsigned short
> > +func16 (unsigned short i)
> > +{
> > +  return __builtin_bswap16(i);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "rev8" 3 } } */
> > --
> > 2.32.0
> >

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-11-19 10:22 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-11 14:10 [PATCH v1 0/8] Improvements to bitmanip-1.0 (Zb[abcs]) support Philipp Tomsich
2021-11-11 14:10 ` [PATCH v1 1/8] bswap: synthesize HImode bswap from SImode or DImode Philipp Tomsich
2021-11-17 14:51   ` Kito Cheng
2021-11-19 10:20   ` Richard Biener
2021-11-19 10:21     ` Richard Biener
2021-11-11 14:10 ` [PATCH v1 2/8] RISC-V: costs: handle BSWAP Philipp Tomsich
2021-11-17 14:52   ` Kito Cheng
2021-11-11 14:10 ` [PATCH v1 3/8] RISC-V: costs: support shift-and-add in strength-reduction Philipp Tomsich
2021-11-11 14:10 ` [PATCH v1 4/8] RISC-V: bitmanip: fix constant-loading for (1ULL << 31) in DImode Philipp Tomsich
2021-11-11 14:10 ` [PATCH v1 5/8] RISC-V: bitmanip: improvements to rotate instructions Philipp Tomsich
2021-11-11 14:10 ` [PATCH v1 6/8] RISC-V: bitmanip: add splitter to use bexti for "(a & (1 << BIT_NO)) ? 0 : -1" Philipp Tomsich
2021-11-18  9:45   ` Kito Cheng
2021-11-11 14:10 ` [PATCH v1 7/8] RISC-V: bitmanip: add orc.b as an unspec Philipp Tomsich
2021-11-11 14:10 ` [PATCH v1 8/8] RISC-V: bitmanip: relax minmax to operate on GPR Philipp Tomsich
2021-11-11 16:00   ` Kito Cheng
2021-11-11 16:18     ` Philipp Tomsich
2021-11-11 16:27       ` Kito Cheng
2021-11-11 16:42         ` Kito Cheng
2021-11-11 18:33           ` Philipp Tomsich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).