public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU,  SLE, SLEU
@ 2022-08-03  9:54 Maciej W. Rozycki
  2022-08-11  3:26 ` Kito Cheng
  0 siblings, 1 reply; 10+ messages in thread
From: Maciej W. Rozycki @ 2022-08-03  9:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Andrew Waterman, Jim Wilson, Kito Cheng, Palmer Dabbelt

We produce inefficient code for some synthesized SImode conditional set 
operations (i.e. ones that are not directly implemented in hardware) on 
RV64.  For example a piece of C code like this:

int
sleu (unsigned int x, unsigned int y)
{
  return x <= y;
}

gets compiled (at `-O2') to this:

sleu:
	sgtu	a0,a0,a1	# 9	[c=4 l=4]  *sgtu_disi
	xori	a0,a0,1		# 10	[c=4 l=4]  *xorsi3_internal/1
	sext.w	a0,a0		# 16	[c=4 l=4]  extendsidi2/0
	ret			# 25	[c=0 l=4]  simple_return

This is because the middle end expands a SLEU operation missing from 
RISC-V hardware into a sequence of a SImode SGTU operation followed by 
an explicit SImode XORI operation with immediate 1.  And while the SGTU 
machine instruction (alias SLTU with the input operands swapped) gives a 
properly sign-extended 32-bit result which is valid both as a SImode or 
a DImode operand the middle end does not see that through a SImode XORI 
operation, because we tell the middle end that the RISC-V target (unlike 
MIPS) may hold values in DImode integer registers that are valid for 
SImode operations even if not properly sign-extended.

However the RISC-V psABI requires that 32-bit function arguments and 
results passed in 64-bit integer registers be properly sign-extended, so 
this is explicitly done at the conclusion of the function.

Fix this by making the backend use a sequence of a DImode SGTU operation 
followed by a SImode SEQZ operation instead.  The latter operation is 
known by the middle end to produce a properly sign-extended 32-bit 
result and therefore combine gets rid of the sign-extension operation 
that follows and actually folds it into the very same XORI machine 
operation resulting in:

sleu:
	sgtu	a0,a0,a1	# 9	[c=4 l=4]  *sgtu_didi
	xori	a0,a0,1		# 16	[c=4 l=4]  xordi3/1
	ret			# 25	[c=0 l=4]  simple_return

instead (although the SEQZ alias SLTIU against immediate 1 machine 
instruction would equally do and is actually retained at `-O0').  This 
is handled analogously for the remaining synthesized operations of this 
kind, i.e. `SLE', `SGEU', and `SGE'.

	gcc/
	* config/riscv/riscv.cc (riscv_emit_int_order_test): Use EQ 0 
	rather that XOR 1 for LE and LEU operations.

	gcc/testsuite/
	* gcc.target/riscv/sge.c: New test.
	* gcc.target/riscv/sgeu.c: New test.
	* gcc.target/riscv/sle.c: New test.
	* gcc.target/riscv/sleu.c: New test.
---
Hi,

 Regression-tested with the `riscv64-linux-gnu' target.  OK to apply?

  Maciej
---
 gcc/config/riscv/riscv.cc             |    4 ++--
 gcc/testsuite/gcc.target/riscv/sge.c  |   11 +++++++++++
 gcc/testsuite/gcc.target/riscv/sgeu.c |   11 +++++++++++
 gcc/testsuite/gcc.target/riscv/sle.c  |   11 +++++++++++
 gcc/testsuite/gcc.target/riscv/sleu.c |   11 +++++++++++
 5 files changed, 46 insertions(+), 2 deletions(-)

gcc-riscv-int-order-inv-seqz.diff
Index: gcc/gcc/config/riscv/riscv.cc
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -2500,9 +2500,9 @@ riscv_emit_int_order_test (enum rtx_code
 	}
       else if (invert_ptr == 0)
 	{
-	  rtx inv_target = riscv_force_binary (GET_MODE (target),
+	  rtx inv_target = riscv_force_binary (word_mode,
 					       inv_code, cmp0, cmp1);
-	  riscv_emit_binary (XOR, target, inv_target, const1_rtx);
+	  riscv_emit_binary (EQ, target, inv_target, const0_rtx);
 	}
       else
 	{
Index: gcc/gcc/testsuite/gcc.target/riscv/sge.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/sge.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int
+sge (int x, int y)
+{
+  return x >= y;
+}
+
+/* { dg-final { scan-assembler-not "sext\\.w" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/sgeu.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/sgeu.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int
+sgeu (unsigned int x, unsigned int y)
+{
+  return x >= y;
+}
+
+/* { dg-final { scan-assembler-not "sext\\.w" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/sle.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/sle.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int
+sle (int x, int y)
+{
+  return x <= y;
+}
+
+/* { dg-final { scan-assembler-not "sext\\.w" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/sleu.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/sleu.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int
+sleu (unsigned int x, unsigned int y)
+{
+  return x <= y;
+}
+
+/* { dg-final { scan-assembler-not "sext\\.w" } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU
  2022-08-03  9:54 [PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU Maciej W. Rozycki
@ 2022-08-11  3:26 ` Kito Cheng
  2022-08-12 22:01   ` Maciej W. Rozycki
  0 siblings, 1 reply; 10+ messages in thread
From: Kito Cheng @ 2022-08-11  3:26 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: GCC Patches, Andrew Waterman

LGTM, but with a nit, I don't get set.w but get an andi like below, so
maybe we should also scan-assembler-not andi? feel free to commit that
directly with that fix

```asm
sleu:
       sgtu    a0,a0,a1        # 9     [c=4 l=4]  *sgtu_disi
       xori    a0,a0,1 # 10    [c=4 l=4]  *xorsi3_internal/1
       andi    a0,a0,1 # 16    [c=4 l=4]  anddi3/1
       ret             # 25    [c=0 l=4]  simple_return
```

On Wed, Aug 3, 2022 at 5:54 PM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> We produce inefficient code for some synthesized SImode conditional set
> operations (i.e. ones that are not directly implemented in hardware) on
> RV64.  For example a piece of C code like this:
>
> int
> sleu (unsigned int x, unsigned int y)
> {
>   return x <= y;
> }
>
> gets compiled (at `-O2') to this:
>
> sleu:
>         sgtu    a0,a0,a1        # 9     [c=4 l=4]  *sgtu_disi
>         xori    a0,a0,1         # 10    [c=4 l=4]  *xorsi3_internal/1
>         sext.w  a0,a0           # 16    [c=4 l=4]  extendsidi2/0
>         ret                     # 25    [c=0 l=4]  simple_return
>
> This is because the middle end expands a SLEU operation missing from
> RISC-V hardware into a sequence of a SImode SGTU operation followed by
> an explicit SImode XORI operation with immediate 1.  And while the SGTU
> machine instruction (alias SLTU with the input operands swapped) gives a
> properly sign-extended 32-bit result which is valid both as a SImode or
> a DImode operand the middle end does not see that through a SImode XORI
> operation, because we tell the middle end that the RISC-V target (unlike
> MIPS) may hold values in DImode integer registers that are valid for
> SImode operations even if not properly sign-extended.
>
> However the RISC-V psABI requires that 32-bit function arguments and
> results passed in 64-bit integer registers be properly sign-extended, so
> this is explicitly done at the conclusion of the function.
>
> Fix this by making the backend use a sequence of a DImode SGTU operation
> followed by a SImode SEQZ operation instead.  The latter operation is
> known by the middle end to produce a properly sign-extended 32-bit
> result and therefore combine gets rid of the sign-extension operation
> that follows and actually folds it into the very same XORI machine
> operation resulting in:
>
> sleu:
>         sgtu    a0,a0,a1        # 9     [c=4 l=4]  *sgtu_didi
>         xori    a0,a0,1         # 16    [c=4 l=4]  xordi3/1
>         ret                     # 25    [c=0 l=4]  simple_return
>
> instead (although the SEQZ alias SLTIU against immediate 1 machine
> instruction would equally do and is actually retained at `-O0').  This
> is handled analogously for the remaining synthesized operations of this
> kind, i.e. `SLE', `SGEU', and `SGE'.
>
>         gcc/
>         * config/riscv/riscv.cc (riscv_emit_int_order_test): Use EQ 0
>         rather that XOR 1 for LE and LEU operations.
>
>         gcc/testsuite/
>         * gcc.target/riscv/sge.c: New test.
>         * gcc.target/riscv/sgeu.c: New test.
>         * gcc.target/riscv/sle.c: New test.
>         * gcc.target/riscv/sleu.c: New test.
> ---
> Hi,
>
>  Regression-tested with the `riscv64-linux-gnu' target.  OK to apply?
>
>   Maciej
> ---
>  gcc/config/riscv/riscv.cc             |    4 ++--
>  gcc/testsuite/gcc.target/riscv/sge.c  |   11 +++++++++++
>  gcc/testsuite/gcc.target/riscv/sgeu.c |   11 +++++++++++
>  gcc/testsuite/gcc.target/riscv/sle.c  |   11 +++++++++++
>  gcc/testsuite/gcc.target/riscv/sleu.c |   11 +++++++++++
>  5 files changed, 46 insertions(+), 2 deletions(-)
>
> gcc-riscv-int-order-inv-seqz.diff
> Index: gcc/gcc/config/riscv/riscv.cc
> ===================================================================
> --- gcc.orig/gcc/config/riscv/riscv.cc
> +++ gcc/gcc/config/riscv/riscv.cc
> @@ -2500,9 +2500,9 @@ riscv_emit_int_order_test (enum rtx_code
>         }
>        else if (invert_ptr == 0)
>         {
> -         rtx inv_target = riscv_force_binary (GET_MODE (target),
> +         rtx inv_target = riscv_force_binary (word_mode,
>                                                inv_code, cmp0, cmp1);
> -         riscv_emit_binary (XOR, target, inv_target, const1_rtx);
> +         riscv_emit_binary (EQ, target, inv_target, const0_rtx);
>         }
>        else
>         {
> Index: gcc/gcc/testsuite/gcc.target/riscv/sge.c
> ===================================================================
> --- /dev/null
> +++ gcc/gcc/testsuite/gcc.target/riscv/sge.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target rv64 } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +int
> +sge (int x, int y)
> +{
> +  return x >= y;
> +}
> +
> +/* { dg-final { scan-assembler-not "sext\\.w" } } */
> Index: gcc/gcc/testsuite/gcc.target/riscv/sgeu.c
> ===================================================================
> --- /dev/null
> +++ gcc/gcc/testsuite/gcc.target/riscv/sgeu.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target rv64 } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +int
> +sgeu (unsigned int x, unsigned int y)
> +{
> +  return x >= y;
> +}
> +
> +/* { dg-final { scan-assembler-not "sext\\.w" } } */
> Index: gcc/gcc/testsuite/gcc.target/riscv/sle.c
> ===================================================================
> --- /dev/null
> +++ gcc/gcc/testsuite/gcc.target/riscv/sle.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target rv64 } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +int
> +sle (int x, int y)
> +{
> +  return x <= y;
> +}
> +
> +/* { dg-final { scan-assembler-not "sext\\.w" } } */
> Index: gcc/gcc/testsuite/gcc.target/riscv/sleu.c
> ===================================================================
> --- /dev/null
> +++ gcc/gcc/testsuite/gcc.target/riscv/sleu.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target rv64 } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +int
> +sleu (unsigned int x, unsigned int y)
> +{
> +  return x <= y;
> +}
> +
> +/* { dg-final { scan-assembler-not "sext\\.w" } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU
  2022-08-11  3:26 ` Kito Cheng
@ 2022-08-12 22:01   ` Maciej W. Rozycki
  2022-11-25 14:07     ` [PING][PATCH] " Maciej W. Rozycki
  0 siblings, 1 reply; 10+ messages in thread
From: Maciej W. Rozycki @ 2022-08-12 22:01 UTC (permalink / raw)
  To: Kito Cheng; +Cc: GCC Patches, Andrew Waterman

On Thu, 11 Aug 2022, Kito Cheng wrote:

> LGTM, but with a nit, I don't get set.w but get an andi like below, so
> maybe we should also scan-assembler-not andi? feel free to commit that
> directly with that fix
> 
> ```asm
> sleu:
>        sgtu    a0,a0,a1        # 9     [c=4 l=4]  *sgtu_disi
>        xori    a0,a0,1 # 10    [c=4 l=4]  *xorsi3_internal/1
>        andi    a0,a0,1 # 16    [c=4 l=4]  anddi3/1
>        ret             # 25    [c=0 l=4]  simple_return
> ```

 Interesting.  I can do that, but can you please share the compilation 
options, given or defaulted (from `--with...' configuration options), this 
happens with?

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PING][PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU
  2022-08-12 22:01   ` Maciej W. Rozycki
@ 2022-11-25 14:07     ` Maciej W. Rozycki
  2022-11-28 14:50       ` Jeff Law
  0 siblings, 1 reply; 10+ messages in thread
From: Maciej W. Rozycki @ 2022-11-25 14:07 UTC (permalink / raw)
  To: Kito Cheng; +Cc: GCC Patches, Andrew Waterman

Hi Kito,

On Fri, 12 Aug 2022, Maciej W. Rozycki wrote:

> > LGTM, but with a nit, I don't get set.w but get an andi like below, so
> > maybe we should also scan-assembler-not andi? feel free to commit that
> > directly with that fix
> > 
> > ```asm
> > sleu:
> >        sgtu    a0,a0,a1        # 9     [c=4 l=4]  *sgtu_disi
> >        xori    a0,a0,1 # 10    [c=4 l=4]  *xorsi3_internal/1
> >        andi    a0,a0,1 # 16    [c=4 l=4]  anddi3/1
> >        ret             # 25    [c=0 l=4]  simple_return
> > ```
> 
>  Interesting.  I can do that, but can you please share the compilation 
> options, given or defaulted (from `--with...' configuration options), this 
> happens with?

 I have noticed it went nowhere.  Can you please check what compilation 
options lead to this discrepancy so that we can have the fix included in 
GCC 13?  I'd like to understand what's going on here.

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PING][PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU
  2022-11-25 14:07     ` [PING][PATCH] " Maciej W. Rozycki
@ 2022-11-28 14:50       ` Jeff Law
  2022-11-28 15:38         ` Maciej W. Rozycki
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Law @ 2022-11-28 14:50 UTC (permalink / raw)
  To: Maciej W. Rozycki, Kito Cheng; +Cc: GCC Patches, Andrew Waterman


On 11/25/22 07:07, Maciej W. Rozycki wrote:
> Hi Kito,
>
> On Fri, 12 Aug 2022, Maciej W. Rozycki wrote:
>
>>> LGTM, but with a nit, I don't get set.w but get an andi like below, so
>>> maybe we should also scan-assembler-not andi? feel free to commit that
>>> directly with that fix
>>>
>>> ```asm
>>> sleu:
>>>         sgtu    a0,a0,a1        # 9     [c=4 l=4]  *sgtu_disi
>>>         xori    a0,a0,1 # 10    [c=4 l=4]  *xorsi3_internal/1
>>>         andi    a0,a0,1 # 16    [c=4 l=4]  anddi3/1
>>>         ret             # 25    [c=0 l=4]  simple_return
>>> ```
>>   Interesting.  I can do that, but can you please share the compilation
>> options, given or defaulted (from `--with...' configuration options), this
>> happens with?
>   I have noticed it went nowhere.  Can you please check what compilation
> options lead to this discrepancy so that we can have the fix included in
> GCC 13?  I'd like to understand what's going on here.

FWIW, I don't see the redundant sign extension with this testcase at -O2 
on the trunk.  Is it possible the patch has been made redundant over the 
last few months?


Jeff


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PING][PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU
  2022-11-28 14:50       ` Jeff Law
@ 2022-11-28 15:38         ` Maciej W. Rozycki
  2022-11-28 16:15           ` Jeff Law
  0 siblings, 1 reply; 10+ messages in thread
From: Maciej W. Rozycki @ 2022-11-28 15:38 UTC (permalink / raw)
  To: Jeff Law; +Cc: Kito Cheng, GCC Patches, Andrew Waterman

On Mon, 28 Nov 2022, Jeff Law wrote:

> > > > LGTM, but with a nit, I don't get set.w but get an andi like below, so
> > > > maybe we should also scan-assembler-not andi? feel free to commit that
> > > > directly with that fix
> > > > 
> > > > ```asm
> > > > sleu:
> > > >         sgtu    a0,a0,a1        # 9     [c=4 l=4]  *sgtu_disi
> > > >         xori    a0,a0,1 # 10    [c=4 l=4]  *xorsi3_internal/1
> > > >         andi    a0,a0,1 # 16    [c=4 l=4]  anddi3/1
> > > >         ret             # 25    [c=0 l=4]  simple_return
> > > > ```
> > >   Interesting.  I can do that, but can you please share the compilation
> > > options, given or defaulted (from `--with...' configuration options), this
> > > happens with?
> >   I have noticed it went nowhere.  Can you please check what compilation
> > options lead to this discrepancy so that we can have the fix included in
> > GCC 13?  I'd like to understand what's going on here.
> 
> FWIW, I don't see the redundant sign extension with this testcase at -O2 on
> the trunk.  Is it possible the patch has been made redundant over the last few
> months?

 Maybe at -O2, but the test cases continue to fail in my configuration for 
other optimisation levels:

FAIL: gcc.target/riscv/sge.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sge.c  -Og -g   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sgeu.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sgeu.c  -Og -g   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sle.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sle.c  -Og -g   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sleu.c   -O1   scan-assembler-not sext\\.w
FAIL: gcc.target/riscv/sleu.c  -Og -g   scan-assembler-not sext\\.w

when applied on top of:

$ riscv64-linux-gnu-gcc --version
riscv64-linux-gnu-gcc (GCC) 13.0.0 20221128 (experimental)

Not anymore with the whole patch applied.

 Does it make sense to bisect the change that removed the pessimisation at 
-O2 to understand what is going on here?

 I think my change is worthwhile anyway: why to rely on the optimiser to 
get things sorted while we can produce the best code in the backend right 
away in the first place?

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PING][PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU
  2022-11-28 15:38         ` Maciej W. Rozycki
@ 2022-11-28 16:15           ` Jeff Law
  2022-11-28 17:44             ` [PATCH v2] " Maciej W. Rozycki
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Law @ 2022-11-28 16:15 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Kito Cheng, GCC Patches, Andrew Waterman


On 11/28/22 08:38, Maciej W. Rozycki wrote:
> On Mon, 28 Nov 2022, Jeff Law wrote:
>
>>>>> LGTM, but with a nit, I don't get set.w but get an andi like below, so
>>>>> maybe we should also scan-assembler-not andi? feel free to commit that
>>>>> directly with that fix
>>>>>
>>>>> ```asm
>>>>> sleu:
>>>>>          sgtu    a0,a0,a1        # 9     [c=4 l=4]  *sgtu_disi
>>>>>          xori    a0,a0,1 # 10    [c=4 l=4]  *xorsi3_internal/1
>>>>>          andi    a0,a0,1 # 16    [c=4 l=4]  anddi3/1
>>>>>          ret             # 25    [c=0 l=4]  simple_return
>>>>> ```
>>>>    Interesting.  I can do that, but can you please share the compilation
>>>> options, given or defaulted (from `--with...' configuration options), this
>>>> happens with?
>>>    I have noticed it went nowhere.  Can you please check what compilation
>>> options lead to this discrepancy so that we can have the fix included in
>>> GCC 13?  I'd like to understand what's going on here.
>> FWIW, I don't see the redundant sign extension with this testcase at -O2 on
>> the trunk.  Is it possible the patch has been made redundant over the last few
>> months?
>   Maybe at -O2, but the test cases continue to fail in my configuration for
> other optimisation levels:
>
> FAIL: gcc.target/riscv/sge.c   -O1   scan-assembler-not sext\\.w
> FAIL: gcc.target/riscv/sge.c  -Og -g   scan-assembler-not sext\\.w
> FAIL: gcc.target/riscv/sgeu.c   -O1   scan-assembler-not sext\\.w
> FAIL: gcc.target/riscv/sgeu.c  -Og -g   scan-assembler-not sext\\.w
> FAIL: gcc.target/riscv/sle.c   -O1   scan-assembler-not sext\\.w
> FAIL: gcc.target/riscv/sle.c  -Og -g   scan-assembler-not sext\\.w
> FAIL: gcc.target/riscv/sleu.c   -O1   scan-assembler-not sext\\.w
> FAIL: gcc.target/riscv/sleu.c  -Og -g   scan-assembler-not sext\\.w

I may have been running an rv32 toolchain...  So I'll start over and 
ensure that I'm running rv64 :-)


With the trunk, I get code like Kito (AND with 0x1 mask)


The key difference is Roger's patch:

commit c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Wed Aug 3 08:55:35 2022 +0100

     Some additional zero-extension related optimizations in simplify-rtx.

     This patch implements some additional zero-extension and sign-extension
     related optimizations in simplify-rtx.cc.  The original motivation 
comes
     from PR rtl-optimization/71775, where in comment #2 Andrew Pinksi sees:

     Failed to match this instruction:
     (set (reg:DI 88 [ _1 ])
         (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))

[ ... ]

With that patch the sign extension is removed and instead we generate 
the AND with 0x1.

Old, from combine dump:

   Successfully matched this instruction:
   (set (reg/i:DI 10 a0)
!     (sign_extend:DI (reg:SI 78)))


New, from combine dump:

   (set (reg/i:DI 10 a0)
!     (and:DI (subreg:DI (reg:SI 78) 0)
!         (const_int 1 [0x1])))

Note the date on Roger's patch, roughly the same time as yours. I 
suspect Kito had tested the truck with Roger's patch.


Your patch is probably still useful.  I think Kito's only concern was to 
make sure we don't have the ANDI instruction in addition to not having 
the SEXT instruction.  So still approved for trunk, just update the 
testcases to make sure we don't have the ANDI too.


jeff



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU
  2022-11-28 16:15           ` Jeff Law
@ 2022-11-28 17:44             ` Maciej W. Rozycki
  2022-11-28 18:07               ` Jeff Law
  0 siblings, 1 reply; 10+ messages in thread
From: Maciej W. Rozycki @ 2022-11-28 17:44 UTC (permalink / raw)
  To: Jeff Law; +Cc: Kito Cheng, GCC Patches, Andrew Waterman

We produce inefficient code for some synthesized SImode conditional set 
operations (i.e. ones that are not directly implemented in hardware) on 
RV64.  For example a piece of C code like this:

int
sleu (unsigned int x, unsigned int y)
{
  return x <= y;
}

gets compiled (at `-O2') to this:

sleu:
	sgtu	a0,a0,a1	# 9	[c=4 l=4]  *sgtu_disi
	xori	a0,a0,1		# 10	[c=4 l=4]  *xorsi3_internal/1
	andi	a0,a0,1		# 16	[c=4 l=4]  anddi3/1
	ret			# 25	[c=0 l=4]  simple_return

or (at `-O1') to this:

sleu:
	sgtu	a0,a0,a1	# 9	[c=4 l=4]  *sgtu_disi
	xori	a0,a0,1		# 10	[c=4 l=4]  *xorsi3_internal/1
	sext.w	a0,a0		# 16	[c=4 l=4]  extendsidi2/0
	ret			# 24	[c=0 l=4]  simple_return

This is because the middle end expands a SLEU operation missing from 
RISC-V hardware into a sequence of a SImode SGTU operation followed by 
an explicit SImode XORI operation with immediate 1.  And while the SGTU 
machine instruction (alias SLTU with the input operands swapped) gives a 
properly sign-extended 32-bit result which is valid both as a SImode or 
a DImode operand the middle end does not see that through a SImode XORI 
operation, because we tell the middle end that the RISC-V target (unlike 
MIPS) may hold values in DImode integer registers that are valid for 
SImode operations even if not properly sign-extended.

However the RISC-V psABI requires that 32-bit function arguments and 
results passed in 64-bit integer registers be properly sign-extended, so 
this is explicitly done at the conclusion of the function.

Fix this by making the backend use a sequence of a DImode SGTU operation 
followed by a SImode SEQZ operation instead.  The latter operation is 
known by the middle end to produce a properly sign-extended 32-bit 
result and therefore combine gets rid of the sign-extension operation 
that follows and actually folds it into the very same XORI machine 
operation resulting in:

sleu:
	sgtu	a0,a0,a1	# 9	[c=4 l=4]  *sgtu_didi
	xori	a0,a0,1		# 16	[c=4 l=4]  xordi3/1
	ret			# 25	[c=0 l=4]  simple_return

instead (although the SEQZ alias SLTIU against immediate 1 machine 
instruction would equally do and is actually retained at `-O0').  This 
is handled analogously for the remaining synthesized operations of this 
kind, i.e. `SLE', `SGEU', and `SGE'.

	gcc/
	* config/riscv/riscv.cc (riscv_emit_int_order_test): Use EQ 0 
	rather that XOR 1 for LE and LEU operations.

	gcc/testsuite/
	* gcc.target/riscv/sge.c: New test.
	* gcc.target/riscv/sgeu.c: New test.
	* gcc.target/riscv/sle.c: New test.
	* gcc.target/riscv/sleu.c: New test.
---
On Mon, 28 Nov 2022, Jeff Law wrote:

> > > >    I have noticed it went nowhere.  Can you please check what
> > > > compilation
> > > > options lead to this discrepancy so that we can have the fix included in
> > > > GCC 13?  I'd like to understand what's going on here.
> > > FWIW, I don't see the redundant sign extension with this testcase at -O2
> > > on
> > > the trunk.  Is it possible the patch has been made redundant over the last
> > > few
> > > months?
> >   Maybe at -O2, but the test cases continue to fail in my configuration for
> > other optimisation levels:
> > 
> > FAIL: gcc.target/riscv/sge.c   -O1   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sge.c  -Og -g   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sgeu.c   -O1   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sgeu.c  -Og -g   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sle.c   -O1   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sle.c  -Og -g   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sleu.c   -O1   scan-assembler-not sext\\.w
> > FAIL: gcc.target/riscv/sleu.c  -Og -g   scan-assembler-not sext\\.w
> 
> I may have been running an rv32 toolchain...  So I'll start over and ensure
> that I'm running rv64 :-)
> 
> 
> With the trunk, I get code like Kito (AND with 0x1 mask)

 Right, I have examined assembly produced at -O2 and this is what happens 
here as well:

--- sleu-O1.s	2022-11-28 16:31:18.520538342 +0000
+++ sleu-O2.s	2022-11-28 16:30:27.054241372 +0000
@@ -10,7 +10,7 @@
 sleu:
 	sgtu	a0,a0,a1
 	xori	a0,a0,1
-	sext.w	a0,a0
+	andi	a0,a0,1
 	ret
 	.size	sleu, .-sleu
 	.section	.note.GNU-stack,"",@progbits

following Kito's observations.  Which is why the tests incorrectly pass at 
some optimisation levels while code produced is still suboptimal and just 
trivially different.

> The key difference is Roger's patch:
> 
> commit c23a9c87cc62bd177fd0d4db6ad34b34e1b9a31f
> Author: Roger Sayle <roger@nextmovesoftware.com>
> Date:   Wed Aug 3 08:55:35 2022 +0100
> 
>     Some additional zero-extension related optimizations in simplify-rtx.
> 
>     This patch implements some additional zero-extension and sign-extension
>     related optimizations in simplify-rtx.cc.  The original motivation comes
>     from PR rtl-optimization/71775, where in comment #2 Andrew Pinksi sees:
> 
>     Failed to match this instruction:
>     (set (reg:DI 88 [ _1 ])
>         (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))
> 
> [ ... ]
> 
> With that patch the sign extension is removed and instead we generate the AND
> with 0x1.
> 
> Old, from combine dump:
> 
>   Successfully matched this instruction:
>   (set (reg/i:DI 10 a0)
> !     (sign_extend:DI (reg:SI 78)))
> 
> 
> New, from combine dump:
> 
>   (set (reg/i:DI 10 a0)
> !     (and:DI (subreg:DI (reg:SI 78) 0)
> !         (const_int 1 [0x1])))
> 
> Note the date on Roger's patch, roughly the same time as yours. I suspect Kito
> had tested the truck with Roger's patch.

 That indeed seems like the correct explanation.  Thanks for tracking it 
down!

> Your patch is probably still useful.  I think Kito's only concern was to make
> sure we don't have the ANDI instruction in addition to not having the SEXT
> instruction.  So still approved for trunk, just update the testcases to make
> sure we don't have the ANDI too.

 Given the false negatives how about getting a bit stricter and also 
checking there's nothing following the XORI instruction, like here?

 It might be an overkill to have a check both for the sequence and for the 
absence of ANDI or SEXT.W as well, but I'd rather have them both out of an 
abundance of caution.

  Maciej

Changes from v1:

- Update test cases so as to verify there's no extra operation between 
  XORI and the final RET, and that an ANDI instruction is not present 
  either.

- Update the change description to reflect changes in code generation.
---
 gcc/config/riscv/riscv.cc             |    4 ++--
 gcc/testsuite/gcc.target/riscv/sge.c  |   12 ++++++++++++
 gcc/testsuite/gcc.target/riscv/sgeu.c |   12 ++++++++++++
 gcc/testsuite/gcc.target/riscv/sle.c  |   12 ++++++++++++
 gcc/testsuite/gcc.target/riscv/sleu.c |   12 ++++++++++++
 5 files changed, 50 insertions(+), 2 deletions(-)

gcc-riscv-int-order-inv-seqz.diff
Index: gcc/gcc/config/riscv/riscv.cc
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -3004,9 +3004,9 @@ riscv_emit_int_order_test (enum rtx_code
 	}
       else if (invert_ptr == 0)
 	{
-	  rtx inv_target = riscv_force_binary (GET_MODE (target),
+	  rtx inv_target = riscv_force_binary (word_mode,
 					       inv_code, cmp0, cmp1);
-	  riscv_emit_binary (XOR, target, inv_target, const1_rtx);
+	  riscv_emit_binary (EQ, target, inv_target, const0_rtx);
 	}
       else
 	{
Index: gcc/gcc/testsuite/gcc.target/riscv/sge.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/sge.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int
+sge (int x, int y)
+{
+  return x >= y;
+}
+
+/* { dg-final { scan-assembler "\\sxori\\sa0,a0,1\n\\sret\n" } } */
+/* { dg-final { scan-assembler-not "andi|sext\\.w" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/sgeu.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/sgeu.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int
+sgeu (unsigned int x, unsigned int y)
+{
+  return x >= y;
+}
+
+/* { dg-final { scan-assembler "\\sxori\\sa0,a0,1\n\\sret\n" } } */
+/* { dg-final { scan-assembler-not "andi|sext\\.w" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/sle.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/sle.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int
+sle (int x, int y)
+{
+  return x <= y;
+}
+
+/* { dg-final { scan-assembler "\\sxori\\sa0,a0,1\n\\sret\n" } } */
+/* { dg-final { scan-assembler-not "andi|sext\\.w" } } */
Index: gcc/gcc/testsuite/gcc.target/riscv/sleu.c
===================================================================
--- /dev/null
+++ gcc/gcc/testsuite/gcc.target/riscv/sleu.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target rv64 } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int
+sleu (unsigned int x, unsigned int y)
+{
+  return x <= y;
+}
+
+/* { dg-final { scan-assembler "\\sxori\\sa0,a0,1\n\\sret\n" } } */
+/* { dg-final { scan-assembler-not "andi|sext\\.w" } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU
  2022-11-28 17:44             ` [PATCH v2] " Maciej W. Rozycki
@ 2022-11-28 18:07               ` Jeff Law
  2022-11-28 19:41                 ` Maciej W. Rozycki
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Law @ 2022-11-28 18:07 UTC (permalink / raw)
  To: Maciej W. Rozycki; +Cc: Kito Cheng, GCC Patches, Andrew Waterman



On 11/28/22 10:44, Maciej W. Rozycki wrote:

> 
>> Your patch is probably still useful.  I think Kito's only concern was to make
>> sure we don't have the ANDI instruction in addition to not having the SEXT
>> instruction.  So still approved for trunk, just update the testcases to make
>> sure we don't have the ANDI too.
> 
>   Given the false negatives how about getting a bit stricter and also
> checking there's nothing following the XORI instruction, like here?
> 
>   It might be an overkill to have a check both for the sequence and for the
> absence of ANDI or SEXT.W as well, but I'd rather have them both out of an
> abundance of caution.
Sure.  That works for me as well.  OK for the trunk.

Interestingly enough Raphael and I are looking at a case where Roger's 
patch is causing poorer code generation.  Given what we're finding as we 
work through the other case, I won't be surprised if we find multiple 
cases where RISC-V is generating poorer code after that patch, even 
though it's a perfectly sensible patch.

jeff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU
  2022-11-28 18:07               ` Jeff Law
@ 2022-11-28 19:41                 ` Maciej W. Rozycki
  0 siblings, 0 replies; 10+ messages in thread
From: Maciej W. Rozycki @ 2022-11-28 19:41 UTC (permalink / raw)
  To: Jeff Law; +Cc: Kito Cheng, GCC Patches, Andrew Waterman

On Mon, 28 Nov 2022, Jeff Law wrote:

> >   Given the false negatives how about getting a bit stricter and also
> > checking there's nothing following the XORI instruction, like here?
> > 
> >   It might be an overkill to have a check both for the sequence and for the
> > absence of ANDI or SEXT.W as well, but I'd rather have them both out of an
> > abundance of caution.
> Sure.  That works for me as well.  OK for the trunk.

 I have committed it then.  Thank you for your review.

> Interestingly enough Raphael and I are looking at a case where Roger's patch
> is causing poorer code generation.  Given what we're finding as we work
> through the other case, I won't be surprised if we find multiple cases where
> RISC-V is generating poorer code after that patch, even though it's a
> perfectly sensible patch.

 I think it would make sense to run RISC-V performance evaluation w/ and 
w/o Roger's applied.  Sadly I am somewhat resource-constrained right now 
and won't be able to do that anytime soon, but hopefully there's enough 
RISC-V hardware available now for someone to pick it up.

  Maciej

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-28 19:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-03  9:54 [PATCH] RISC-V: Avoid redundant sign-extension for SImode SGE, SGEU, SLE, SLEU Maciej W. Rozycki
2022-08-11  3:26 ` Kito Cheng
2022-08-12 22:01   ` Maciej W. Rozycki
2022-11-25 14:07     ` [PING][PATCH] " Maciej W. Rozycki
2022-11-28 14:50       ` Jeff Law
2022-11-28 15:38         ` Maciej W. Rozycki
2022-11-28 16:15           ` Jeff Law
2022-11-28 17:44             ` [PATCH v2] " Maciej W. Rozycki
2022-11-28 18:07               ` Jeff Law
2022-11-28 19:41                 ` Maciej W. Rozycki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).