[PATCH] xtensa: Optimize bitwise splicing operation

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] xtensa: Optimize bitwise splicing operation
       [not found] <1e8fab8f-c0bb-dfc6-5533-eba3bde49ea4.ref@yahoo.co.jp>
@ 2023-01-07  2:55 ` Takayuki 'January June' Suwa
  2023-01-07 21:53   ` Max Filippov
  0 siblings, 1 reply; 3+ messages in thread
From: Takayuki 'January June' Suwa @ 2023-01-07  2:55 UTC (permalink / raw)
  To: GCC Patches; +Cc: Max Filippov

This patch optimizes the operation of cutting and splicing two register
values at a specified bit position, in other words, combining (bitwise
ORing) bits 0 through (C-1) of the register with bits C through 31
of the other, where C is the specified immediate integer 1 through 31.

This typically applies to signedness copy of floating point number or
__builtin_return_address() if the windowed register ABI, and saves one
instruction compared to four shifts and a bitwise OR by the RTL
generation pass.

gcc/ChangeLog:

	* config/xtensa/xtensa.md (*splice_bits):
	New insn_and_split pattern.
---
 gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 0a26d3dccf4..36ec1b1918e 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -746,6 +746,53 @@
    (set_attr "mode"	"SI")
    (set_attr "length"	"3")])
 
+(define_insn_and_split "*splice_bits"
+  [(set (match_operand:SI 0 "register_operand" "=a")
+	(ior:SI (and:SI (match_operand:SI 1 "register_operand" "r")
+			(match_operand:SI 3 "const_int_operand" "i"))
+		(and:SI (match_operand:SI 2 "register_operand" "r")
+			(match_operand:SI 4 "const_int_operand" "i"))))]
+
+  "!optimize_debug && optimize
+   && INTVAL (operands[3]) + INTVAL (operands[4]) == -1
+   && (exact_log2 (INTVAL (operands[3]) + 1) > 0
+       || exact_log2 (INTVAL (operands[4]) + 1) > 0)"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(set (match_dup 5)
+	(ashift:SI (match_dup 1)
+		   (match_dup 4)))
+   (set (match_dup 6)
+	(lshiftrt:SI (match_dup 2)
+		     (match_dup 3)))
+   (set (match_dup 0)
+	(ior:SI (lshiftrt:SI (match_dup 5)
+			     (match_dup 4))
+		(ashift:SI (match_dup 6)
+			   (match_dup 3))))]
+{
+  int shift;
+  if (INTVAL (operands[3]) < 0)
+    {
+      rtx x;
+      x = operands[1], operands[1] = operands[2], operands[2] = x;
+      x = operands[3], operands[3] = operands[4], operands[4] = x;
+    }
+  shift = floor_log2 (INTVAL (operands[3]) + 1);
+  operands[3] = GEN_INT (shift);
+  operands[4] = GEN_INT (32 - shift);
+  operands[5] = gen_reg_rtx (SImode);
+  operands[6] = gen_reg_rtx (SImode);
+}
+  [(set_attr "type"	"arith")
+   (set_attr "mode"	"SI")
+   (set (attr "length")
+	(if_then_else (match_test "TARGET_DENSITY
+				   && (INTVAL (operands[3]) == 0x7FFFFFFF
+				       || INTVAL (operands[4]) == 0x7FFFFFFF)")
+		      (const_int 11)
+		      (const_int 12)))])
+
 \f
 ;; Zero-extend instructions.
 
-- 
2.30.2

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] xtensa: Optimize bitwise splicing operation
  2023-01-07  2:55 ` [PATCH] xtensa: Optimize bitwise splicing operation Takayuki 'January June' Suwa
@ 2023-01-07 21:53   ` Max Filippov
  2023-01-08  4:11     ` Takayuki 'January June' Suwa
  0 siblings, 1 reply; 3+ messages in thread
From: Max Filippov @ 2023-01-07 21:53 UTC (permalink / raw)
  To: Takayuki 'January June' Suwa; +Cc: GCC Patches

On Fri, Jan 6, 2023 at 6:55 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
>
> This patch optimizes the operation of cutting and splicing two register
> values at a specified bit position, in other words, combining (bitwise
> ORing) bits 0 through (C-1) of the register with bits C through 31
> of the other, where C is the specified immediate integer 1 through 31.
>
> This typically applies to signedness copy of floating point number or
> __builtin_return_address() if the windowed register ABI, and saves one
> instruction compared to four shifts and a bitwise OR by the RTL
> generation pass.

While I indeed see this kind of change, e.g.:
-       extui   a3, a3, 27, 5
-       slli    a2, a2, 5
-       srli    a2, a2, 5
-       slli    a3, a3, 27
-       or      a2, a2, a3
+       slli    a2, a2, 5
+       extui   a3, a3, 27, 5
+       ssai    5
+       src     a2, a3, a2

I also see the following:
-       movi.n  a6, -4
-       and     a5, a5, a6
-       extui   a3, a3, 0, 2
-       or      a3, a3, a5
+       srli    a5, a5, 2
+       slli    a3, a3, 30
+       ssai    30
+       src     a3, a5, a3

i.e. after the split there's the same number of instructions,
but the new sequence is one byte longer than the original one
because of the movi.n.

Looking at a bunch of linux builds I observe a slight code size
growth in call0 kernels and a slight code size reduction in
windowed kernels.

> gcc/ChangeLog:
>
>         * config/xtensa/xtensa.md (*splice_bits):
>         New insn_and_split pattern.
> ---
>  gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>
> diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
> index 0a26d3dccf4..36ec1b1918e 100644
> --- a/gcc/config/xtensa/xtensa.md
> +++ b/gcc/config/xtensa/xtensa.md
> @@ -746,6 +746,53 @@
>     (set_attr "mode"    "SI")
>     (set_attr "length"  "3")])
>
> +(define_insn_and_split "*splice_bits"
> +  [(set (match_operand:SI 0 "register_operand" "=a")
> +       (ior:SI (and:SI (match_operand:SI 1 "register_operand" "r")
> +                       (match_operand:SI 3 "const_int_operand" "i"))
> +               (and:SI (match_operand:SI 2 "register_operand" "r")
> +                       (match_operand:SI 4 "const_int_operand" "i"))))]
> +
> +  "!optimize_debug && optimize
> +   && INTVAL (operands[3]) + INTVAL (operands[4]) == -1
> +   && (exact_log2 (INTVAL (operands[3]) + 1) > 0
> +       || exact_log2 (INTVAL (operands[4]) + 1) > 0)"
> +  "#"
> +  "&& can_create_pseudo_p ()"
> +  [(set (match_dup 5)
> +       (ashift:SI (match_dup 1)
> +                  (match_dup 4)))
> +   (set (match_dup 6)
> +       (lshiftrt:SI (match_dup 2)
> +                    (match_dup 3)))
> +   (set (match_dup 0)
> +       (ior:SI (lshiftrt:SI (match_dup 5)
> +                            (match_dup 4))
> +               (ashift:SI (match_dup 6)
> +                          (match_dup 3))))]
> +{
> +  int shift;
> +  if (INTVAL (operands[3]) < 0)
> +    {
> +      rtx x;
> +      x = operands[1], operands[1] = operands[2], operands[2] = x;
> +      x = operands[3], operands[3] = operands[4], operands[4] = x;
> +    }
> +  shift = floor_log2 (INTVAL (operands[3]) + 1);
> +  operands[3] = GEN_INT (shift);
> +  operands[4] = GEN_INT (32 - shift);
> +  operands[5] = gen_reg_rtx (SImode);
> +  operands[6] = gen_reg_rtx (SImode);
> +}
> +  [(set_attr "type"    "arith")
> +   (set_attr "mode"    "SI")
> +   (set (attr "length")
> +       (if_then_else (match_test "TARGET_DENSITY
> +                                  && (INTVAL (operands[3]) == 0x7FFFFFFF
> +                                      || INTVAL (operands[4]) == 0x7FFFFFFF)")
> +                     (const_int 11)
> +                     (const_int 12)))])

I wonder how the length could be 11 here? I always see 4 3-byte
instructions generated by this pattern.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] xtensa: Optimize bitwise splicing operation
  2023-01-07 21:53   ` Max Filippov
@ 2023-01-08  4:11     ` Takayuki 'January June' Suwa
  0 siblings, 0 replies; 3+ messages in thread
From: Takayuki 'January June' Suwa @ 2023-01-08  4:11 UTC (permalink / raw)
  To: Max Filippov; +Cc: GCC Patches

On 2023/01/08 6:53, Max Filippov wrote:
> On Fri, Jan 6, 2023 at 6:55 PM Takayuki 'January June' Suwa
> <jjsuwa_sys3175@yahoo.co.jp> wrote:
>>
>> This patch optimizes the operation of cutting and splicing two register
>> values at a specified bit position, in other words, combining (bitwise
>> ORing) bits 0 through (C-1) of the register with bits C through 31
>> of the other, where C is the specified immediate integer 1 through 31.
>>
>> This typically applies to signedness copy of floating point number or
>> __builtin_return_address() if the windowed register ABI, and saves one
>> instruction compared to four shifts and a bitwise OR by the RTL
>> generation pass.
> 
> While I indeed see this kind of change, e.g.:
> -       extui   a3, a3, 27, 5
> -       slli    a2, a2, 5
> -       srli    a2, a2, 5
> -       slli    a3, a3, 27
> -       or      a2, a2, a3
> +       slli    a2, a2, 5
> +       extui   a3, a3, 27, 5
> +       ssai    5
> +       src     a2, a3, a2
> 
> I also see the following:
> -       movi.n  a6, -4
> -       and     a5, a5, a6
> -       extui   a3, a3, 0, 2
> -       or      a3, a3, a5
> +       srli    a5, a5, 2
> +       slli    a3, a3, 30
> +       ssai    30
> +       src     a3, a5, a3
> 
> i.e. after the split there's the same number of instructions,
> but the new sequence is one byte longer than the original one
> because of the movi.n.
> 
> Looking at a bunch of linux builds I observe a slight code size
> growth in call0 kernels and a slight code size reduction in
> windowed kernels.
> 
>> gcc/ChangeLog:
>>
>>         * config/xtensa/xtensa.md (*splice_bits):
>>         New insn_and_split pattern.
>> ---
>>  gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
>> index 0a26d3dccf4..36ec1b1918e 100644
>> --- a/gcc/config/xtensa/xtensa.md
>> +++ b/gcc/config/xtensa/xtensa.md
>> @@ -746,6 +746,53 @@
>>     (set_attr "mode"    "SI")
>>     (set_attr "length"  "3")])
>>
>> +(define_insn_and_split "*splice_bits"
>> +  [(set (match_operand:SI 0 "register_operand" "=a")
>> +       (ior:SI (and:SI (match_operand:SI 1 "register_operand" "r")
>> +                       (match_operand:SI 3 "const_int_operand" "i"))
>> +               (and:SI (match_operand:SI 2 "register_operand" "r")
>> +                       (match_operand:SI 4 "const_int_operand" "i"))))]
>> +
>> +  "!optimize_debug && optimize
>> +   && INTVAL (operands[3]) + INTVAL (operands[4]) == -1
>> +   && (exact_log2 (INTVAL (operands[3]) + 1) > 0
>> +       || exact_log2 (INTVAL (operands[4]) + 1) > 0)"
>> +  "#"
>> +  "&& can_create_pseudo_p ()"
>> +  [(set (match_dup 5)
>> +       (ashift:SI (match_dup 1)
>> +                  (match_dup 4)))
>> +   (set (match_dup 6)
>> +       (lshiftrt:SI (match_dup 2)
>> +                    (match_dup 3)))
>> +   (set (match_dup 0)
>> +       (ior:SI (lshiftrt:SI (match_dup 5)
>> +                            (match_dup 4))
>> +               (ashift:SI (match_dup 6)
>> +                          (match_dup 3))))]
>> +{
>> +  int shift;
>> +  if (INTVAL (operands[3]) < 0)
>> +    {
>> +      rtx x;
>> +      x = operands[1], operands[1] = operands[2], operands[2] = x;
>> +      x = operands[3], operands[3] = operands[4], operands[4] = x;
>> +    }
>> +  shift = floor_log2 (INTVAL (operands[3]) + 1);
>> +  operands[3] = GEN_INT (shift);
>> +  operands[4] = GEN_INT (32 - shift);
>> +  operands[5] = gen_reg_rtx (SImode);
>> +  operands[6] = gen_reg_rtx (SImode);
>> +}
>> +  [(set_attr "type"    "arith")
>> +   (set_attr "mode"    "SI")
>> +   (set (attr "length")
>> +       (if_then_else (match_test "TARGET_DENSITY
>> +                                  && (INTVAL (operands[3]) == 0x7FFFFFFF
>> +                                      || INTVAL (operands[4]) == 0x7FFFFFFF)")
>> +                     (const_int 11)
>> +                     (const_int 12)))])
> 
> I wonder how the length could be 11 here? I always see 4 3-byte
> instructions generated by this pattern.
> 

Sorry, I should have carried out a systematic test beforehand:

    #define TEST(c) \
      unsigned int test_ ## c (unsigned int a, unsigned int b) { \
        return (a & (-1U >> c)) | (b & ~(-1U >> c)); \
      }
    TEST(1)
    TEST(2)
      ...
    TEST(30)
    TEST(31)

Without this patch, compiling the above if c is:

 a. between 1 and 15, slli (or add.n) + extui + slli + srli + or
 b. 16 then extui + slli + extui + or
 c. between 17 and 20, srli + slli + extui + or 
 d. between 21 and 31, movi(.n) + and + extui + or

Clearly, the patch should be restricted to apply only to case a.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-01-08  4:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1e8fab8f-c0bb-dfc6-5533-eba3bde49ea4.ref@yahoo.co.jp>
2023-01-07  2:55 ` [PATCH] xtensa: Optimize bitwise splicing operation Takayuki 'January June' Suwa
2023-01-07 21:53   ` Max Filippov
2023-01-08  4:11     ` Takayuki 'January June' Suwa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).