* [PATCH 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants
@ 2022-06-12 6:41 Takayuki 'January June' Suwa
2022-06-13 3:49 ` Max Filippov
0 siblings, 1 reply; 4+ messages in thread
From: Takayuki 'January June' Suwa @ 2022-06-12 6:41 UTC (permalink / raw)
To: GCC Patches
This patch offers several insn-and-split patterns for bitwise AND with
register and constant that cannot fit into a "MOVI Ax, simm12" instruction,
but can be represented as:
i. 1's least significant N bits and the others 0's (17 <= N <= 31)
ii. 1's most significant N bits and the others 0's (12 <= N <= 31)
iii. M 1's sequence of bits and trailing N 0's bits
(1 <= M <= 16, 1 <= N <= 30)
And also offers shortcuts for conditional branch if each of the
abovementioned
operations is (not) equal to zero.
gcc/ChangeLog:
* config/xtensa/predicates.md (shifted_mask_operand):
New predicate.
* config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one):
New insn-and-split pattern.
(*andsi3_const_negative_pow2, *andsi3_const_shifted_mask,
*masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2,
*masktrue_const_shifted_mask): Ditto.
---
gcc/config/xtensa/predicates.md | 11 +++
gcc/config/xtensa/xtensa.md | 165 ++++++++++++++++++++++++++++++++
2 files changed, 176 insertions(+)
diff --git a/gcc/config/xtensa/predicates.md
b/gcc/config/xtensa/predicates.md
index bcc83ada0ae..24c77f343a0 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -52,6 +52,17 @@
(match_test "xtensa_mask_immediate (INTVAL (op))"))
(match_operand 0 "register_operand")))
+(define_predicate "shifted_mask_operand"
+ (and (match_code "const_int")
+ (match_test "!xtensa_simm12b (INTVAL (op))"))
+{
+ HOST_WIDE_INT mask = INTVAL (op);
+ int shift = ctz_hwi (mask);
+
+ return IN_RANGE (shift, 1, 31)
+ && xtensa_mask_immediate ((uint32_t)mask >> shift);
+})
+
(define_predicate "extui_fldsz_operand"
(and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 1, 16)")))
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index 090a2939684..286a1d8c38e 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -645,6 +645,78 @@
(set_attr "mode" "SI")
(set_attr "length" "6")])
+(define_insn_and_split "*andsi3_const_pow2_minus_one"
+ [(set (match_operand:SI 0 "register_operand" "=a")
+ (and:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "const_int_operand" "i")))]
+ "IN_RANGE (exact_log2 (INTVAL (operands[2]) + 1), 17, 31)"
+ "#"
+ "&& 1"
+ [(set (match_dup 0)
+ (ashift:SI (match_dup 1)
+ (match_dup 2)))
+ (set (match_dup 0)
+ (lshiftrt:SI (match_dup 0)
+ (match_dup 2)))]
+{
+ operands[2] = GEN_INT (32 - floor_log2 (INTVAL (operands[2]) + 1));
+}
+ [(set_attr "type" "arith")
+ (set_attr "mode" "SI")
+ (set (attr "length")
+ (if_then_else (match_test "TARGET_DENSITY
+ && INTVAL (operands[2]) == 0x7FFFFFFF")
+ (const_int 5)
+ (const_int 6)))])
+
+(define_insn_and_split "*andsi3_const_negative_pow2"
+ [(set (match_operand:SI 0 "register_operand" "=a")
+ (and:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "const_int_operand" "i")))]
+ "IN_RANGE (exact_log2 (-INTVAL (operands[2])), 12, 31)"
+ "#"
+ "&& 1"
+ [(set (match_dup 0)
+ (lshiftrt:SI (match_dup 1)
+ (match_dup 2)))
+ (set (match_dup 0)
+ (ashift:SI (match_dup 0)
+ (match_dup 2)))]
+{
+ operands[2] = GEN_INT (floor_log2 (-INTVAL (operands[2])));
+}
+ [(set_attr "type" "arith")
+ (set_attr "mode" "SI")
+ (set_attr "length" "6")])
+
+(define_insn_and_split "*andsi3_const_shifted_mask"
+ [(set (match_operand:SI 0 "register_operand" "=a")
+ (and:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "shifted_mask_operand" "i")))]
+ ""
+ "#"
+ ""
+ [(set (match_dup 0)
+ (zero_extract:SI (match_dup 1)
+ (match_dup 3)
+ (match_dup 2)))
+ (set (match_dup 0)
+ (ashift:SI (match_dup 0)
+ (match_dup 2)))]
+{
+ HOST_WIDE_INT mask = INTVAL (operands[2]);
+ int shift = ctz_hwi (mask);
+ operands[2] = GEN_INT (shift);
+ operands[3] = GEN_INT (floor_log2 (((uint32_t)mask >> shift) + 1));
+}
+ [(set_attr "type" "arith")
+ (set_attr "mode" "SI")
+ (set (attr "length")
+ (if_then_else (match_test "TARGET_DENSITY
+ && ctz_hwi (INTVAL (operands[2])) == 1")
+ (const_int 5)
+ (const_int 6)))])
+
(define_insn "iorsi3"
[(set (match_operand:SI 0 "register_operand" "=a")
(ior:SI (match_operand:SI 1 "register_operand" "%r")
@@ -1648,6 +1720,99 @@
(set_attr "mode" "none")
(set_attr "length" "3")])
+(define_insn_and_split "*masktrue_const_pow2_minus_one"
+ [(set (pc)
+ (if_then_else (match_operator 4 "boolean_operator"
+ [(zero_extract:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "const_int_operand" "i")
+ (const_int 0))
+ (const_int 0)])
+ (label_ref (match_operand 3 "" ""))
+ (pc)))
+ (clobber (match_scratch:SI 0 "=&a"))]
+ "IN_RANGE (INTVAL (operands[2]), 17, 31)"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (ashift:SI (match_dup 1)
+ (match_dup 2)))
+ (set (pc)
+ (if_then_else (match_op_dup 4
+ [(match_dup 0)
+ (const_int 0)])
+ (label_ref (match_dup 3))
+ (pc)))]
+{
+ operands[2] = GEN_INT (32 - INTVAL (operands[2]));
+}
+ [(set_attr "type" "jump")
+ (set_attr "mode" "none")
+ (set (attr "length")
+ (if_then_else (match_test "TARGET_DENSITY
+ && INTVAL (operands[2]) == 31")
+ (const_int 5)
+ (const_int 6)))])
+
+(define_insn_and_split "*masktrue_const_negative_pow2"
+ [(set (pc)
+ (if_then_else (match_operator 4 "boolean_operator"
+ [(and:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "const_int_operand" "i"))
+ (const_int 0)])
+ (label_ref (match_operand 3 "" ""))
+ (pc)))
+ (clobber (match_scratch:SI 0 "=&a"))]
+ "IN_RANGE (exact_log2 (-INTVAL (operands[2])), 12, 30)"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (lshiftrt:SI (match_dup 1)
+ (match_dup 2)))
+ (set (pc)
+ (if_then_else (match_op_dup 4
+ [(match_dup 0)
+ (const_int 0)])
+ (label_ref (match_dup 3))
+ (pc)))]
+{
+ operands[2] = GEN_INT (floor_log2 (-INTVAL (operands[2])));
+}
+ [(set_attr "type" "jump")
+ (set_attr "mode" "none")
+ (set_attr "length" "6")])
+
+(define_insn_and_split "*masktrue_const_shifted_mask"
+ [(set (pc)
+ (if_then_else (match_operator 4 "boolean_operator"
+ [(and:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "shifted_mask_operand" "i"))
+ (const_int 0)])
+ (label_ref (match_operand 3 "" ""))
+ (pc)))
+ (clobber (match_scratch:SI 0 "=&a"))]
+ ""
+ "#"
+ "reload_completed"
+ [(set (match_dup 0)
+ (zero_extract:SI (match_dup 1)
+ (match_dup 5)
+ (match_dup 2)))
+ (set (pc)
+ (if_then_else (match_op_dup 4
+ [(match_dup 0)
+ (const_int 0)])
+ (label_ref (match_dup 3))
+ (pc)))]
+{
+ HOST_WIDE_INT mask = INTVAL (operands[2]);
+ int shift = ctz_hwi (mask);
+ operands[2] = GEN_INT (shift);
+ operands[5] = GEN_INT (floor_log2 (((uint32_t)mask >> shift) + 1));
+}
+ [(set_attr "type" "jump")
+ (set_attr "mode" "none")
+ (set_attr "length" "6")])
+
;; Zero-overhead looping support.
--
2.20.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants
2022-06-12 6:41 [PATCH 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants Takayuki 'January June' Suwa
@ 2022-06-13 3:49 ` Max Filippov
2022-06-13 16:28 ` [PATCH v2 " Takayuki 'January June' Suwa
0 siblings, 1 reply; 4+ messages in thread
From: Max Filippov @ 2022-06-13 3:49 UTC (permalink / raw)
To: Takayuki 'January June' Suwa; +Cc: GCC Patches
Hi Suwa-san,
On Sat, Jun 11, 2022 at 11:43 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
>
> This patch offers several insn-and-split patterns for bitwise AND with
> register and constant that cannot fit into a "MOVI Ax, simm12" instruction,
> but can be represented as:
>
> i. 1's least significant N bits and the others 0's (17 <= N <= 31)
> ii. 1's most significant N bits and the others 0's (12 <= N <= 31)
> iii. M 1's sequence of bits and trailing N 0's bits
> (1 <= M <= 16, 1 <= N <= 30)
>
> And also offers shortcuts for conditional branch if each of the
> abovementioned
> operations is (not) equal to zero.
>
> gcc/ChangeLog:
>
> * config/xtensa/predicates.md (shifted_mask_operand):
> New predicate.
> * config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one):
> New insn-and-split pattern.
> (*andsi3_const_negative_pow2, *andsi3_const_shifted_mask,
> *masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2,
> *masktrue_const_shifted_mask): Ditto.
> ---
> gcc/config/xtensa/predicates.md | 11 +++
> gcc/config/xtensa/xtensa.md | 165 ++++++++++++++++++++++++++++++++
> 2 files changed, 176 insertions(+)
This change produces a bunch of regression test failures in big-endian
configuration:
FAIL: gcc.c-torture/execute/20020108-1.c -O1 execution test
FAIL: gcc.c-torture/execute/20020108-1.c -O2 execution test
FAIL: gcc.c-torture/execute/20020108-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/20020108-1.c -Os execution test
FAIL: gcc.c-torture/execute/20020108-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/20040629-1.c -O1 execution test
FAIL: gcc.c-torture/execute/20040629-1.c -O2 execution test
FAIL: gcc.c-torture/execute/20040629-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/20040629-1.c -Os execution test
FAIL: gcc.c-torture/execute/20040629-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/20040705-1.c -O1 execution test
FAIL: gcc.c-torture/execute/20040705-1.c -O2 execution test
FAIL: gcc.c-torture/execute/20040705-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/20040705-1.c -Os execution test
FAIL: gcc.c-torture/execute/20040705-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/20040705-2.c -O1 execution test
FAIL: gcc.c-torture/execute/20040705-2.c -O2 execution test
FAIL: gcc.c-torture/execute/20040705-2.c -O3 -g execution test
FAIL: gcc.c-torture/execute/20040705-2.c -Os execution test
FAIL: gcc.c-torture/execute/20040705-2.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/20040709-1.c -O1 execution test
FAIL: gcc.c-torture/execute/20040709-1.c -O2 execution test
FAIL: gcc.c-torture/execute/20040709-1.c -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions execution
test
FAIL: gcc.c-torture/execute/20040709-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/20040709-1.c -Os execution test
FAIL: gcc.c-torture/execute/20040709-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/20040709-1.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: gcc.c-torture/execute/20180921-1.c -O1 execution test
FAIL: gcc.c-torture/execute/20180921-1.c -O2 execution test
FAIL: gcc.c-torture/execute/20180921-1.c -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions execution
test
FAIL: gcc.c-torture/execute/20180921-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/20180921-1.c -Os execution test
FAIL: gcc.c-torture/execute/20180921-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/20180921-1.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: gcc.c-torture/execute/pr60454.c -O1 execution test
FAIL: gcc.c-torture/execute/pr60454.c -O2 execution test
FAIL: gcc.c-torture/execute/pr60454.c -O3 -g execution test
FAIL: gcc.c-torture/execute/pr60454.c -Os execution test
FAIL: gcc.c-torture/execute/pr60454.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr61306-2.c -O1 execution test
FAIL: gcc.c-torture/execute/pr64718.c -O1 execution test
FAIL: gcc.c-torture/execute/pr65215-1.c -O1 execution test
FAIL: gcc.c-torture/execute/pr65215-1.c -O2 execution test
FAIL: gcc.c-torture/execute/pr65215-1.c -O3 -g execution test
FAIL: gcc.c-torture/execute/pr65215-1.c -Os execution test
FAIL: gcc.c-torture/execute/pr65215-1.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr65215-1.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: gcc.c-torture/execute/pr65215-3.c -O1 execution test
FAIL: gcc.c-torture/execute/pr65215-4.c -O1 execution test
FAIL: gcc.c-torture/execute/pr65215-4.c -O2 execution test
FAIL: gcc.c-torture/execute/pr65215-4.c -O3 -g execution test
FAIL: gcc.c-torture/execute/pr65215-4.c -Os execution test
FAIL: gcc.c-torture/execute/pr65215-4.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr65215-4.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: gcc.c-torture/execute/pr79388.c -O1 execution test
FAIL: gcc.c-torture/execute/pr79388.c -O2 execution test
FAIL: gcc.c-torture/execute/pr79388.c -O3 -g execution test
FAIL: gcc.c-torture/execute/pr79388.c -Os execution test
FAIL: gcc.c-torture/execute/pr79388.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr93908.c -O1 execution test
FAIL: gcc.c-torture/execute/pr93908.c -O2 execution test
FAIL: gcc.c-torture/execute/pr93908.c -O3 -g execution test
FAIL: gcc.c-torture/execute/pr93908.c -Os execution test
FAIL: gcc.c-torture/execute/pr93908.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.c-torture/execute/pr93908.c -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects execution test
FAIL: gcc.c-torture/execute/struct-ini-2.c -O1 execution test
FAIL: gcc.c-torture/execute/struct-ini-2.c -O2 execution test
FAIL: gcc.c-torture/execute/struct-ini-2.c -O3 -g execution test
FAIL: gcc.c-torture/execute/struct-ini-2.c -Os execution test
FAIL: gcc.c-torture/execute/struct-ini-2.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O1 execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O2 execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions execution
test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O3 -g execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -Os execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O2 -flto
-fno-use-linker-plugin -flto-partition=none execution test
FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects execution test
FAIL: gcc.dg/20050826-1.c execution test
FAIL: gcc.dg/sso/s3.c -Wno-scalar-storage-order -O1 -fno-inline
output pattern test
FAIL: gcc.dg/sso/t2.c -Wno-scalar-storage-order -O1 -fno-inline
output pattern test
FAIL: gcc.dg/sso/t2.c -Wno-scalar-storage-order -O2 output pattern test
FAIL: gcc.dg/sso/t2.c -Wno-scalar-storage-order -O3
-finline-functions output pattern test
FAIL: gcc.dg/sso/t2.c -Wno-scalar-storage-order -Os output pattern test
FAIL: gcc.dg/sso/t2.c -Wno-scalar-storage-order -Og -g output pattern test
FAIL: gcc.dg/torture/pr30665-2.c -O2 execution test
FAIL: gcc.dg/torture/pr30665-2.c -O3 -g execution test
FAIL: gcc.dg/torture/pr30665-2.c -Os execution test
FAIL: gcc.dg/torture/pr30665-2.c -O2 -flto -fno-use-linker-plugin
-flto-partition=none execution test
FAIL: gcc.dg/torture/pr30665-2.c -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects execution test
FAIL: gcc.dg/torture/pr69714.c -O1 execution test
FAIL: gcc.dg/torture/vshuf-v8qi.c -O2 execution test
FAIL: gcc.dg/tree-ssa/pr80803.c execution test
FAIL: gcc.dg/tree-ssa/pr80898-2.c execution test
E.g. for the test gcc.c-torture/execute/struct-ini-2.c
the following assembly code is generated now:
.file "struct-ini-2.c"
.text
.literal_position
.literal .LC0, x
.literal .LC2, 8192
.literal .LC3, abort@PLT
.literal .LC4, exit@PLT
.align 4
.global main
.type main, @function
main:
entry sp, 32
l32r a8, .LC0
l16ui a8, a8, 0
l32r a9, .LC2
extui a10, a8, 16, 4
slli a10, a10, 12
extui a9, a9, 0, 16
beq a10, a9, .L2
l32r a8, .LC3
callx8 a8
.L2:
movi a9, 0xf0
and a9, a8, a9
movi.n a10, 0x30
beq a9, a10, .L3
l32r a8, .LC3
callx8 a8
.L3:
extui a8, a8, 0, 4
beqi a8, 4, .L4
l32r a8, .LC3
callx8 a8
.L4:
movi.n a10, 0
l32r a8, .LC4
callx8 a8
.size main, .-main
.global x
.data
.align 4
.type x, @object
.size x, 4
x:
.byte 32
.byte 52
.zero 2
.ident "GCC: (GNU) 13.0.0 20220612 (experimental)"
and the following code was generated before this change:
.file "struct-ini-2.c"
.text
.literal_position
.literal .LC0, x
.literal .LC1, -4096
.literal .LC2, 8192
.literal .LC3, abort@PLT
.literal .LC4, exit@PLT
.align 4
.global main
.type main, @function
main:
entry sp, 32
l32r a8, .LC0
l16ui a8, a8, 0
l32r a9, .LC2
l32r a10, .LC1
and a10, a8, a10
extui a9, a9, 0, 16
beq a10, a9, .L2
l32r a8, .LC3
callx8 a8
.L2:
movi a9, 0xf0
and a9, a8, a9
movi.n a10, 0x30
beq a9, a10, .L3
l32r a8, .LC3
callx8 a8
.L3:
extui a8, a8, 0, 4
beqi a8, 4, .L4
l32r a8, .LC3
callx8 a8
.L4:
movi.n a10, 0
l32r a8, .LC4
callx8 a8
.size main, .-main
.global x
.data
.align 4
.type x, @object
.size x, 4
x:
.byte 32
.byte 52
.zero 2
.ident "GCC: (GNU) 13.0.0 20220612 (experimental)"
--
Thanks.
-- Max
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants
2022-06-13 3:49 ` Max Filippov
@ 2022-06-13 16:28 ` Takayuki 'January June' Suwa
2022-06-14 0:33 ` Max Filippov
0 siblings, 1 reply; 4+ messages in thread
From: Takayuki 'January June' Suwa @ 2022-06-13 16:28 UTC (permalink / raw)
To: Max Filippov; +Cc: GCC Patches
On 2022/06/13 12:49, Max Filippov wrote:
> Hi Suwa-san,
hi!
> This change produces a bunch of regression test failures in big-endian
> configuration:
bad news X(
that point is what i was a little worried about...
> E.g. for the test gcc.c-torture/execute/struct-ini-2.c
> the following assembly code is generated now:
> and the following code was generated before this change:
- .literal .LC1, -4096
- l32r a10, .LC1
- and a10, a8, a10
+ extui a10, a8, 16, 4 // wrong! must be 12, 4
+ slli a10, a10, 12
and of course, '(zero_extract)' is endianness-sensitive.
(ref. 14.11 Bit-Fields, gcc-internals)
the all patches that i previouly posted do not match or emit '(zero_extract)', except for this case.
===
This patch offers several insn-and-split patterns for bitwise AND with
register and constant that can be represented as:
i. 1's least significant N bits and the others 0's (17 <= N <= 31)
ii. 1's most significant N bits and the others 0's (12 <= N <= 31)
iii. M 1's sequence of bits and trailing N 0's bits, that cannot fit into a
"MOVI Ax, simm12" instruction (1 <= M <= 16, 1 <= N <= 30)
And also offers shortcuts for conditional branch if each of the abovementioned
operations is (not) equal to zero.
gcc/ChangeLog:
* config/xtensa/predicates.md (shifted_mask_operand):
New predicate.
* config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one):
New insn-and-split pattern.
(*andsi3_const_negative_pow2, *andsi3_const_shifted_mask,
*masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2,
*masktrue_const_shifted_mask): Ditto.
---
gcc/config/xtensa/predicates.md | 10 ++
gcc/config/xtensa/xtensa.md | 179 ++++++++++++++++++++++++++++++++
2 files changed, 189 insertions(+)
diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md
index bcc83ada0ae..d63a6cf034c 100644
--- a/gcc/config/xtensa/predicates.md
+++ b/gcc/config/xtensa/predicates.md
@@ -52,6 +52,16 @@
(match_test "xtensa_mask_immediate (INTVAL (op))"))
(match_operand 0 "register_operand")))
+(define_predicate "shifted_mask_operand"
+ (match_code "const_int")
+{
+ HOST_WIDE_INT mask = INTVAL (op);
+ int shift = ctz_hwi (mask);
+
+ return IN_RANGE (shift, 1, 31)
+ && xtensa_mask_immediate ((uint32_t)mask >> shift);
+})
+
(define_predicate "extui_fldsz_operand"
(and (match_code "const_int")
(match_test "IN_RANGE (INTVAL (op), 1, 16)")))
diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
index a4477e2207e..5d0f346b01a 100644
--- a/gcc/config/xtensa/xtensa.md
+++ b/gcc/config/xtensa/xtensa.md
@@ -645,6 +645,83 @@
(set_attr "mode" "SI")
(set_attr "length" "6")])
+(define_insn_and_split "*andsi3_const_pow2_minus_one"
+ [(set (match_operand:SI 0 "register_operand" "=a")
+ (and:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "const_int_operand" "i")))]
+ "IN_RANGE (exact_log2 (INTVAL (operands[2]) + 1), 17, 31)"
+ "#"
+ "&& 1"
+ [(set (match_dup 0)
+ (ashift:SI (match_dup 1)
+ (match_dup 2)))
+ (set (match_dup 0)
+ (lshiftrt:SI (match_dup 0)
+ (match_dup 2)))]
+{
+ operands[2] = GEN_INT (32 - floor_log2 (INTVAL (operands[2]) + 1));
+}
+ [(set_attr "type" "arith")
+ (set_attr "mode" "SI")
+ (set (attr "length")
+ (if_then_else (match_test "TARGET_DENSITY
+ && INTVAL (operands[2]) == 0x7FFFFFFF")
+ (const_int 5)
+ (const_int 6)))])
+
+(define_insn_and_split "*andsi3_const_negative_pow2"
+ [(set (match_operand:SI 0 "register_operand" "=a")
+ (and:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "const_int_operand" "i")))]
+ "IN_RANGE (exact_log2 (-INTVAL (operands[2])), 12, 31)"
+ "#"
+ "&& 1"
+ [(set (match_dup 0)
+ (lshiftrt:SI (match_dup 1)
+ (match_dup 2)))
+ (set (match_dup 0)
+ (ashift:SI (match_dup 0)
+ (match_dup 2)))]
+{
+ operands[2] = GEN_INT (floor_log2 (-INTVAL (operands[2])));
+}
+ [(set_attr "type" "arith")
+ (set_attr "mode" "SI")
+ (set_attr "length" "6")])
+
+(define_insn_and_split "*andsi3_const_shifted_mask"
+ [(set (match_operand:SI 0 "register_operand" "=a")
+ (and:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "shifted_mask_operand" "i")))]
+ "! xtensa_simm12b (INTVAL (operands[2]))"
+ "#"
+ "&& 1"
+ [(set (match_dup 0)
+ (zero_extract:SI (match_dup 1)
+ (match_dup 3)
+ (match_dup 4)))
+ (set (match_dup 0)
+ (ashift:SI (match_dup 0)
+ (match_dup 2)))]
+{
+ HOST_WIDE_INT mask = INTVAL (operands[2]);
+ int shift = ctz_hwi (mask);
+ int mask_size = floor_log2 (((uint32_t)mask >> shift) + 1);
+ int mask_pos = shift;
+ if (BITS_BIG_ENDIAN)
+ mask_pos = (32 - (mask_size + shift)) & 0x1f;
+ operands[2] = GEN_INT (shift);
+ operands[3] = GEN_INT (mask_size);
+ operands[4] = GEN_INT (mask_pos);
+}
+ [(set_attr "type" "arith")
+ (set_attr "mode" "SI")
+ (set (attr "length")
+ (if_then_else (match_test "TARGET_DENSITY
+ && ctz_hwi (INTVAL (operands[2])) == 1")
+ (const_int 5)
+ (const_int 6)))])
+
(define_insn "iorsi3"
[(set (match_operand:SI 0 "register_operand" "=a")
(ior:SI (match_operand:SI 1 "register_operand" "%r")
@@ -1649,6 +1726,108 @@
(set_attr "mode" "none")
(set_attr "length" "3")])
+(define_insn_and_split "*masktrue_const_pow2_minus_one"
+ [(set (pc)
+ (if_then_else (match_operator 3 "boolean_operator"
+ [(and:SI (match_operand:SI 0 "register_operand" "r")
+ (match_operand:SI 1 "const_int_operand" "i"))
+ (const_int 0)])
+ (label_ref (match_operand 2 "" ""))
+ (pc)))]
+ "IN_RANGE (exact_log2 (INTVAL (operands[1]) + 1), 17, 31)"
+ "#"
+ "&& can_create_pseudo_p ()"
+ [(set (match_dup 4)
+ (ashift:SI (match_dup 0)
+ (match_dup 1)))
+ (set (pc)
+ (if_then_else (match_op_dup 3
+ [(match_dup 4)
+ (const_int 0)])
+ (label_ref (match_dup 2))
+ (pc)))]
+{
+ operands[1] = GEN_INT (32 - floor_log2 (INTVAL (operands[1]) + 1));
+ operands[4] = gen_reg_rtx (SImode);
+}
+ [(set_attr "type" "jump")
+ (set_attr "mode" "none")
+ (set (attr "length")
+ (if_then_else (match_test "TARGET_DENSITY
+ && INTVAL (operands[1]) == 0x7FFFFFFF")
+ (const_int 5)
+ (const_int 6)))])
+
+(define_insn_and_split "*masktrue_const_negative_pow2"
+ [(set (pc)
+ (if_then_else (match_operator 3 "boolean_operator"
+ [(and:SI (match_operand:SI 0 "register_operand" "r")
+ (match_operand:SI 1 "const_int_operand" "i"))
+ (const_int 0)])
+ (label_ref (match_operand 2 "" ""))
+ (pc)))]
+ "IN_RANGE (exact_log2 (-INTVAL (operands[1])), 12, 30)"
+ "#"
+ "&& can_create_pseudo_p ()"
+ [(set (match_dup 4)
+ (lshiftrt:SI (match_dup 0)
+ (match_dup 1)))
+ (set (pc)
+ (if_then_else (match_op_dup 3
+ [(match_dup 4)
+ (const_int 0)])
+ (label_ref (match_dup 2))
+ (pc)))]
+{
+ operands[1] = GEN_INT (floor_log2 (-INTVAL (operands[1])));
+ operands[4] = gen_reg_rtx (SImode);
+}
+ [(set_attr "type" "jump")
+ (set_attr "mode" "none")
+ (set_attr "length" "6")])
+
+(define_insn_and_split "*masktrue_const_shifted_mask"
+ [(set (pc)
+ (if_then_else (match_operator 4 "boolean_operator"
+ [(and:SI (match_operand:SI 0 "register_operand" "r")
+ (match_operand:SI 1 "shifted_mask_operand" "i"))
+ (match_operand:SI 2 "const_int_operand" "i")])
+ (label_ref (match_operand 3 "" ""))
+ (pc)))]
+ "(INTVAL (operands[2]) & ((1 << ctz_hwi (INTVAL (operands[1]))) - 1)) == 0
+ && xtensa_b4const_or_zero ((uint32_t)INTVAL (operands[2]) >> ctz_hwi (INTVAL (operands[1])))"
+ "#"
+ "&& can_create_pseudo_p ()"
+ [(set (match_dup 6)
+ (zero_extract:SI (match_dup 0)
+ (match_dup 5)
+ (match_dup 1)))
+ (set (pc)
+ (if_then_else (match_op_dup 4
+ [(match_dup 6)
+ (match_dup 2)])
+ (label_ref (match_dup 3))
+ (pc)))]
+{
+ HOST_WIDE_INT mask = INTVAL (operands[1]);
+ int shift = ctz_hwi (mask);
+ int mask_size = floor_log2 (((uint32_t)mask >> shift) + 1);
+ int mask_pos = shift;
+ if (BITS_BIG_ENDIAN)
+ mask_pos = (32 - (mask_size + shift)) & 0x1f;
+ operands[1] = GEN_INT (mask_pos);
+ operands[2] = GEN_INT ((uint32_t)INTVAL (operands[2]) >> shift);
+ operands[5] = GEN_INT (mask_size);
+ operands[6] = gen_reg_rtx (SImode);
+}
+ [(set_attr "type" "jump")
+ (set_attr "mode" "none")
+ (set (attr "length")
+ (if_then_else (match_test "TARGET_DENSITY
+ && (uint32_t)INTVAL (operands[2]) >> ctz_hwi (INTVAL (operands[1])) == 0")
+ (const_int 5)
+ (const_int 6)))])
+
;; Zero-overhead looping support.
--
2.20.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants
2022-06-13 16:28 ` [PATCH v2 " Takayuki 'January June' Suwa
@ 2022-06-14 0:33 ` Max Filippov
0 siblings, 0 replies; 4+ messages in thread
From: Max Filippov @ 2022-06-14 0:33 UTC (permalink / raw)
To: Takayuki 'January June' Suwa; +Cc: GCC Patches
On Mon, Jun 13, 2022 at 9:39 AM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
>
> On 2022/06/13 12:49, Max Filippov wrote:
> > Hi Suwa-san,
> hi!
>
> > This change produces a bunch of regression test failures in big-endian
> > configuration:
> bad news X(
> that point is what i was a little worried about...
>
> > E.g. for the test gcc.c-torture/execute/struct-ini-2.c
> > the following assembly code is generated now:
> > and the following code was generated before this change:
> - .literal .LC1, -4096
> - l32r a10, .LC1
> - and a10, a8, a10
> + extui a10, a8, 16, 4 // wrong! must be 12, 4
> + slli a10, a10, 12
> and of course, '(zero_extract)' is endianness-sensitive.
> (ref. 14.11 Bit-Fields, gcc-internals)
>
> the all patches that i previouly posted do not match or emit '(zero_extract)', except for this case.
>
> ===
> This patch offers several insn-and-split patterns for bitwise AND with
> register and constant that can be represented as:
>
> i. 1's least significant N bits and the others 0's (17 <= N <= 31)
> ii. 1's most significant N bits and the others 0's (12 <= N <= 31)
> iii. M 1's sequence of bits and trailing N 0's bits, that cannot fit into a
> "MOVI Ax, simm12" instruction (1 <= M <= 16, 1 <= N <= 30)
>
> And also offers shortcuts for conditional branch if each of the abovementioned
> operations is (not) equal to zero.
>
> gcc/ChangeLog:
>
> * config/xtensa/predicates.md (shifted_mask_operand):
> New predicate.
> * config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one):
> New insn-and-split pattern.
> (*andsi3_const_negative_pow2, *andsi3_const_shifted_mask,
> *masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2,
> *masktrue_const_shifted_mask): Ditto.
> ---
> gcc/config/xtensa/predicates.md | 10 ++
> gcc/config/xtensa/xtensa.md | 179 ++++++++++++++++++++++++++++++++
> 2 files changed, 189 insertions(+)
Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.
--
Thanks.
-- Max
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-06-14 0:34 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-12 6:41 [PATCH 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants Takayuki 'January June' Suwa
2022-06-13 3:49 ` Max Filippov
2022-06-13 16:28 ` [PATCH v2 " Takayuki 'January June' Suwa
2022-06-14 0:33 ` Max Filippov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).