* [PATCH 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants @ 2022-06-12 6:41 Takayuki 'January June' Suwa 2022-06-13 3:49 ` Max Filippov 0 siblings, 1 reply; 4+ messages in thread From: Takayuki 'January June' Suwa @ 2022-06-12 6:41 UTC (permalink / raw) To: GCC Patches This patch offers several insn-and-split patterns for bitwise AND with register and constant that cannot fit into a "MOVI Ax, simm12" instruction, but can be represented as: i. 1's least significant N bits and the others 0's (17 <= N <= 31) ii. 1's most significant N bits and the others 0's (12 <= N <= 31) iii. M 1's sequence of bits and trailing N 0's bits (1 <= M <= 16, 1 <= N <= 30) And also offers shortcuts for conditional branch if each of the abovementioned operations is (not) equal to zero. gcc/ChangeLog: * config/xtensa/predicates.md (shifted_mask_operand): New predicate. * config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one): New insn-and-split pattern. (*andsi3_const_negative_pow2, *andsi3_const_shifted_mask, *masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2, *masktrue_const_shifted_mask): Ditto. --- gcc/config/xtensa/predicates.md | 11 +++ gcc/config/xtensa/xtensa.md | 165 ++++++++++++++++++++++++++++++++ 2 files changed, 176 insertions(+) diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md index bcc83ada0ae..24c77f343a0 100644 --- a/gcc/config/xtensa/predicates.md +++ b/gcc/config/xtensa/predicates.md @@ -52,6 +52,17 @@ (match_test "xtensa_mask_immediate (INTVAL (op))")) (match_operand 0 "register_operand"))) +(define_predicate "shifted_mask_operand" + (and (match_code "const_int") + (match_test "!xtensa_simm12b (INTVAL (op))")) +{ + HOST_WIDE_INT mask = INTVAL (op); + int shift = ctz_hwi (mask); + + return IN_RANGE (shift, 1, 31) + && xtensa_mask_immediate ((uint32_t)mask >> shift); +}) + (define_predicate "extui_fldsz_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 1, 16)"))) diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index 090a2939684..286a1d8c38e 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -645,6 +645,78 @@ (set_attr "mode" "SI") (set_attr "length" "6")]) +(define_insn_and_split "*andsi3_const_pow2_minus_one" + [(set (match_operand:SI 0 "register_operand" "=a") + (and:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "const_int_operand" "i")))] + "IN_RANGE (exact_log2 (INTVAL (operands[2]) + 1), 17, 31)" + "#" + "&& 1" + [(set (match_dup 0) + (ashift:SI (match_dup 1) + (match_dup 2))) + (set (match_dup 0) + (lshiftrt:SI (match_dup 0) + (match_dup 2)))] +{ + operands[2] = GEN_INT (32 - floor_log2 (INTVAL (operands[2]) + 1)); +} + [(set_attr "type" "arith") + (set_attr "mode" "SI") + (set (attr "length") + (if_then_else (match_test "TARGET_DENSITY + && INTVAL (operands[2]) == 0x7FFFFFFF") + (const_int 5) + (const_int 6)))]) + +(define_insn_and_split "*andsi3_const_negative_pow2" + [(set (match_operand:SI 0 "register_operand" "=a") + (and:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "const_int_operand" "i")))] + "IN_RANGE (exact_log2 (-INTVAL (operands[2])), 12, 31)" + "#" + "&& 1" + [(set (match_dup 0) + (lshiftrt:SI (match_dup 1) + (match_dup 2))) + (set (match_dup 0) + (ashift:SI (match_dup 0) + (match_dup 2)))] +{ + operands[2] = GEN_INT (floor_log2 (-INTVAL (operands[2]))); +} + [(set_attr "type" "arith") + (set_attr "mode" "SI") + (set_attr "length" "6")]) + +(define_insn_and_split "*andsi3_const_shifted_mask" + [(set (match_operand:SI 0 "register_operand" "=a") + (and:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "shifted_mask_operand" "i")))] + "" + "#" + "" + [(set (match_dup 0) + (zero_extract:SI (match_dup 1) + (match_dup 3) + (match_dup 2))) + (set (match_dup 0) + (ashift:SI (match_dup 0) + (match_dup 2)))] +{ + HOST_WIDE_INT mask = INTVAL (operands[2]); + int shift = ctz_hwi (mask); + operands[2] = GEN_INT (shift); + operands[3] = GEN_INT (floor_log2 (((uint32_t)mask >> shift) + 1)); +} + [(set_attr "type" "arith") + (set_attr "mode" "SI") + (set (attr "length") + (if_then_else (match_test "TARGET_DENSITY + && ctz_hwi (INTVAL (operands[2])) == 1") + (const_int 5) + (const_int 6)))]) + (define_insn "iorsi3" [(set (match_operand:SI 0 "register_operand" "=a") (ior:SI (match_operand:SI 1 "register_operand" "%r") @@ -1648,6 +1720,99 @@ (set_attr "mode" "none") (set_attr "length" "3")]) +(define_insn_and_split "*masktrue_const_pow2_minus_one" + [(set (pc) + (if_then_else (match_operator 4 "boolean_operator" + [(zero_extract:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "const_int_operand" "i") + (const_int 0)) + (const_int 0)]) + (label_ref (match_operand 3 "" "")) + (pc))) + (clobber (match_scratch:SI 0 "=&a"))] + "IN_RANGE (INTVAL (operands[2]), 17, 31)" + "#" + "&& reload_completed" + [(set (match_dup 0) + (ashift:SI (match_dup 1) + (match_dup 2))) + (set (pc) + (if_then_else (match_op_dup 4 + [(match_dup 0) + (const_int 0)]) + (label_ref (match_dup 3)) + (pc)))] +{ + operands[2] = GEN_INT (32 - INTVAL (operands[2])); +} + [(set_attr "type" "jump") + (set_attr "mode" "none") + (set (attr "length") + (if_then_else (match_test "TARGET_DENSITY + && INTVAL (operands[2]) == 31") + (const_int 5) + (const_int 6)))]) + +(define_insn_and_split "*masktrue_const_negative_pow2" + [(set (pc) + (if_then_else (match_operator 4 "boolean_operator" + [(and:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "const_int_operand" "i")) + (const_int 0)]) + (label_ref (match_operand 3 "" "")) + (pc))) + (clobber (match_scratch:SI 0 "=&a"))] + "IN_RANGE (exact_log2 (-INTVAL (operands[2])), 12, 30)" + "#" + "&& reload_completed" + [(set (match_dup 0) + (lshiftrt:SI (match_dup 1) + (match_dup 2))) + (set (pc) + (if_then_else (match_op_dup 4 + [(match_dup 0) + (const_int 0)]) + (label_ref (match_dup 3)) + (pc)))] +{ + operands[2] = GEN_INT (floor_log2 (-INTVAL (operands[2]))); +} + [(set_attr "type" "jump") + (set_attr "mode" "none") + (set_attr "length" "6")]) + +(define_insn_and_split "*masktrue_const_shifted_mask" + [(set (pc) + (if_then_else (match_operator 4 "boolean_operator" + [(and:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "shifted_mask_operand" "i")) + (const_int 0)]) + (label_ref (match_operand 3 "" "")) + (pc))) + (clobber (match_scratch:SI 0 "=&a"))] + "" + "#" + "reload_completed" + [(set (match_dup 0) + (zero_extract:SI (match_dup 1) + (match_dup 5) + (match_dup 2))) + (set (pc) + (if_then_else (match_op_dup 4 + [(match_dup 0) + (const_int 0)]) + (label_ref (match_dup 3)) + (pc)))] +{ + HOST_WIDE_INT mask = INTVAL (operands[2]); + int shift = ctz_hwi (mask); + operands[2] = GEN_INT (shift); + operands[5] = GEN_INT (floor_log2 (((uint32_t)mask >> shift) + 1)); +} + [(set_attr "type" "jump") + (set_attr "mode" "none") + (set_attr "length" "6")]) + ;; Zero-overhead looping support. -- 2.20.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants 2022-06-12 6:41 [PATCH 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants Takayuki 'January June' Suwa @ 2022-06-13 3:49 ` Max Filippov 2022-06-13 16:28 ` [PATCH v2 " Takayuki 'January June' Suwa 0 siblings, 1 reply; 4+ messages in thread From: Max Filippov @ 2022-06-13 3:49 UTC (permalink / raw) To: Takayuki 'January June' Suwa; +Cc: GCC Patches Hi Suwa-san, On Sat, Jun 11, 2022 at 11:43 PM Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp> wrote: > > This patch offers several insn-and-split patterns for bitwise AND with > register and constant that cannot fit into a "MOVI Ax, simm12" instruction, > but can be represented as: > > i. 1's least significant N bits and the others 0's (17 <= N <= 31) > ii. 1's most significant N bits and the others 0's (12 <= N <= 31) > iii. M 1's sequence of bits and trailing N 0's bits > (1 <= M <= 16, 1 <= N <= 30) > > And also offers shortcuts for conditional branch if each of the > abovementioned > operations is (not) equal to zero. > > gcc/ChangeLog: > > * config/xtensa/predicates.md (shifted_mask_operand): > New predicate. > * config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one): > New insn-and-split pattern. > (*andsi3_const_negative_pow2, *andsi3_const_shifted_mask, > *masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2, > *masktrue_const_shifted_mask): Ditto. > --- > gcc/config/xtensa/predicates.md | 11 +++ > gcc/config/xtensa/xtensa.md | 165 ++++++++++++++++++++++++++++++++ > 2 files changed, 176 insertions(+) This change produces a bunch of regression test failures in big-endian configuration: FAIL: gcc.c-torture/execute/20020108-1.c -O1 execution test FAIL: gcc.c-torture/execute/20020108-1.c -O2 execution test FAIL: gcc.c-torture/execute/20020108-1.c -O3 -g execution test FAIL: gcc.c-torture/execute/20020108-1.c -Os execution test FAIL: gcc.c-torture/execute/20020108-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/20040629-1.c -O1 execution test FAIL: gcc.c-torture/execute/20040629-1.c -O2 execution test FAIL: gcc.c-torture/execute/20040629-1.c -O3 -g execution test FAIL: gcc.c-torture/execute/20040629-1.c -Os execution test FAIL: gcc.c-torture/execute/20040629-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/20040705-1.c -O1 execution test FAIL: gcc.c-torture/execute/20040705-1.c -O2 execution test FAIL: gcc.c-torture/execute/20040705-1.c -O3 -g execution test FAIL: gcc.c-torture/execute/20040705-1.c -Os execution test FAIL: gcc.c-torture/execute/20040705-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/20040705-2.c -O1 execution test FAIL: gcc.c-torture/execute/20040705-2.c -O2 execution test FAIL: gcc.c-torture/execute/20040705-2.c -O3 -g execution test FAIL: gcc.c-torture/execute/20040705-2.c -Os execution test FAIL: gcc.c-torture/execute/20040705-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/20040709-1.c -O1 execution test FAIL: gcc.c-torture/execute/20040709-1.c -O2 execution test FAIL: gcc.c-torture/execute/20040709-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.c-torture/execute/20040709-1.c -O3 -g execution test FAIL: gcc.c-torture/execute/20040709-1.c -Os execution test FAIL: gcc.c-torture/execute/20040709-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/20040709-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.c-torture/execute/20180921-1.c -O1 execution test FAIL: gcc.c-torture/execute/20180921-1.c -O2 execution test FAIL: gcc.c-torture/execute/20180921-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.c-torture/execute/20180921-1.c -O3 -g execution test FAIL: gcc.c-torture/execute/20180921-1.c -Os execution test FAIL: gcc.c-torture/execute/20180921-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/20180921-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.c-torture/execute/pr60454.c -O1 execution test FAIL: gcc.c-torture/execute/pr60454.c -O2 execution test FAIL: gcc.c-torture/execute/pr60454.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr60454.c -Os execution test FAIL: gcc.c-torture/execute/pr60454.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr61306-2.c -O1 execution test FAIL: gcc.c-torture/execute/pr64718.c -O1 execution test FAIL: gcc.c-torture/execute/pr65215-1.c -O1 execution test FAIL: gcc.c-torture/execute/pr65215-1.c -O2 execution test FAIL: gcc.c-torture/execute/pr65215-1.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr65215-1.c -Os execution test FAIL: gcc.c-torture/execute/pr65215-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr65215-1.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.c-torture/execute/pr65215-3.c -O1 execution test FAIL: gcc.c-torture/execute/pr65215-4.c -O1 execution test FAIL: gcc.c-torture/execute/pr65215-4.c -O2 execution test FAIL: gcc.c-torture/execute/pr65215-4.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr65215-4.c -Os execution test FAIL: gcc.c-torture/execute/pr65215-4.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr65215-4.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.c-torture/execute/pr79388.c -O1 execution test FAIL: gcc.c-torture/execute/pr79388.c -O2 execution test FAIL: gcc.c-torture/execute/pr79388.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr79388.c -Os execution test FAIL: gcc.c-torture/execute/pr79388.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr93908.c -O1 execution test FAIL: gcc.c-torture/execute/pr93908.c -O2 execution test FAIL: gcc.c-torture/execute/pr93908.c -O3 -g execution test FAIL: gcc.c-torture/execute/pr93908.c -Os execution test FAIL: gcc.c-torture/execute/pr93908.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.c-torture/execute/pr93908.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.c-torture/execute/struct-ini-2.c -O1 execution test FAIL: gcc.c-torture/execute/struct-ini-2.c -O2 execution test FAIL: gcc.c-torture/execute/struct-ini-2.c -O3 -g execution test FAIL: gcc.c-torture/execute/struct-ini-2.c -Os execution test FAIL: gcc.c-torture/execute/struct-ini-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O1 execution test FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O2 execution test FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions execution test FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O3 -g execution test FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -Os execution test FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/atomic/c11-atomic-exec-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/20050826-1.c execution test FAIL: gcc.dg/sso/s3.c -Wno-scalar-storage-order -O1 -fno-inline output pattern test FAIL: gcc.dg/sso/t2.c -Wno-scalar-storage-order -O1 -fno-inline output pattern test FAIL: gcc.dg/sso/t2.c -Wno-scalar-storage-order -O2 output pattern test FAIL: gcc.dg/sso/t2.c -Wno-scalar-storage-order -O3 -finline-functions output pattern test FAIL: gcc.dg/sso/t2.c -Wno-scalar-storage-order -Os output pattern test FAIL: gcc.dg/sso/t2.c -Wno-scalar-storage-order -Og -g output pattern test FAIL: gcc.dg/torture/pr30665-2.c -O2 execution test FAIL: gcc.dg/torture/pr30665-2.c -O3 -g execution test FAIL: gcc.dg/torture/pr30665-2.c -Os execution test FAIL: gcc.dg/torture/pr30665-2.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test FAIL: gcc.dg/torture/pr30665-2.c -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects execution test FAIL: gcc.dg/torture/pr69714.c -O1 execution test FAIL: gcc.dg/torture/vshuf-v8qi.c -O2 execution test FAIL: gcc.dg/tree-ssa/pr80803.c execution test FAIL: gcc.dg/tree-ssa/pr80898-2.c execution test E.g. for the test gcc.c-torture/execute/struct-ini-2.c the following assembly code is generated now: .file "struct-ini-2.c" .text .literal_position .literal .LC0, x .literal .LC2, 8192 .literal .LC3, abort@PLT .literal .LC4, exit@PLT .align 4 .global main .type main, @function main: entry sp, 32 l32r a8, .LC0 l16ui a8, a8, 0 l32r a9, .LC2 extui a10, a8, 16, 4 slli a10, a10, 12 extui a9, a9, 0, 16 beq a10, a9, .L2 l32r a8, .LC3 callx8 a8 .L2: movi a9, 0xf0 and a9, a8, a9 movi.n a10, 0x30 beq a9, a10, .L3 l32r a8, .LC3 callx8 a8 .L3: extui a8, a8, 0, 4 beqi a8, 4, .L4 l32r a8, .LC3 callx8 a8 .L4: movi.n a10, 0 l32r a8, .LC4 callx8 a8 .size main, .-main .global x .data .align 4 .type x, @object .size x, 4 x: .byte 32 .byte 52 .zero 2 .ident "GCC: (GNU) 13.0.0 20220612 (experimental)" and the following code was generated before this change: .file "struct-ini-2.c" .text .literal_position .literal .LC0, x .literal .LC1, -4096 .literal .LC2, 8192 .literal .LC3, abort@PLT .literal .LC4, exit@PLT .align 4 .global main .type main, @function main: entry sp, 32 l32r a8, .LC0 l16ui a8, a8, 0 l32r a9, .LC2 l32r a10, .LC1 and a10, a8, a10 extui a9, a9, 0, 16 beq a10, a9, .L2 l32r a8, .LC3 callx8 a8 .L2: movi a9, 0xf0 and a9, a8, a9 movi.n a10, 0x30 beq a9, a10, .L3 l32r a8, .LC3 callx8 a8 .L3: extui a8, a8, 0, 4 beqi a8, 4, .L4 l32r a8, .LC3 callx8 a8 .L4: movi.n a10, 0 l32r a8, .LC4 callx8 a8 .size main, .-main .global x .data .align 4 .type x, @object .size x, 4 x: .byte 32 .byte 52 .zero 2 .ident "GCC: (GNU) 13.0.0 20220612 (experimental)" -- Thanks. -- Max ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants 2022-06-13 3:49 ` Max Filippov @ 2022-06-13 16:28 ` Takayuki 'January June' Suwa 2022-06-14 0:33 ` Max Filippov 0 siblings, 1 reply; 4+ messages in thread From: Takayuki 'January June' Suwa @ 2022-06-13 16:28 UTC (permalink / raw) To: Max Filippov; +Cc: GCC Patches On 2022/06/13 12:49, Max Filippov wrote: > Hi Suwa-san, hi! > This change produces a bunch of regression test failures in big-endian > configuration: bad news X( that point is what i was a little worried about... > E.g. for the test gcc.c-torture/execute/struct-ini-2.c > the following assembly code is generated now: > and the following code was generated before this change: - .literal .LC1, -4096 - l32r a10, .LC1 - and a10, a8, a10 + extui a10, a8, 16, 4 // wrong! must be 12, 4 + slli a10, a10, 12 and of course, '(zero_extract)' is endianness-sensitive. (ref. 14.11 Bit-Fields, gcc-internals) the all patches that i previouly posted do not match or emit '(zero_extract)', except for this case. === This patch offers several insn-and-split patterns for bitwise AND with register and constant that can be represented as: i. 1's least significant N bits and the others 0's (17 <= N <= 31) ii. 1's most significant N bits and the others 0's (12 <= N <= 31) iii. M 1's sequence of bits and trailing N 0's bits, that cannot fit into a "MOVI Ax, simm12" instruction (1 <= M <= 16, 1 <= N <= 30) And also offers shortcuts for conditional branch if each of the abovementioned operations is (not) equal to zero. gcc/ChangeLog: * config/xtensa/predicates.md (shifted_mask_operand): New predicate. * config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one): New insn-and-split pattern. (*andsi3_const_negative_pow2, *andsi3_const_shifted_mask, *masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2, *masktrue_const_shifted_mask): Ditto. --- gcc/config/xtensa/predicates.md | 10 ++ gcc/config/xtensa/xtensa.md | 179 ++++++++++++++++++++++++++++++++ 2 files changed, 189 insertions(+) diff --git a/gcc/config/xtensa/predicates.md b/gcc/config/xtensa/predicates.md index bcc83ada0ae..d63a6cf034c 100644 --- a/gcc/config/xtensa/predicates.md +++ b/gcc/config/xtensa/predicates.md @@ -52,6 +52,16 @@ (match_test "xtensa_mask_immediate (INTVAL (op))")) (match_operand 0 "register_operand"))) +(define_predicate "shifted_mask_operand" + (match_code "const_int") +{ + HOST_WIDE_INT mask = INTVAL (op); + int shift = ctz_hwi (mask); + + return IN_RANGE (shift, 1, 31) + && xtensa_mask_immediate ((uint32_t)mask >> shift); +}) + (define_predicate "extui_fldsz_operand" (and (match_code "const_int") (match_test "IN_RANGE (INTVAL (op), 1, 16)"))) diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index a4477e2207e..5d0f346b01a 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -645,6 +645,83 @@ (set_attr "mode" "SI") (set_attr "length" "6")]) +(define_insn_and_split "*andsi3_const_pow2_minus_one" + [(set (match_operand:SI 0 "register_operand" "=a") + (and:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "const_int_operand" "i")))] + "IN_RANGE (exact_log2 (INTVAL (operands[2]) + 1), 17, 31)" + "#" + "&& 1" + [(set (match_dup 0) + (ashift:SI (match_dup 1) + (match_dup 2))) + (set (match_dup 0) + (lshiftrt:SI (match_dup 0) + (match_dup 2)))] +{ + operands[2] = GEN_INT (32 - floor_log2 (INTVAL (operands[2]) + 1)); +} + [(set_attr "type" "arith") + (set_attr "mode" "SI") + (set (attr "length") + (if_then_else (match_test "TARGET_DENSITY + && INTVAL (operands[2]) == 0x7FFFFFFF") + (const_int 5) + (const_int 6)))]) + +(define_insn_and_split "*andsi3_const_negative_pow2" + [(set (match_operand:SI 0 "register_operand" "=a") + (and:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "const_int_operand" "i")))] + "IN_RANGE (exact_log2 (-INTVAL (operands[2])), 12, 31)" + "#" + "&& 1" + [(set (match_dup 0) + (lshiftrt:SI (match_dup 1) + (match_dup 2))) + (set (match_dup 0) + (ashift:SI (match_dup 0) + (match_dup 2)))] +{ + operands[2] = GEN_INT (floor_log2 (-INTVAL (operands[2]))); +} + [(set_attr "type" "arith") + (set_attr "mode" "SI") + (set_attr "length" "6")]) + +(define_insn_and_split "*andsi3_const_shifted_mask" + [(set (match_operand:SI 0 "register_operand" "=a") + (and:SI (match_operand:SI 1 "register_operand" "r") + (match_operand:SI 2 "shifted_mask_operand" "i")))] + "! xtensa_simm12b (INTVAL (operands[2]))" + "#" + "&& 1" + [(set (match_dup 0) + (zero_extract:SI (match_dup 1) + (match_dup 3) + (match_dup 4))) + (set (match_dup 0) + (ashift:SI (match_dup 0) + (match_dup 2)))] +{ + HOST_WIDE_INT mask = INTVAL (operands[2]); + int shift = ctz_hwi (mask); + int mask_size = floor_log2 (((uint32_t)mask >> shift) + 1); + int mask_pos = shift; + if (BITS_BIG_ENDIAN) + mask_pos = (32 - (mask_size + shift)) & 0x1f; + operands[2] = GEN_INT (shift); + operands[3] = GEN_INT (mask_size); + operands[4] = GEN_INT (mask_pos); +} + [(set_attr "type" "arith") + (set_attr "mode" "SI") + (set (attr "length") + (if_then_else (match_test "TARGET_DENSITY + && ctz_hwi (INTVAL (operands[2])) == 1") + (const_int 5) + (const_int 6)))]) + (define_insn "iorsi3" [(set (match_operand:SI 0 "register_operand" "=a") (ior:SI (match_operand:SI 1 "register_operand" "%r") @@ -1649,6 +1726,108 @@ (set_attr "mode" "none") (set_attr "length" "3")]) +(define_insn_and_split "*masktrue_const_pow2_minus_one" + [(set (pc) + (if_then_else (match_operator 3 "boolean_operator" + [(and:SI (match_operand:SI 0 "register_operand" "r") + (match_operand:SI 1 "const_int_operand" "i")) + (const_int 0)]) + (label_ref (match_operand 2 "" "")) + (pc)))] + "IN_RANGE (exact_log2 (INTVAL (operands[1]) + 1), 17, 31)" + "#" + "&& can_create_pseudo_p ()" + [(set (match_dup 4) + (ashift:SI (match_dup 0) + (match_dup 1))) + (set (pc) + (if_then_else (match_op_dup 3 + [(match_dup 4) + (const_int 0)]) + (label_ref (match_dup 2)) + (pc)))] +{ + operands[1] = GEN_INT (32 - floor_log2 (INTVAL (operands[1]) + 1)); + operands[4] = gen_reg_rtx (SImode); +} + [(set_attr "type" "jump") + (set_attr "mode" "none") + (set (attr "length") + (if_then_else (match_test "TARGET_DENSITY + && INTVAL (operands[1]) == 0x7FFFFFFF") + (const_int 5) + (const_int 6)))]) + +(define_insn_and_split "*masktrue_const_negative_pow2" + [(set (pc) + (if_then_else (match_operator 3 "boolean_operator" + [(and:SI (match_operand:SI 0 "register_operand" "r") + (match_operand:SI 1 "const_int_operand" "i")) + (const_int 0)]) + (label_ref (match_operand 2 "" "")) + (pc)))] + "IN_RANGE (exact_log2 (-INTVAL (operands[1])), 12, 30)" + "#" + "&& can_create_pseudo_p ()" + [(set (match_dup 4) + (lshiftrt:SI (match_dup 0) + (match_dup 1))) + (set (pc) + (if_then_else (match_op_dup 3 + [(match_dup 4) + (const_int 0)]) + (label_ref (match_dup 2)) + (pc)))] +{ + operands[1] = GEN_INT (floor_log2 (-INTVAL (operands[1]))); + operands[4] = gen_reg_rtx (SImode); +} + [(set_attr "type" "jump") + (set_attr "mode" "none") + (set_attr "length" "6")]) + +(define_insn_and_split "*masktrue_const_shifted_mask" + [(set (pc) + (if_then_else (match_operator 4 "boolean_operator" + [(and:SI (match_operand:SI 0 "register_operand" "r") + (match_operand:SI 1 "shifted_mask_operand" "i")) + (match_operand:SI 2 "const_int_operand" "i")]) + (label_ref (match_operand 3 "" "")) + (pc)))] + "(INTVAL (operands[2]) & ((1 << ctz_hwi (INTVAL (operands[1]))) - 1)) == 0 + && xtensa_b4const_or_zero ((uint32_t)INTVAL (operands[2]) >> ctz_hwi (INTVAL (operands[1])))" + "#" + "&& can_create_pseudo_p ()" + [(set (match_dup 6) + (zero_extract:SI (match_dup 0) + (match_dup 5) + (match_dup 1))) + (set (pc) + (if_then_else (match_op_dup 4 + [(match_dup 6) + (match_dup 2)]) + (label_ref (match_dup 3)) + (pc)))] +{ + HOST_WIDE_INT mask = INTVAL (operands[1]); + int shift = ctz_hwi (mask); + int mask_size = floor_log2 (((uint32_t)mask >> shift) + 1); + int mask_pos = shift; + if (BITS_BIG_ENDIAN) + mask_pos = (32 - (mask_size + shift)) & 0x1f; + operands[1] = GEN_INT (mask_pos); + operands[2] = GEN_INT ((uint32_t)INTVAL (operands[2]) >> shift); + operands[5] = GEN_INT (mask_size); + operands[6] = gen_reg_rtx (SImode); +} + [(set_attr "type" "jump") + (set_attr "mode" "none") + (set (attr "length") + (if_then_else (match_test "TARGET_DENSITY + && (uint32_t)INTVAL (operands[2]) >> ctz_hwi (INTVAL (operands[1])) == 0") + (const_int 5) + (const_int 6)))]) + ;; Zero-overhead looping support. -- 2.20.1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants 2022-06-13 16:28 ` [PATCH v2 " Takayuki 'January June' Suwa @ 2022-06-14 0:33 ` Max Filippov 0 siblings, 0 replies; 4+ messages in thread From: Max Filippov @ 2022-06-14 0:33 UTC (permalink / raw) To: Takayuki 'January June' Suwa; +Cc: GCC Patches On Mon, Jun 13, 2022 at 9:39 AM Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp> wrote: > > On 2022/06/13 12:49, Max Filippov wrote: > > Hi Suwa-san, > hi! > > > This change produces a bunch of regression test failures in big-endian > > configuration: > bad news X( > that point is what i was a little worried about... > > > E.g. for the test gcc.c-torture/execute/struct-ini-2.c > > the following assembly code is generated now: > > and the following code was generated before this change: > - .literal .LC1, -4096 > - l32r a10, .LC1 > - and a10, a8, a10 > + extui a10, a8, 16, 4 // wrong! must be 12, 4 > + slli a10, a10, 12 > and of course, '(zero_extract)' is endianness-sensitive. > (ref. 14.11 Bit-Fields, gcc-internals) > > the all patches that i previouly posted do not match or emit '(zero_extract)', except for this case. > > === > This patch offers several insn-and-split patterns for bitwise AND with > register and constant that can be represented as: > > i. 1's least significant N bits and the others 0's (17 <= N <= 31) > ii. 1's most significant N bits and the others 0's (12 <= N <= 31) > iii. M 1's sequence of bits and trailing N 0's bits, that cannot fit into a > "MOVI Ax, simm12" instruction (1 <= M <= 16, 1 <= N <= 30) > > And also offers shortcuts for conditional branch if each of the abovementioned > operations is (not) equal to zero. > > gcc/ChangeLog: > > * config/xtensa/predicates.md (shifted_mask_operand): > New predicate. > * config/xtensa/xtensa.md (*andsi3_const_pow2_minus_one): > New insn-and-split pattern. > (*andsi3_const_negative_pow2, *andsi3_const_shifted_mask, > *masktrue_const_pow2_minus_one, *masktrue_const_negative_pow2, > *masktrue_const_shifted_mask): Ditto. > --- > gcc/config/xtensa/predicates.md | 10 ++ > gcc/config/xtensa/xtensa.md | 179 ++++++++++++++++++++++++++++++++ > 2 files changed, 189 insertions(+) Regtested for target=xtensa-linux-uclibc, no new regressions. Committed to master. -- Thanks. -- Max ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-06-14 0:34 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-06-12 6:41 [PATCH 4/4] xtensa: Optimize bitwise AND operation with some specific forms of constants Takayuki 'January June' Suwa 2022-06-13 3:49 ` Max Filippov 2022-06-13 16:28 ` [PATCH v2 " Takayuki 'January June' Suwa 2022-06-14 0:33 ` Max Filippov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).