From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from nh602-vm5.bullet.mail.ssk.yahoo.co.jp (nh602-vm5.bullet.mail.ssk.yahoo.co.jp [182.22.90.30]) by sourceware.org (Postfix) with SMTP id 417E5385840C for ; Mon, 23 May 2022 15:52:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 417E5385840C Received: from [182.22.66.105] by nh602.bullet.mail.ssk.yahoo.co.jp with NNFMP; 23 May 2022 15:52:46 -0000 Received: from [182.22.91.128] by t603.bullet.mail.ssk.yahoo.co.jp with NNFMP; 23 May 2022 15:52:46 -0000 Received: from [127.0.0.1] by omp601.mail.ssk.yahoo.co.jp with NNFMP; 23 May 2022 15:52:46 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 119969.11824.bm@omp601.mail.ssk.yahoo.co.jp Received: (qmail 87518 invoked by alias); 23 May 2022 15:52:46 -0000 Received: from unknown (HELO ?192.168.2.3?) (175.177.45.174 with ) by smtp6009.mail.ssk.ynwp.yahoo.co.jp with SMTP; 23 May 2022 15:52:45 -0000 X-YMail-JAS: yo0W6QwVM1kZoztbbfIhyg.c1LkGW5qrOWrzqF_EWIwHHIdJ1oFzXmxgMWUG04G6d_YlyTmiguDnvwr0.lCwJsaNrGJ4rMawDzA3yC5WgXt71w3WAKCgOEWQ8rMD5sgsvQkI3pwUpA-- X-Apparently-From: X-YMail-OSG: riyZPx0VM1lpCWBP3AISAW5jkcSyP.abqkU4eMXoCokgaDa x4TpSmj2y5yzuP4cog2AZdsAjVxAT8b83YoMKUrQTYLRwcrTWPwS7Z6kRxbC X_mpJ7Oe_VnvMjE12MRvX14kPYDLoZHjNlGep4P2bWas9F7k4Tpeav6i8iJ7 Ybc7kUw5s.SNM9qujAppRg259mI7ofr01F360cOfo9CVUjZECPx2aETZVFwe dgN3MtrBj8Wq50NzccYO7DyjTo4SerBMAHfzIdCISupNBL0nEEpeuJyCWGn1 NPzaOrJapUsjAZ3V_52XzBWXHAusZz6RqDmEpzy59f3N13eEJkSRrKHblkmO yMqTnbmvKcS2NnuhtHLe.AIzIfgxWzIDs5WxrOcJhOzSD2XuEs7hsS_asYrK C3NA8RNhk_5WLTrUNhQ1XoQdcqb9bqqaOlnTvWenRaQW2JTjwAxTCWo_is4X XOn28yEwaluNqDRdHt4E0PvWJ2Cs132a2i4SvRdeICSLw4rdg9kc35KEkDS1 Nkt_4e3G3NDZjRedNH8w_OS_LT20VT0ojbvkwLiASIQpyURQRwIRhnH0XqVB J7hotWGjfUUt3O6JmcaXhXz1sPlJyJFun1CoxGApbCDftVpJkks.UeXS0k8j Hr0vj_NkIN_YSV7KJAFHbPCBvJvvac273Kofn1T9Z3bArYo_M.NbkmMtevJu VX5xDoE298MVqLGn3AbhJhz3IDece7EIJrmwl7AEH.kJjEo35fCzcRigfIXt oA9X9iEaYMeQ80sMeF695PfKXrucAufNdw4s.a958Ettv5u0MM5CQW9pjENJ fiWisMJxVgJ_OZWbihRRetWbHpkzeG_6XKymUud4auxWvpI073lAb90uZz2X 0lQjQo3P6KWXKxory.dPm3H.52VFcQH0wDkLIMmFgCqZjG9E_xiEdDkaIONO 0uCOXpFFvLkqX9_vTWQ-- Message-ID: <87b80e93-0031-d847-9120-ceccd79c1a37@yahoo.co.jp> Date: Tue, 24 May 2022 00:52:44 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Content-Language: en-US To: gcc-patches@gcc.gnu.org From: Takayuki 'January June' Suwa Subject: [PATCH v3 4/5] xtensa: Add setmemsi insn pattern Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 23 May 2022 15:52:52 -0000 This patch introduces setmemsi insn pattern of two kinds, unrolled loop and small loop, for fixed small length and constant initialization value. gcc/ChangeLog: * gcc/config/xtensa/xtensa-protos.h (xtensa_expand_block_set_unrolled_loop, xtensa_expand_block_set_small_loop): New prototypes. * gcc/config/xtensa/xtensa.cc (xtensa_sizeof_MOVI, xtensa_expand_block_set_unrolled_loop, xtensa_expand_block_set_small_loop): New functions. * gcc/config/xtensa/xtensa.md (setmemsi): New expansion pattern. * gcc/config/xtensa/xtensa.opt (mlongcalls): Add target mask. --- gcc/config/xtensa/xtensa-protos.h | 2 + gcc/config/xtensa/xtensa.cc | 211 ++++++++++++++++++++++++++++++ gcc/config/xtensa/xtensa.md | 16 +++ gcc/config/xtensa/xtensa.opt | 2 +- 4 files changed, 230 insertions(+), 1 deletion(-) diff --git a/gcc/config/xtensa/xtensa-protos.h b/gcc/config/xtensa/xtensa-protos.h index 4bc42da2320..30e4b54394a 100644 --- a/gcc/config/xtensa/xtensa-protos.h +++ b/gcc/config/xtensa/xtensa-protos.h @@ -41,6 +41,8 @@ extern void xtensa_expand_conditional_branch (rtx *, machine_mode); extern int xtensa_expand_conditional_move (rtx *, int); extern int xtensa_expand_scc (rtx *, machine_mode); extern int xtensa_expand_block_move (rtx *); +extern int xtensa_expand_block_set_unrolled_loop (rtx *); +extern int xtensa_expand_block_set_small_loop (rtx *); extern void xtensa_split_operand_pair (rtx *, machine_mode); extern int xtensa_emit_move_sequence (rtx *, machine_mode); extern rtx xtensa_copy_incoming_a7 (rtx); diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc index d2aabf38339..c7b54babc37 100644 --- a/gcc/config/xtensa/xtensa.cc +++ b/gcc/config/xtensa/xtensa.cc @@ -1373,6 +1373,217 @@ xtensa_expand_block_move (rtx *operands) } +/* Try to expand a block set operation to a sequence of RTL move + instructions. If not optimizing, or if the block size is not a + constant, or if the block is too large, or if the value to + initialize the block with is not a constant, the expansion + fails and GCC falls back to calling memset(). + + operands[0] is the destination + operands[1] is the length + operands[2] is the initialization value + operands[3] is the alignment */ + +static int +xtensa_sizeof_MOVI (HOST_WIDE_INT imm) +{ + return (TARGET_DENSITY && IN_RANGE (imm, -32, 95)) ? 2 : 3; +} + +int +xtensa_expand_block_set_unrolled_loop (rtx *operands) +{ + rtx dst_mem = operands[0]; + HOST_WIDE_INT bytes, value, align; + int expand_len, funccall_len; + rtx x, reg; + int offset; + + if (!CONST_INT_P (operands[1]) || !CONST_INT_P (operands[2])) + return 0; + + bytes = INTVAL (operands[1]); + if (bytes <= 0) + return 0; + value = (int8_t)INTVAL (operands[2]); + align = INTVAL (operands[3]); + if (align > MOVE_MAX) + align = MOVE_MAX; + + /* Insn expansion: holding the init value. + Either MOV(.N) or L32R w/litpool. */ + if (align == 1) + expand_len = xtensa_sizeof_MOVI (value); + else if (value == 0 || value == -1) + expand_len = TARGET_DENSITY ? 2 : 3; + else + expand_len = 3 + 4; + /* Insn expansion: a series of aligned memory stores. + Consist of S8I, S16I or S32I(.N). */ + expand_len += (bytes / align) * (TARGET_DENSITY + && align == 4 ? 2 : 3); + /* Insn expansion: the remainder, sub-aligned memory stores. + A combination of S8I and S16I as needed. */ + expand_len += ((bytes % align + 1) / 2) * 3; + + /* Function call: preparing two arguments. */ + funccall_len = xtensa_sizeof_MOVI (value); + funccall_len += xtensa_sizeof_MOVI (bytes); + /* Function call: calling memset(). */ + funccall_len += TARGET_LONGCALLS ? (3 + 4 + 3) : 3; + + /* Apply expansion bonus (2x) if optimizing for speed. */ + if (optimize > 1 && !optimize_size) + funccall_len *= 2; + + /* Decide whether to expand or not, based on the sum of the length + of instructions. */ + if (expand_len > funccall_len) + return 0; + + x = XEXP (dst_mem, 0); + if (!REG_P (x)) + dst_mem = replace_equiv_address (dst_mem, force_reg (Pmode, x)); + switch (align) + { + case 1: + break; + case 2: + value = (int16_t)((uint8_t)value * 0x0101U); + break; + case 4: + value = (int32_t)((uint8_t)value * 0x01010101U); + break; + default: + gcc_unreachable (); + } + reg = force_reg (SImode, GEN_INT (value)); + + offset = 0; + do + { + int unit_size = MIN (bytes, align); + machine_mode unit_mode = (unit_size >= 4 ? SImode : + (unit_size >= 2 ? HImode : + QImode)); + unit_size = GET_MODE_SIZE (unit_mode); + + emit_move_insn (adjust_address (dst_mem, unit_mode, offset), + unit_mode == SImode ? reg + : convert_to_mode (unit_mode, reg, true)); + + offset += unit_size; + bytes -= unit_size; + } + while (bytes > 0); + + return 1; +} + +int +xtensa_expand_block_set_small_loop (rtx *operands) +{ + HOST_WIDE_INT bytes, value, align; + int expand_len, funccall_len; + rtx x, dst, end, reg; + machine_mode unit_mode; + rtx_code_label *label; + + if (!CONST_INT_P (operands[1]) || !CONST_INT_P (operands[2])) + return 0; + + bytes = INTVAL (operands[1]); + if (bytes <= 0) + return 0; + value = (int8_t)INTVAL (operands[2]); + align = INTVAL (operands[3]); + if (align > MOVE_MAX) + align = MOVE_MAX; + + /* Totally-aligned block only. */ + if (bytes % align != 0) + return 0; + + /* If 4-byte aligned, small loop substitution is almost optimal, thus + limited to only offset to the end address for ADDI/ADDMI instruction. */ + if (align == 4 + && ! (bytes <= 127 || (bytes <= 32512 && bytes % 256 == 0))) + return 0; + + /* If no 4-byte aligned, loop count should be treated as the constraint. */ + if (align != 4 + && bytes / align > ((optimize > 1 && !optimize_size) ? 8 : 15)) + return 0; + + /* Insn expansion: holding the init value. + Either MOV(.N) or L32R w/litpool. */ + if (align == 1) + expand_len = xtensa_sizeof_MOVI (value); + else if (value == 0 || value == -1) + expand_len = TARGET_DENSITY ? 2 : 3; + else + expand_len = 3 + 4; + /* Insn expansion: Either ADDI(.N) or ADDMI for the end address. */ + expand_len += bytes > 127 ? 3 + : (TARGET_DENSITY && bytes <= 15) ? 2 : 3; + + /* Insn expansion: the loop body and branch instruction. + For store, one of S8I, S16I or S32I(.N). + For advance, ADDI(.N). + For branch, BNE. */ + expand_len += (TARGET_DENSITY && align == 4 ? 2 : 3) + + (TARGET_DENSITY ? 2 : 3) + 3; + + /* Function call: preparing two arguments. */ + funccall_len = xtensa_sizeof_MOVI (value); + funccall_len += xtensa_sizeof_MOVI (bytes); + /* Function call: calling memset(). */ + funccall_len += TARGET_LONGCALLS ? (3 + 4 + 3) : 3; + + /* Apply expansion bonus (2x) if optimizing for speed. */ + if (optimize > 1 && !optimize_size) + funccall_len *= 2; + + /* Decide whether to expand or not, based on the sum of the length + of instructions. */ + if (expand_len > funccall_len) + return 0; + + x = XEXP (operands[0], 0); + if (!REG_P (x)) + x = XEXP (replace_equiv_address (operands[0], force_reg (Pmode, x)), 0); + dst = gen_reg_rtx (SImode); + emit_move_insn (dst, x); + end = gen_reg_rtx (SImode); + emit_insn (gen_addsi3 (end, dst, operands[1] /* the length */)); + switch (align) + { + case 1: + unit_mode = QImode; + break; + case 2: + value = (int16_t)((uint8_t)value * 0x0101U); + unit_mode = HImode; + break; + case 4: + value = (int32_t)((uint8_t)value * 0x01010101U); + unit_mode = SImode; + break; + default: + gcc_unreachable (); + } + reg = force_reg (unit_mode, GEN_INT (value)); + + label = gen_label_rtx (); + emit_label (label); + emit_move_insn (gen_rtx_MEM (unit_mode, dst), reg); + emit_insn (gen_addsi3 (dst, dst, GEN_INT (align))); + emit_cmp_and_jump_insns (dst, end, NE, const0_rtx, SImode, true, label); + + return 1; +} + + void xtensa_expand_nonlocal_goto (rtx *operands) { diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md index 96e043b26b5..2d146b7995c 100644 --- a/gcc/config/xtensa/xtensa.md +++ b/gcc/config/xtensa/xtensa.md @@ -1080,6 +1080,22 @@ DONE; }) +;; Block sets + +(define_expand "setmemsi" + [(match_operand:BLK 0 "memory_operand") + (match_operand:SI 1 "") + (match_operand:SI 2 "") + (match_operand:SI 3 "const_int_operand")] + "!optimize_debug && optimize" +{ + if (xtensa_expand_block_set_unrolled_loop (operands)) + DONE; + if (xtensa_expand_block_set_small_loop (operands)) + DONE; + FAIL; +}) + ;; Shift instructions. diff --git a/gcc/config/xtensa/xtensa.opt b/gcc/config/xtensa/xtensa.opt index c406297af0d..1fc68a3d994 100644 --- a/gcc/config/xtensa/xtensa.opt +++ b/gcc/config/xtensa/xtensa.opt @@ -27,7 +27,7 @@ Target Mask(FORCE_NO_PIC) Disable position-independent code (PIC) for use in OS kernel code. mlongcalls -Target +Target Mask(LONGCALLS) Use indirect CALLXn instructions for large programs. mtarget-align -- 2.20.1