From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sonicconh6001-vm2.mail.ssk.yahoo.co.jp (sonicconh6001-vm2.mail.ssk.yahoo.co.jp [182.22.37.11]) by sourceware.org (Postfix) with ESMTPS id 328DB3858D33 for ; Sun, 8 Jan 2023 04:11:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 328DB3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=yahoo.co.jp Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=yahoo.co.jp X-YMail-OSG: UVBBI5gVM1nP1IaYzs69ybOXNjNYG3fjzL0UYCnRnkuKhRWqT7kgWpvzmEqrBzC n2C8OyJRdKs1V6Z2LRGiz3smD3V1xnPA27uSPe.Saz7aV7EWW_SMr25jO_gv3upeOP1FnW7GKoB7 82FE06Wkx6M.MduVUc2VOYZxh7HyevgtbILAMNT.NxU6jl7tR_tJWUX409Oe7X6kfZypCdrlQLRv eq14KIqU8rrBPWlNf.HUvPPWxmo3XIdVH0rT_MiljZC3t6kPS8J3sn2iBsahRdz.yjuzXTfMCVOd 8WLIcmHfLv52rJ8vhmnRm0IrKU_5pM27TG8L0CCsgARMTQDXg3Z_Bg0mZcysqN8kcuZGsiB8QI9Y OfgXRy1pfSuVrZGzYiFbcuFkdH6Lt0QLmND2Tz8YCmj52dBJe7WSb6do9__jK5hFhgdvtjUxDy7W jpHpdTL1kQauXA58TcmpPtF0nHViWJvSB3dmbb5EmvW3HKVxEhDQUpcWt10lOsKy2nC_kvOjHfoc vUnLZojTseMLk8kGOf7UpaMpWWgCl0.eMDne38qzyWL0rprbiSBPJRmD00aHr88ZslLfiLlGYyWz GkYzRzF4_5vqJjm2DsefmrMaJ71qZ_Y7jQTicqpiyNWtO3zoKBr054SewK0Nkcd9K2dlu.oQjUji UPDhmDN6N53RdedTxFgmIDFSPPtJFVTDzUXYuF0Vn81sn.D5Uph6UZyk5NHLk2MBufUNUAoSEYv7 IbLQYBrq4SUiPvt37f7emVXgoGqzx0.PzB5KPFBSXiWy0TWBabpF93LnQXQ65WhVHobHZnxqlzVB c3qXrg_BYHqfzhi1JY8dFjIL9G9oX2_64boLUfQa1Yi8jpJWOY8ju5bue77xewUJtRdft8yDYgHz VAVWw4wZUPgczvBL_TxWNiJVYSfJdxSfa9yhlU.332nTGnBKBw4vBJ0I0zC2Vu2eeRZpcBGrmfna mlCQvnJI- Received: from sonicgw.mail.yahoo.co.jp by sonicconh6001.mail.ssk.yahoo.co.jp with HTTP; Sun, 8 Jan 2023 04:11:52 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1673151112; s=yj20110701; d=yahoo.co.jp; h=Message-ID:Date:MIME-Version:Subject:To:References:Cc:From:In-Reply-To:Content-Type:Content-Transfer-Encoding; bh=199XZKqu6qd4wymdLZbVDKQDUDPd2pSMPDr6z7TrBC4=; b=MxIwdTQbOd7ySztowAj/qj72qj3E3U06JDtn67UXo1IWHtVzkjanKGqIbxzMLe4W QyFtvsMSrYsattMMMFJK82UrTHs62CatAU9qZsIGGbjo+WP6SCOQKNODbZWL7iJRLnU ds5Vj5by3nGlTlyBZlRQSfb1+OBnt1qtCoQNtWbc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=yj20110701; d=yahoo.co.jp; h=Message-ID:Date:MIME-Version:References:Cc:From:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=WWt7xbD1EAhL2whc2DPA5BRLSKeo/eRffqqHW1tuIBA7lXTeSstKs3RcYvHMvwXz mYJM3OlFeypdBXqsXmCQSL5UoW3iBA/xZ0blpIdtKPfb7v2m/VPgyaM1RcjoLGknUtm ltRIklXsDzSm2D7/vbmGPIDZLz5tGbqEIRpLbFeU=; Received: by smtphe6010.mail.ssk.ynwp.yahoo.co.jp (YJ Hermes SMTP Server) with ESMTPA ID 6f496db6d99808a65621b92b5bc106e5; Sun, 08 Jan 2023 13:11:49 +0900 (JST) Message-ID: <1a8df28d-4f9b-bc71-3bfc-90f8363d698c@yahoo.co.jp> Date: Sun, 8 Jan 2023 13:11:31 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH] xtensa: Optimize bitwise splicing operation To: Max Filippov References: <1e8fab8f-c0bb-dfc6-5533-eba3bde49ea4.ref@yahoo.co.jp> <1e8fab8f-c0bb-dfc6-5533-eba3bde49ea4@yahoo.co.jp> Cc: GCC Patches From: Takayuki 'January June' Suwa In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2023/01/08 6:53, Max Filippov wrote: > On Fri, Jan 6, 2023 at 6:55 PM Takayuki 'January June' Suwa > wrote: >> >> This patch optimizes the operation of cutting and splicing two register >> values at a specified bit position, in other words, combining (bitwise >> ORing) bits 0 through (C-1) of the register with bits C through 31 >> of the other, where C is the specified immediate integer 1 through 31. >> >> This typically applies to signedness copy of floating point number or >> __builtin_return_address() if the windowed register ABI, and saves one >> instruction compared to four shifts and a bitwise OR by the RTL >> generation pass. > > While I indeed see this kind of change, e.g.: > - extui a3, a3, 27, 5 > - slli a2, a2, 5 > - srli a2, a2, 5 > - slli a3, a3, 27 > - or a2, a2, a3 > + slli a2, a2, 5 > + extui a3, a3, 27, 5 > + ssai 5 > + src a2, a3, a2 > > I also see the following: > - movi.n a6, -4 > - and a5, a5, a6 > - extui a3, a3, 0, 2 > - or a3, a3, a5 > + srli a5, a5, 2 > + slli a3, a3, 30 > + ssai 30 > + src a3, a5, a3 > > i.e. after the split there's the same number of instructions, > but the new sequence is one byte longer than the original one > because of the movi.n. > > Looking at a bunch of linux builds I observe a slight code size > growth in call0 kernels and a slight code size reduction in > windowed kernels. > >> gcc/ChangeLog: >> >> * config/xtensa/xtensa.md (*splice_bits): >> New insn_and_split pattern. >> --- >> gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++ >> 1 file changed, 47 insertions(+) >> >> diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md >> index 0a26d3dccf4..36ec1b1918e 100644 >> --- a/gcc/config/xtensa/xtensa.md >> +++ b/gcc/config/xtensa/xtensa.md >> @@ -746,6 +746,53 @@ >> (set_attr "mode" "SI") >> (set_attr "length" "3")]) >> >> +(define_insn_and_split "*splice_bits" >> + [(set (match_operand:SI 0 "register_operand" "=a") >> + (ior:SI (and:SI (match_operand:SI 1 "register_operand" "r") >> + (match_operand:SI 3 "const_int_operand" "i")) >> + (and:SI (match_operand:SI 2 "register_operand" "r") >> + (match_operand:SI 4 "const_int_operand" "i"))))] >> + >> + "!optimize_debug && optimize >> + && INTVAL (operands[3]) + INTVAL (operands[4]) == -1 >> + && (exact_log2 (INTVAL (operands[3]) + 1) > 0 >> + || exact_log2 (INTVAL (operands[4]) + 1) > 0)" >> + "#" >> + "&& can_create_pseudo_p ()" >> + [(set (match_dup 5) >> + (ashift:SI (match_dup 1) >> + (match_dup 4))) >> + (set (match_dup 6) >> + (lshiftrt:SI (match_dup 2) >> + (match_dup 3))) >> + (set (match_dup 0) >> + (ior:SI (lshiftrt:SI (match_dup 5) >> + (match_dup 4)) >> + (ashift:SI (match_dup 6) >> + (match_dup 3))))] >> +{ >> + int shift; >> + if (INTVAL (operands[3]) < 0) >> + { >> + rtx x; >> + x = operands[1], operands[1] = operands[2], operands[2] = x; >> + x = operands[3], operands[3] = operands[4], operands[4] = x; >> + } >> + shift = floor_log2 (INTVAL (operands[3]) + 1); >> + operands[3] = GEN_INT (shift); >> + operands[4] = GEN_INT (32 - shift); >> + operands[5] = gen_reg_rtx (SImode); >> + operands[6] = gen_reg_rtx (SImode); >> +} >> + [(set_attr "type" "arith") >> + (set_attr "mode" "SI") >> + (set (attr "length") >> + (if_then_else (match_test "TARGET_DENSITY >> + && (INTVAL (operands[3]) == 0x7FFFFFFF >> + || INTVAL (operands[4]) == 0x7FFFFFFF)") >> + (const_int 11) >> + (const_int 12)))]) > > I wonder how the length could be 11 here? I always see 4 3-byte > instructions generated by this pattern. > Sorry, I should have carried out a systematic test beforehand: #define TEST(c) \ unsigned int test_ ## c (unsigned int a, unsigned int b) { \ return (a & (-1U >> c)) | (b & ~(-1U >> c)); \ } TEST(1) TEST(2) ... TEST(30) TEST(31) Without this patch, compiling the above if c is: a. between 1 and 15, slli (or add.n) + extui + slli + srli + or b. 16 then extui + slli + extui + or c. between 17 and 20, srli + slli + extui + or d. between 21 and 31, movi(.n) + and + extui + or Clearly, the patch should be restricted to apply only to case a.