From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=lRYE=5F=yahoo.co.jp=jjsuwa_sys3175@sourceware.org>
Received: from sonicconh6001-vm2.mail.ssk.yahoo.co.jp (sonicconh6001-vm2.mail.ssk.yahoo.co.jp [182.22.37.11])
	by sourceware.org (Postfix) with ESMTPS id 328DB3858D33
	for <gcc-patches@gcc.gnu.org>; Sun,  8 Jan 2023 04:11:58 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 328DB3858D33
Authentication-Results: sourceware.org; dmarc=pass (p=quarantine dis=none) header.from=yahoo.co.jp
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=yahoo.co.jp
X-YMail-OSG: UVBBI5gVM1nP1IaYzs69ybOXNjNYG3fjzL0UYCnRnkuKhRWqT7kgWpvzmEqrBzC
 n2C8OyJRdKs1V6Z2LRGiz3smD3V1xnPA27uSPe.Saz7aV7EWW_SMr25jO_gv3upeOP1FnW7GKoB7
 82FE06Wkx6M.MduVUc2VOYZxh7HyevgtbILAMNT.NxU6jl7tR_tJWUX409Oe7X6kfZypCdrlQLRv
 eq14KIqU8rrBPWlNf.HUvPPWxmo3XIdVH0rT_MiljZC3t6kPS8J3sn2iBsahRdz.yjuzXTfMCVOd
 8WLIcmHfLv52rJ8vhmnRm0IrKU_5pM27TG8L0CCsgARMTQDXg3Z_Bg0mZcysqN8kcuZGsiB8QI9Y
 OfgXRy1pfSuVrZGzYiFbcuFkdH6Lt0QLmND2Tz8YCmj52dBJe7WSb6do9__jK5hFhgdvtjUxDy7W
 jpHpdTL1kQauXA58TcmpPtF0nHViWJvSB3dmbb5EmvW3HKVxEhDQUpcWt10lOsKy2nC_kvOjHfoc
 vUnLZojTseMLk8kGOf7UpaMpWWgCl0.eMDne38qzyWL0rprbiSBPJRmD00aHr88ZslLfiLlGYyWz
 GkYzRzF4_5vqJjm2DsefmrMaJ71qZ_Y7jQTicqpiyNWtO3zoKBr054SewK0Nkcd9K2dlu.oQjUji
 UPDhmDN6N53RdedTxFgmIDFSPPtJFVTDzUXYuF0Vn81sn.D5Uph6UZyk5NHLk2MBufUNUAoSEYv7
 IbLQYBrq4SUiPvt37f7emVXgoGqzx0.PzB5KPFBSXiWy0TWBabpF93LnQXQ65WhVHobHZnxqlzVB
 c3qXrg_BYHqfzhi1JY8dFjIL9G9oX2_64boLUfQa1Yi8jpJWOY8ju5bue77xewUJtRdft8yDYgHz
 VAVWw4wZUPgczvBL_TxWNiJVYSfJdxSfa9yhlU.332nTGnBKBw4vBJ0I0zC2Vu2eeRZpcBGrmfna
 mlCQvnJI-
Received: from sonicgw.mail.yahoo.co.jp by sonicconh6001.mail.ssk.yahoo.co.jp with HTTP; Sun, 8 Jan 2023 04:11:52 +0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; t=1673151112;
	s=yj20110701; d=yahoo.co.jp;
	h=Message-ID:Date:MIME-Version:Subject:To:References:Cc:From:In-Reply-To:Content-Type:Content-Transfer-Encoding;
	bh=199XZKqu6qd4wymdLZbVDKQDUDPd2pSMPDr6z7TrBC4=;
	b=MxIwdTQbOd7ySztowAj/qj72qj3E3U06JDtn67UXo1IWHtVzkjanKGqIbxzMLe4W
	QyFtvsMSrYsattMMMFJK82UrTHs62CatAU9qZsIGGbjo+WP6SCOQKNODbZWL7iJRLnU
	ds5Vj5by3nGlTlyBZlRQSfb1+OBnt1qtCoQNtWbc=
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=yj20110701; d=yahoo.co.jp;
	h=Message-ID:Date:MIME-Version:References:Cc:From:In-Reply-To:Content-Type:Content-Transfer-Encoding;
	b=WWt7xbD1EAhL2whc2DPA5BRLSKeo/eRffqqHW1tuIBA7lXTeSstKs3RcYvHMvwXz
	mYJM3OlFeypdBXqsXmCQSL5UoW3iBA/xZ0blpIdtKPfb7v2m/VPgyaM1RcjoLGknUtm
	ltRIklXsDzSm2D7/vbmGPIDZLz5tGbqEIRpLbFeU=;
Received: by smtphe6010.mail.ssk.ynwp.yahoo.co.jp (YJ Hermes SMTP Server) with ESMTPA ID 6f496db6d99808a65621b92b5bc106e5;
          Sun, 08 Jan 2023 13:11:49 +0900 (JST)
Message-ID: <1a8df28d-4f9b-bc71-3bfc-90f8363d698c@yahoo.co.jp>
Date: Sun, 8 Jan 2023 13:11:31 +0900
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.6.1
Subject: Re: [PATCH] xtensa: Optimize bitwise splicing operation
To: Max Filippov <jcmvbkbc@gmail.com>
References: <1e8fab8f-c0bb-dfc6-5533-eba3bde49ea4.ref@yahoo.co.jp>
 <1e8fab8f-c0bb-dfc6-5533-eba3bde49ea4@yahoo.co.jp>
 <CAMo8Bf+N33djvSuL1XRO8WM0hwU9XBrt=NCVyDk+wrjuohDbZQ@mail.gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
From: Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
In-Reply-To: <CAMo8Bf+N33djvSuL1XRO8WM0hwU9XBrt=NCVyDk+wrjuohDbZQ@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-13.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_0,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On 2023/01/08 6:53, Max Filippov wrote:
> On Fri, Jan 6, 2023 at 6:55 PM Takayuki 'January June' Suwa
> <jjsuwa_sys3175@yahoo.co.jp> wrote:
>>
>> This patch optimizes the operation of cutting and splicing two register
>> values at a specified bit position, in other words, combining (bitwise
>> ORing) bits 0 through (C-1) of the register with bits C through 31
>> of the other, where C is the specified immediate integer 1 through 31.
>>
>> This typically applies to signedness copy of floating point number or
>> __builtin_return_address() if the windowed register ABI, and saves one
>> instruction compared to four shifts and a bitwise OR by the RTL
>> generation pass.
> 
> While I indeed see this kind of change, e.g.:
> -       extui   a3, a3, 27, 5
> -       slli    a2, a2, 5
> -       srli    a2, a2, 5
> -       slli    a3, a3, 27
> -       or      a2, a2, a3
> +       slli    a2, a2, 5
> +       extui   a3, a3, 27, 5
> +       ssai    5
> +       src     a2, a3, a2
> 
> I also see the following:
> -       movi.n  a6, -4
> -       and     a5, a5, a6
> -       extui   a3, a3, 0, 2
> -       or      a3, a3, a5
> +       srli    a5, a5, 2
> +       slli    a3, a3, 30
> +       ssai    30
> +       src     a3, a5, a3
> 
> i.e. after the split there's the same number of instructions,
> but the new sequence is one byte longer than the original one
> because of the movi.n.
> 
> Looking at a bunch of linux builds I observe a slight code size
> growth in call0 kernels and a slight code size reduction in
> windowed kernels.
> 
>> gcc/ChangeLog:
>>
>>         * config/xtensa/xtensa.md (*splice_bits):
>>         New insn_and_split pattern.
>> ---
>>  gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md
>> index 0a26d3dccf4..36ec1b1918e 100644
>> --- a/gcc/config/xtensa/xtensa.md
>> +++ b/gcc/config/xtensa/xtensa.md
>> @@ -746,6 +746,53 @@
>>     (set_attr "mode"    "SI")
>>     (set_attr "length"  "3")])
>>
>> +(define_insn_and_split "*splice_bits"
>> +  [(set (match_operand:SI 0 "register_operand" "=a")
>> +       (ior:SI (and:SI (match_operand:SI 1 "register_operand" "r")
>> +                       (match_operand:SI 3 "const_int_operand" "i"))
>> +               (and:SI (match_operand:SI 2 "register_operand" "r")
>> +                       (match_operand:SI 4 "const_int_operand" "i"))))]
>> +
>> +  "!optimize_debug && optimize
>> +   && INTVAL (operands[3]) + INTVAL (operands[4]) == -1
>> +   && (exact_log2 (INTVAL (operands[3]) + 1) > 0
>> +       || exact_log2 (INTVAL (operands[4]) + 1) > 0)"
>> +  "#"
>> +  "&& can_create_pseudo_p ()"
>> +  [(set (match_dup 5)
>> +       (ashift:SI (match_dup 1)
>> +                  (match_dup 4)))
>> +   (set (match_dup 6)
>> +       (lshiftrt:SI (match_dup 2)
>> +                    (match_dup 3)))
>> +   (set (match_dup 0)
>> +       (ior:SI (lshiftrt:SI (match_dup 5)
>> +                            (match_dup 4))
>> +               (ashift:SI (match_dup 6)
>> +                          (match_dup 3))))]
>> +{
>> +  int shift;
>> +  if (INTVAL (operands[3]) < 0)
>> +    {
>> +      rtx x;
>> +      x = operands[1], operands[1] = operands[2], operands[2] = x;
>> +      x = operands[3], operands[3] = operands[4], operands[4] = x;
>> +    }
>> +  shift = floor_log2 (INTVAL (operands[3]) + 1);
>> +  operands[3] = GEN_INT (shift);
>> +  operands[4] = GEN_INT (32 - shift);
>> +  operands[5] = gen_reg_rtx (SImode);
>> +  operands[6] = gen_reg_rtx (SImode);
>> +}
>> +  [(set_attr "type"    "arith")
>> +   (set_attr "mode"    "SI")
>> +   (set (attr "length")
>> +       (if_then_else (match_test "TARGET_DENSITY
>> +                                  && (INTVAL (operands[3]) == 0x7FFFFFFF
>> +                                      || INTVAL (operands[4]) == 0x7FFFFFFF)")
>> +                     (const_int 11)
>> +                     (const_int 12)))])
> 
> I wonder how the length could be 11 here? I always see 4 3-byte
> instructions generated by this pattern.
> 

Sorry, I should have carried out a systematic test beforehand:

    #define TEST(c) \
      unsigned int test_ ## c (unsigned int a, unsigned int b) { \
        return (a & (-1U >> c)) | (b & ~(-1U >> c)); \
      }
    TEST(1)
    TEST(2)
      ...
    TEST(30)
    TEST(31)

Without this patch, compiling the above if c is:

 a. between 1 and 15, slli (or add.n) + extui + slli + srli + or
 b. 16 then extui + slli + extui + or
 c. between 17 and 20, srli + slli + extui + or 
 d. between 21 and 31, movi(.n) + and + extui + or

Clearly, the patch should be restricted to apply only to case a.