From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A415B3858CDB for ; Wed, 24 May 2023 12:31:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A415B3858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4D0751042; Wed, 24 May 2023 05:32:04 -0700 (PDT) Received: from [10.2.78.54] (e120077-lin.cambridge.arm.com [10.2.78.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C2FA03F840; Wed, 24 May 2023 05:31:18 -0700 (PDT) Message-ID: Date: Wed, 24 May 2023 13:31:17 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: Wrong cost computation / conclusion ins insn combine? Content-Language: en-GB To: Georg-Johann Lay , gcc@gcc.gnu.org References: <1ed99189-7db3-6b97-db6b-86ce295c075a@gjlay.de> From: "Richard Earnshaw (lists)" In-Reply-To: <1ed99189-7db3-6b97-db6b-86ce295c075a@gjlay.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3491.5 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 23/05/2023 19:41, Georg-Johann Lay wrote: > For some time now I am staring at the following test case and what > combine does with it: > > typedef struct > { >     unsigned b0 : 1; >     unsigned b1 : 1; >     unsigned b2 : 1; >     unsigned b3 : 1; >     unsigned b4 : 1; >     unsigned b5 : 1; >     unsigned b6 : 1; >     unsigned b7 : 1; > } b_t; > > Prior to combine, there is: > > insn_cost 4 for    18: r52:QI = r24:QI > insn_cost 4 for     2: r47:QI = r52:QI > insn_cost 4 for     6: r48:QI = zero_extract(r47:QI,0x1,0) > insn_cost 4 for     7: r50:QI = 0x1 > insn_cost 4 for     8: r49:QI = r48:QI ^ r50:QI > insn_cost 8 for     9: zero_extract(r47:QI,0x1,0) = r49:QI > insn_cost 4 for    15: r24:QI = r47:QI > > So insn 6 extracts bit 0, insn 8 flips it, and insn 9 inserts > it as bit 0 again. > > Combine then starts looking for combinations, and at some point > comes up with: > > > Trying 7 -> 9: >     7: r50:QI = ~r47:QI >     9: zero_extract(r47:QI,0x1,0) = r50:QI > Successfully matched this instruction: > (set (zero_extract:QI (reg/v:QI 47 [ a ]) >                       (const_int 1 [0x1]) >                       (const_int 0 [0])) >      (not:QI (reg/v:QI 47 [ a ]))) > allowing combination of insns 7 and 9 > original costs 4 + 8 = 12 > replacement cost 12 > deferring deletion of insn with uid = 7. > modifying insn i3     9: zero_extract(r47:QI,0x1,0)=~r47:QI > deferring rescan insn with uid = 9. > > So the cost is 12 and this insn is accepted. But down the line, > combine cooks up this: > > Trying 2, 9 -> 15: >     2: r47:QI = r52:QI >     9: zero_extract(r47:QI,0x1,0) = ~r47:QI >    15: r24:QI = r47:QI > ... > Successfully matched this instruction: > (set (reg/i:QI 24 r24) >      (ior:QI (and:QI (reg:QI 52) >                      (const_int -2 [0xfffffffffffffffe])) >              (and:QI (reg/v:QI 47 [ a ]) >                      (const_int 1 [0x1])))) > allowing combination of insns 2, 9 and 15 > original costs 4 + 12 + 4 = 20 > replacement costs 4 + 12 = 16 > deferring deletion of insn with uid = 2. > modifying insn i2     9: r47:QI=~r52:QI > deferring rescan insn with uid = 9. > modifying insn i3    15: r24:QI=r52:QI&0xfffffffffffffffe|r47:QI&0x1 > deferring rescan insn with uid = 15. > > So this one has a cost of 16 which is more expensive than the first > combination. For example it still needs to flip the bit. > > So why is combine choosing the expensive replacement over the cheap one? Because it thinks it is cheaper. As your log shows, the calculation is: original costs 4 + 12 + 4 = 20 replacement costs 4 + 12 = 16 But the real problem is that two of the instructions in this example are simple register-register move operations which will likely be eliminated during register allocation anyway (due to coalescing) and this throws off the cost calculations. I've seen this sort of thing before; perhaps the best solution would be to override the cost of a simple (register to register) set and give it a cost of zero. Then we'd see that this new sequence is worse than the original. > > Also it combines hard-registers in the 2nd case, but not in the > 1st one.  So the costs get biased towards 2nd. > > Can someone explain why combine takes the more expensive solution? > > Target is avr, compiled with > > $ avr-gcc-14 bits.c -dumpbase "" -S -Os -fdump-rtl-combine-details -dp > > Johann R.