From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=G+7U=BN=arm.com=Richard.Earnshaw@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id A415B3858CDB
	for <gcc@gcc.gnu.org>; Wed, 24 May 2023 12:31:19 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A415B3858CDB
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4D0751042;
	Wed, 24 May 2023 05:32:04 -0700 (PDT)
Received: from [10.2.78.54] (e120077-lin.cambridge.arm.com [10.2.78.54])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C2FA03F840;
	Wed, 24 May 2023 05:31:18 -0700 (PDT)
Message-ID: <d73c079b-e808-3a4c-951c-3bd23aff4801@arm.com>
Date: Wed, 24 May 2023 13:31:17 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Subject: Re: Wrong cost computation / conclusion ins insn combine?
Content-Language: en-GB
To: Georg-Johann Lay <avr@gjlay.de>, gcc@gcc.gnu.org
References: <1ed99189-7db3-6b97-db6b-86ce295c075a@gjlay.de>
From: "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>
In-Reply-To: <1ed99189-7db3-6b97-db6b-86ce295c075a@gjlay.de>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-3491.5 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc.gcc.gnu.org>

On 23/05/2023 19:41, Georg-Johann Lay wrote:
> For some time now I am staring at the following test case and what
> combine does with it:
> 
> typedef struct
> {
>      unsigned b0 : 1;
>      unsigned b1 : 1;
>      unsigned b2 : 1;
>      unsigned b3 : 1;
>      unsigned b4 : 1;
>      unsigned b5 : 1;
>      unsigned b6 : 1;
>      unsigned b7 : 1;
> } b_t;
> 
> Prior to combine, there is:
> 
> insn_cost 4 for    18: r52:QI = r24:QI
> insn_cost 4 for     2: r47:QI = r52:QI
> insn_cost 4 for     6: r48:QI = zero_extract(r47:QI,0x1,0)
> insn_cost 4 for     7: r50:QI = 0x1
> insn_cost 4 for     8: r49:QI = r48:QI ^ r50:QI
> insn_cost 8 for     9: zero_extract(r47:QI,0x1,0) = r49:QI
> insn_cost 4 for    15: r24:QI = r47:QI
> 
> So insn 6 extracts bit 0, insn 8 flips it, and insn 9 inserts
> it as bit 0 again.
> 
> Combine then starts looking for combinations, and at some point
> comes up with:
> 
> 
> Trying 7 -> 9:
>      7: r50:QI = ~r47:QI
>      9: zero_extract(r47:QI,0x1,0) = r50:QI
> Successfully matched this instruction:
> (set (zero_extract:QI (reg/v:QI 47 [ a ])
>                        (const_int 1 [0x1])
>                        (const_int 0 [0]))
>       (not:QI (reg/v:QI 47 [ a ])))
> allowing combination of insns 7 and 9
> original costs 4 + 8 = 12
> replacement cost 12
> deferring deletion of insn with uid = 7.
> modifying insn i3     9: zero_extract(r47:QI,0x1,0)=~r47:QI
> deferring rescan insn with uid = 9.
> 
> So the cost is 12 and this insn is accepted. But down the line,
> combine cooks up this:
> 
> Trying 2, 9 -> 15:
>      2: r47:QI = r52:QI
>      9: zero_extract(r47:QI,0x1,0) = ~r47:QI
>     15: r24:QI = r47:QI
> ...
> Successfully matched this instruction:
> (set (reg/i:QI 24 r24)
>       (ior:QI (and:QI (reg:QI 52)
>                       (const_int -2 [0xfffffffffffffffe]))
>               (and:QI (reg/v:QI 47 [ a ])
>                       (const_int 1 [0x1]))))
> allowing combination of insns 2, 9 and 15
> original costs 4 + 12 + 4 = 20
> replacement costs 4 + 12 = 16
> deferring deletion of insn with uid = 2.
> modifying insn i2     9: r47:QI=~r52:QI
> deferring rescan insn with uid = 9.
> modifying insn i3    15: r24:QI=r52:QI&0xfffffffffffffffe|r47:QI&0x1
> deferring rescan insn with uid = 15.
> 
> So this one has a cost of 16 which is more expensive than the first
> combination. For example it still needs to flip the bit.
> 
> So why is combine choosing the expensive replacement over the cheap one?

Because it thinks it is cheaper.  As your log shows, the calculation is:

original costs 4 + 12 + 4 = 20
replacement costs 4 + 12 = 16

But the real problem is that two of the instructions in this example are 
simple register-register move operations which will likely be eliminated 
during register allocation anyway (due to coalescing) and this throws 
off the cost calculations.

I've seen this sort of thing before; perhaps the best solution would be 
to override the cost of a simple (register to register) set and give it 
a cost of zero.  Then we'd see that this new sequence is worse than the 
original.

> 
> Also it combines hard-registers in the 2nd case, but not in the
> 1st one.  So the costs get biased towards 2nd.
> 
> Can someone explain why combine takes the more expensive solution?
> 
> Target is avr, compiled with
> 
> $ avr-gcc-14 bits.c -dumpbase "" -S -Os -fdump-rtl-combine-details -dp
> 
> Johann

R.