Am 02.11.23 um 12:50 schrieb Roger Sayle:
> 
> This patch optimizes a few special cases in avr.md's *insv.any_shift.<mode>
> instruction.  This template handles tests for a single bit, where the result
> has only a (different) single bit set in the result.  Usually (currently)
> this always requires a three-instruction sequence of a BST, a CLR and a BLD
> (plus any additional CLR instructions to clear the rest of the result
> bytes).
> The special cases considered here are those that can be done with only two
> instructions (plus CLRs); an ANDI preceded by either a MOV, a SHIFT or a
> SWAP.
> 
> Hence for C=1 in HImode, GCC with -O2 currently generates:
> 
>          bst r24,1
>          clr r24
>          clr r25
>          bld r24,0
> 
> with this patch, we now generate:
> 
>          lsr r24
>          andi r24,1
>          clr r25
> 
> Likewise, HImode C=4 now becomes:
> 
>          swap r24
>          andi r24,1
>          clr r25
> 
> and SImode C=8 now becomes:
> 
>          mov r22,r23
>          andi r22,1
>          clr 23
>          clr 24
>          clr 25
> 
> 
> I've not attempted to model the instruction length accurately for these
> special cases; the logic would be ugly, but it's safe to use the current
> (1 insn longer) length.
> 
> This patch has been (partially) tested with a cross-compiler to avr-elf
> hosted on x86_64, without a simulator, where the compile-only tests in
> the gcc testsuite show no regressions.  If someone could test this more
> thoroughly that would be great.
> 
> 
> 2023-11-02  Roger Sayle  <roger@nextmovesoftware.com>

CCing Andrew.

Hi, here is a version based on yours.

I am still unsure of what to make with this insn; one approach would be
to post-reload split which simplifies the pattern a bit.  However, when
the current pattern would use MOVW, in a split version we'd get one
more instruction because there would be no MOVW but two MOV's.

Splitting would improve situation when not all of the output bytes
are used by following code, though.

Maybe Andrew has an idea; he helped a lot to improve code generation
by fixing and tweaking middle-end using AVR test cases like for PR55181
or PR109907.

Anyway, here is a version that works out exact code lengths, and it
handles some more cases.

Then I am not really sure if testcases that assert certain instruction
sequences from optimizers is a good idea or rather a liability:
The middle-end is not very good at generating reproducible code
across versions.  In particular, it's not uncommon that newer GCC
versions no more find some optimizations.  So the attached patch just
has a dg-do run without asserting anything on the exact code sequence.

Johann

--

Improve insn output for "*insv.any_shift.<mode>".

gcc/
	* config/avr/avr-protos.h (avr_out_insv): New proto.
	* config/avr/avr.md (adjust_len) [insv]: Add to define_attr.
	(*insv.any_shift.<mode>): Output using...
	* config/avr/avr.cc (avr_out_insv): ...this new function.
	(avr_adjust_insn_length) [ADJUST_LEN_INSV]: Handle new case.

gcc/testsuite/
	* gcc.target/avr/torture/insv-anyshift.c: New test.