Just to add a bit more color on this one...
It was originally observed (and isolated from)
_ZN11xalanc_1_1027XalanReferenceCountedObject12addReferenceEPS0_ and
reproduces both for AArch64 and RISC-V.

The basic block (annotated with dynamic instructions executed and
percentage of total dynamic instructions) looks as follows:

>   0x0000000000511488 4589868875 0.4638%
> _ZN11xalanc_1_1027XalanReferenceCountedObject12addReferenceEPS0_
>       4518              lw              a4,8(a0)
>       0017029b          addiw           t0,a4,1
>       00552423          sw              t0,8(a0)
>       4685              addi            a3,zero,1
>       00d28363          beq             t0,a3,6         # 0x51149a


This change reduces the instruction count on RISC-V by one compressible
instruction (2 bytes) and on AArch64 by one instruction (4 bytes).
No execution time improvement (measured on Neoverse-N1) — as would be
expected.

--Philipp.


On Thu, 16 Mar 2023 at 17:41, Jeff Law <jeffreyalaw@gmail.com> wrote:

>
>
> On 3/16/23 09:27, Manolis Tsamis wrote:
> > For this C testcase:
> >
> > void g();
> > void f(unsigned int *a)
> > {
> >    if (++*a == 1)
> >      g();
> > }
> >
> > GCC will currently emit a comparison with 1 by using the value
> > of *a after the increment. This can be improved by comparing
> > against 0 and using the value before the increment. As a result
> > there is a potentially shorter dependancy chain (no need to wait
> > for the result of +1) and on targets with compare zero instructions
> > the generated code is one instruction shorter.
> >
> > Example from Aarch64:
> >
> > Before
> >          ldr     w1, [x0]
> >          add     w1, w1, 1
> >          str     w1, [x0]
> >          cmp     w1, 1
> >          beq     .L4
> >          ret
> >
> > After
> >          ldr     w1, [x0]
> >          add     w2, w1, 1
> >          str     w2, [x0]
> >          cbz     w1, .L4
> >          ret
> >
> > gcc/ChangeLog:
> >
> >          * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> >          (forward_propagate_into_comparison_1): Optimize
> >          for zero comparisons.
> Deferring to gcc-14.  Though I'm generally supportive of normalizing to
> a comparison against zero when we safely can :-)
>
> jeff
>