* [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314]
@ 2022-04-26 7:39 Jakub Jelinek
2022-04-26 7:58 ` Richard Biener
2022-07-19 10:53 ` Maciej W. Rozycki
0 siblings, 2 replies; 5+ messages in thread
From: Jakub Jelinek @ 2022-04-26 7:39 UTC (permalink / raw)
To: Richard Biener, Jeff Law, Eric Botcazou; +Cc: gcc-patches
Hi!
The following testcase regressed on riscv due to the splitting of critical
edges in the sink pass, similarly to x86_64 compared to GCC 11 we now swap
the edges, whether true or false edge goes to an empty forwarded bb.
From GIMPLE POV, those 2 forms are equivalent, but as can be seen here, for
some ifcvt opts it matters one way or another.
On this testcase, noce_try_store_flag_mask used to trigger and transformed
if (pseudo2) pseudo1 = 0;
into
pseudo1 &= -(pseudo2 == 0);
But with the swapped edges ifcvt actually sees
if (!pseudo2) pseudo3 = pseudo1; else pseudo3 = 0;
and noce_try_store_flag_mask punts. IMHO there is no reason why it
should punt those, it is equivalent to
pseudo3 = pseudo1 & -(pseudo2 == 0);
and especially if the target has 3 operand AND, it shouldn't be any more
costly (and even with 2 operand AND, it might very well happen that RA
can make it happen without any extra moves).
Initially I've just removed the rtx_equal_p calls from the conditions
and didn't add anything there, but that broke aarch64 bootstrap and
regressed some testcases on x86_64, where if_info->a or if_info->b could be
some larger expression that we can't force into a register.
Furthermore, the case where both if_info->a and if_info->b are constants is
better handled by other ifcvt optimizations like noce_try_store_flag
or noce_try_inverse_constants or noce_try_store_flag_constants.
So, I've restricted it to just a REG (perhaps SUBREG of REG might be ok too)
next to what has been handled previously.
Bootstrapped/regtested on {x86_64,i686,armv7hl,aarch64,powerpc64le}-linux,
and tested it on the testcase in a cross to riscv*-linux, ok for trunk?
2022-04-26 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/105314
* ifcvt.cc (noce_try_store_flag_mask): Don't require that the non-zero
operand is equal to if_info->x, instead use the non-zero operand
as one of the operands of AND with if_info->x as target.
* gcc.target/riscv/pr105314.c: New test.
--- gcc/ifcvt.cc.jj 2022-03-15 09:11:55.312988179 +0100
+++ gcc/ifcvt.cc 2022-04-25 17:37:11.278924377 +0200
@@ -1678,10 +1678,10 @@ noce_try_store_flag_mask (struct noce_if
reversep = 0;
if ((if_info->a == const0_rtx
- && rtx_equal_p (if_info->b, if_info->x))
+ && (REG_P (if_info->b) || rtx_equal_p (if_info->b, if_info->x)))
|| ((reversep = (noce_reversed_cond_code (if_info) != UNKNOWN))
&& if_info->b == const0_rtx
- && rtx_equal_p (if_info->a, if_info->x)))
+ && (REG_P (if_info->a) || rtx_equal_p (if_info->a, if_info->x))))
{
start_sequence ();
target = noce_emit_store_flag (if_info,
@@ -1689,7 +1689,7 @@ noce_try_store_flag_mask (struct noce_if
reversep, -1);
if (target)
target = expand_simple_binop (GET_MODE (if_info->x), AND,
- if_info->x,
+ reversep ? if_info->a : if_info->b,
target, if_info->x, 0,
OPTAB_WIDEN);
--- gcc/testsuite/gcc.target/riscv/pr105314.c.jj 2022-04-25 17:41:00.958736306 +0200
+++ gcc/testsuite/gcc.target/riscv/pr105314.c 2022-04-25 17:40:46.237940642 +0200
@@ -0,0 +1,12 @@
+/* PR rtl-optimization/105314 */
+/* { dg-do compile } *
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "\tbeq\t" } } */
+
+long
+foo (long a, long b, long c)
+{
+ if (c)
+ a = 0;
+ return a;
+}
Jakub
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314]
2022-04-26 7:39 [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314] Jakub Jelinek
@ 2022-04-26 7:58 ` Richard Biener
2022-04-26 8:07 ` Jakub Jelinek
2022-07-19 10:53 ` Maciej W. Rozycki
1 sibling, 1 reply; 5+ messages in thread
From: Richard Biener @ 2022-04-26 7:58 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Jeff Law, Eric Botcazou, gcc-patches
On Tue, 26 Apr 2022, Jakub Jelinek wrote:
> Hi!
>
> The following testcase regressed on riscv due to the splitting of critical
> edges in the sink pass, similarly to x86_64 compared to GCC 11 we now swap
> the edges, whether true or false edge goes to an empty forwarded bb.
> From GIMPLE POV, those 2 forms are equivalent, but as can be seen here, for
> some ifcvt opts it matters one way or another.
Do we have evidence that one or the other form is "better"? There's
the possibility to change CFG cleanup to process the edges in a
defined {true,false} or {false,true} order rather than edge number order.
Likewise we could order fallthru before EH or abnormal edges here.
It would be a bit intrusive (and come at a cost) since currently we
just iterate over all BBs, seeing if they are forwarders while ordering
would require a different iteration scheme. But doing all this might
also make the CFG cleanup result more stable with respect to IL
representation changes.
> On this testcase, noce_try_store_flag_mask used to trigger and transformed
> if (pseudo2) pseudo1 = 0;
> into
> pseudo1 &= -(pseudo2 == 0);
> But with the swapped edges ifcvt actually sees
> if (!pseudo2) pseudo3 = pseudo1; else pseudo3 = 0;
> and noce_try_store_flag_mask punts. IMHO there is no reason why it
> should punt those, it is equivalent to
> pseudo3 = pseudo1 & -(pseudo2 == 0);
> and especially if the target has 3 operand AND, it shouldn't be any more
> costly (and even with 2 operand AND, it might very well happen that RA
> can make it happen without any extra moves).
>
> Initially I've just removed the rtx_equal_p calls from the conditions
> and didn't add anything there, but that broke aarch64 bootstrap and
> regressed some testcases on x86_64, where if_info->a or if_info->b could be
> some larger expression that we can't force into a register.
> Furthermore, the case where both if_info->a and if_info->b are constants is
> better handled by other ifcvt optimizations like noce_try_store_flag
> or noce_try_inverse_constants or noce_try_store_flag_constants.
> So, I've restricted it to just a REG (perhaps SUBREG of REG might be ok too)
> next to what has been handled previously.
>
> Bootstrapped/regtested on {x86_64,i686,armv7hl,aarch64,powerpc64le}-linux,
> and tested it on the testcase in a cross to riscv*-linux, ok for trunk?
OK.
Thanks,
Richard.
> 2022-04-26 Jakub Jelinek <jakub@redhat.com>
>
> PR rtl-optimization/105314
> * ifcvt.cc (noce_try_store_flag_mask): Don't require that the non-zero
> operand is equal to if_info->x, instead use the non-zero operand
> as one of the operands of AND with if_info->x as target.
>
> * gcc.target/riscv/pr105314.c: New test.
>
> --- gcc/ifcvt.cc.jj 2022-03-15 09:11:55.312988179 +0100
> +++ gcc/ifcvt.cc 2022-04-25 17:37:11.278924377 +0200
> @@ -1678,10 +1678,10 @@ noce_try_store_flag_mask (struct noce_if
> reversep = 0;
>
> if ((if_info->a == const0_rtx
> - && rtx_equal_p (if_info->b, if_info->x))
> + && (REG_P (if_info->b) || rtx_equal_p (if_info->b, if_info->x)))
> || ((reversep = (noce_reversed_cond_code (if_info) != UNKNOWN))
> && if_info->b == const0_rtx
> - && rtx_equal_p (if_info->a, if_info->x)))
> + && (REG_P (if_info->a) || rtx_equal_p (if_info->a, if_info->x))))
> {
> start_sequence ();
> target = noce_emit_store_flag (if_info,
> @@ -1689,7 +1689,7 @@ noce_try_store_flag_mask (struct noce_if
> reversep, -1);
> if (target)
> target = expand_simple_binop (GET_MODE (if_info->x), AND,
> - if_info->x,
> + reversep ? if_info->a : if_info->b,
> target, if_info->x, 0,
> OPTAB_WIDEN);
>
> --- gcc/testsuite/gcc.target/riscv/pr105314.c.jj 2022-04-25 17:41:00.958736306 +0200
> +++ gcc/testsuite/gcc.target/riscv/pr105314.c 2022-04-25 17:40:46.237940642 +0200
> @@ -0,0 +1,12 @@
> +/* PR rtl-optimization/105314 */
> +/* { dg-do compile } *
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "\tbeq\t" } } */
> +
> +long
> +foo (long a, long b, long c)
> +{
> + if (c)
> + a = 0;
> + return a;
> +}
>
> Jakub
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314]
2022-04-26 7:58 ` Richard Biener
@ 2022-04-26 8:07 ` Jakub Jelinek
2022-04-26 8:39 ` Richard Biener
0 siblings, 1 reply; 5+ messages in thread
From: Jakub Jelinek @ 2022-04-26 8:07 UTC (permalink / raw)
To: Richard Biener; +Cc: Jeff Law, Eric Botcazou, gcc-patches
On Tue, Apr 26, 2022 at 09:58:35AM +0200, Richard Biener wrote:
> > The following testcase regressed on riscv due to the splitting of critical
> > edges in the sink pass, similarly to x86_64 compared to GCC 11 we now swap
> > the edges, whether true or false edge goes to an empty forwarded bb.
> > From GIMPLE POV, those 2 forms are equivalent, but as can be seen here, for
> > some ifcvt opts it matters one way or another.
>
> Do we have evidence that one or the other form is "better"? There's
> the possibility to change CFG cleanup to process the edges in a
> defined {true,false} or {false,true} order rather than edge number order.
> Likewise we could order fallthru before EH or abnormal edges here.
> It would be a bit intrusive (and come at a cost) since currently we
> just iterate over all BBs, seeing if they are forwarders while ordering
> would require a different iteration scheme. But doing all this might
> also make the CFG cleanup result more stable with respect to IL
> representation changes.
I have the feeling that sometimes one order and sometimes another order is
better, sometimes one order means we reuse a pseudo, sometimes the other,
and in some cases reusing a pseudo disables ifcvt optimization, in another
is the only case where it works.
I think it is best to make ifcvt work with any orders, but not sure it will
be always possible. E.g. sometimes the reused pseudo means we need to
allocate an extra temporary, and while say RA can generate the same code
from it, during ifcvt the sequence now might have higher cost.
Also, we have the case of cond_move_process_if_block which doesn't bother
checking any costs, so it can force cases where we otherwise carefully punt
because of costs. I'm afraid to touch that this late and not really sure
I'll have time for it during stage1 either.
Jakub
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314]
2022-04-26 8:07 ` Jakub Jelinek
@ 2022-04-26 8:39 ` Richard Biener
0 siblings, 0 replies; 5+ messages in thread
From: Richard Biener @ 2022-04-26 8:39 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Jeff Law, Eric Botcazou, gcc-patches
On Tue, 26 Apr 2022, Jakub Jelinek wrote:
> On Tue, Apr 26, 2022 at 09:58:35AM +0200, Richard Biener wrote:
> > > The following testcase regressed on riscv due to the splitting of critical
> > > edges in the sink pass, similarly to x86_64 compared to GCC 11 we now swap
> > > the edges, whether true or false edge goes to an empty forwarded bb.
> > > From GIMPLE POV, those 2 forms are equivalent, but as can be seen here, for
> > > some ifcvt opts it matters one way or another.
> >
> > Do we have evidence that one or the other form is "better"? There's
> > the possibility to change CFG cleanup to process the edges in a
> > defined {true,false} or {false,true} order rather than edge number order.
> > Likewise we could order fallthru before EH or abnormal edges here.
> > It would be a bit intrusive (and come at a cost) since currently we
> > just iterate over all BBs, seeing if they are forwarders while ordering
> > would require a different iteration scheme. But doing all this might
> > also make the CFG cleanup result more stable with respect to IL
> > representation changes.
>
> I have the feeling that sometimes one order and sometimes another order is
> better, sometimes one order means we reuse a pseudo, sometimes the other,
> and in some cases reusing a pseudo disables ifcvt optimization, in another
> is the only case where it works.
> I think it is best to make ifcvt work with any orders, but not sure it will
> be always possible. E.g. sometimes the reused pseudo means we need to
> allocate an extra temporary, and while say RA can generate the same code
> from it, during ifcvt the sequence now might have higher cost.
Agreed.
> Also, we have the case of cond_move_process_if_block which doesn't bother
> checking any costs, so it can force cases where we otherwise carefully punt
> because of costs. I'm afraid to touch that this late and not really sure
> I'll have time for it during stage1 either.
OK, I'm keeping sanitizing CFG cleanup on my TODO for stage1 since I
want to look at the critical edge split / forwarder removal anyway for
other reasons.
Richard.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314]
2022-04-26 7:39 [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314] Jakub Jelinek
2022-04-26 7:58 ` Richard Biener
@ 2022-07-19 10:53 ` Maciej W. Rozycki
1 sibling, 0 replies; 5+ messages in thread
From: Maciej W. Rozycki @ 2022-07-19 10:53 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Richard Biener, Jeff Law, Eric Botcazou, gcc-patches
Hi Jakub,
On Tue, 26 Apr 2022, Jakub Jelinek via Gcc-patches wrote:
> The following testcase regressed on riscv due to the splitting of critical
> edges in the sink pass, similarly to x86_64 compared to GCC 11 we now swap
> the edges, whether true or false edge goes to an empty forwarded bb.
> >From GIMPLE POV, those 2 forms are equivalent, but as can be seen here, for
> some ifcvt opts it matters one way or another.
[...]
> --- gcc/testsuite/gcc.target/riscv/pr105314.c.jj 2022-04-25 17:41:00.958736306 +0200
> +++ gcc/testsuite/gcc.target/riscv/pr105314.c 2022-04-25 17:40:46.237940642 +0200
> @@ -0,0 +1,12 @@
> +/* PR rtl-optimization/105314 */
> +/* { dg-do compile } *
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "\tbeq\t" } } */
> +
> +long
> +foo (long a, long b, long c)
> +{
> + if (c)
> + a = 0;
> + return a;
> +}
It's not clear to me why we insist that a branch is to be unconditionally
avoided here. I think it has to be microarchitecture-specific and based
on my observations a branch will or will not be used depending on what the
specific microarchitecture sets the branch cost to (via BRANCH_COST), e.g.
if the cost is set to 1 (which none of our current microarchitectures do),
then a branch is preferred to a three-instruction fall-through sequence,
i.e.:
foo:
beq a2,zero,.L1
li a0,0
.L1:
ret
vs:
foo:
seqz a2,a2
neg a2,a2
and a0,a0,a2
ret
Also with `-mtune=sifive-7-series' code produced here is already as
follows:
foo:
beq a2,zero,1f; mv a0,zero; 1: # movcc
ret
and it only passes the test case due to sloppy coding in the machine
description (which I'll post a fix for separately and which will obviously
uncover the failure expected here) combined with the test's strictness
about using tabs for white space.
Therefore shouldn't the case be somehow restricted to a subset of RISC-V
microarchitectures, ones we know that do not want to see a branch here?
Maciej
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-07-19 10:53 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-26 7:39 [PATCH] ifcvt: Improve noce_try_store_flag_mask [PR105314] Jakub Jelinek
2022-04-26 7:58 ` Richard Biener
2022-04-26 8:07 ` Jakub Jelinek
2022-04-26 8:39 ` Richard Biener
2022-07-19 10:53 ` Maciej W. Rozycki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).