* [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
@ 2017-06-19 13:47 Richard Earnshaw (lists)
2017-06-19 14:08 ` Segher Boessenkool
2017-06-30 9:03 ` Richard Earnshaw (lists)
0 siblings, 2 replies; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-06-19 13:47 UTC (permalink / raw)
To: gcc-patches; +Cc: Segher Boessenkool
[-- Attachment #1: Type: text/plain, Size: 905 bytes --]
Many parallel set insns are of the form of a single set that also sets
the condition code flags. In this case the cost of such an insn is
normally the cost of the part that doesn't set the flags, since updating
the condition flags is simply a side effect.
At present all such insns are treated as having unknown cost (ie 0) and
combine assumes that such insns are infinitely more expensive than any
other insn sequence with a non-zero cost.
This patch addresses this problem by allowing insn_rtx_cost to ignore
the condition setting part of a PARALLEL iff there is exactly one
comparison set and one non-comparison set. If the only set operation is
a comparison we still use that as the basis of the insn cost.
* rtlanal.c (insn_rtx_cost): If a parallel contains exactly one
comparison set and one other set, use the cost of the
non-comparison set.
Bootstrapped on aarch64-none-linuxgnu
OK?
R.
[-- Attachment #2: insn-costs.patch --]
[-- Type: text/x-patch, Size: 1285 bytes --]
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index d9f57c3..5cae793 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -5260,23 +5260,41 @@ insn_rtx_cost (rtx pat, bool speed)
int i, cost;
rtx set;
- /* Extract the single set rtx from the instruction pattern.
- We can't use single_set since we only have the pattern. */
+ /* Extract the single set rtx from the instruction pattern. We
+ can't use single_set since we only have the pattern. We also
+ consider PARALLELs of a normal set and and a single comparison.
+ In that case we use the cost of the non-comparison SET operation,
+ which is most-likely to be the real cost of this operation. */
if (GET_CODE (pat) == SET)
set = pat;
else if (GET_CODE (pat) == PARALLEL)
{
set = NULL_RTX;
+ rtx comparison = NULL_RTX;
+
for (i = 0; i < XVECLEN (pat, 0); i++)
{
rtx x = XVECEXP (pat, 0, i);
if (GET_CODE (x) == SET)
{
- if (set)
- return 0;
- set = x;
+ if (GET_CODE (SET_SRC (x)) == COMPARE)
+ {
+ if (comparison)
+ return 0;
+ comparison = x;
+ }
+ else
+ {
+ if (set)
+ return 0;
+ set = x;
+ }
}
}
+
+ if (!set && comparison)
+ set = comparison;
+
if (!set)
return 0;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
2017-06-19 13:47 [rtlanal] Do a better job of costing parallel sets containing flag-setting operations Richard Earnshaw (lists)
@ 2017-06-19 14:08 ` Segher Boessenkool
2017-06-19 14:28 ` Richard Earnshaw (lists)
2017-06-19 14:45 ` Richard Earnshaw (lists)
2017-06-30 9:03 ` Richard Earnshaw (lists)
1 sibling, 2 replies; 11+ messages in thread
From: Segher Boessenkool @ 2017-06-19 14:08 UTC (permalink / raw)
To: Richard Earnshaw (lists); +Cc: gcc-patches
Hi!
On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote:
> Many parallel set insns are of the form of a single set that also sets
> the condition code flags. In this case the cost of such an insn is
> normally the cost of the part that doesn't set the flags, since updating
> the condition flags is simply a side effect.
>
> At present all such insns are treated as having unknown cost (ie 0) and
> combine assumes that such insns are infinitely more expensive than any
> other insn sequence with a non-zero cost.
That's not what combine does: it optimistically assumes any combination
with unknown costs is an improvement.
> This patch addresses this problem by allowing insn_rtx_cost to ignore
> the condition setting part of a PARALLEL iff there is exactly one
> comparison set and one non-comparison set. If the only set operation is
> a comparison we still use that as the basis of the insn cost.
I'll test this on a zillion archs, see what the effect is.
Have you considered costing general parallels as well?
Segher
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
2017-06-19 14:08 ` Segher Boessenkool
@ 2017-06-19 14:28 ` Richard Earnshaw (lists)
2017-06-19 15:06 ` Segher Boessenkool
2017-06-19 14:45 ` Richard Earnshaw (lists)
1 sibling, 1 reply; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-06-19 14:28 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
On 19/06/17 15:08, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote:
>> Many parallel set insns are of the form of a single set that also sets
>> the condition code flags. In this case the cost of such an insn is
>> normally the cost of the part that doesn't set the flags, since updating
>> the condition flags is simply a side effect.
>>
>> At present all such insns are treated as having unknown cost (ie 0) and
>> combine assumes that such insns are infinitely more expensive than any
>> other insn sequence with a non-zero cost.
>
> That's not what combine does: it optimistically assumes any combination
> with unknown costs is an improvement.
So try this testcase on ARM.
unsigned long x, y, z;
int b;
void test()
{
b = __builtin_sub_overflow (y,z, &x);
}
Without the patch, combine rips apart a compare and subtract insn
because it sees it as having cost zero and substitutes it with separate
compare and subtract insns.
ie before:
ldr r3, .L5
ldr r2, .L5+4
ldr r3, [r3]
ldr r2, [r2]
cmp r3, r2 <=====
movcs r0, #0
movcc r0, #1
ldr ip, .L5+8
ldr r1, .L5+12
sub r3, r3, r2 <=====
str r3, [ip]
str r0, [r1]
bx lr
after:
ldr r3, .L5
ldr r2, .L5+4
ldr r3, [r3]
ldr r2, [r2]
subs r3, r3, r2 <====
movcc r1, #1
movcs r1, #0
ldr r0, .L5+8
ldr r2, .L5+12
str r3, [r0]
str r1, [r2]
bx lr
The combine log before the patch shows:
allowing combination of insns 10 and 51
original costs 0 + 8 = 0
replacement costs 4 + 12 = 16
So it is clearly deciding that the original costs are greater than the
replacement costs.
>
>> This patch addresses this problem by allowing insn_rtx_cost to ignore
>> the condition setting part of a PARALLEL iff there is exactly one
>> comparison set and one non-comparison set. If the only set operation is
>> a comparison we still use that as the basis of the insn cost.
>
> I'll test this on a zillion archs, see what the effect is.
>
> Have you considered costing general parallels as well?
>
>
I thought about it but concluded that there's no generically correct
answer. It might be the max of all the individual sets or it might be
the sum, or it might be somewhere in between. For example on ARM the
load/store multiple operations are expressed as parallels, but their
cost will depend on how many loads/stores happen in parallel within the
hardware.
I think we'd need a new back-end hook to handle the other cases sensibly.
R.
> Segher
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
2017-06-19 14:08 ` Segher Boessenkool
2017-06-19 14:28 ` Richard Earnshaw (lists)
@ 2017-06-19 14:45 ` Richard Earnshaw (lists)
2017-06-19 15:09 ` Segher Boessenkool
1 sibling, 1 reply; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-06-19 14:45 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
On 19/06/17 15:08, Segher Boessenkool wrote:
> Hi!
>
> On Mon, Jun 19, 2017 at 02:46:59PM +0100, Richard Earnshaw (lists) wrote:
>> Many parallel set insns are of the form of a single set that also sets
>> the condition code flags. In this case the cost of such an insn is
>> normally the cost of the part that doesn't set the flags, since updating
>> the condition flags is simply a side effect.
>>
>> At present all such insns are treated as having unknown cost (ie 0) and
>> combine assumes that such insns are infinitely more expensive than any
>> other insn sequence with a non-zero cost.
>
> That's not what combine does: it optimistically assumes any combination
> with unknown costs is an improvement.
Actually the logic is
int reject = old_cost > 0 && new_cost > old_cost;
So reject will never be true if old cost is zero.
R.
>
>> This patch addresses this problem by allowing insn_rtx_cost to ignore
>> the condition setting part of a PARALLEL iff there is exactly one
>> comparison set and one non-comparison set. If the only set operation is
>> a comparison we still use that as the basis of the insn cost.
>
> I'll test this on a zillion archs, see what the effect is.
>
> Have you considered costing general parallels as well?
>
>
> Segher
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
2017-06-19 14:28 ` Richard Earnshaw (lists)
@ 2017-06-19 15:06 ` Segher Boessenkool
0 siblings, 0 replies; 11+ messages in thread
From: Segher Boessenkool @ 2017-06-19 15:06 UTC (permalink / raw)
To: Richard Earnshaw (lists); +Cc: gcc-patches
On Mon, Jun 19, 2017 at 03:28:20PM +0100, Richard Earnshaw (lists) wrote:
> > That's not what combine does: it optimistically assumes any combination
> > with unknown costs is an improvement.
>
> So try this testcase on ARM.
>
> unsigned long x, y, z;
> int b;
> void test()
> {
> b = __builtin_sub_overflow (y,z, &x);
> }
>
>
> Without the patch, combine rips apart a compare and subtract insn
> because it sees it as having cost zero and substitutes it with separate
> compare and subtract insns.
> The combine log before the patch shows:
>
> allowing combination of insns 10 and 51
> original costs 0 + 8 = 0
> replacement costs 4 + 12 = 16
Yes, this is a good example of a case where your patch helps. Thanks.
> So it is clearly deciding that the original costs are greater than the
> replacement costs.
No: it allows any combination with unknown cost (either old or new cost).
See combine_validate_cost.
> >> This patch addresses this problem by allowing insn_rtx_cost to ignore
> >> the condition setting part of a PARALLEL iff there is exactly one
> >> comparison set and one non-comparison set. If the only set operation is
> >> a comparison we still use that as the basis of the insn cost.
> >
> > I'll test this on a zillion archs, see what the effect is.
> >
> > Have you considered costing general parallels as well?
>
> I thought about it but concluded that there's no generically correct
> answer. It might be the max of all the individual sets or it might be
> the sum, or it might be somewhere in between. For example on ARM the
> load/store multiple operations are expressed as parallels, but their
> cost will depend on how many loads/stores happen in parallel within the
> hardware.
>
> I think we'd need a new back-end hook to handle the other cases sensibly.
And in general make insn_rtx_cost do something more sane than just looking
at a set_src_cost, yeah.
The problem is changing any of this without regressing some targets.
Of course we are in stage 1 now ;-)
Segher
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
2017-06-19 14:45 ` Richard Earnshaw (lists)
@ 2017-06-19 15:09 ` Segher Boessenkool
2017-06-19 16:01 ` Richard Earnshaw (lists)
0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2017-06-19 15:09 UTC (permalink / raw)
To: Richard Earnshaw (lists); +Cc: gcc-patches
On Mon, Jun 19, 2017 at 03:45:23PM +0100, Richard Earnshaw (lists) wrote:
> >> At present all such insns are treated as having unknown cost (ie 0) and
> >> combine assumes that such insns are infinitely more expensive than any
> >> other insn sequence with a non-zero cost.
> >
> > That's not what combine does: it optimistically assumes any combination
> > with unknown costs is an improvement.
>
> Actually the logic is
>
> int reject = old_cost > 0 && new_cost > old_cost;
>
> So reject will never be true if old cost is zero.
Yes, exactly; and neither if new_cost is zero. If any cost is unknown
combine just hopes for the best.
Segher
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
2017-06-19 15:09 ` Segher Boessenkool
@ 2017-06-19 16:01 ` Richard Earnshaw (lists)
2017-06-19 17:41 ` Segher Boessenkool
0 siblings, 1 reply; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-06-19 16:01 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
On 19/06/17 16:09, Segher Boessenkool wrote:
> On Mon, Jun 19, 2017 at 03:45:23PM +0100, Richard Earnshaw (lists) wrote:
>>>> At present all such insns are treated as having unknown cost (ie 0) and
>>>> combine assumes that such insns are infinitely more expensive than any
>>>> other insn sequence with a non-zero cost.
>>>
>>> That's not what combine does: it optimistically assumes any combination
>>> with unknown costs is an improvement.
>>
>> Actually the logic is
>>
>> int reject = old_cost > 0 && new_cost > old_cost;
>>
>> So reject will never be true if old cost is zero.
>
> Yes, exactly; and neither if new_cost is zero. If any cost is unknown
> combine just hopes for the best.
>
>
> Segher
>
Yeah, and I'm not suggesting we change the logic there (sorry if the
description was misleading). Instead I'm proposing that we handle more
cases for parallels to not return zero.
R.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
2017-06-19 16:01 ` Richard Earnshaw (lists)
@ 2017-06-19 17:41 ` Segher Boessenkool
2017-06-20 12:55 ` Segher Boessenkool
0 siblings, 1 reply; 11+ messages in thread
From: Segher Boessenkool @ 2017-06-19 17:41 UTC (permalink / raw)
To: Richard Earnshaw (lists); +Cc: gcc-patches
On Mon, Jun 19, 2017 at 05:01:10PM +0100, Richard Earnshaw (lists) wrote:
> Yeah, and I'm not suggesting we change the logic there (sorry if the
> description was misleading). Instead I'm proposing that we handle more
> cases for parallels to not return zero.
Right. My test run is half way through, will have results later --
your change looks good to me, but it is always surprising whether
better costs help or not, or even *hurt* good code generation (things
are just too tightly tuned to the current behaviour, so some things
may need retuning).
Segher
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
2017-06-19 17:41 ` Segher Boessenkool
@ 2017-06-20 12:55 ` Segher Boessenkool
0 siblings, 0 replies; 11+ messages in thread
From: Segher Boessenkool @ 2017-06-20 12:55 UTC (permalink / raw)
To: Richard Earnshaw (lists); +Cc: gcc-patches
On Mon, Jun 19, 2017 at 12:40:53PM -0500, Segher Boessenkool wrote:
> On Mon, Jun 19, 2017 at 05:01:10PM +0100, Richard Earnshaw (lists) wrote:
> > Yeah, and I'm not suggesting we change the logic there (sorry if the
> > description was misleading). Instead I'm proposing that we handle more
> > cases for parallels to not return zero.
>
> Right. My test run is half way through, will have results later --
> your change looks good to me, but it is always surprising whether
> better costs help or not, or even *hurt* good code generation (things
> are just too tightly tuned to the current behaviour, so some things
> may need retuning).
Everything built successfully (31 targets); --enable-checking=yes,rtl,tree
so it took a while, sorry.
The targets with any differences (table shows code size):
old patched
arm 11545709 11545797
powerpc 8442762 8442746
x86_64 10627428 10627363
Arm has very many differences, the others do not. For powerpc (which
is 32-bit, 64-bit showed no differences) most of the difference is
scheduling deciding to do things a bit differently, and most of it
in places where we have not-so-good costs anyway. For arm the effects
often cascade to bb-reorder making different decisions.
Anyway, all differences are small, it is not likely to hurt anything.
I support the patch, if that helps -- but I cannot approve it.
Segher
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
2017-06-19 13:47 [rtlanal] Do a better job of costing parallel sets containing flag-setting operations Richard Earnshaw (lists)
2017-06-19 14:08 ` Segher Boessenkool
@ 2017-06-30 9:03 ` Richard Earnshaw (lists)
2017-06-30 15:20 ` Jeff Law
1 sibling, 1 reply; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-06-30 9:03 UTC (permalink / raw)
To: gcc-patches
On 19/06/17 14:46, Richard Earnshaw (lists) wrote:
> Many parallel set insns are of the form of a single set that also sets
> the condition code flags. In this case the cost of such an insn is
> normally the cost of the part that doesn't set the flags, since updating
> the condition flags is simply a side effect.
>
> At present all such insns are treated as having unknown cost (ie 0) and
> combine assumes that such insns are infinitely more expensive than any
> other insn sequence with a non-zero cost.
>
> This patch addresses this problem by allowing insn_rtx_cost to ignore
> the condition setting part of a PARALLEL iff there is exactly one
> comparison set and one non-comparison set. If the only set operation is
> a comparison we still use that as the basis of the insn cost.
>
> * rtlanal.c (insn_rtx_cost): If a parallel contains exactly one
> comparison set and one other set, use the cost of the
> non-comparison set.
>
> Bootstrapped on aarch64-none-linuxgnu
>
> OK?
>
Ping?
R.
> R.
>
>
> insn-costs.patch
>
>
> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index d9f57c3..5cae793 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -5260,23 +5260,41 @@ insn_rtx_cost (rtx pat, bool speed)
> int i, cost;
> rtx set;
>
> - /* Extract the single set rtx from the instruction pattern.
> - We can't use single_set since we only have the pattern. */
> + /* Extract the single set rtx from the instruction pattern. We
> + can't use single_set since we only have the pattern. We also
> + consider PARALLELs of a normal set and and a single comparison.
> + In that case we use the cost of the non-comparison SET operation,
> + which is most-likely to be the real cost of this operation. */
> if (GET_CODE (pat) == SET)
> set = pat;
> else if (GET_CODE (pat) == PARALLEL)
> {
> set = NULL_RTX;
> + rtx comparison = NULL_RTX;
> +
> for (i = 0; i < XVECLEN (pat, 0); i++)
> {
> rtx x = XVECEXP (pat, 0, i);
> if (GET_CODE (x) == SET)
> {
> - if (set)
> - return 0;
> - set = x;
> + if (GET_CODE (SET_SRC (x)) == COMPARE)
> + {
> + if (comparison)
> + return 0;
> + comparison = x;
> + }
> + else
> + {
> + if (set)
> + return 0;
> + set = x;
> + }
> }
> }
> +
> + if (!set && comparison)
> + set = comparison;
> +
> if (!set)
> return 0;
> }
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [rtlanal] Do a better job of costing parallel sets containing flag-setting operations.
2017-06-30 9:03 ` Richard Earnshaw (lists)
@ 2017-06-30 15:20 ` Jeff Law
0 siblings, 0 replies; 11+ messages in thread
From: Jeff Law @ 2017-06-30 15:20 UTC (permalink / raw)
To: Richard Earnshaw (lists), gcc-patches
On 06/30/2017 03:03 AM, Richard Earnshaw (lists) wrote:
> On 19/06/17 14:46, Richard Earnshaw (lists) wrote:
>> Many parallel set insns are of the form of a single set that also sets
>> the condition code flags. In this case the cost of such an insn is
>> normally the cost of the part that doesn't set the flags, since updating
>> the condition flags is simply a side effect.
>>
>> At present all such insns are treated as having unknown cost (ie 0) and
>> combine assumes that such insns are infinitely more expensive than any
>> other insn sequence with a non-zero cost.
>>
>> This patch addresses this problem by allowing insn_rtx_cost to ignore
>> the condition setting part of a PARALLEL iff there is exactly one
>> comparison set and one non-comparison set. If the only set operation is
>> a comparison we still use that as the basis of the insn cost.
>>
>> * rtlanal.c (insn_rtx_cost): If a parallel contains exactly one
>> comparison set and one other set, use the cost of the
>> non-comparison set.
>>
>> Bootstrapped on aarch64-none-linuxgnu
>>
>> OK?
>>
>
> Ping?
Needs a ChangeLog, with that, OK. Ideally we'd have something which
verifies we're getting the cost right, but that's probably nontrivial to
do in a generic manner.
jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-06-30 15:20 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-19 13:47 [rtlanal] Do a better job of costing parallel sets containing flag-setting operations Richard Earnshaw (lists)
2017-06-19 14:08 ` Segher Boessenkool
2017-06-19 14:28 ` Richard Earnshaw (lists)
2017-06-19 15:06 ` Segher Boessenkool
2017-06-19 14:45 ` Richard Earnshaw (lists)
2017-06-19 15:09 ` Segher Boessenkool
2017-06-19 16:01 ` Richard Earnshaw (lists)
2017-06-19 17:41 ` Segher Boessenkool
2017-06-20 12:55 ` Segher Boessenkool
2017-06-30 9:03 ` Richard Earnshaw (lists)
2017-06-30 15:20 ` Jeff Law
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).