public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/49095] New: Horrible code generation for trivial decrement with test
@ 2011-05-21 3:46 torvalds@linux-foundation.org
2011-05-21 10:12 ` [Bug rtl-optimization/49095] " rguenth at gcc dot gnu.org
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-21 3:46 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
Summary: Horrible code generation for trivial decrement with
test
Product: gcc
Version: 4.5.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: other
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: torvalds@linux-foundation.org
This trivial code:
extern void fncall(void *);
int main(int argc, char **argv)
{
if (!--*argv)
fncall(argv);
return 0;
}
compiles into this ridiculous x86-64 assembly language:
movq (%rsi), %rax
subq $1, %rax
testq %rax, %rax
movq %rax, (%rsi)
je .L4
for the "decrement and test result" at -O2.
I'd have expected that any reasonable compiler would generate something like
decq (%rsi)
je .L4
instead, which would be smaller and faster (even a "subq $1" would be fine, but
the decq is one byte shorter).
The problem is more noticeable when the memory location is a structure offset,
when the "load+decrement+store" model really results in relatively much bigger
code due to the silly repetition of the memory address, for absolutely no
advantage.
Is there some way that I haven't found to make gcc use the rmw instructions?
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
@ 2011-05-21 10:12 ` rguenth at gcc dot gnu.org
2011-05-21 19:22 ` torvalds@linux-foundation.org
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-05-21 10:12 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |x86_64-*-*, i?86-*-*
Status|UNCONFIRMED |NEW
Keywords| |missed-optimization
Last reconfirmed| |2011.05.21 09:52:49
Component|other |rtl-optimization
Ever Confirmed|0 |1
--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-05-21 09:52:49 UTC ---
Confirmed (not using decq is because it is slower for some archs). On the
tree level we cannot do better than
D.2722_2 = *argv_1(D);
D.2723_3 = D.2722_2 + -1;
*argv_1(D) = D.2723_3;
if (D.2723_3 == 0B)
because we lack anything like direct operations on memory (and that's good).
On the RTL side combine tries to do
Trying 7, 8 -> 9:
Failed to match this instruction:
(parallel [
(set (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8 A64])
(plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8
A64])
(const_int -1 [0xffffffffffffffff])))
(set (reg/f:DI 60 [ D.2723 ])
(plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8
A64])
(const_int -1 [0xffffffffffffffff])))
])
because we have a use of the decrement result in the comparison. It doesn't
try to combine this with the comparison though.
So this case is really special ;) Without the use of the decremented
value we get the desired subq $1, (%rsi).
Manually sinking the store to *argv into the if and the else yields
movq (%rsi), %rbx
subq $1, %rbx
je L4
L2:
movq %rbx, (%rsi)
...
L4:
LCFI4:
movq $0, (%rsi)
movq %rsi, %rdi
movq %rsi, 8(%rsp)
call _fncall
so at least we then can combine the decrement with the test ...
As usual combine doesn't like stores.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
2011-05-21 10:12 ` [Bug rtl-optimization/49095] " rguenth at gcc dot gnu.org
@ 2011-05-21 19:22 ` torvalds@linux-foundation.org
2011-05-21 21:33 ` torvalds@linux-foundation.org
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-21 19:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #2 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-21 18:41:15 UTC ---
(In reply to comment #1)
>
> On the RTL side combine tries to do
>
> Trying 7, 8 -> 9:
> Failed to match this instruction:
> (parallel [
> (set (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8 A64])
> (plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8
> A64])
> (const_int -1 [0xffffffffffffffff])))
> (set (reg/f:DI 60 [ D.2723 ])
> (plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8
> A64])
> (const_int -1 [0xffffffffffffffff])))
> ])
>
> because we have a use of the decrement result in the comparison. It doesn't
> try to combine this with the comparison though.
Why isn't there a trivial pattern for the combination of "add+cmp0"? It sounds
like a peephole optimization to me.
> So this case is really special ;) Without the use of the decremented
> value we get the desired subq $1, (%rsi).
The whole notion of "decrement and check if zero" is just about as special as
mud.
And I realize that without the "check if zero" part I get the single rmw
instruction, but I was really hoping that gcc would get this kind of really
obvious code right. There is absolutely no question about what the correct
result is, and gcc simply doesn't generate it.
I'm used to gcc sometimes being confused by more complicated things (inline
asms, bitfields etc), but this is really basic code.
The load-store model is fine for a Pentium 4 - those things were not very good
at complex instructions. But it generates horribly big code, and modern x86
chips all want the "operate on memory" version.
> Manually sinking the store to *argv into the if and the else yields
Yeah. And that's pretty horrible.
> As usual combine doesn't like stores.
Is there some reason this can't just be a peephole pattern?
I really thought that gcc has done this before.
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
2011-05-21 10:12 ` [Bug rtl-optimization/49095] " rguenth at gcc dot gnu.org
2011-05-21 19:22 ` torvalds@linux-foundation.org
@ 2011-05-21 21:33 ` torvalds@linux-foundation.org
2011-05-27 10:50 ` jakub at gcc dot gnu.org
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-21 21:33 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #3 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-21 20:42:26 UTC ---
Hmm. Looking at that code generation, it strikes me that even with the odd load
store situation, why do we have that "test" instruction?
c: 8b 10 mov (%eax),%edx
e: 83 ea 01 sub $0x1,%edx
11: 85 d2 test %edx,%edx
13: 89 10 mov %edx,(%eax)
15: 74 09 je 20 <main+0x20>
iow, regardless of any complexities of the store, that "sub + test" is just
odd. Gcc knows to simplify that particular sequence in other situations, why
doesn't it simplify it here?
IOW, I can make gcc generate code like
c: 83 e8 01 sub $0x1,%eax
f: 75 07 jne 18 <main+0x18>
with no real problem when it's in registers. No "test" instruction after the
sub. Why does that store matter so much?
It looks like the combine is bring driven by the conditional branch, and then
when the previous instruction from the conditional branch is that store,
everything kind of goes to hell.
Would it be possible to have a peephole for the "arithmetic/logical +
compare-with-zero" case (causing us to just drop the compare), and then have a
separate peephole optimization that triggers the "load + op + store with dead
reg" and turns that into a "op to mem" case?
The MD files do make me confused, so maybe there is some fundamental limitation
to the peephole patterns that makes this impossible?
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
` (2 preceding siblings ...)
2011-05-21 21:33 ` torvalds@linux-foundation.org
@ 2011-05-27 10:50 ` jakub at gcc dot gnu.org
2011-05-27 12:30 ` jakub at gcc dot gnu.org
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-27 10:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
AssignedTo|unassigned at gcc dot |jakub at gcc dot gnu.org
|gnu.org |
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-27 10:35:52 UTC ---
Created attachment 24366
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24366
gcc47-pr49095.patch
Untested patch which solves this (for some cases) using peephole2.
The reason combiner can't do this is that there are no LOG_LINKS in between the
comparison and memory store, those two insns are independent and thus don't
ever show up together in the same try_combine call (together with their
dependencies). And we generate:
movq (%rsi), %rax
subq $1, %rax
testq %rax, %rax
movq %rax, (%rsi)
je .L4
instead of
movq (%rsi), %rax
subq $1, %rax
movq %rax, (%rsi)
je .L4
because in case of multiple uses, LOG_LINKS are on the first use, and the
memory store is before the test during combine (only later scheduling changes
it).
Thus the test has no LOG_LINKS at all and isn't being attempted to be merged
together with the subtraction.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
` (3 preceding siblings ...)
2011-05-27 10:50 ` jakub at gcc dot gnu.org
@ 2011-05-27 12:30 ` jakub at gcc dot gnu.org
2011-05-27 14:22 ` torvalds@linux-foundation.org
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-27 12:30 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #24366|0 |1
is obsolete| |
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-27 11:58:37 UTC ---
Created attachment 24368
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24368
gcc47-pr49095.patch
Improved patch with two new peephole2 patterns, which in addition to the
earlier:
reg0 = mem1; reg0 op= nonmem2; mem1 = reg0; if (reg0 != 0)
optimizes also:
reg0 op= mem1; mem1 = reg0; if (reg0 != 0)
and variant of the first, where all the operations are in QI or HImode, except
for op= which is in SImode.
Again, the patch has been tested just on this testcase so far.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
` (4 preceding siblings ...)
2011-05-27 12:30 ` jakub at gcc dot gnu.org
@ 2011-05-27 14:22 ` torvalds@linux-foundation.org
2011-05-27 14:55 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-27 14:22 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #6 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-27 14:15:25 UTC ---
Jakub - the patch looks sane, although I don't currently have a gcc build
environment to actually test it with, and there is no way I'm going to claim
that I follow all the matches and understand why you have that third pattern
with SWI12 (but I'm guessing it's because the op and the test are being done in
the "explicit cast to integer" mode).
One thing I wanted to check, though: I hope everybody is aware that
op $x,mem
is not identically the same as
mov mem,reg
op $x, reg
mov reg,mem
test reg
WRT the carry flag. The "test" version will always clear the carry flag, while
the "op" version will obviously set it according to the "op". For the logical
ops, this isn't a problem (they also clear carry), but for "add" and "sub" this
transformation *will* result in a difference in the C flag.
Now, "carry" is meaningless when talking about compare with zero, so this
should all be trivially ok. But I bring it up because it is possible that gcc
perhaps still uses the "unsigned compare" versions with zero.
In particular, the thing to look out for would be code like
unsigned int *a;
if (--*a > 0)
do_something();
and if the comparison uses "jg" ("unsigned greater than") for the comparison,
it will get the wrong end result.
Now, unsigned compares against zero are always expressible as "ne" or "eq", so
this is not a fundamental problem. In fact, my trivial testing with a few cases
seems to imply that gcc already does that conversion to inequality, and you'd
obviously have exactly the same issue with eliding the "test" instruction for
the cases you already handle (when things are in registers).
So I think everything is fine - but I wanted to mention it explicitly in case
it makes you go "ooh, yes, there are cases when this is an issue"
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
` (5 preceding siblings ...)
2011-05-27 14:22 ` torvalds@linux-foundation.org
@ 2011-05-27 14:55 ` jakub at gcc dot gnu.org
2011-05-27 16:02 ` torvalds@linux-foundation.org
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-27 14:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-27 14:47:08 UTC ---
(In reply to comment #6)
> Jakub - the patch looks sane, although I don't currently have a gcc build
> environment to actually test it with, and there is no way I'm going to claim
> that I follow all the matches and understand why you have that third pattern
> with SWI12 (but I'm guessing it's because the op and the test are being done in
> the "explicit cast to integer" mode).
E.g. in the testcase from the patch, in hcharplus routine peephole2 sees:
(insn 27 6 28 2 (set (reg:QI 0 ax [orig:62 D.3491 ] [62])
(mem/c:QI (reg/v/f:DI 3 bx [orig:64 x ] [64]) [0 *x_1(D)+0 S1 A8]))
pr49095.c:63 66 {*movqi_internal}
(nil))
(insn 28 27 8 2 (parallel [
(set (reg:SI 0 ax [orig:62 D.3491 ] [62])
(plus:SI (reg:SI 0 ax [orig:62 D.3491 ] [62])
(const_int 24 [0x18])))
(clobber (reg:CC 17 flags))
]) pr49095.c:63 249 {*addsi_1}
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(insn 8 28 9 2 (set (mem:QI (reg/v/f:DI 3 bx [orig:64 x ] [64]) [0 *x_1(D)+0 S1
A8])
(reg:QI 0 ax [orig:62 D.3491 ] [62])) pr49095.c:63 66 {*movqi_internal}
(nil))
(insn 9 8 10 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:QI 0 ax [orig:62 D.3491 ] [62])
(const_int 0 [0]))) pr49095.c:63 0 {*cmpqi_ccno_1}
(expr_list:REG_DEAD (reg:QI 0 ax [orig:62 D.3491 ] [62])
(nil)))
It used to be normal QImode addition addqi_1_lea, but it got later on split
to make the code shorter:
;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.
(define_split
[(set (match_operand 0 "register_operand" "")
(match_operator 3 "promotable_binary_operator"
[(match_operand 1 "register_operand" "")
(match_operand 2 "aligned_operand" "")]))
(clobber (reg:CC FLAGS_REG))]
"! TARGET_PARTIAL_REG_STALL && reload_completed
&& ((GET_MODE (operands[0]) == HImode
&& ((optimize_function_for_speed_p (cfun) && !TARGET_FAST_PREFIX)
/* ??? next two lines just !satisfies_constraint_K (...) */
|| !CONST_INT_P (operands[2])
|| satisfies_constraint_K (operands[2])))
|| (GET_MODE (operands[0]) == QImode
&& (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))"
[(parallel [(set (match_dup 0)
(match_op_dup 3 [(match_dup 1) (match_dup 2)]))
(clobber (reg:CC FLAGS_REG))])]
"operands[0] = gen_lowpart (SImode, operands[0]);
operands[1] = gen_lowpart (SImode, operands[1]);
if (GET_CODE (operands[3]) != ASHIFT)
operands[2] = gen_lowpart (SImode, operands[2]);
PUT_MODE (operands[3], SImode);")
> One thing I wanted to check, though: I hope everybody is aware that
>
> op $x,mem
>
> is not identically the same as
>
> mov mem,reg
> op $x, reg
> mov reg,mem
> test reg
>
> WRT the carry flag. The "test" version will always clear the carry flag, while
> the "op" version will obviously set it according to the "op". For the logical
> ops, this isn't a problem (they also clear carry), but for "add" and "sub" this
> transformation *will* result in a difference in the C flag.
>
> Now, "carry" is meaningless when talking about compare with zero, so this
> should all be trivially ok. But I bring it up because it is possible that gcc
> perhaps still uses the "unsigned compare" versions with zero.
>
> In particular, the thing to look out for would be code like
>
> unsigned int *a;
>
> if (--*a > 0)
> do_something();
>
> and if the comparison uses "jg" ("unsigned greater than") for the comparison,
> it will get the wrong end result.
>
> Now, unsigned compares against zero are always expressible as "ne" or "eq", so
> this is not a fundamental problem. In fact, my trivial testing with a few cases
> seems to imply that gcc already does that conversion to inequality, and you'd
> obviously have exactly the same issue with eliding the "test" instruction for
> the cases you already handle (when things are in registers).
That ought to be handled by the
ix86_match_ccmode tests in the peepholes, using CCGOCmode for PLUS/MINUS and
CCNOmode for AND/IOR/XOR. I've just copied what the actual patterns these
peepholes will turn into use. The documentation about those modes says:
Add CCNO to indicate comparisons against zero that requires
Overflow flag to be unset. Sign bit test is used instead and
thus can be used to form "a&b>0" type of tests.
Add CCGOC to indicate comparisons against zero that allows
unspecified garbage in the Carry and Overflow flag. This
mode is used to simulate comparisons of (a-b) and (a+b)
against zero using sub/cmp/add operations.
and ix86_match_ccmode should check if the test type in which the flags register
is tested uses only the flags that will be set correctly by the operation. In
your testcase the comparison was done in CCZmode, which is documented as:
Add CCZ to indicate that only the Zero flag is valid.
and ix86_match_ccmode allows that for both CCNOmode and CCGOCmode requests, as
the zero flag is correct in that case. If that didn't work properly, we'd have
problems already without these peepholes, as {add,sub,and,ior,xor}<mode>_2
patterns often match during combine when the target is in register, not memory.
BTW, the patch bootstrapped/regtested on both x86_64-linux and i686-linux, I'm
just running second set of bootstrap without the patch to be able to compare
how much the patch changed code on gcc itself.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
` (6 preceding siblings ...)
2011-05-27 14:55 ` jakub at gcc dot gnu.org
@ 2011-05-27 16:02 ` torvalds@linux-foundation.org
2011-05-27 16:36 ` jakub at gcc dot gnu.org
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-27 16:02 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #8 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-27 15:32:21 UTC ---
(In reply to comment #7)
>
> BTW, the patch bootstrapped/regtested on both x86_64-linux and i686-linux, I'm
> just running second set of bootstrap without the patch to be able to compare
> how much the patch changed code on gcc itself.
I'd love to hear how well it works - I'm in the middle of a busy merge window
and about to leave for a trip to Japan, otherwise I'd just try to set up a gcc
build myself right now.
(Actually, I have a copy of a git tree, but that one fails with
/usr/bin/ld: ../libiberty/libiberty.a(argv.o): relocation R_X86_64_32S
against `_sch_istable' can not be used when making a shared object; recompile
with -fPIC
and due to the merge window I don't really have time to try to figure it out)
Thanks,
Linus
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
` (7 preceding siblings ...)
2011-05-27 16:02 ` torvalds@linux-foundation.org
@ 2011-05-27 16:36 ` jakub at gcc dot gnu.org
2011-05-27 16:52 ` torvalds@linux-foundation.org
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-27 16:36 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-27 16:33:33 UTC ---
.text sizes before/after the patch (in each case on identical sources, for
cc1plus I've reverted the patch afterwards and did make -j64 cc1plus in gcc/
subdir), the size changes aren't that big, but perhaps other projects use it
more often and/or something is hidden due to alignment.
32-bit before 32-bit after 64-bit before 64-bit after
libstdc++.so.6 0x71708 0x716e8 0x67ee6 0x67ec6
libgcj.so.12 0xa3ec58 0xa3eb98 0x90cd68 0x90cce8
cc1plus 0xe8a29c 0xe8a25c 0xdccf98 0xdccf08
In 64-bit cc1plus it only made a difference in:
c-decl.o
cfg.o
dbxout.o
ebitmap.o
final.o
ira-color.o
jump.o
regcprop.o
reg-stack.o
reload.o
tree-ssa-dom.o
tree-ssa-loop-ivopts.o
Anyway, I'll post the patch now and see what Uros says about it.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
` (8 preceding siblings ...)
2011-05-27 16:36 ` jakub at gcc dot gnu.org
@ 2011-05-27 16:52 ` torvalds@linux-foundation.org
2011-05-29 18:53 ` jakub at gcc dot gnu.org
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-27 16:52 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #10 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-27 16:48:52 UTC ---
(In reply to comment #9)
>
> 32-bit before 32-bit after 64-bit before 64-bit after
> libstdc++.so.6 0x71708 0x716e8 0x67ee6 0x67ec6
> libgcj.so.12 0xa3ec58 0xa3eb98 0x90cd68 0x90cce8
> cc1plus 0xe8a29c 0xe8a25c 0xdccf98 0xdccf08
Ok, that's much less noticeable than I was hoping for.
That said, even for the kernel, the only reason I noticed this problem was not
because I've seen a lot of them per se, but because the pattern showed up in a
very hot function. In fact, it's the same __rcu_read_unlock() function that I
made the otherwise unrelated bugzilla entry for inlining:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194
so it's quite possible that we don't have that many of them in the kernel
either.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
` (9 preceding siblings ...)
2011-05-27 16:52 ` torvalds@linux-foundation.org
@ 2011-05-29 18:53 ` jakub at gcc dot gnu.org
2011-05-29 18:57 ` jakub at gcc dot gnu.org
2011-05-29 19:09 ` torvalds@linux-foundation.org
12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-29 18:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-29 18:51:51 UTC ---
Author: jakub
Date: Sun May 29 18:51:48 2011
New Revision: 174413
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=174413
Log:
PR rtl-optimization/49095
* config/i386/predicates.md (plusminuslogic_operator): New predicate.
* config/i386/i386.md: Add peepholes for mem {+,-,&,|,^}= x; mem != 0.
* gcc.target/i386/pr49095.c: New test.
Added:
trunk/gcc/testsuite/gcc.target/i386/pr49095.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.md
trunk/gcc/config/i386/predicates.md
trunk/gcc/testsuite/ChangeLog
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
` (10 preceding siblings ...)
2011-05-29 18:53 ` jakub at gcc dot gnu.org
@ 2011-05-29 18:57 ` jakub at gcc dot gnu.org
2011-05-29 19:09 ` torvalds@linux-foundation.org
12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-29 18:57 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-29 18:52:43 UTC ---
Fixed.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
` (11 preceding siblings ...)
2011-05-29 18:57 ` jakub at gcc dot gnu.org
@ 2011-05-29 19:09 ` torvalds@linux-foundation.org
12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-29 19:09 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095
--- Comment #13 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-29 18:56:44 UTC ---
Thanks guys.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-05-29 18:57 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-21 3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
2011-05-21 10:12 ` [Bug rtl-optimization/49095] " rguenth at gcc dot gnu.org
2011-05-21 19:22 ` torvalds@linux-foundation.org
2011-05-21 21:33 ` torvalds@linux-foundation.org
2011-05-27 10:50 ` jakub at gcc dot gnu.org
2011-05-27 12:30 ` jakub at gcc dot gnu.org
2011-05-27 14:22 ` torvalds@linux-foundation.org
2011-05-27 14:55 ` jakub at gcc dot gnu.org
2011-05-27 16:02 ` torvalds@linux-foundation.org
2011-05-27 16:36 ` jakub at gcc dot gnu.org
2011-05-27 16:52 ` torvalds@linux-foundation.org
2011-05-29 18:53 ` jakub at gcc dot gnu.org
2011-05-29 18:57 ` jakub at gcc dot gnu.org
2011-05-29 19:09 ` torvalds@linux-foundation.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).