public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/49095] New: Horrible code generation for trivial decrement with test
@ 2011-05-21  3:46 torvalds@linux-foundation.org
  2011-05-21 10:12 ` [Bug rtl-optimization/49095] " rguenth at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-21  3:46 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

           Summary: Horrible code generation for trivial decrement with
                    test
           Product: gcc
           Version: 4.5.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: torvalds@linux-foundation.org


This trivial code:

  extern void fncall(void *);

  int main(int argc, char **argv)
  {
    if (!--*argv)
        fncall(argv);
    return 0;
  }

compiles into this ridiculous x86-64 assembly language:

    movq    (%rsi), %rax
    subq    $1, %rax
    testq    %rax, %rax
    movq    %rax, (%rsi)
    je    .L4

for the "decrement and test result" at -O2. 

I'd have expected that any reasonable compiler would generate something like

    decq    (%rsi)
    je    .L4

instead, which would be smaller and faster (even a "subq $1" would be fine, but
the decq is one byte shorter).

The problem is more noticeable when the memory location is a structure offset,
when the "load+decrement+store" model really results in relatively much bigger
code due to the silly repetition of the memory address, for absolutely no
advantage.

Is there some way that I haven't found to make gcc use the rmw instructions?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
@ 2011-05-21 10:12 ` rguenth at gcc dot gnu.org
  2011-05-21 19:22 ` torvalds@linux-foundation.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-05-21 10:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-*, i?86-*-*
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2011.05.21 09:52:49
          Component|other                       |rtl-optimization
     Ever Confirmed|0                           |1

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-05-21 09:52:49 UTC ---
Confirmed (not using decq is because it is slower for some archs).  On the
tree level we cannot do better than

  D.2722_2 = *argv_1(D);
  D.2723_3 = D.2722_2 + -1;
  *argv_1(D) = D.2723_3;
  if (D.2723_3 == 0B)

because we lack anything like direct operations on memory (and that's good).
On the RTL side combine tries to do

Trying 7, 8 -> 9:
Failed to match this instruction:
(parallel [
        (set (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8 A64])
            (plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8
A64])
                (const_int -1 [0xffffffffffffffff])))
        (set (reg/f:DI 60 [ D.2723 ])
            (plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8
A64])
                (const_int -1 [0xffffffffffffffff])))
    ])

because we have a use of the decrement result in the comparison.  It doesn't
try to combine this with the comparison though.

So this case is really special ;)  Without the use of the decremented
value we get the desired subq $1, (%rsi).

Manually sinking the store to *argv into the if and the else yields

        movq    (%rsi), %rbx
        subq    $1, %rbx
        je      L4
L2:
        movq    %rbx, (%rsi)
...

L4:
LCFI4:
        movq    $0, (%rsi)
        movq    %rsi, %rdi
        movq    %rsi, 8(%rsp)
        call    _fncall

so at least we then can combine the decrement with the test ...

As usual combine doesn't like stores.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
  2011-05-21 10:12 ` [Bug rtl-optimization/49095] " rguenth at gcc dot gnu.org
@ 2011-05-21 19:22 ` torvalds@linux-foundation.org
  2011-05-21 21:33 ` torvalds@linux-foundation.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-21 19:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

--- Comment #2 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-21 18:41:15 UTC ---
(In reply to comment #1)
>
> On the RTL side combine tries to do
> 
> Trying 7, 8 -> 9:
> Failed to match this instruction:
> (parallel [
>         (set (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8 A64])
>             (plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8
> A64])
>                 (const_int -1 [0xffffffffffffffff])))
>         (set (reg/f:DI 60 [ D.2723 ])
>             (plus:DI (mem/f:DI (reg/v/f:DI 63 [ argv ]) [2 *argv_1(D)+0 S8
> A64])
>                 (const_int -1 [0xffffffffffffffff])))
>     ])
> 
> because we have a use of the decrement result in the comparison.  It doesn't
> try to combine this with the comparison though.

Why isn't there a trivial pattern for the combination of "add+cmp0"? It sounds
like a peephole optimization to me.

> So this case is really special ;)  Without the use of the decremented
> value we get the desired subq $1, (%rsi).

The whole notion of "decrement and check if zero" is just about as special as
mud. 

And I realize that without the "check if zero" part I get the single rmw
instruction, but I was really hoping that gcc would get this kind of really
obvious code right. There is absolutely no question about what the correct
result is, and gcc simply doesn't generate it.

I'm used to gcc sometimes being confused by more complicated things (inline
asms, bitfields etc), but this is really basic code.

The load-store model is fine for a Pentium 4 - those things were not very good
at complex instructions. But it generates horribly big code, and modern x86
chips all want the "operate on memory" version.

> Manually sinking the store to *argv into the if and the else yields

Yeah. And that's pretty horrible. 

> As usual combine doesn't like stores.

Is there some reason this can't just be a peephole pattern?

I really thought that gcc has done this before. 

                       Linus


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
  2011-05-21 10:12 ` [Bug rtl-optimization/49095] " rguenth at gcc dot gnu.org
  2011-05-21 19:22 ` torvalds@linux-foundation.org
@ 2011-05-21 21:33 ` torvalds@linux-foundation.org
  2011-05-27 10:50 ` jakub at gcc dot gnu.org
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-21 21:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

--- Comment #3 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-21 20:42:26 UTC ---
Hmm. Looking at that code generation, it strikes me that even with the odd load
store situation, why do we have that "test" instruction?

   c:    8b 10                    mov    (%eax),%edx
   e:    83 ea 01                 sub    $0x1,%edx
  11:    85 d2                    test   %edx,%edx
  13:    89 10                    mov    %edx,(%eax)
  15:    74 09                    je     20 <main+0x20>

iow, regardless of any complexities of the store, that "sub + test" is just
odd. Gcc knows to simplify that particular sequence in other situations, why
doesn't it simplify it here?

IOW, I can make gcc generate code like

   c:    83 e8 01                 sub    $0x1,%eax
   f:    75 07                    jne    18 <main+0x18>

with no real problem when it's in registers. No "test" instruction after the
sub. Why does that store matter so much?

It looks like the combine is bring driven by the conditional branch, and then
when the previous instruction from the conditional branch is that store,
everything kind of goes to hell.

Would it be possible to have a peephole for the "arithmetic/logical +
compare-with-zero" case (causing us to just drop the compare), and then have a
separate peephole optimization that triggers the "load + op + store with dead
reg" and turns that into a "op to mem" case?

The MD files do make me confused, so maybe there is some fundamental limitation
to the peephole patterns that makes this impossible?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
                   ` (2 preceding siblings ...)
  2011-05-21 21:33 ` torvalds@linux-foundation.org
@ 2011-05-27 10:50 ` jakub at gcc dot gnu.org
  2011-05-27 12:30 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-27 10:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|unassigned at gcc dot       |jakub at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-27 10:35:52 UTC ---
Created attachment 24366
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24366
gcc47-pr49095.patch

Untested patch which solves this (for some cases) using peephole2.

The reason combiner can't do this is that there are no LOG_LINKS in between the
comparison and memory store, those two insns are independent and thus don't
ever show up together in the same try_combine call (together with their
dependencies).  And we generate:
    movq    (%rsi), %rax
    subq    $1, %rax
    testq    %rax, %rax
    movq    %rax, (%rsi)
    je    .L4
instead of
    movq    (%rsi), %rax
    subq    $1, %rax
    movq    %rax, (%rsi)
    je    .L4
because in case of multiple uses, LOG_LINKS are on the first use, and the
memory store is before the test during combine (only later scheduling changes
it).
Thus the test has no LOG_LINKS at all and isn't being attempted to be merged
together with the subtraction.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
                   ` (3 preceding siblings ...)
  2011-05-27 10:50 ` jakub at gcc dot gnu.org
@ 2011-05-27 12:30 ` jakub at gcc dot gnu.org
  2011-05-27 14:22 ` torvalds@linux-foundation.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-27 12:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #24366|0                           |1
        is obsolete|                            |

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-27 11:58:37 UTC ---
Created attachment 24368
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24368
gcc47-pr49095.patch

Improved patch with two new peephole2 patterns, which in addition to the
earlier:
reg0 = mem1; reg0 op= nonmem2; mem1 = reg0; if (reg0 != 0)
optimizes also:
reg0 op= mem1; mem1 = reg0; if (reg0 != 0)
and variant of the first, where all the operations are in QI or HImode, except
for op= which is in SImode.

Again, the patch has been tested just on this testcase so far.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
                   ` (4 preceding siblings ...)
  2011-05-27 12:30 ` jakub at gcc dot gnu.org
@ 2011-05-27 14:22 ` torvalds@linux-foundation.org
  2011-05-27 14:55 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-27 14:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

--- Comment #6 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-27 14:15:25 UTC ---
Jakub - the patch looks sane, although I don't currently have a gcc build
environment to actually test it with, and there is no way I'm going to claim
that I follow all the matches and understand why you have that third pattern
with SWI12 (but I'm guessing it's because the op and the test are being done in
the "explicit cast to integer" mode).

One thing I wanted to check, though: I hope everybody is aware that

   op $x,mem

is not identically the same as

   mov mem,reg
   op $x, reg
   mov reg,mem
   test reg

WRT the carry flag. The "test" version will always clear the carry flag, while
the "op" version will obviously set it according to the "op". For the logical
ops, this isn't a problem (they also clear carry), but for "add" and "sub" this
transformation *will* result in a difference in the C flag.

Now, "carry" is meaningless when talking about compare with zero, so this
should all be trivially ok. But I bring it up because it is possible that gcc
perhaps still uses the "unsigned compare" versions with zero.

In particular, the thing to look out for would be code like

  unsigned int *a;

  if (--*a > 0)
    do_something();

and if the comparison uses "jg" ("unsigned greater than") for the comparison,
it will get the wrong end result.

Now, unsigned compares against zero are always expressible as "ne" or "eq", so
this is not a fundamental problem. In fact, my trivial testing with a few cases
seems to imply that gcc already does that conversion to inequality, and you'd
obviously have exactly the same issue with eliding the "test" instruction for
the cases you already handle (when things are in registers).

So I think everything is fine - but I wanted to mention it explicitly in case
it makes you go "ooh, yes, there are cases when this is an issue"


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
                   ` (5 preceding siblings ...)
  2011-05-27 14:22 ` torvalds@linux-foundation.org
@ 2011-05-27 14:55 ` jakub at gcc dot gnu.org
  2011-05-27 16:02 ` torvalds@linux-foundation.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-27 14:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-27 14:47:08 UTC ---
(In reply to comment #6)
> Jakub - the patch looks sane, although I don't currently have a gcc build
> environment to actually test it with, and there is no way I'm going to claim
> that I follow all the matches and understand why you have that third pattern
> with SWI12 (but I'm guessing it's because the op and the test are being done in
> the "explicit cast to integer" mode).

E.g. in the testcase from the patch, in hcharplus routine peephole2 sees:
(insn 27 6 28 2 (set (reg:QI 0 ax [orig:62 D.3491 ] [62])
        (mem/c:QI (reg/v/f:DI 3 bx [orig:64 x ] [64]) [0 *x_1(D)+0 S1 A8]))
pr49095.c:63 66 {*movqi_internal}
     (nil))

(insn 28 27 8 2 (parallel [
            (set (reg:SI 0 ax [orig:62 D.3491 ] [62])
                (plus:SI (reg:SI 0 ax [orig:62 D.3491 ] [62])
                    (const_int 24 [0x18])))
            (clobber (reg:CC 17 flags))
        ]) pr49095.c:63 249 {*addsi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))

(insn 8 28 9 2 (set (mem:QI (reg/v/f:DI 3 bx [orig:64 x ] [64]) [0 *x_1(D)+0 S1
A8])
        (reg:QI 0 ax [orig:62 D.3491 ] [62])) pr49095.c:63 66 {*movqi_internal}
     (nil))

(insn 9 8 10 2 (set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 0 ax [orig:62 D.3491 ] [62])
            (const_int 0 [0]))) pr49095.c:63 0 {*cmpqi_ccno_1}
     (expr_list:REG_DEAD (reg:QI 0 ax [orig:62 D.3491 ] [62])
        (nil)))

It used to be normal QImode addition addqi_1_lea, but it got later on split
to make the code shorter:

;; Avoid redundant prefixes by splitting HImode arithmetic to SImode.

(define_split
  [(set (match_operand 0 "register_operand" "")
        (match_operator 3 "promotable_binary_operator"
           [(match_operand 1 "register_operand" "")
            (match_operand 2 "aligned_operand" "")]))
   (clobber (reg:CC FLAGS_REG))]
  "! TARGET_PARTIAL_REG_STALL && reload_completed
   && ((GET_MODE (operands[0]) == HImode
        && ((optimize_function_for_speed_p (cfun) && !TARGET_FAST_PREFIX)
            /* ??? next two lines just !satisfies_constraint_K (...) */
            || !CONST_INT_P (operands[2])
            || satisfies_constraint_K (operands[2])))
       || (GET_MODE (operands[0]) == QImode               
           && (TARGET_PROMOTE_QImode || optimize_function_for_size_p (cfun))))"
  [(parallel [(set (match_dup 0)                          
                   (match_op_dup 3 [(match_dup 1) (match_dup 2)]))
              (clobber (reg:CC FLAGS_REG))])]             
  "operands[0] = gen_lowpart (SImode, operands[0]);       
   operands[1] = gen_lowpart (SImode, operands[1]);       
   if (GET_CODE (operands[3]) != ASHIFT)
     operands[2] = gen_lowpart (SImode, operands[2]);
   PUT_MODE (operands[3], SImode);")

> One thing I wanted to check, though: I hope everybody is aware that
> 
>    op $x,mem
> 
> is not identically the same as
> 
>    mov mem,reg
>    op $x, reg
>    mov reg,mem
>    test reg
> 
> WRT the carry flag. The "test" version will always clear the carry flag, while
> the "op" version will obviously set it according to the "op". For the logical
> ops, this isn't a problem (they also clear carry), but for "add" and "sub" this
> transformation *will* result in a difference in the C flag.
> 
> Now, "carry" is meaningless when talking about compare with zero, so this
> should all be trivially ok. But I bring it up because it is possible that gcc
> perhaps still uses the "unsigned compare" versions with zero.
> 
> In particular, the thing to look out for would be code like
> 
>   unsigned int *a;
> 
>   if (--*a > 0)
>     do_something();
> 
> and if the comparison uses "jg" ("unsigned greater than") for the comparison,
> it will get the wrong end result.
> 
> Now, unsigned compares against zero are always expressible as "ne" or "eq", so
> this is not a fundamental problem. In fact, my trivial testing with a few cases
> seems to imply that gcc already does that conversion to inequality, and you'd
> obviously have exactly the same issue with eliding the "test" instruction for
> the cases you already handle (when things are in registers).

That ought to be handled by the
ix86_match_ccmode tests in the peepholes, using CCGOCmode for PLUS/MINUS and
CCNOmode for AND/IOR/XOR.  I've just copied what the actual patterns these
peepholes will turn into use.  The documentation about those modes says:
   Add CCNO to indicate comparisons against zero that requires
   Overflow flag to be unset.  Sign bit test is used instead and
   thus can be used to form "a&b>0" type of tests.

   Add CCGOC to indicate comparisons against zero that allows
   unspecified garbage in the Carry and Overflow flag. This
   mode is used to simulate comparisons of (a-b) and (a+b)
   against zero using sub/cmp/add operations.
and ix86_match_ccmode should check if the test type in which the flags register
is tested uses only the flags that will be set correctly by the operation.  In
your testcase the comparison was done in CCZmode, which is documented as:
   Add CCZ to indicate that only the Zero flag is valid.
and ix86_match_ccmode allows that for both CCNOmode and CCGOCmode requests, as
the zero flag is correct in that case.  If that didn't work properly, we'd have
problems already without these peepholes, as {add,sub,and,ior,xor}<mode>_2
patterns often match during combine when the target is in register, not memory.

BTW, the patch bootstrapped/regtested on both x86_64-linux and i686-linux, I'm
just running second set of bootstrap without the patch to be able to compare
how much the patch changed code on gcc itself.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
                   ` (6 preceding siblings ...)
  2011-05-27 14:55 ` jakub at gcc dot gnu.org
@ 2011-05-27 16:02 ` torvalds@linux-foundation.org
  2011-05-27 16:36 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-27 16:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

--- Comment #8 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-27 15:32:21 UTC ---
(In reply to comment #7)
>
> BTW, the patch bootstrapped/regtested on both x86_64-linux and i686-linux, I'm
> just running second set of bootstrap without the patch to be able to compare
> how much the patch changed code on gcc itself.

I'd love to hear how well it works - I'm in the middle of a busy merge window
and about to leave for a trip to Japan, otherwise I'd just try to set up a gcc
build myself right now.

(Actually, I have a copy of a git tree, but that one fails with

  /usr/bin/ld: ../libiberty/libiberty.a(argv.o): relocation R_X86_64_32S
against `_sch_istable' can not be used when making a shared object; recompile
with -fPIC

and due to the merge window I don't really have time to try to figure it out)

Thanks,

                     Linus


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
                   ` (7 preceding siblings ...)
  2011-05-27 16:02 ` torvalds@linux-foundation.org
@ 2011-05-27 16:36 ` jakub at gcc dot gnu.org
  2011-05-27 16:52 ` torvalds@linux-foundation.org
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-27 16:36 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-27 16:33:33 UTC ---
.text sizes before/after the patch (in each case on identical sources, for
cc1plus I've reverted the patch afterwards and did make -j64 cc1plus in gcc/
subdir), the size changes aren't that big, but perhaps other projects use it
more often and/or something is hidden due to alignment.

        32-bit before    32-bit after    64-bit before    64-bit after
libstdc++.so.6    0x71708        0x716e8        0x67ee6        0x67ec6
libgcj.so.12    0xa3ec58    0xa3eb98    0x90cd68    0x90cce8
cc1plus        0xe8a29c    0xe8a25c    0xdccf98    0xdccf08

In 64-bit cc1plus it only made a difference in:
c-decl.o
cfg.o
dbxout.o
ebitmap.o
final.o
ira-color.o
jump.o
regcprop.o
reg-stack.o
reload.o
tree-ssa-dom.o
tree-ssa-loop-ivopts.o

Anyway, I'll post the patch now and see what Uros says about it.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
                   ` (8 preceding siblings ...)
  2011-05-27 16:36 ` jakub at gcc dot gnu.org
@ 2011-05-27 16:52 ` torvalds@linux-foundation.org
  2011-05-29 18:53 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-27 16:52 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

--- Comment #10 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-27 16:48:52 UTC ---
(In reply to comment #9)
> 
>         32-bit before    32-bit after    64-bit before    64-bit after
> libstdc++.so.6    0x71708        0x716e8        0x67ee6        0x67ec6
> libgcj.so.12    0xa3ec58    0xa3eb98    0x90cd68    0x90cce8
> cc1plus        0xe8a29c    0xe8a25c    0xdccf98    0xdccf08

Ok, that's much less noticeable than I was hoping for.

That said, even for the kernel, the only reason I noticed this problem was not
because I've seen a lot of them per se, but because the pattern showed up in a
very hot function. In fact, it's the same __rcu_read_unlock() function that I
made the otherwise unrelated bugzilla entry for inlining: 

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49194

so it's quite possible that we don't have that many of them in the kernel
either.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
                   ` (9 preceding siblings ...)
  2011-05-27 16:52 ` torvalds@linux-foundation.org
@ 2011-05-29 18:53 ` jakub at gcc dot gnu.org
  2011-05-29 18:57 ` jakub at gcc dot gnu.org
  2011-05-29 19:09 ` torvalds@linux-foundation.org
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-29 18:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-29 18:51:51 UTC ---
Author: jakub
Date: Sun May 29 18:51:48 2011
New Revision: 174413

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=174413
Log:
    PR rtl-optimization/49095
    * config/i386/predicates.md (plusminuslogic_operator): New predicate.
    * config/i386/i386.md: Add peepholes for mem {+,-,&,|,^}= x; mem != 0.

    * gcc.target/i386/pr49095.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/pr49095.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386.md
    trunk/gcc/config/i386/predicates.md
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
                   ` (10 preceding siblings ...)
  2011-05-29 18:53 ` jakub at gcc dot gnu.org
@ 2011-05-29 18:57 ` jakub at gcc dot gnu.org
  2011-05-29 19:09 ` torvalds@linux-foundation.org
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-05-29 18:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED

--- Comment #12 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-05-29 18:52:43 UTC ---
Fixed.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug rtl-optimization/49095] Horrible code generation for trivial decrement with test
  2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
                   ` (11 preceding siblings ...)
  2011-05-29 18:57 ` jakub at gcc dot gnu.org
@ 2011-05-29 19:09 ` torvalds@linux-foundation.org
  12 siblings, 0 replies; 14+ messages in thread
From: torvalds@linux-foundation.org @ 2011-05-29 19:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49095

--- Comment #13 from Linus Torvalds <torvalds@linux-foundation.org> 2011-05-29 18:56:44 UTC ---
Thanks guys.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-05-29 18:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-21  3:46 [Bug other/49095] New: Horrible code generation for trivial decrement with test torvalds@linux-foundation.org
2011-05-21 10:12 ` [Bug rtl-optimization/49095] " rguenth at gcc dot gnu.org
2011-05-21 19:22 ` torvalds@linux-foundation.org
2011-05-21 21:33 ` torvalds@linux-foundation.org
2011-05-27 10:50 ` jakub at gcc dot gnu.org
2011-05-27 12:30 ` jakub at gcc dot gnu.org
2011-05-27 14:22 ` torvalds@linux-foundation.org
2011-05-27 14:55 ` jakub at gcc dot gnu.org
2011-05-27 16:02 ` torvalds@linux-foundation.org
2011-05-27 16:36 ` jakub at gcc dot gnu.org
2011-05-27 16:52 ` torvalds@linux-foundation.org
2011-05-29 18:53 ` jakub at gcc dot gnu.org
2011-05-29 18:57 ` jakub at gcc dot gnu.org
2011-05-29 19:09 ` torvalds@linux-foundation.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).