public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/51182] New: [ipa-iterations] running multiple passes of early IPA on a file produces difference code when it shouldn't
@ 2011-11-16 23:12 matt at use dot net
  2011-11-16 23:13 ` [Bug middle-end/51182] " matt at use dot net
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: matt at use dot net @ 2011-11-16 23:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51182

             Bug #: 51182
           Summary: [ipa-iterations] running multiple passes of early IPA
                    on a file produces difference code when it shouldn't
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: matt@use.net


Created attachment 25841
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25841
pre-procecessed source that produces the above code differances

As requested by Richard (http://gcc.gnu.org/ml/gcc-cvs/2011-11/msg00669.html),
I am testing the outstanding multiple iterations patch and reporting when
multiple early IPA passes produce differences in code generation that should
probably be gotten in one pass (or not at all).

The attached file is from the open source pmccabe project. When compiling with
-O1, there are register scheduling differences and the elimination of a nop
instruction when doing a second early IPA pass.

with -O1 --param eipa-iterations=1:
 2b3:   8d 6d 01                lea    0x1(%rbp),%ebp
 2b6:   48 98                   cltq   
 2b8:   48 8d 5c c3 f8          lea    -0x8(%rbx,%rax,8),%rbx
 2bd:   83 3d 00 00 00 00 00    cmpl   $0x0,0x0(%rip)        # 2c4 <main+0x180>
 2c4:   74 12                   je     2d8 <main+0x194>
 2c6:   48 89 de                mov    %rbx,%rsi
 2c9:   89 ef                   mov    %ebp,%edi
[...]
 429:   80 78 50 01             cmpb   $0x1,0x50(%rax)
 42d:   0f 1f 00                nopl   (%rax)
 430:   76 2e                   jbe    460 <stats_accumulate+0x4c>


with -O1 --param eipa-iterations=2:
 2b3:   44 8d 65 01             lea    0x1(%rbp),%r12d
 2b7:   48 98                   cltq   
 2b9:   48 8d 5c c3 f8          lea    -0x8(%rbx,%rax,8),%rbx
 2be:   83 3d 00 00 00 00 00    cmpl   $0x0,0x0(%rip)        # 2c5 <main+0x181>
 2c5:   74 13                   je     2da <main+0x196>
 2c7:   48 89 de                mov    %rbx,%rsi
 2ca:   44 89 e7                mov    %r12d,%edi
[...]

 42f:   80 78 50 01             cmpb   $0x1,0x50(%rax)
 433:   76 2e                   jbe    463 <stats_accumulate+0x49>

There are additional/different differences at -O2, but I'll file those in
another bug once I get feedback on this one.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/51182] [ipa-iterations] running multiple passes of early IPA on a file produces difference code when it shouldn't
  2011-11-16 23:12 [Bug middle-end/51182] New: [ipa-iterations] running multiple passes of early IPA on a file produces difference code when it shouldn't matt at use dot net
@ 2011-11-16 23:13 ` matt at use dot net
  2011-11-17 19:47 ` [Bug middle-end/51182] [ipa-iterations] running multiple passes of early IPA on a file produces different " rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: matt at use dot net @ 2011-11-16 23:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51182

--- Comment #1 from Matt Hargett <matt at use dot net> 2011-11-16 23:09:22 UTC ---
I see the same seeming no-op register and instruction twiddles with inflate.c
from zlib, as well. Adding more iterations has a kind of ping-pong effect where
it goes between the two different versions.


diff inflate.o.-O3.ipa-iterations2.dump inflate.o.-O3.ipa-iterations3.dump2c2
< inflate.o.-O3.ipa-iterations2:     file format elf64-x86-64
---
> inflate.o.-O3.ipa-iterations3:     file format elf64-x86-64
897,898c897,898
<      d22:    31 db                    xor    %ebx,%ebx
<      d24:    45 31 d2                 xor    %r10d,%r10d
---
>      d22:	45 31 d2             	xor    %r10d,%r10d
>      d25:	31 db                	xor    %ebx,%ebx
1731c1731
<     19a9:    44 39 c7                 cmp    %r8d,%edi
---
>     19a9:	41 39 f8             	cmp    %edi,%r8d
2192,2193c2192,2193
<     20e0:    45 31 d2                 xor    %r10d,%r10d
<     20e3:    31 db                    xor    %ebx,%ebx
---
>     20e0:	31 db                	xor    %ebx,%ebx
>     20e2:	45 31 d2             	xor    %r10d,%r10d


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/51182] [ipa-iterations] running multiple passes of early IPA on a file produces different code when it shouldn't
  2011-11-16 23:12 [Bug middle-end/51182] New: [ipa-iterations] running multiple passes of early IPA on a file produces difference code when it shouldn't matt at use dot net
  2011-11-16 23:13 ` [Bug middle-end/51182] " matt at use dot net
@ 2011-11-17 19:47 ` rguenth at gcc dot gnu.org
  2011-11-18  2:19 ` matt at use dot net
  2011-11-18  2:20 ` matt at use dot net
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-11-17 19:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51182

--- Comment #2 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-11-17 19:25:56 UTC ---
This kind of changes are not interesting (and I doubt anyone will investigate).
Interesting are code changes that make a difference in performance.

Btw, the code path with the most recent patch for one and two early
iterations are not the same (due to the separation into different IPA
phases).  This alone probably explains the (spurious) differences you see.
To eliminate them make sure we go the three IPA phases path even with
just one iteration.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/51182] [ipa-iterations] running multiple passes of early IPA on a file produces different code when it shouldn't
  2011-11-16 23:12 [Bug middle-end/51182] New: [ipa-iterations] running multiple passes of early IPA on a file produces difference code when it shouldn't matt at use dot net
  2011-11-16 23:13 ` [Bug middle-end/51182] " matt at use dot net
  2011-11-17 19:47 ` [Bug middle-end/51182] [ipa-iterations] running multiple passes of early IPA on a file produces different " rguenth at gcc dot gnu.org
@ 2011-11-18  2:19 ` matt at use dot net
  2011-11-18  2:20 ` matt at use dot net
  3 siblings, 0 replies; 5+ messages in thread
From: matt at use dot net @ 2011-11-18  2:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51182

--- Comment #3 from Matt Hargett <matt at use dot net> 2011-11-18 01:41:04 UTC ---
Created attachment 25850
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25850
pre-procecessed source that produces better-performing code with two iterations


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug middle-end/51182] [ipa-iterations] running multiple passes of early IPA on a file produces different code when it shouldn't
  2011-11-16 23:12 [Bug middle-end/51182] New: [ipa-iterations] running multiple passes of early IPA on a file produces difference code when it shouldn't matt at use dot net
                   ` (2 preceding siblings ...)
  2011-11-18  2:19 ` matt at use dot net
@ 2011-11-18  2:20 ` matt at use dot net
  3 siblings, 0 replies; 5+ messages in thread
From: matt at use dot net @ 2011-11-18  2:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51182

--- Comment #4 from Matt Hargett <matt at use dot net> 2011-11-18 01:43:32 UTC ---
Ah, okay. I read in your email you were looking for evidence of bugs, and the
behaviour looked fishy to me. Regardless, here is a performance improvement
that perhaps should be gotten within one iteration.

Attached is the combined.i from pmccabe, which can be compiled and linked
directly to be an executable (on a Debian/Ubuntu-ish amd64 system, anyway).

Using -O3 (or -Ofast), two iterations produces a binary that performs better
than just one iteration. Performance was measured at the macro level, based on
timings when run against tens of thousands of files while in single-user mode
on a ramdisk. In addition, performance at the micro level was measured by
looking at cache misses and branch misprediction rates using callgrind (a tool
within valgrind), with output below. The second iteration reduces the I1 miss
rate, as well as the misprediction rate. (Multiple iterations of -O2 is more of
a mixed bag at the micro level, for some reason, and appears to have no
macro-level performance impact.)


matt@matt-desktop:~/src/pmccabe-2.7$ valgrind --tool=callgrind --branch-sim=yes
--cache-sim=yes ./pmccabe.o3i1.loop.whopr *.c test0[012][0123456]

==4119== 
==4119== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw Bc Bcm Bi Bim
==4119== Collected : 10312284 2549768 1398563 3869 3209 1417 1285 2045 990
2534056 74514 208896 8052
==4119== 
==4119== I   refs:      10,312,284
==4119== I1  misses:         3,869
==4119== LLi misses:         1,285
==4119== I1  miss rate:        0.3%
==4119== LLi miss rate:        0.1%
==4119== 
==4119== D   refs:       3,948,331  (2,549,768 rd + 1,398,563 wr)
==4119== D1  misses:         4,626  (    3,209 rd +     1,417 wr)
==4119== LLd misses:         3,035  (    2,045 rd +       990 wr)
==4119== D1  miss rate:        0.1% (      0.1%   +       0.1%  )
==4119== LLd miss rate:        0.0% (      0.0%   +       0.0%  )
==4119== 
==4119== LL refs:            8,495  (    7,078 rd +     1,417 wr)
==4119== LL misses:          4,320  (    3,330 rd +       990 wr)
==4119== LL miss rate:         0.0% (      0.0%   +       0.0%  )
==4119== 
==4119== Branches:       2,742,952  (2,534,056 cond +   208,896 ind)
==4119== Mispredicts:       82,566  (   74,514 cond +     8,052 ind)
==4119== Mispred rate:         3.0% (      2.9%     +       3.8%   )


matt@matt-desktop:~/src/pmccabe-2.7$ valgrind --tool=callgrind --branch-sim=yes
--cache-sim=yes ./pmccabe.o3i2.loop.whopr *.c test0[012][0123456]

==4122== 
==4122== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw Bc Bcm Bi Bim
==4122== Collected : 10312147 2549768 1398563 3054 3209 1416 1286 2049 989
2534056 74071 208896 7618
==4122== 
==4122== I   refs:      10,312,147
==4122== I1  misses:         3,054
==4122== LLi misses:         1,286
==4122== I1  miss rate:        0.2%
==4122== LLi miss rate:        0.1%
==4122== 
==4122== D   refs:       3,948,331  (2,549,768 rd + 1,398,563 wr)
==4122== D1  misses:         4,625  (    3,209 rd +     1,416 wr)
==4122== LLd misses:         3,038  (    2,049 rd +       989 wr)
==4122== D1  miss rate:        0.1% (      0.1%   +       0.1%  )
==4122== LLd miss rate:        0.0% (      0.0%   +       0.0%  )
==4122== 
==4122== LL refs:            7,679  (    6,263 rd +     1,416 wr)
==4122== LL misses:          4,324  (    3,335 rd +       989 wr)
==4122== LL miss rate:         0.0% (      0.0%   +       0.0%  )
==4122== 
==4122== Branches:       2,742,952  (2,534,056 cond +   208,896 ind)
==4122== Mispredicts:       81,689  (   74,071 cond +     7,618 ind)
==4122== Mispred rate:         2.9% (      2.9%     +       3.6%   )


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-11-18  1:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-16 23:12 [Bug middle-end/51182] New: [ipa-iterations] running multiple passes of early IPA on a file produces difference code when it shouldn't matt at use dot net
2011-11-16 23:13 ` [Bug middle-end/51182] " matt at use dot net
2011-11-17 19:47 ` [Bug middle-end/51182] [ipa-iterations] running multiple passes of early IPA on a file produces different " rguenth at gcc dot gnu.org
2011-11-18  2:19 ` matt at use dot net
2011-11-18  2:20 ` matt at use dot net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).