public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/50101] New: GCC 4.5 and 4.6 generate suboptimal code on ppc for countdown loops when the CTR register cannot be used
@ 2011-08-16 20:57 meissner at gcc dot gnu.org
  2011-08-16 21:22 ` [Bug rtl-optimization/50101] " meissner at gcc dot gnu.org
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: meissner at gcc dot gnu.org @ 2011-08-16 20:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50101

             Bug #: 50101
           Summary: GCC 4.5 and 4.6 generate suboptimal code on ppc for
                    countdown loops when the CTR register cannot be used
    Classification: Unclassified
           Product: gcc
           Version: 4.6.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: meissner@gcc.gnu.org
              Host: powerpc64-linux
            Target: powerpc64-linux
             Build: powerpc64-linux


When GCC switched over to the IRA register allocator in GCC 4.5, it made some
loops run slower on the PowerPC.  In particular, the powerpc has a count down
register (CTR) that the compiler can use with the -fbranch-count-reg
optimization.  However, if the CTR register is not available in the loop, the
compiler does not use a GPR register for the loop index, but instead loads the
index value from memory, increments it, and stores it back to the stack.

For example, in the code:

int code[65536];

mike()
{
  int j;
  long addr;

  for (j = 0; j < 65536; j+=4) {
    asm("mtctr %1" : "=c" (addr) : "r" (&code[j]));
    asm("bctrl" : : "c" (addr) : "lr" );
  }
}

It generates the following on 4.3 (Sles 11SP1 host compiler):

.L.mike:
        mflr 0
        ld 9,.LC0@toc(2)
        li 11,16384
        std 0,16(1)
        .p2align 4,,15
.L2:
#APP
 # 10 "test-ppc-ctr.c" 1
        mtctr 9
 # 0 "" 2
 # 11 "test-ppc-ctr.c" 1
        bctrl
 # 0 "" 2
#NO_APP
        addic. 11,11,-1
        addi 9,9,16
        bne 0,.L2
        ld 0,16(1)
        mtlr 0
        blr

If I go to a 4.4 based compiler such as the RHEL6 host compiler I get:

.L.mike:
        mflr 0
        ld 9,.LC0@toc(2)
        std 0,16(1)
        li 0,16384
        std 0,-16(1)
        .p2align 4,,15
.L2:
#APP
 # 10 "test-ppc-ctr.c" 1
        mtctr 9
 # 0 "" 2
 # 11 "test-ppc-ctr.c" 1
        bctrl
 # 0 "" 2
#NO_APP
        ld 0,-16(1)
        addi 9,9,16
        addic. 11,0,-1
        std 11,-16(1)
        bne 0,.L2
        ld 0,16(1)
        mtlr 0
        blr

Notice that it stores and loads the loop index value.  If I use
-fno-branch-count-reg, it generates code to use the GPRS:

.L.mike:
        mflr 0
        ld 9,.LC0@toc(2)
        std 0,16(1)
        addis 0,9,0x4
        .p2align 4,,15
.L2:
#APP
 # 10 "test-ppc-ctr.c" 1
        mtctr 9
 # 0 "" 2
 # 11 "test-ppc-ctr.c" 1
        bctrl
 # 0 "" 2
#NO_APP
        addi 9,9,16
        cmpd 7,9,0
        bne 7,.L2
        ld 0,16(1)
        mtlr 0
        blr

This is fixed in the GCC 4.7 development sources.  The development source
revision that fixed this was subversion id 171649, created on March 28th, 2011
by Vladimir Makarov  <vmakarov@redhat.com>, in his large rewrite of the ira
register allocator.

As an experiment, I built the Spec 2006 benchmark suite with
-fno-branch-count-reg.  As expected, there are a number of benchmarks that
regress if the count register optimization, but there are a few benchmarks that
get a large speed up by disabling this optimization, which probably indicates
they are being mis-optimized.  The benchmarks with the speedup include:
464.h264ref (19.65% improvement), 434.zeusmp (17.92% improvement) and
459.GemsFDTD (13.02% improvement).


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-04-12 16:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-16 20:57 [Bug rtl-optimization/50101] New: GCC 4.5 and 4.6 generate suboptimal code on ppc for countdown loops when the CTR register cannot be used meissner at gcc dot gnu.org
2011-08-16 21:22 ` [Bug rtl-optimization/50101] " meissner at gcc dot gnu.org
2011-08-17  8:13 ` rguenth at gcc dot gnu.org
2011-08-17 15:46 ` meissner at gcc dot gnu.org
2011-08-17 15:47 ` meissner at gcc dot gnu.org
2011-08-17 15:48 ` meissner at gcc dot gnu.org
2011-08-17 16:32 ` meissner at gcc dot gnu.org
2011-08-18  9:01 ` rguenth at gcc dot gnu.org
2011-08-19 14:12 ` meissner at linux dot vnet.ibm.com
2011-08-19 15:54 ` law at redhat dot com
2011-12-15 20:53 ` [Bug rtl-optimization/50101] [4.5/4.6 regression] " pinskia at gcc dot gnu.org
2012-07-02 12:03 ` rguenth at gcc dot gnu.org
2013-04-12 16:18 ` [Bug rtl-optimization/50101] [4.6 " jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).