[Bug rtl-optimization/33716] New: gcc generates suboptimal code for long long shifts

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/33716]  New: gcc generates suboptimal code for long long shifts
@ 2007-10-09 16:20 felix-gcc at fefe dot de
  0 siblings, 0 replies; only message in thread
From: felix-gcc at fefe dot de @ 2007-10-09 16:20 UTC (permalink / raw)
  To: gcc-bugs

Consider this function:

unsigned long long x(unsigned long long l) {
  return l >> 4;
}

gcc will use the shrd instruction here, which is much slower than doing it "by
hand" on at least Athlon, Pentium 3, VIA C3.  On Core 2 shrd appears to be
faster.

On my Athlon 64, I measured 350 cycles vs 441 for a loop of 100.
On my Core 2, I measured 672 cycles vs 624.

So, my suggestion is: if -march= is set to Pentium 3 or a non-Intel CPU, don't
use shrd and shrl.

My benchmark program is on http://dl.fefe.de/shrd.c


-- 
           Summary: gcc generates suboptimal code for long long shifts
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: felix-gcc at fefe dot de
 GCC build triplet: i386-pc-linux-gnu
  GCC host triplet: i386-pc-linux-gnu
GCC target triplet: i386-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33716


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2007-10-09 16:20 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-10-09 16:20 [Bug rtl-optimization/33716] New: gcc generates suboptimal code for long long shifts felix-gcc at fefe dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).