public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/23810] New: missed 64-bit shift+mask optimizations on 32-bit arch
@ 2005-09-11  0:18 gcc-bugzilla at gcc dot gnu dot org
  0 siblings, 0 replies; only message in thread
From: gcc-bugzilla at gcc dot gnu dot org @ 2005-09-11  0:18 UTC (permalink / raw)
  To: gcc-bugs



(Sources are from CVS as of about 6AM US/Eastern time today.)

I'm testing out how well gcc optimizes some code for reversing bit
strings.  It appears that on x86 at least, double-word shifts followed
by masks that zero out all the bits that crossed the word boundary are
not optimized as well as they could be.

In the included file, compiled with "-O9 -fomit-frame-pointer",
functions rt and rt2 both result in assembly code including a
double-word shift, bringing two bits from the upper half of the
argument into the top of the lower half of the double-word value, then
masks that word with 0x33333333, which zeros out those bits:

    rt:
	    movl	8(%esp), %edx
	    movl	4(%esp), %eax
	    shrdl	$2, %edx, %eax
	    shrl	$2, %edx
	    andl	$858993459, %eax
	    andl	$858993459, %edx
	    ret

Okay, in this case, the only optimization would be to make the shift
not reference both %edx and %eax, and drop the reference to the upper
half flom the RTL during optimization.  To highlight the issue a
little more, rt4 is like rt but only returns the lower half.  Still,
the upper half is read in from memory (and shifted!) needlessly:

    rt4:
	    movl	8(%esp), %edx
	    movl	4(%esp), %eax
	    shrdl	$2, %edx, %eax
	    andl	$858993459, %eax
	    shrl	$2, %edx
	    ret

Function left shows the same problem, shifting in the opposite
direction:

    left:
	    movl	4(%esp), %eax
	    movl	8(%esp), %edx
	    shldl	$2, %eax, %edx
	    sall	$2, %eax
	    andl	$-858993460, %edx
	    andl	$-858993460, %eax
	    ret

The "andl" of %edx with 0xcccccccc will clobber the bits brought in
from %eax.

I haven't got the hang of reading ppc assembly yet, but I think the
Mac OS X compiler (10.4.2 = "gcc version 4.0.0 (Apple Computer,
Inc. build 5026)") is missing similar optimizations.  I haven't tried
the cvs code on ppc.

Environment:
System: Linux kal-el 2.4.17 #4 SMP Sun Apr 6 16:25:37 EDT 2003 i686 GNU/Linux
Architecture: i686

	
host: i686-pc-linux-gnu
build: i686-pc-linux-gnu
target: i686-pc-linux-gnu
configured with: ../src/configure --enable-maintainer-mode --prefix=/u3/raeburn/gcc/linux/Install --enable-languages=c,c++,java,objc --no-create --no-recursion : (reconfigured) ../src/configure --prefix=/u3/raeburn/gcc/linux/Install

How-To-Repeat:

typedef unsigned long long uint64_t;
typedef unsigned long uint32_t;

uint64_t rt (uint64_t n) { return (n >> 2) & 0x3333333333333333ULL; }
uint64_t rt2 (uint64_t n) { return (n & (0x3333333333333333ULL << 2)) >> 2; }
uint32_t rt4 (uint64_t n) { return (n >> 2) & 0x33333333; }
uint64_t left(uint64_t n) {
  return (n << 2) & (0xFFFFFFFFFFFFFFFFULL & ~0x3333333333333333ULL);
}

-- 
           Summary: missed 64-bit shift+mask optimizations on 32-bit arch
           Product: gcc
           Version: 4.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: raeburn at raeburn dot org
                CC: gcc-bugs at gcc dot gnu dot org
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23810


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2005-09-11  0:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-11  0:18 [Bug rtl-optimization/23810] New: missed 64-bit shift+mask optimizations on 32-bit arch gcc-bugzilla at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).