public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug optimization/14504] New: Missed Bit Twiddling Optimization
@ 2004-03-09 17:17 stl at caltech dot edu
  2004-03-09 17:35 ` [Bug optimization/14504] " pinskia at gcc dot gnu dot org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: stl at caltech dot edu @ 2004-03-09 17:17 UTC (permalink / raw)
  To: gcc-bugs

g++ 3.3.2 on i686-pc-linux-gnu with -s -O3 -fomit-frame-pointer compiles:

unsigned long cond_mask(bool flag, unsigned long mask, unsigned long target) {
    return flag ? target | mask : target & ~mask;
}

into:

cmpb    $0, 4(%esp)
movl    8(%esp), %eax
movl    12(%esp), %edx
je      .L2
orl     %edx, %eax
ret
.L2:
notl    %eax
andl    %edx, %eax
ret

This appears to be a straightforward translation of the code into assembly. 
With the same options, this:

unsigned long cond_mask(bool flag, unsigned long mask, unsigned long target) {
    return (mask | target ^ 0xFFFFFFFFUL + flag) ^ 0xFFFFFFFFUL + flag;
}

is compiled into:

movzbl  4(%esp), %edx
movl    12(%esp), %eax
movl    8(%esp), %ecx
decl    %edx
xorl    %edx, %eax
orl     %ecx, %eax
xorl    %edx, %eax
ret

Again this appears to be a straightforward translation of the code into 
assembly (instead of flag + 0xFFFFFFFFUL, it uses static_cast<unsigned long>
(flag) - 1, which is the same thing).

However, the rewritten code lacks a branch yet does the exact same thing.

g++ ought to be able to perform this reasonably simple transformation on its 
own - if, indeed, the transformation is desirable (which I think it is).

NB:
I do not know assembly, so I may have deleted important information from g++'s 
output. I don't think I have, however.

This case was suggested by ryanm@microsoft.com, who said "this is something I 
will never expect a compiler to be able to optimize for me.  ^_^". I wrote the 
transformed code; perhaps there is a cleverer way to transform it which I am 
not aware of.

-- 
           Summary: Missed Bit Twiddling Optimization
           Product: gcc
           Version: 3.3.2
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P2
         Component: optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: stl at caltech dot edu
                CC: gcc-bugs at gcc dot gnu dot org,ryanm at microsoft dot
                    com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/14504] Missed Bit Twiddling Optimization
  2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
@ 2004-03-09 17:35 ` pinskia at gcc dot gnu dot org
  2004-03-09 18:01 ` [Bug optimization/14504] New: " Falk Hueffner
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-03-09 17:35 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-03-09 17:35 -------
Confirmed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
           Keywords|                            |pessimizes-code
   Last reconfirmed|0000-00-00 00:00:00         |2004-03-09 17:35:06
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bug optimization/14504] New: Missed Bit Twiddling Optimization
  2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
  2004-03-09 17:35 ` [Bug optimization/14504] " pinskia at gcc dot gnu dot org
@ 2004-03-09 18:01 ` Falk Hueffner
  2004-03-11  0:09 ` [Bug optimization/14504] " kazu at cs dot umass dot edu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Falk Hueffner @ 2004-03-09 18:01 UTC (permalink / raw)
  To: gcc-bugs

"stl at caltech dot edu" <gcc-bugzilla@gcc.gnu.org> writes:

> unsigned long cond_mask(bool flag, unsigned long mask, unsigned long target) {
>     return flag ? target | mask : target & ~mask;
> }
> 
> unsigned long cond_mask(bool flag, unsigned long mask, unsigned long target) {
>     return (mask | target ^ 0xFFFFFFFFUL + flag) ^ 0xFFFFFFFFUL + flag;
> }

This is not universally a win. For example, on an Alpha EV5, the first:

andnot  a2,a1,t1
or      a2,a1,v0
cmoveq  a0,t1,v0

takes 2 cycles, and the second:

lda     t1,-1(a0)
xor     a2,t1,v0
or      a1,v0,t0
xor     t0,t1,v0

takes 3 cycles and 1 more insn. I suspect on i386 it'd be similar if
you enable conditional moves.

-- 
	Falk


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/14504] Missed Bit Twiddling Optimization
  2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
  2004-03-09 17:35 ` [Bug optimization/14504] " pinskia at gcc dot gnu dot org
  2004-03-09 18:01 ` [Bug optimization/14504] New: " Falk Hueffner
@ 2004-03-11  0:09 ` kazu at cs dot umass dot edu
  2004-03-11 21:33 ` stl at caltech dot edu
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: kazu at cs dot umass dot edu @ 2004-03-11  0:09 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From kazu at cs dot umass dot edu  2004-03-11 00:09 -------
I've got a patch.

/* There are four cases to handle modulo whether FLAG appears
   positively and ordering of operands TARGET and MASK in the
   expression.

   With my patch, gcc -O2 -fomit-frame-pointer -mregparm=3 generates
   the following assembly code.  */

typedef _Bool bool;

unsigned long
cond_mask_0 (bool flag, unsigned long mask, unsigned long target)
{
  return flag ? target | mask : target & ~mask;
#if 0
	movzbl	%al, %eax
	decl	%eax
	xorl	%eax, %ecx
	orl	%edx, %ecx
	xorl	%ecx, %eax
#endif
}

unsigned long
cond_mask_1 (bool flag, unsigned long mask, unsigned long target)
{
  return flag ? target | ~mask : target & mask;
#if 0
	movzbl	%al, %eax
	negl	%eax
	xorl	%eax, %ecx
	andl	%edx, %ecx
	xorl	%ecx, %eax
#endif
}

unsigned long
cond_mask_2 (bool flag, unsigned long mask, unsigned long target)
{
  return flag ? target & mask : target | ~mask;
#if 0
	movzbl	%al, %eax
	decl	%eax
	xorl	%eax, %ecx
	andl	%edx, %ecx
	xorl	%ecx, %eax
#endif
}

unsigned long
cond_mask_3 (bool flag, unsigned long mask, unsigned long target)
{
  return flag ? target & ~mask : target | mask;
#if 0
	movzbl	%al, %eax
	negl	%eax
	xorl	%eax, %ecx
	orl	%edx, %ecx
	xorl	%ecx, %eax
#endif
}


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kazu at cs dot umass dot edu
         AssignedTo|unassigned at gcc dot gnu   |kazu at cs dot umass dot edu
                   |dot org                     |
             Status|NEW                         |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/14504] Missed Bit Twiddling Optimization
  2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
                   ` (2 preceding siblings ...)
  2004-03-11  0:09 ` [Bug optimization/14504] " kazu at cs dot umass dot edu
@ 2004-03-11 21:33 ` stl at caltech dot edu
  2004-03-11 21:35 ` kazu at cs dot umass dot edu
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: stl at caltech dot edu @ 2004-03-11 21:33 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From stl at caltech dot edu  2004-03-11 21:33 -------
Neat. Is this patch for 3.3.x, 3.4.0, mainline, or some combination?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/14504] Missed Bit Twiddling Optimization
  2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
                   ` (3 preceding siblings ...)
  2004-03-11 21:33 ` stl at caltech dot edu
@ 2004-03-11 21:35 ` kazu at cs dot umass dot edu
  2004-03-11 21:41 ` stl at caltech dot edu
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: kazu at cs dot umass dot edu @ 2004-03-11 21:35 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From kazu at cs dot umass dot edu  2004-03-11 21:35 -------
Mainline as this is not a regression or bug fix.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/14504] Missed Bit Twiddling Optimization
  2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
                   ` (4 preceding siblings ...)
  2004-03-11 21:35 ` kazu at cs dot umass dot edu
@ 2004-03-11 21:41 ` stl at caltech dot edu
  2004-03-11 22:10 ` falk at debian dot org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: stl at caltech dot edu @ 2004-03-11 21:41 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From stl at caltech dot edu  2004-03-11 21:41 -------
Ok. (I thought that 3.3.4 was open for non-invasive enhancements; now I see 
that it's open for non-invasive non-regression bugfixes.)

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/14504] Missed Bit Twiddling Optimization
  2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
                   ` (5 preceding siblings ...)
  2004-03-11 21:41 ` stl at caltech dot edu
@ 2004-03-11 22:10 ` falk at debian dot org
  2004-03-11 22:21 ` stl at caltech dot edu
  2004-03-11 22:29 ` kazu at cs dot umass dot edu
  8 siblings, 0 replies; 10+ messages in thread
From: falk at debian dot org @ 2004-03-11 22:10 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From falk at debian dot org  2004-03-11 22:10 -------
My comment posted to gcc-bugs didn't make it into bugzilla, so I'll repeat it.

This is not universally a win. For example, on an Alpha EV5, the first:

andnot  a2,a1,t1
or      a2,a1,v0
cmoveq  a0,t1,v0

takes 2 cycles, and the second:

lda     t1,-1(a0)
xor     a2,t1,v0
or      a1,v0,t0
xor     t0,t1,v0

takes 4 cycles and 1 more insn. I suspect on i386 it'd be similar if
you enable conditional moves.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/14504] Missed Bit Twiddling Optimization
  2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
                   ` (6 preceding siblings ...)
  2004-03-11 22:10 ` falk at debian dot org
@ 2004-03-11 22:21 ` stl at caltech dot edu
  2004-03-11 22:29 ` kazu at cs dot umass dot edu
  8 siblings, 0 replies; 10+ messages in thread
From: stl at caltech dot edu @ 2004-03-11 22:21 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From stl at caltech dot edu  2004-03-11 22:21 -------
Well, the Right Thing to do is to perform the transformation when it's a win, 
and also perform the /reverse transformation/ when it's a win, and not 
transform anything otherwise.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/14504] Missed Bit Twiddling Optimization
  2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
                   ` (7 preceding siblings ...)
  2004-03-11 22:21 ` stl at caltech dot edu
@ 2004-03-11 22:29 ` kazu at cs dot umass dot edu
  8 siblings, 0 replies; 10+ messages in thread
From: kazu at cs dot umass dot edu @ 2004-03-11 22:29 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From kazu at cs dot umass dot edu  2004-03-11 22:29 -------
Err, i386 is a two-address machine with only a few registers,
so cmove doesn't work as well in this case.

typedef _Bool bool;

unsigned long
cond_mask (bool flag, unsigned long mask, unsigned long target)
{
  unsigned long l = target | mask;
  unsigned long r = target & ~mask;
  return flag ? l : r;
#if 0
	pushl	%ebx
	movl	%eax, %ebx ; %bl = flag
	movl	%ecx, %eax
	orl	%edx, %eax ; %eax = target | mask
	notl	%edx       ; ~mask
	andl	%ecx, %edx ; %edx = target & ~mask
	testb	%bl, %bl
	popl	%ebx
	cmove	%edx, %eax
#endif
}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-03-11 22:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
2004-03-09 17:35 ` [Bug optimization/14504] " pinskia at gcc dot gnu dot org
2004-03-09 18:01 ` [Bug optimization/14504] New: " Falk Hueffner
2004-03-11  0:09 ` [Bug optimization/14504] " kazu at cs dot umass dot edu
2004-03-11 21:33 ` stl at caltech dot edu
2004-03-11 21:35 ` kazu at cs dot umass dot edu
2004-03-11 21:41 ` stl at caltech dot edu
2004-03-11 22:10 ` falk at debian dot org
2004-03-11 22:21 ` stl at caltech dot edu
2004-03-11 22:29 ` kazu at cs dot umass dot edu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).