public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug optimization/14504] New: Missed Bit Twiddling Optimization
@ 2004-03-09 17:17 stl at caltech dot edu
2004-03-09 17:35 ` [Bug optimization/14504] " pinskia at gcc dot gnu dot org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: stl at caltech dot edu @ 2004-03-09 17:17 UTC (permalink / raw)
To: gcc-bugs
g++ 3.3.2 on i686-pc-linux-gnu with -s -O3 -fomit-frame-pointer compiles:
unsigned long cond_mask(bool flag, unsigned long mask, unsigned long target) {
return flag ? target | mask : target & ~mask;
}
into:
cmpb $0, 4(%esp)
movl 8(%esp), %eax
movl 12(%esp), %edx
je .L2
orl %edx, %eax
ret
.L2:
notl %eax
andl %edx, %eax
ret
This appears to be a straightforward translation of the code into assembly.
With the same options, this:
unsigned long cond_mask(bool flag, unsigned long mask, unsigned long target) {
return (mask | target ^ 0xFFFFFFFFUL + flag) ^ 0xFFFFFFFFUL + flag;
}
is compiled into:
movzbl 4(%esp), %edx
movl 12(%esp), %eax
movl 8(%esp), %ecx
decl %edx
xorl %edx, %eax
orl %ecx, %eax
xorl %edx, %eax
ret
Again this appears to be a straightforward translation of the code into
assembly (instead of flag + 0xFFFFFFFFUL, it uses static_cast<unsigned long>
(flag) - 1, which is the same thing).
However, the rewritten code lacks a branch yet does the exact same thing.
g++ ought to be able to perform this reasonably simple transformation on its
own - if, indeed, the transformation is desirable (which I think it is).
NB:
I do not know assembly, so I may have deleted important information from g++'s
output. I don't think I have, however.
This case was suggested by ryanm@microsoft.com, who said "this is something I
will never expect a compiler to be able to optimize for me. ^_^". I wrote the
transformed code; perhaps there is a cleverer way to transform it which I am
not aware of.
--
Summary: Missed Bit Twiddling Optimization
Product: gcc
Version: 3.3.2
Status: UNCONFIRMED
Severity: enhancement
Priority: P2
Component: optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: stl at caltech dot edu
CC: gcc-bugs at gcc dot gnu dot org,ryanm at microsoft dot
com
GCC build triplet: i686-pc-linux-gnu
GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug optimization/14504] Missed Bit Twiddling Optimization
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
@ 2004-03-09 17:35 ` pinskia at gcc dot gnu dot org
2004-03-09 18:01 ` [Bug optimization/14504] New: " Falk Hueffner
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-03-09 17:35 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From pinskia at gcc dot gnu dot org 2004-03-09 17:35 -------
Confirmed.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever Confirmed| |1
Keywords| |pessimizes-code
Last reconfirmed|0000-00-00 00:00:00 |2004-03-09 17:35:06
date| |
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Bug optimization/14504] New: Missed Bit Twiddling Optimization
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
2004-03-09 17:35 ` [Bug optimization/14504] " pinskia at gcc dot gnu dot org
@ 2004-03-09 18:01 ` Falk Hueffner
2004-03-11 0:09 ` [Bug optimization/14504] " kazu at cs dot umass dot edu
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Falk Hueffner @ 2004-03-09 18:01 UTC (permalink / raw)
To: gcc-bugs
"stl at caltech dot edu" <gcc-bugzilla@gcc.gnu.org> writes:
> unsigned long cond_mask(bool flag, unsigned long mask, unsigned long target) {
> return flag ? target | mask : target & ~mask;
> }
>
> unsigned long cond_mask(bool flag, unsigned long mask, unsigned long target) {
> return (mask | target ^ 0xFFFFFFFFUL + flag) ^ 0xFFFFFFFFUL + flag;
> }
This is not universally a win. For example, on an Alpha EV5, the first:
andnot a2,a1,t1
or a2,a1,v0
cmoveq a0,t1,v0
takes 2 cycles, and the second:
lda t1,-1(a0)
xor a2,t1,v0
or a1,v0,t0
xor t0,t1,v0
takes 3 cycles and 1 more insn. I suspect on i386 it'd be similar if
you enable conditional moves.
--
Falk
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug optimization/14504] Missed Bit Twiddling Optimization
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
2004-03-09 17:35 ` [Bug optimization/14504] " pinskia at gcc dot gnu dot org
2004-03-09 18:01 ` [Bug optimization/14504] New: " Falk Hueffner
@ 2004-03-11 0:09 ` kazu at cs dot umass dot edu
2004-03-11 21:33 ` stl at caltech dot edu
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: kazu at cs dot umass dot edu @ 2004-03-11 0:09 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From kazu at cs dot umass dot edu 2004-03-11 00:09 -------
I've got a patch.
/* There are four cases to handle modulo whether FLAG appears
positively and ordering of operands TARGET and MASK in the
expression.
With my patch, gcc -O2 -fomit-frame-pointer -mregparm=3 generates
the following assembly code. */
typedef _Bool bool;
unsigned long
cond_mask_0 (bool flag, unsigned long mask, unsigned long target)
{
return flag ? target | mask : target & ~mask;
#if 0
movzbl %al, %eax
decl %eax
xorl %eax, %ecx
orl %edx, %ecx
xorl %ecx, %eax
#endif
}
unsigned long
cond_mask_1 (bool flag, unsigned long mask, unsigned long target)
{
return flag ? target | ~mask : target & mask;
#if 0
movzbl %al, %eax
negl %eax
xorl %eax, %ecx
andl %edx, %ecx
xorl %ecx, %eax
#endif
}
unsigned long
cond_mask_2 (bool flag, unsigned long mask, unsigned long target)
{
return flag ? target & mask : target | ~mask;
#if 0
movzbl %al, %eax
decl %eax
xorl %eax, %ecx
andl %edx, %ecx
xorl %ecx, %eax
#endif
}
unsigned long
cond_mask_3 (bool flag, unsigned long mask, unsigned long target)
{
return flag ? target & ~mask : target | mask;
#if 0
movzbl %al, %eax
negl %eax
xorl %eax, %ecx
orl %edx, %ecx
xorl %ecx, %eax
#endif
}
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |kazu at cs dot umass dot edu
AssignedTo|unassigned at gcc dot gnu |kazu at cs dot umass dot edu
|dot org |
Status|NEW |ASSIGNED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug optimization/14504] Missed Bit Twiddling Optimization
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
` (2 preceding siblings ...)
2004-03-11 0:09 ` [Bug optimization/14504] " kazu at cs dot umass dot edu
@ 2004-03-11 21:33 ` stl at caltech dot edu
2004-03-11 21:35 ` kazu at cs dot umass dot edu
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: stl at caltech dot edu @ 2004-03-11 21:33 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From stl at caltech dot edu 2004-03-11 21:33 -------
Neat. Is this patch for 3.3.x, 3.4.0, mainline, or some combination?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug optimization/14504] Missed Bit Twiddling Optimization
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
` (3 preceding siblings ...)
2004-03-11 21:33 ` stl at caltech dot edu
@ 2004-03-11 21:35 ` kazu at cs dot umass dot edu
2004-03-11 21:41 ` stl at caltech dot edu
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: kazu at cs dot umass dot edu @ 2004-03-11 21:35 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From kazu at cs dot umass dot edu 2004-03-11 21:35 -------
Mainline as this is not a regression or bug fix.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug optimization/14504] Missed Bit Twiddling Optimization
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
` (4 preceding siblings ...)
2004-03-11 21:35 ` kazu at cs dot umass dot edu
@ 2004-03-11 21:41 ` stl at caltech dot edu
2004-03-11 22:10 ` falk at debian dot org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: stl at caltech dot edu @ 2004-03-11 21:41 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From stl at caltech dot edu 2004-03-11 21:41 -------
Ok. (I thought that 3.3.4 was open for non-invasive enhancements; now I see
that it's open for non-invasive non-regression bugfixes.)
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug optimization/14504] Missed Bit Twiddling Optimization
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
` (5 preceding siblings ...)
2004-03-11 21:41 ` stl at caltech dot edu
@ 2004-03-11 22:10 ` falk at debian dot org
2004-03-11 22:21 ` stl at caltech dot edu
2004-03-11 22:29 ` kazu at cs dot umass dot edu
8 siblings, 0 replies; 10+ messages in thread
From: falk at debian dot org @ 2004-03-11 22:10 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From falk at debian dot org 2004-03-11 22:10 -------
My comment posted to gcc-bugs didn't make it into bugzilla, so I'll repeat it.
This is not universally a win. For example, on an Alpha EV5, the first:
andnot a2,a1,t1
or a2,a1,v0
cmoveq a0,t1,v0
takes 2 cycles, and the second:
lda t1,-1(a0)
xor a2,t1,v0
or a1,v0,t0
xor t0,t1,v0
takes 4 cycles and 1 more insn. I suspect on i386 it'd be similar if
you enable conditional moves.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug optimization/14504] Missed Bit Twiddling Optimization
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
` (6 preceding siblings ...)
2004-03-11 22:10 ` falk at debian dot org
@ 2004-03-11 22:21 ` stl at caltech dot edu
2004-03-11 22:29 ` kazu at cs dot umass dot edu
8 siblings, 0 replies; 10+ messages in thread
From: stl at caltech dot edu @ 2004-03-11 22:21 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From stl at caltech dot edu 2004-03-11 22:21 -------
Well, the Right Thing to do is to perform the transformation when it's a win,
and also perform the /reverse transformation/ when it's a win, and not
transform anything otherwise.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug optimization/14504] Missed Bit Twiddling Optimization
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
` (7 preceding siblings ...)
2004-03-11 22:21 ` stl at caltech dot edu
@ 2004-03-11 22:29 ` kazu at cs dot umass dot edu
8 siblings, 0 replies; 10+ messages in thread
From: kazu at cs dot umass dot edu @ 2004-03-11 22:29 UTC (permalink / raw)
To: gcc-bugs
------- Additional Comments From kazu at cs dot umass dot edu 2004-03-11 22:29 -------
Err, i386 is a two-address machine with only a few registers,
so cmove doesn't work as well in this case.
typedef _Bool bool;
unsigned long
cond_mask (bool flag, unsigned long mask, unsigned long target)
{
unsigned long l = target | mask;
unsigned long r = target & ~mask;
return flag ? l : r;
#if 0
pushl %ebx
movl %eax, %ebx ; %bl = flag
movl %ecx, %eax
orl %edx, %eax ; %eax = target | mask
notl %edx ; ~mask
andl %ecx, %edx ; %edx = target & ~mask
testb %bl, %bl
popl %ebx
cmove %edx, %eax
#endif
}
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14504
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-03-11 22:29 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-09 17:17 [Bug optimization/14504] New: Missed Bit Twiddling Optimization stl at caltech dot edu
2004-03-09 17:35 ` [Bug optimization/14504] " pinskia at gcc dot gnu dot org
2004-03-09 18:01 ` [Bug optimization/14504] New: " Falk Hueffner
2004-03-11 0:09 ` [Bug optimization/14504] " kazu at cs dot umass dot edu
2004-03-11 21:33 ` stl at caltech dot edu
2004-03-11 21:35 ` kazu at cs dot umass dot edu
2004-03-11 21:41 ` stl at caltech dot edu
2004-03-11 22:10 ` falk at debian dot org
2004-03-11 22:21 ` stl at caltech dot edu
2004-03-11 22:29 ` kazu at cs dot umass dot edu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).