public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/58073] New: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64
@ 2013-08-03 17:01 dhowells at redhat dot com
2013-08-03 17:06 ` [Bug c/58073] " dhowells at redhat dot com
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: dhowells at redhat dot com @ 2013-08-03 17:01 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58073
Bug ID: 58073
Summary: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x &
0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64
Product: gcc
Version: 4.8.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: dhowells at redhat dot com
Created attachment 30605
--> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30605&action=edit
Demonstration source code
When the attached demo code is compiled with gcc-4.8.1, two of the cases
optimise fine and the third case is optimised suboptimally - probably because
it initially matches the optimisation for the second case.
Going through the cases individually for 'shift' being 4:
(1) return (mask(d) == (0x0 << shift));
This is rendered as a single TEST instruction in x86_64 asm:
0: f6 07 70 testb $0x70,(%rdi)
3: 0f 94 c0 sete %al
6: c3 retq
which is good.
(2) return (mask(d) == (0x0 << shift) ||
mask(d) == (0x1 << shift));
This is also rendered as a single TEST instruction:
10: f6 07 60 testb $0x60,(%rdi)
13: 0f 94 c0 sete %al
16: c3 retq
which is again good. The problem comes with the third case:
(3) return (mask(d) == (0x0 << shift) ||
mask(d) == (0x1 << shift) ||
mask(d) == (0x2 << shift));
This is rendered as:
20: 8b 17 mov (%rdi),%edx
22: b8 01 00 00 00 mov $0x1,%eax
27: f6 c2 60 test $0x60,%dl
2a: 74 09 je 35 <foo3+0x15>
2c: 83 e2 70 and $0x70,%edx
2f: 83 fa 20 cmp $0x20,%edx
32: 0f 94 c0 sete %al
35: f3 c3 repz retq
which is odd. I would expect the thing to be turned into MOV, AND, CMP, SETE,
RETQ since the numbers it is checking for lie adjacent to each other, starting
from zero.
I think what has happened is that the first two comparisons matched the
optimisation for case (2) - resulting in three extra instructions.
The compilation command line was:
gcc -O2 -c foo.c -Wall && objdump -d foo.o
The compiler version:
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.1/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin
--enable-initfini-array --enable-java-awt=gtk --disable-dssi
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--with-isl=/builddir/build/BUILD/gcc-4.8.1-20130603/obj-x86_64-redhat-linux/isl-install
--with-cloog=/builddir/build/BUILD/gcc-4.8.1-20130603/obj-x86_64-redhat-linux/cloog-install
--with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.1 20130603 (Red Hat 4.8.1-1) (GCC)
as supplied by Fedora: gcc-4.8.1-1.fc19.x86_64
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug c/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64
2013-08-03 17:01 [Bug c/58073] New: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64 dhowells at redhat dot com
@ 2013-08-03 17:06 ` dhowells at redhat dot com
2021-06-04 22:18 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || " pinskia at gcc dot gnu.org
2021-06-04 22:32 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || (x & 0x70) == 0x20) pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: dhowells at redhat dot com @ 2013-08-03 17:06 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58073
--- Comment #1 from dhowells at redhat dot com <dhowells at redhat dot com> ---
Interestingly, the suboptimality shifts if the 'shift' value in the demo
program is changed to 0:
Going through the cases individually::
(1) return (mask(d) == (0x0 << shift));
This is rendered as a single TEST instruction in x86_64 asm:
0: f6 07 07 testb $0x7,(%rdi)
3: 0f 94 c0 sete %al
6: c3 retq
which is fine.
(2) return (mask(d) == (0x0 << shift) ||
mask(d) == (0x1 << shift));
is rendered as:
10: 8b 07 mov (%rdi),%eax
12: 83 e0 07 and $0x7,%eax
15: 83 f8 01 cmp $0x1,%eax
18: 0f 96 c0 setbe %al
1b: c3 retq
which is not what I'd expect. What happened to the single TEST instruction
that was produced for shift = 4?
And then:
(3) return (mask(d) == (0x0 << shift) ||
mask(d) == (0x1 << shift) ||
mask(d) == (0x2 << shift));
which is rendered as:
20: 8b 07 mov (%rdi),%eax
22: 83 e0 07 and $0x7,%eax
25: 83 f8 02 cmp $0x2,%eax
28: 0f 96 c0 setbe %al
2b: c3 retq
which is good (and which is similar to what I'd've expected case 3 to generate
when shift = 4).
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || (x & 0x70) == 0x20) on x86_64
2013-08-03 17:01 [Bug c/58073] New: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64 dhowells at redhat dot com
2013-08-03 17:06 ` [Bug c/58073] " dhowells at redhat dot com
@ 2021-06-04 22:18 ` pinskia at gcc dot gnu.org
2021-06-04 22:32 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || (x & 0x70) == 0x20) pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-04 22:18 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58073
--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
#define shift 4
return ((mask(d) == (0x0 << shift)) ||
(mask(d) == (0x1 << shift)) ||
(mask(d) == (0x2 << shift)));
static inline unsigned mask(const struct dentry *dentry)
{
return dentry->d_flags & (0x7 << shift);
}
ifcombine does:
_4 = _5 & 112; // this was already there.
_8 = _4 == 16;
_7 = _4 == 0;
_9 = _7 | _8;
_10 = _4 == 32;
_11 = _9 | _10;
Which is correct.
reassoc1 does:
_4 = _5 & 112;
_8 = _4 == 16;
_1 = _4 & 4294967279; // -17 or ~16
_13 = _1 == 0;
_7 = _4 == 0;
_10 = _4 == 32;
_11 = _10 | _13;
and that is where it messes up, it misses reassocation of all three ands
together. And _4 & 4294967279 removes bit 7 from the original and.
Final output:
_4 = _5 & 112; // 0b1110000
_1 = _5 & 96; // 0b1100000
_13 = _1 == 0; // 0b0000000
_10 = _4 == 32;// 0b0100000
_11 = _10 | _13;
So we need have another pattern for something like this:
(bit_ior
(cmp (bit_and @0 INTEGER_CST@2) INTEGER_CST@3)
(cmp (bit_and @0 INTEGER_CST@4) INTEGER_CST@5))
And maybe even one like this (which will solve the issue sooner):
(bit_ior
(cmp (bit_and @0 INTEGER_CST@2) INTEGER_CST@3)
(cmp @0 INTEGER_CST@5)))
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || (x & 0x70) == 0x20)
2013-08-03 17:01 [Bug c/58073] New: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64 dhowells at redhat dot com
2013-08-03 17:06 ` [Bug c/58073] " dhowells at redhat dot com
2021-06-04 22:18 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || " pinskia at gcc dot gnu.org
@ 2021-06-04 22:32 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-04 22:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58073
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #6)
>
> And maybe even one like this (which will solve the issue sooner):
>
> (bit_ior
> (cmp (bit_and @0 INTEGER_CST@2) INTEGER_CST@3)
> (cmp @0 INTEGER_CST@5)))
Note the final INTEGER_CST has to be 0 I think but you get the idea.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-06-04 22:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-03 17:01 [Bug c/58073] New: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64 dhowells at redhat dot com
2013-08-03 17:06 ` [Bug c/58073] " dhowells at redhat dot com
2021-06-04 22:18 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || " pinskia at gcc dot gnu.org
2021-06-04 22:32 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || (x & 0x70) == 0x20) pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).