public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/58073] New: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64
@ 2013-08-03 17:01 dhowells at redhat dot com
  2013-08-03 17:06 ` [Bug c/58073] " dhowells at redhat dot com
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: dhowells at redhat dot com @ 2013-08-03 17:01 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58073

            Bug ID: 58073
           Summary: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x &
                    0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64
           Product: gcc
           Version: 4.8.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dhowells at redhat dot com

Created attachment 30605
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30605&action=edit
Demonstration source code

When the attached demo code is compiled with gcc-4.8.1, two of the cases
optimise fine and the third case is optimised suboptimally - probably because
it initially matches the optimisation for the second case.

Going through the cases individually for 'shift' being 4:

 (1) return (mask(d) == (0x0 << shift));

This is rendered as a single TEST instruction in x86_64 asm:

   0:   f6 07 70                testb  $0x70,(%rdi)
   3:   0f 94 c0                sete   %al
   6:   c3                      retq   

which is good.

 (2) return (mask(d) == (0x0 << shift) ||
                mask(d) == (0x1 << shift));

This is also rendered as a single TEST instruction:

  10:   f6 07 60                testb  $0x60,(%rdi)
  13:   0f 94 c0                sete   %al
  16:   c3                      retq   

which is again good.  The problem comes with the third case:

 (3) return (mask(d) == (0x0 << shift) ||
                   mask(d) == (0x1 << shift) ||
                   mask(d) == (0x2 << shift));

This is rendered as:

  20:   8b 17                   mov    (%rdi),%edx
  22:   b8 01 00 00 00          mov    $0x1,%eax
  27:   f6 c2 60                test   $0x60,%dl
  2a:   74 09                   je     35 <foo3+0x15>
  2c:   83 e2 70                and    $0x70,%edx
  2f:   83 fa 20                cmp    $0x20,%edx
  32:   0f 94 c0                sete   %al
  35:   f3 c3                   repz retq 

which is odd.  I would expect the thing to be turned into MOV, AND, CMP, SETE,
RETQ since the numbers it is checking for lie adjacent to each other, starting
from zero.

I think what has happened is that the first two comparisons matched the
optimisation for case (2) - resulting in three extra instructions.

The compilation command line was:

gcc -O2 -c foo.c -Wall && objdump -d foo.o

The compiler version:

Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.1/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla
--enable-bootstrap --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-gnu-unique-object
--enable-linker-build-id --with-linker-hash-style=gnu
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin
--enable-initfini-array --enable-java-awt=gtk --disable-dssi
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib
--with-isl=/builddir/build/BUILD/gcc-4.8.1-20130603/obj-x86_64-redhat-linux/isl-install
--with-cloog=/builddir/build/BUILD/gcc-4.8.1-20130603/obj-x86_64-redhat-linux/cloog-install
--with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.1 20130603 (Red Hat 4.8.1-1) (GCC) 

as supplied by Fedora: gcc-4.8.1-1.fc19.x86_64


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug c/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64
  2013-08-03 17:01 [Bug c/58073] New: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64 dhowells at redhat dot com
@ 2013-08-03 17:06 ` dhowells at redhat dot com
  2021-06-04 22:18 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || " pinskia at gcc dot gnu.org
  2021-06-04 22:32 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || (x & 0x70) == 0x20) pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: dhowells at redhat dot com @ 2013-08-03 17:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58073

--- Comment #1 from dhowells at redhat dot com <dhowells at redhat dot com> ---
Interestingly, the suboptimality shifts if the 'shift' value in the demo
program is changed to 0:

Going through the cases individually::

 (1) return (mask(d) == (0x0 << shift));

This is rendered as a single TEST instruction in x86_64 asm:

   0:   f6 07 07                testb  $0x7,(%rdi)
   3:   0f 94 c0                sete   %al
   6:   c3                      retq   

which is fine.

 (2) return (mask(d) == (0x0 << shift) ||
                mask(d) == (0x1 << shift));

is rendered as:

  10:   8b 07                   mov    (%rdi),%eax
  12:   83 e0 07                and    $0x7,%eax
  15:   83 f8 01                cmp    $0x1,%eax
  18:   0f 96 c0                setbe  %al
  1b:   c3                      retq   

which is not what I'd expect.  What happened to the single TEST instruction
that was produced for shift = 4?

And then:

 (3) return (mask(d) == (0x0 << shift) ||
                   mask(d) == (0x1 << shift) ||
                   mask(d) == (0x2 << shift));

which is rendered as:

  20:   8b 07                   mov    (%rdi),%eax
  22:   83 e0 07                and    $0x7,%eax
  25:   83 f8 02                cmp    $0x2,%eax
  28:   0f 96 c0                setbe  %al
  2b:   c3                      retq   

which is good (and which is similar to what I'd've expected case 3 to generate
when shift = 4).


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || (x & 0x70) == 0x20) on x86_64
  2013-08-03 17:01 [Bug c/58073] New: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64 dhowells at redhat dot com
  2013-08-03 17:06 ` [Bug c/58073] " dhowells at redhat dot com
@ 2021-06-04 22:18 ` pinskia at gcc dot gnu.org
  2021-06-04 22:32 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || (x & 0x70) == 0x20) pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-04 22:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58073

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
#define shift 4

        return ((mask(d) == (0x0 << shift)) ||
                (mask(d) == (0x1 << shift)) ||
                (mask(d) == (0x2 << shift)));

static inline unsigned mask(const struct dentry *dentry)
{
        return dentry->d_flags & (0x7 << shift);
}



ifcombine does:
  _4 = _5 & 112; // this was already there.
  _8 = _4 == 16;
  _7 = _4 == 0;
  _9 = _7 | _8;
  _10 = _4 == 32;
  _11 = _9 | _10;

Which is correct.

reassoc1 does:

  _4 = _5 & 112;
  _8 = _4 == 16;
  _1 = _4 & 4294967279; // -17 or ~16
  _13 = _1 == 0;
  _7 = _4 == 0;
  _10 = _4 == 32;
  _11 = _10 | _13;

and that is where it messes up, it misses reassocation of all three ands
together. And _4 & 4294967279 removes bit 7 from the original and.


Final output:
  _4 = _5 & 112; // 0b1110000
  _1 = _5 & 96;  // 0b1100000
  _13 = _1 == 0; // 0b0000000
  _10 = _4 == 32;// 0b0100000
  _11 = _10 | _13;

So we need have another pattern for something like this:
(bit_ior
 (cmp (bit_and @0 INTEGER_CST@2) INTEGER_CST@3)
 (cmp (bit_and @0 INTEGER_CST@4) INTEGER_CST@5))

And maybe even one like this (which will solve the issue sooner):

(bit_ior
 (cmp (bit_and @0 INTEGER_CST@2) INTEGER_CST@3)
 (cmp @0 INTEGER_CST@5)))

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || (x & 0x70) == 0x20)
  2013-08-03 17:01 [Bug c/58073] New: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64 dhowells at redhat dot com
  2013-08-03 17:06 ` [Bug c/58073] " dhowells at redhat dot com
  2021-06-04 22:18 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || " pinskia at gcc dot gnu.org
@ 2021-06-04 22:32 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-06-04 22:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58073

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #6)
> 
> And maybe even one like this (which will solve the issue sooner):
> 
> (bit_ior
>  (cmp (bit_and @0 INTEGER_CST@2) INTEGER_CST@3)
>  (cmp @0 INTEGER_CST@5)))

Note the final INTEGER_CST has to be 0 I think but you get the idea.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-04 22:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-03 17:01 [Bug c/58073] New: Suboptimal optimisation of ((x & 0x70) == 0x00 && (x & 0x70) == 0x10 && (x & 0x70) == 0x20) on x86_64 dhowells at redhat dot com
2013-08-03 17:06 ` [Bug c/58073] " dhowells at redhat dot com
2021-06-04 22:18 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || " pinskia at gcc dot gnu.org
2021-06-04 22:32 ` [Bug tree-optimization/58073] Suboptimal optimisation of ((x & 0x70) == 0x00 || (x & 0x70) == 0x10 || (x & 0x70) == 0x20) pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).