public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/103345] New: missed optimization: add/xor individual bytes to form a word
@ 2021-11-21 11:17 gcc at rjk dot terraraq.uk
  2021-11-21 12:44 ` [Bug tree-optimization/103345] " roger at nextmovesoftware dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: gcc at rjk dot terraraq.uk @ 2021-11-21 11:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103345

            Bug ID: 103345
           Summary: missed optimization: add/xor individual bytes to form
                    a word
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gcc at rjk dot terraraq.uk
  Target Milestone: ---

All code generated with godbolt's idea of 'trunk'. See
https://godbolt.org/z/Wcj61PKKG

Source:

#include <stdint.h>

uint32_t load_le_32_or(const uint8_t *ptr)
{
  return ((uint32_t)ptr[0]) | ((uint32_t)ptr[1] << 8) | ((uint32_t)ptr[2] <<
16) | ((uint32_t)ptr[3] << 24);
}

uint32_t load_le_32_add(const uint8_t *ptr)
{
  return ((uint32_t)ptr[0]) + ((uint32_t)ptr[1] << 8) + ((uint32_t)ptr[2] <<
16) + ((uint32_t)ptr[3] << 24);
}


uint32_t load_le_32_xor(const uint8_t *ptr)
{
  return ((uint32_t)ptr[0]) ^ ((uint32_t)ptr[1] << 8) ^ ((uint32_t)ptr[2] <<
16) ^ ((uint32_t)ptr[3] << 24);
}

The ^ version is admittedly a bit of an odd choice but the + version is a
reasonably natural way to write the code.


Code on gcc -O2:

load_le_32_or:
        mov     eax, DWORD PTR [rdi]
        ret
load_le_32_add:
        movzx   eax, BYTE PTR [rdi+1]
        movzx   edx, BYTE PTR [rdi+2]
        sal     eax, 8
        sal     edx, 16
        add     eax, edx
        movzx   edx, BYTE PTR [rdi]
        add     eax, edx
        movzx   edx, BYTE PTR [rdi+3]
        sal     edx, 24
        add     eax, edx
        ret
load_le_32_xor:
        movzx   eax, BYTE PTR [rdi+1]
        movzx   edx, BYTE PTR [rdi+2]
        sal     eax, 8
        sal     edx, 16
        xor     eax, edx
        movzx   edx, BYTE PTR [rdi]
        xor     eax, edx
        movzx   edx, BYTE PTR [rdi+3]
        sal     edx, 24
        xor     eax, edx
        ret


Code on clang -O2:

load_le_32_or:                          # @load_le_32_or
        mov     eax, dword ptr [rdi]
        ret
load_le_32_add:                         # @load_le_32_add
        mov     eax, dword ptr [rdi]
        ret
load_le_32_xor:                         # @load_le_32_xor
        mov     eax, dword ptr [rdi]
        ret

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/103345] missed optimization: add/xor individual bytes to form a word
  2021-11-21 11:17 [Bug other/103345] New: missed optimization: add/xor individual bytes to form a word gcc at rjk dot terraraq.uk
@ 2021-11-21 12:44 ` roger at nextmovesoftware dot com
  2021-11-22  8:41 ` roger at nextmovesoftware dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2021-11-21 12:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103345

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
          Component|other                       |tree-optimization
     Ever confirmed|0                           |1
                 CC|                            |roger at nextmovesoftware dot com
   Last reconfirmed|                            |2021-11-21

--- Comment #1 from Roger Sayle <roger at nextmovesoftware dot com> ---
The ior form is percieved in tree-ssa's bswap pass implemented in
gimple-ssa-store-merging.c.
32 bit load in target endianness found at: _16 = MEM <unsigned int> [(const
uint8_t *)ptr_15(D)];
32-bit nop implementations found: 1
My guess it that it should be trivial to handle the PLUS_EXPR and BIT_XOR_EXPR
tree codes at the same time as BIT_IOR_EXPR.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/103345] missed optimization: add/xor individual bytes to form a word
  2021-11-21 11:17 [Bug other/103345] New: missed optimization: add/xor individual bytes to form a word gcc at rjk dot terraraq.uk
  2021-11-21 12:44 ` [Bug tree-optimization/103345] " roger at nextmovesoftware dot com
@ 2021-11-22  8:41 ` roger at nextmovesoftware dot com
  2021-11-22 18:17 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2021-11-22  8:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103345

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com
             Status|NEW                         |ASSIGNED

--- Comment #2 from Roger Sayle <roger at nextmovesoftware dot com> ---
Patch proposed
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585117.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/103345] missed optimization: add/xor individual bytes to form a word
  2021-11-21 11:17 [Bug other/103345] New: missed optimization: add/xor individual bytes to form a word gcc at rjk dot terraraq.uk
  2021-11-21 12:44 ` [Bug tree-optimization/103345] " roger at nextmovesoftware dot com
  2021-11-22  8:41 ` roger at nextmovesoftware dot com
@ 2021-11-22 18:17 ` cvs-commit at gcc dot gnu.org
  2021-11-24  8:55 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-22 18:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103345

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:a944b5dec3adb28ed199234d2116145ca9010d6a

commit r12-5453-ga944b5dec3adb28ed199234d2116145ca9010d6a
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Mon Nov 22 18:15:36 2021 +0000

    tree-optimization/103345: Improved load merging.

    This patch implements PR tree-optimization/103345 to merge adjacent
    loads when combined with addition or bitwise xor.  The current code
    in gimple-ssa-store-merging.c's find_bswap_or_nop alreay handles ior,
    so that all that's required is to treat PLUS_EXPR and BIT_XOR_EXPR in
    the same way at BIT_IOR_EXPR.  Many thanks to Andrew Pinski for
    pointing out that this also resolves PR target/98953.

    2021-11-22  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR tree-optimization/98953
            PR tree-optimization/103345
            * gimple-ssa-store-merging.c (find_bswap_or_nop_1): Handle
            BIT_XOR_EXPR and PLUS_EXPR the same as BIT_IOR_EXPR.
            (pass_optimize_bswap::execute): Likewise.

    gcc/testsuite/ChangeLog
            PR tree-optimization/98953
            PR tree-optimization/103345
            * gcc.dg/tree-ssa/pr98953.c: New test case.
            * gcc.dg/tree-ssa/pr103345.c: New test case.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/103345] missed optimization: add/xor individual bytes to form a word
  2021-11-21 11:17 [Bug other/103345] New: missed optimization: add/xor individual bytes to form a word gcc at rjk dot terraraq.uk
                   ` (2 preceding siblings ...)
  2021-11-22 18:17 ` cvs-commit at gcc dot gnu.org
@ 2021-11-24  8:55 ` cvs-commit at gcc dot gnu.org
  2021-11-25 19:47 ` roger at nextmovesoftware dot com
  2021-11-30 10:37 ` cvs-commit at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-24  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103345

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:04eccbbe3d9a4e9d2f8f43dba8ac4cb686029fb2

commit r12-5492-g04eccbbe3d9a4e9d2f8f43dba8ac4cb686029fb2
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Wed Nov 24 09:54:44 2021 +0100

    bswap: Fix up symbolic merging for xor and plus [PR103376]

    On Mon, Nov 22, 2021 at 08:39:42AM -0000, Roger Sayle wrote:
    > This patch implements PR tree-optimization/103345 to merge adjacent
    > loads when combined with addition or bitwise xor.  The current code
    > in gimple-ssa-store-merging.c's find_bswap_or_nop alreay handles ior,
    > so that all that's required is to treat PLUS_EXPR and BIT_XOR_EXPR in
    > the same way at BIT_IOR_EXPR.

    Unfortunately they aren't exactly the same.  They work the same if always
    at least one operand (or corresponding byte in it) is known to be 0,
    0 | 0 = 0 ^ 0 = 0 + 0 = 0.  But for | also x | x = x for any other x,
    so perform_symbolic_merge has been accepting either that at least one
    of the bytes is 0 or that both are the same, but that is wrong for ^
    and +.

    The following patch fixes that by passing through the code of binary
    operation and allowing non-zero masked1 == masked2 through only
    for BIT_IOR_EXPR.

    Thinking more about it, perhaps we could do more for BIT_XOR_EXPR.
    We could allow masked1 == masked2 case for it, but would need to
    do something different than the
      n->n = n1->n | n2->n;
    we do on all the bytes together.
    In particular, for masked1 == masked2 if masked1 != 0 (well, for 0
    both variants are the same) and masked1 != 0xff we would need to
    clear corresponding n->n byte instead of setting it to the input
    as x ^ x = 0 (but if we don't know what x and y are, the result is
    also don't know).  Now, for plus it is much harder, because not only
    for non-zero operands we don't know what the result is, but it can
    modify upper bytes as well.  So perhaps only if current's byte
    masked1 && masked2 set the resulting byte to 0xff (unknown) iff
    the byte above it is 0 and 0, and set that resulting byte to 0xff too.
    Also, even for | we could instead of return NULL just set the resulting
    byte to 0xff if it is different, perhaps it will be masked off later on.

    2021-11-24  Jakub Jelinek  <jakub@redhat.com>

            PR tree-optimization/103376
            * gimple-ssa-store-merging.c (perform_symbolic_merge): Add CODE
            argument.  If CODE is not BIT_IOR_EXPR, ensure that one of masked1
            or masked2 is 0.
            (find_bswap_or_nop_1, find_bswap_or_nop,
            imm_store_chain_info::try_coalesce_bswap): Adjust
            perform_symbolic_merge callers.

            * gcc.c-torture/execute/pr103376.c: New test.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/103345] missed optimization: add/xor individual bytes to form a word
  2021-11-21 11:17 [Bug other/103345] New: missed optimization: add/xor individual bytes to form a word gcc at rjk dot terraraq.uk
                   ` (3 preceding siblings ...)
  2021-11-24  8:55 ` cvs-commit at gcc dot gnu.org
@ 2021-11-25 19:47 ` roger at nextmovesoftware dot com
  2021-11-30 10:37 ` cvs-commit at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2021-11-25 19:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103345

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #5 from Roger Sayle <roger at nextmovesoftware dot com> ---
This PR should now be fixed (missed optimization implemented) on mainline.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/103345] missed optimization: add/xor individual bytes to form a word
  2021-11-21 11:17 [Bug other/103345] New: missed optimization: add/xor individual bytes to form a word gcc at rjk dot terraraq.uk
                   ` (4 preceding siblings ...)
  2021-11-25 19:47 ` roger at nextmovesoftware dot com
@ 2021-11-30 10:37 ` cvs-commit at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-30 10:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103345

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:92de188ea3d36ec012b6d42959d4722e42524256

commit r12-5616-g92de188ea3d36ec012b6d42959d4722e42524256
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Tue Nov 30 10:25:35 2021 +0000

    [Committed] PR testsuite/103477: Fix big-endian mistake in new test case.

    I missed a spot when adding the "#if __BYTE_ORDER__ == ..." guards to
    the new test case for PR tree-optimization/103345.  Committed as obvious.

    2021-11-30  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/testsuite/ChangeLog
            PR testsuite/103477
            * gcc.dg/tree-ssa/pr103345.c: Correct xor test for big-endian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-11-30 10:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-21 11:17 [Bug other/103345] New: missed optimization: add/xor individual bytes to form a word gcc at rjk dot terraraq.uk
2021-11-21 12:44 ` [Bug tree-optimization/103345] " roger at nextmovesoftware dot com
2021-11-22  8:41 ` roger at nextmovesoftware dot com
2021-11-22 18:17 ` cvs-commit at gcc dot gnu.org
2021-11-24  8:55 ` cvs-commit at gcc dot gnu.org
2021-11-25 19:47 ` roger at nextmovesoftware dot com
2021-11-30 10:37 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).