public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/102391] New: Failure to optimize 2 8-bit loads into a single 16-bit load
@ 2021-09-17 22:32 gabravier at gmail dot com
  2021-09-18  0:03 ` [Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load gabravier at gmail dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: gabravier at gmail dot com @ 2021-09-17 22:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

            Bug ID: 102391
           Summary: Failure to optimize 2 8-bit loads into a single 16-bit
                    load
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: gabravier at gmail dot com
  Target Milestone: ---

#include <stdint.h>

uint16_t HeaderReadU16LE(int offset, uint8_t *RomHeader)
{
    return RomHeader[offset] |
        (RomHeader[offset + 1] << 8);
}

This can be optimized into a single 16-bit load. On -O3, this optimization is
done by LLVM, but not by GCC.

This winds up affecting the resulting assembly quite a bit:

AMD64 GCC:

HeaderReadU16LE:
  movsx rdi, edi
  movzx edx, BYTE PTR [rsi+1+rdi]
  movzx eax, BYTE PTR [rsi+rdi]
  sal edx, 8
  or eax, edx
  ret

AMD64 LLVM:

HeaderReadU16LE:
  movsxd rax, edi
  movzx eax, word ptr [rsi + rax]
  ret

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load
  2021-09-17 22:32 [Bug tree-optimization/102391] New: Failure to optimize 2 8-bit loads into a single 16-bit load gabravier at gmail dot com
@ 2021-09-18  0:03 ` gabravier at gmail dot com
  2021-09-18  0:14 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: gabravier at gmail dot com @ 2021-09-18  0:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

Gabriel Ravier <gabravier at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Failure to optimize 2 8-bit |Failure to optimize
                   |loads into a single 16-bit  |adjacent 8-bit loads into a
                   |load                        |single bigger load

--- Comment #1 from Gabriel Ravier <gabravier at gmail dot com> ---
Note: this also equivalently works on bigger sizes:

uint32_t HeaderReadU32LE(int offset, uint8_t *RomHeader)
{
    return RomHeader[offset] |
        (RomHeader[offset + 1] << 8) |
        (RomHeader[offset + 2] << 16) |
        (RomHeader[offset + 3] << 24);
}

On AMD64, GCC outputs this:

HeaderReadU32LE:
  movsx rdi, edi
  movzx eax, BYTE PTR [rsi+1+rdi]
  movzx edx, BYTE PTR [rsi+2+rdi]
  sal eax, 8
  sal edx, 16
  or eax, edx
  movzx edx, BYTE PTR [rsi+rdi]
  or eax, edx
  movzx edx, BYTE PTR [rsi+3+rdi]
  sal edx, 24
  or eax, edx
  ret

LLVM manages this:

HeaderReadU32LE:
  movsxd rax, edi
  mov eax, dword ptr [rsi + rax]
  ret

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load
  2021-09-17 22:32 [Bug tree-optimization/102391] New: Failure to optimize 2 8-bit loads into a single 16-bit load gabravier at gmail dot com
  2021-09-18  0:03 ` [Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load gabravier at gmail dot com
@ 2021-09-18  0:14 ` pinskia at gcc dot gnu.org
  2021-09-20  8:42 ` rguenth at gcc dot gnu.org
  2021-12-15 23:31 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-18  0:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-09-18
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC can figure out case offset = 0;

There might be a dup of this one too.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load
  2021-09-17 22:32 [Bug tree-optimization/102391] New: Failure to optimize 2 8-bit loads into a single 16-bit load gabravier at gmail dot com
  2021-09-18  0:03 ` [Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load gabravier at gmail dot com
  2021-09-18  0:14 ` pinskia at gcc dot gnu.org
@ 2021-09-20  8:42 ` rguenth at gcc dot gnu.org
  2021-12-15 23:31 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-20  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
the bswap pass is in principle able to handle these but it sees

  _1 = (sizetype) offset_12(D);
  _2 = RomHeader_13(D) + _1;
  _3 = *_2;
  _4 = (signed short) _3;
  _5 = _1 + 1;
  _6 = RomHeader_13(D) + _5;
  _7 = *_6;

so the constant offset is not forwarded to the MEM_REFs (int vs. size_t issue)
and the bswap pass doesn't perform any fancy dataref analysis to spot
constant offsetted same bases (it could simply use split_constant_offset
on the found base I guess or invoke DR analysis in BB mode).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load
  2021-09-17 22:32 [Bug tree-optimization/102391] New: Failure to optimize 2 8-bit loads into a single 16-bit load gabravier at gmail dot com
                   ` (2 preceding siblings ...)
  2021-09-20  8:42 ` rguenth at gcc dot gnu.org
@ 2021-12-15 23:31 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-15 23:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102391

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Dup of bug 98953.

*** This bug has been marked as a duplicate of bug 98953 ***

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-12-15 23:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-17 22:32 [Bug tree-optimization/102391] New: Failure to optimize 2 8-bit loads into a single 16-bit load gabravier at gmail dot com
2021-09-18  0:03 ` [Bug tree-optimization/102391] Failure to optimize adjacent 8-bit loads into a single bigger load gabravier at gmail dot com
2021-09-18  0:14 ` pinskia at gcc dot gnu.org
2021-09-20  8:42 ` rguenth at gcc dot gnu.org
2021-12-15 23:31 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).