[Bug tree-optimization/111502] New: Suboptimal unaligned 2/4-byte memcpy on strict-align targets

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "lasse.collin at tukaani dot org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/111502] New: Suboptimal unaligned 2/4-byte memcpy on strict-align targets
Date: Wed, 20 Sep 2023 17:51:16 +0000	[thread overview]
Message-ID: <bug-111502-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111502

            Bug ID: 111502
           Summary: Suboptimal unaligned 2/4-byte memcpy on strict-align
                    targets
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: lasse.collin at tukaani dot org
  Target Milestone: ---

I was playing with RISC-V GCC 12.2.0 from Arch Linux. I noticed
inefficient-looking assembly output in code that uses memcpy to access 32-bit
unaligned integers. I tried Godbolt with 16/32-bit integers and seems that the
same weirdness happens with RV32 & RV64 with GCC 13.2.0 and trunk, and also on
a few other targets. (Clang's output looks OK.)

For a little endian target:

#include <stdint.h>
#include <string.h>

uint32_t bytes16(const uint8_t *b)
{
    return (uint32_t)b[0]
        | ((uint32_t)b[1] << 8);
}

uint32_t copy16(const uint8_t *b)
{
    uint16_t v;
    memcpy(&v, b, sizeof(v));
    return v;
}

riscv64-linux-gnu-gcc -march=rv64gc -O2 -mtune=size

bytes16:
        lhu     a0,0(a0)
        ret

copy16:
        lhu     a0,0(a0)
        ret

That looks good because -mno-strict-align is the default.

After omitting -mtune=size, unaligned access isn't used (the output is the same
as with -mstrict-align):

riscv64-linux-gnu-gcc -march=rv64gc -O2

bytes16:
        lbu     a5,1(a0)
        lbu     a0,0(a0)
        slli    a5,a5,8
        or      a0,a5,a0
        ret

copy16:
        lbu     a4,0(a0)
        lbu     a5,1(a0)
        addi    sp,sp,-16
        sb      a4,14(sp)
        sb      a5,15(sp)
        lhu     a0,14(sp)
        addi    sp,sp,16
        jr      ra

bytes16 looks good but copy16 is weird: the bytes are copied to an aligned
location on stack and then loaded back.

On Godbolt it happens with GCC 13.2.0 on RV32, RV64, ARM64 (but only if using
-mstrict-align), MIPS64EL, and SPARC & SPARC64 (comparison needs big endian
bytes16). For ARM64 and MIPS64EL the oldest GCC on Godbolt is GCC 5.4 and the
same thing happens with that too.

32-bit reads with -O2 behave similarly. With -Os a call to memcpy is emitted
for copy32 but not for bytes32.

#include <stdint.h>
#include <string.h>

uint32_t bytes32(const uint8_t *b)
{
    return (uint32_t)b[0]
        | ((uint32_t)b[1] << 8)
        | ((uint32_t)b[2] << 16)
        | ((uint32_t)b[3] << 24);
}

uint32_t copy32(const uint8_t *b)
{
    uint32_t v;
    memcpy(&v, b, sizeof(v));
    return v;
}

riscv64-linux-gnu-gcc -march=rv64gc -O2

bytes32:
        lbu     a4,1(a0)
        lbu     a3,0(a0)
        lbu     a5,2(a0)
        lbu     a0,3(a0)
        slli    a4,a4,8
        or      a4,a4,a3
        slli    a5,a5,16
        or      a5,a5,a4
        slli    a0,a0,24
        or      a0,a0,a5
        sext.w  a0,a0
        ret

copy32:
        lbu     a2,0(a0)
        lbu     a3,1(a0)
        lbu     a4,2(a0)
        lbu     a5,3(a0)
        addi    sp,sp,-16
        sb      a2,12(sp)
        sb      a3,13(sp)
        sb      a4,14(sp)
        sb      a5,15(sp)
        lw      a0,12(sp)
        addi    sp,sp,16
        jr      ra

riscv64-linux-gnu-gcc -march=rv64gc -Os

bytes32:
        lbu     a4,1(a0)
        lbu     a5,0(a0)
        slli    a4,a4,8
        or      a4,a4,a5
        lbu     a5,2(a0)
        lbu     a0,3(a0)
        slli    a5,a5,16
        or      a5,a5,a4
        slli    a0,a0,24
        or      a0,a0,a5
        sext.w  a0,a0
        ret

copy32:
        addi    sp,sp,-32
        mv      a1,a0
        li      a2,4
        addi    a0,sp,12
        sd      ra,24(sp)
        call    memcpy@plt
        ld      ra,24(sp)
        lw      a0,12(sp)
        addi    sp,sp,32
        jr      ra

I probably cannot test any proposed fixes but I hope this report is still
useful. Thanks!

next             reply	other threads:[~2023-09-20 17:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-20 17:51 lasse.collin at tukaani dot org [this message]
2023-09-20 17:56 ` [Bug tree-optimization/111502] " andrew at sifive dot com
2023-09-20 18:37 ` [Bug target/111502] " lasse.collin at tukaani dot org
2023-09-20 18:42 ` [Bug middle-end/111502] " pinskia at gcc dot gnu.org
2023-09-20 18:49 ` pinskia at gcc dot gnu.org
2023-09-20 20:06 ` lasse.collin at tukaani dot org
2023-09-20 20:20 ` andrew at sifive dot com
2023-09-21  6:22 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-111502-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).