public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "nate at thatsmathematics dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/110780] New: aarch64 NEON redundant displaced ld3
Date: Sun, 23 Jul 2023 20:58:15 +0000	[thread overview]
Message-ID: <bug-110780-4@http.gcc.gnu.org/bugzilla/> (raw)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110780

            Bug ID: 110780
           Summary: aarch64 NEON redundant displaced ld3
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: nate at thatsmathematics dot com
  Target Milestone: ---

Compile the following with gcc 14.0.0 20230723 on aarch64 with -O3:

#include <stdint.h>
void CSI2toBE12(uint8_t* pCSI2, uint8_t* pBE, uint8_t* pCSI2LineEnd)
{
    while (pCSI2 < pCSI2LineEnd) {
        pBE[0] = pCSI2[0];
        pBE[1] = ((pCSI2[2] & 0xf) << 4) | (pCSI2[1] >> 4);
        pBE[2] = ((pCSI2[1] & 0xf) << 4) | (pCSI2[2] >> 4);
        pCSI2 += 3;
        pBE += 3;
    }
}

Godbolt: https://godbolt.org/z/WshTPKzY5

In the inner loop (.L5 of the godbolt asm) we have

        ld3     {v25.16b - v27.16b}, [x3]
        add     x6, x3, 1
        // no intervening stores
        ld3     {v25.16b - v27.16b}, [x6]

The second load is redundant.  v25, v26 are the same as what was already in
v26, v27 respectively.  The value loaded into v27 is new but it is not used in
the subsequent code.

This might also account for some extra later complexity, because it means that
the last 48 bytes of the input can't be handled by this loop (or else the
second load would be out of bounds by one byte) and so must be handled
specially.

             reply	other threads:[~2023-07-23 20:58 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-23 20:58 nate at thatsmathematics dot com [this message]
2023-07-23 21:05 ` [Bug tree-optimization/110780] " pinskia at gcc dot gnu.org
2023-07-24 19:12 ` rsandifo at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-110780-4@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).