[Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/114741] [14 regression] aarch64 sve: unnecessary fmov for scalar int bit operations
Date: Thu, 18 Apr 2024 10:49:22 +0000	[thread overview]
Message-ID: <bug-114741-4-ooSCnja5YO@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-114741-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114741

--- Comment #9 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:a2f4be3dae04fa8606d1cc8451f0b9d450f7e6e6

commit r14-10014-ga2f4be3dae04fa8606d1cc8451f0b9d450f7e6e6
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Thu Apr 18 11:47:42 2024 +0100

    AArch64: remove reliance on register allocator for simd/gpreg costing.
[PR114741]

    In PR114741 we see that we have a regression in codegen when SVE is enable
where
    the simple testcase:

    void foo(unsigned v, unsigned *p)
    {
        *p = v & 1;
    }

    generates

    foo:
            fmov    s31, w0
            and     z31.s, z31.s, #1
            str     s31, [x1]
            ret

    instead of:

    foo:
            and     w0, w0, 1
            str     w0, [x1]
            ret

    This causes an impact it not just codesize but also performance.  This is
caused
    by the use of the ^ constraint modifier in the pattern <optab><mode>3.

    The documentation states that this modifier should only have an effect on
the
    alternative costing in that a particular alternative is to be preferred
unless
    a non-psuedo reload is needed.

    The pattern was trying to convey that whenever both r and w are required,
that
    it should prefer r unless a reload is needed.  This is because if a reload
is
    needed then we can construct the constants more flexibly on the SIMD side.

    We were using this so simplify the implementation and to get generic cases
such
    as:

    double negabs (double x)
    {
       unsigned long long y;
       memcpy (&y, &x, sizeof(double));
       y = y | (1UL << 63);
       memcpy (&x, &y, sizeof(double));
       return x;
    }

    which don't go through an expander.
    However the implementation of ^ in the register allocator is not according
to
    the documentation in that it also has an effect during coloring.  During
initial
    register class selection it applies a penalty to a class, similar to how ?
does.

    In this example the penalty makes the use of GP regs expensive enough that
it no
    longer considers them:

        r106: preferred FP_REGS, alternative NO_REGS, allocno FP_REGS
    ;;        3--> b  0: i   9 r106=r105&0x1
        :cortex_a53_slot_any:GENERAL_REGS+0(-1)FP_REGS+1(1)PR_LO_REGS+0(0)
                             PR_HI_REGS+0(0):model 4

    which is not the expected behavior.  For GCC 14 this is a conservative fix.

    1. we remove the ^ modifier from the logical optabs.

    2. In order not to regress copysign we then move the copysign expansion to
       directly use the SIMD variant.  Since copysign only supports floating
point
       modes this is fine and no longer relies on the register allocator to
select
       the right alternative.

    It once again regresses the general case, but this case wasn't optimized in
    earlier GCCs either so it's not a regression in GCC 14.  This change gives
    strict better codegen than earlier GCCs and still optimizes the important
cases.

    gcc/ChangeLog:

            PR target/114741
            * config/aarch64/aarch64.md (<optab><mode>3): Remove ^ from alt 2.
            (copysign<GPF:mode>3): Use SIMD version of IOR directly.

    gcc/testsuite/ChangeLog:

            PR target/114741
            * gcc.target/aarch64/fneg-abs_2.c: Update codegen.
            * gcc.target/aarch64/fneg-abs_4.c: xfail for now.
            * gcc.target/aarch64/pr114741.c: New test.

next prev parent reply	other threads:[~2024-04-18 10:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-16 12:34 [Bug target/114741] New: " nsz at gcc dot gnu.org
2024-04-16 12:55 ` [Bug target/114741] " pinskia at gcc dot gnu.org
2024-04-16 14:55 ` wilco at gcc dot gnu.org
2024-04-16 15:45 ` wilco at gcc dot gnu.org
2024-04-16 16:42 ` pinskia at gcc dot gnu.org
2024-04-16 18:03 ` pinskia at gcc dot gnu.org
2024-04-16 19:18 ` tnfchris at gcc dot gnu.org
2024-04-16 19:20 ` tnfchris at gcc dot gnu.org
2024-04-17 13:33 ` law at gcc dot gnu.org
2024-04-17 13:47 ` wilco at gcc dot gnu.org
2024-04-17 13:49 ` tnfchris at gcc dot gnu.org
2024-04-18 10:49 ` cvs-commit at gcc dot gnu.org [this message]
2024-04-18 10:51 ` tnfchris at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-114741-4-ooSCnja5YO@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).