[Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2
Date: Sun, 12 Mar 2023 18:44:06 +0000	[thread overview]
Message-ID: <bug-108583-4-PhxRswyqxq@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-108583-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #30 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:f23dc726875c26f2c38dfded453aa9beba0b9be9

commit r13-6620-gf23dc726875c26f2c38dfded453aa9beba0b9be9
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Sun Mar 12 18:42:59 2023 +0000

    AArch64: Update div-bitmask to implement new optab instead of target hook
[PR108583]

    This replaces the custom division hook with just an implementation through
    add_highpart.  For NEON we implement the add highpart (Addition +
extraction of
    the upper highpart of the register in the same precision) as ADD + LSR.

    This representation allows us to easily optimize the sequence using
existing
    sequences. This gets us a pretty decent sequence using SRA:

            umull   v1.8h, v0.8b, v3.8b
            umull2  v0.8h, v0.16b, v3.16b
            add     v5.8h, v1.8h, v2.8h
            add     v4.8h, v0.8h, v2.8h
            usra    v1.8h, v5.8h, 8
            usra    v0.8h, v4.8h, 8
            uzp2    v1.16b, v1.16b, v0.16b

    To get the most optimal sequence however we match (a + ((b + c) >> n))
where n
    is half the precision of the mode of the operation into addhn + uaddw which
is
    a general good optimization on its own and gets us back to:

    .L4:
            ldr     q0, [x3]
            umull   v1.8h, v0.8b, v5.8b
            umull2  v0.8h, v0.16b, v5.16b
            addhn   v3.8b, v1.8h, v4.8h
            addhn   v2.8b, v0.8h, v4.8h
            uaddw   v1.8h, v1.8h, v3.8b
            uaddw   v0.8h, v0.8h, v2.8b
            uzp2    v1.16b, v1.16b, v0.16b
            str     q1, [x3], 16
            cmp     x3, x4
            bne     .L4

    For SVE2 we optimize the initial sequence to the same ADD + LSR which gets
us:

    .L3:
            ld1b    z0.h, p0/z, [x0, x3]
            mul     z0.h, p1/m, z0.h, z2.h
            add     z1.h, z0.h, z3.h
            usra    z0.h, z1.h, #8
            lsr     z0.h, z0.h, #8
            st1b    z0.h, p0, [x0, x3]
            inch    x3
            whilelo p0.h, w3, w2
            b.any   .L3
    .L1:
            ret

    and to get the most optimal sequence I match (a + b) >> n (same constraint
on n)
    to addhnb which gets us to:

    .L3:
            ld1b    z0.h, p0/z, [x0, x3]
            mul     z0.h, p1/m, z0.h, z2.h
            addhnb  z1.b, z0.h, z3.h
            addhnb  z0.b, z0.h, z1.h
            st1b    z0.h, p0, [x0, x3]
            inch    x3
            whilelo p0.h, w3, w2
            b.any   .L3

    There are multiple RTL representations possible for these optimizations, I
did
    not represent them using a zero_extend because we seem very inconsistent in
this
    in the backend.  Since they are unspecs we won't match them from vector ops
    anyway. I figured maintainers would prefer this, but my maintainer ouija
board
    is still out for repairs :)

    There are no new test as new correctness tests were added to the mid-end
and
    the existing codegen tests for this already exist.

    gcc/ChangeLog:

            PR target/108583
            * config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv<mode>3):
Remove.
            (*bitmask_shift_plus<mode>): New.
            * config/aarch64/aarch64-sve2.md (*bitmask_shift_plus<mode>): New.
            (@aarch64_bitmask_udiv<mode>3): Remove.
            * config/aarch64/aarch64.cc
            (aarch64_vectorize_can_special_div_by_constant,
            TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Removed.
            (TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT,
            aarch64_vectorize_preferred_div_as_shifts_over_mult): New.

next prev parent reply	other threads:[~2023-03-12 18:44 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-28 14:00 [Bug rtl-optimization/108583] New: " zsojka at seznam dot cz
2023-01-30  4:19 ` [Bug target/108583] " pinskia at gcc dot gnu.org
2023-01-30  4:22 ` pinskia at gcc dot gnu.org
2023-01-30  8:31 ` rguenth at gcc dot gnu.org
2023-01-30 14:20 ` tnfchris at gcc dot gnu.org
2023-01-30 14:52 ` rguenther at suse dot de
2023-01-30 15:01 ` tnfchris at gcc dot gnu.org
2023-01-30 16:52 ` rsandifo at gcc dot gnu.org
2023-01-30 17:04 ` tnfchris at gcc dot gnu.org
2023-01-31 10:31 ` rguenth at gcc dot gnu.org
2023-01-31 11:01 ` rsandifo at gcc dot gnu.org
2023-01-31 11:39 ` tnfchris at gcc dot gnu.org
2023-01-31 11:44 ` rguenther at suse dot de
2023-01-31 11:58 ` tnfchris at gcc dot gnu.org
2023-01-31 12:03 ` rsandifo at gcc dot gnu.org
2023-01-31 12:19 ` rguenther at suse dot de
2023-01-31 13:35 ` tnfchris at gcc dot gnu.org
2023-01-31 14:33 ` rguenther at suse dot de
2023-01-31 14:45 ` rguenther at suse dot de
2023-01-31 15:01 ` tnfchris at gcc dot gnu.org
2023-02-01  7:29 ` rguenther at suse dot de
2023-02-01 16:22 ` tnfchris at gcc dot gnu.org
2023-02-02  8:03 ` tnfchris at gcc dot gnu.org
2023-02-02  8:50 ` rguenther at suse dot de
2023-02-02  8:55 ` tnfchris at gcc dot gnu.org
2023-02-08 13:57 ` tnfchris at gcc dot gnu.org
2023-02-09  7:41 ` rguenther at suse dot de
2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org
2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org
2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org [this message]
2023-03-12 18:45 ` tnfchris at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-108583-4-PhxRswyqxq@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).