public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug target/108583] [13 Regression] wrong code with vector division by uint16 at -O2 Date: Sun, 12 Mar 2023 18:44:06 +0000 [thread overview] Message-ID: <bug-108583-4-PhxRswyqxq@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-108583-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583 --- Comment #30 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>: https://gcc.gnu.org/g:f23dc726875c26f2c38dfded453aa9beba0b9be9 commit r13-6620-gf23dc726875c26f2c38dfded453aa9beba0b9be9 Author: Tamar Christina <tamar.christina@arm.com> Date: Sun Mar 12 18:42:59 2023 +0000 AArch64: Update div-bitmask to implement new optab instead of target hook [PR108583] This replaces the custom division hook with just an implementation through add_highpart. For NEON we implement the add highpart (Addition + extraction of the upper highpart of the register in the same precision) as ADD + LSR. This representation allows us to easily optimize the sequence using existing sequences. This gets us a pretty decent sequence using SRA: umull v1.8h, v0.8b, v3.8b umull2 v0.8h, v0.16b, v3.16b add v5.8h, v1.8h, v2.8h add v4.8h, v0.8h, v2.8h usra v1.8h, v5.8h, 8 usra v0.8h, v4.8h, 8 uzp2 v1.16b, v1.16b, v0.16b To get the most optimal sequence however we match (a + ((b + c) >> n)) where n is half the precision of the mode of the operation into addhn + uaddw which is a general good optimization on its own and gets us back to: .L4: ldr q0, [x3] umull v1.8h, v0.8b, v5.8b umull2 v0.8h, v0.16b, v5.16b addhn v3.8b, v1.8h, v4.8h addhn v2.8b, v0.8h, v4.8h uaddw v1.8h, v1.8h, v3.8b uaddw v0.8h, v0.8h, v2.8b uzp2 v1.16b, v1.16b, v0.16b str q1, [x3], 16 cmp x3, x4 bne .L4 For SVE2 we optimize the initial sequence to the same ADD + LSR which gets us: .L3: ld1b z0.h, p0/z, [x0, x3] mul z0.h, p1/m, z0.h, z2.h add z1.h, z0.h, z3.h usra z0.h, z1.h, #8 lsr z0.h, z0.h, #8 st1b z0.h, p0, [x0, x3] inch x3 whilelo p0.h, w3, w2 b.any .L3 .L1: ret and to get the most optimal sequence I match (a + b) >> n (same constraint on n) to addhnb which gets us to: .L3: ld1b z0.h, p0/z, [x0, x3] mul z0.h, p1/m, z0.h, z2.h addhnb z1.b, z0.h, z3.h addhnb z0.b, z0.h, z1.h st1b z0.h, p0, [x0, x3] inch x3 whilelo p0.h, w3, w2 b.any .L3 There are multiple RTL representations possible for these optimizations, I did not represent them using a zero_extend because we seem very inconsistent in this in the backend. Since they are unspecs we won't match them from vector ops anyway. I figured maintainers would prefer this, but my maintainer ouija board is still out for repairs :) There are no new test as new correctness tests were added to the mid-end and the existing codegen tests for this already exist. gcc/ChangeLog: PR target/108583 * config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv<mode>3): Remove. (*bitmask_shift_plus<mode>): New. * config/aarch64/aarch64-sve2.md (*bitmask_shift_plus<mode>): New. (@aarch64_bitmask_udiv<mode>3): Remove. * config/aarch64/aarch64.cc (aarch64_vectorize_can_special_div_by_constant, TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Removed. (TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT, aarch64_vectorize_preferred_div_as_shifts_over_mult): New.
next prev parent reply other threads:[~2023-03-12 18:44 UTC|newest] Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-01-28 14:00 [Bug rtl-optimization/108583] New: " zsojka at seznam dot cz 2023-01-30 4:19 ` [Bug target/108583] " pinskia at gcc dot gnu.org 2023-01-30 4:22 ` pinskia at gcc dot gnu.org 2023-01-30 8:31 ` rguenth at gcc dot gnu.org 2023-01-30 14:20 ` tnfchris at gcc dot gnu.org 2023-01-30 14:52 ` rguenther at suse dot de 2023-01-30 15:01 ` tnfchris at gcc dot gnu.org 2023-01-30 16:52 ` rsandifo at gcc dot gnu.org 2023-01-30 17:04 ` tnfchris at gcc dot gnu.org 2023-01-31 10:31 ` rguenth at gcc dot gnu.org 2023-01-31 11:01 ` rsandifo at gcc dot gnu.org 2023-01-31 11:39 ` tnfchris at gcc dot gnu.org 2023-01-31 11:44 ` rguenther at suse dot de 2023-01-31 11:58 ` tnfchris at gcc dot gnu.org 2023-01-31 12:03 ` rsandifo at gcc dot gnu.org 2023-01-31 12:19 ` rguenther at suse dot de 2023-01-31 13:35 ` tnfchris at gcc dot gnu.org 2023-01-31 14:33 ` rguenther at suse dot de 2023-01-31 14:45 ` rguenther at suse dot de 2023-01-31 15:01 ` tnfchris at gcc dot gnu.org 2023-02-01 7:29 ` rguenther at suse dot de 2023-02-01 16:22 ` tnfchris at gcc dot gnu.org 2023-02-02 8:03 ` tnfchris at gcc dot gnu.org 2023-02-02 8:50 ` rguenther at suse dot de 2023-02-02 8:55 ` tnfchris at gcc dot gnu.org 2023-02-08 13:57 ` tnfchris at gcc dot gnu.org 2023-02-09 7:41 ` rguenther at suse dot de 2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org 2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org 2023-03-12 18:43 ` cvs-commit at gcc dot gnu.org 2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org 2023-03-12 18:44 ` cvs-commit at gcc dot gnu.org [this message] 2023-03-12 18:45 ` tnfchris at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-108583-4-PhxRswyqxq@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).