public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug target/107093] AVX512 mask operations not simplified in fully masked loop Date: Mon, 24 Jul 2023 08:21:37 +0000 [thread overview] Message-ID: <bug-107093-4-K89aVO2ctS@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-107093-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- icelake is able to forward a masked store with a all-ones mask, Zen4 isn't able to do that. Other masked stores indeed do not forward. There's a related problem also when an outer loop causes a low trip inner loop to use masked load/store to then overlapping vectors: outer iteration 1 ... = .MASK_LOAD (p, {-1, -1, -1, -1, 0, 0, 0, 0}); ... .MASK_STORE (p, val, {-1, -1, -1, -1, 0, 0, 0, 0}); outer iteration 2 ... = .MASK_LOAD (p + delta, {-1, -1, -1, -1, 0, 0, 0, 0}); ... .MASK_STORE (p + delta, val, {-1, -1, -1, -1, 0, 0, 0, 0}); with delta causing the next outer iteration to access the masked out values from the previous iteration. That gets a STLF failure (obviously) but we now also need to wait for the masked store to retire before the masked load of iteration 2 can be carried out. We are hitting this case in SPEC CPU 2017 with masked epilogues (the inner loop just iterates 4 times, vectorized with V8DFmode vectors). Ideally the implementation (the CPU) would "shorten" loads/stores for trailing sequences of zeros so this hazard doesn't occur. Not sure if that would be allowed by the x86 memory model though (I didn't find anything specific there with respect to load/store masking). ISTR store buffer entries are usually assigned at instruction issue time, I'm not sure if the mask is resolved there or whether the size of the store could be adjusted later when it is. The implementation could also somehow ignore the "conflict". Note I didn't yet fully benchmark masked epilogues with -mpreferred-vector-width=512 on icelake or sapphire rapids, maybe Intel CPUs are not affected by this issue. The original issue in the description seems solved we now generate the following with the code generation variant that's now on trunk: .L3: vmovapd b(%rax), %ymm0{%k1} movl %edx, %ecx subl $4, %edx kmovw %edx, %k0 vmulpd %ymm3, %ymm0, %ymm1{%k1}{z} vmovapd %ymm1, a(%rax){%k1} vpbroadcastmw2d %k0, %xmm1 addq $32, %rax vpcmpud $6, %xmm2, %xmm1, %k1 cmpw $4, %cx ja .L3 that avoids using the slow mask ops for loop control. It oddly does subl $4, %edx kmovw %edx, %k0 vpbroadcastmw2d %k0, %xmm1 with -march=cascadelake - with -march=znver4 I get the expected subl $8, %edx vpbroadcastw %edx, %xmm1 but I can reproduce the mask register "spill" with -mprefer-vector-width=256. We expand to (insn 14 13 15 (set (reg:V4SI 96) (vec_duplicate:V4SI (reg:SI 93 [ _27 ]))) 8167 {*avx512vl_vec_dup_gprv4si} (nil)) I'll file a separate bugreport for this.
prev parent reply other threads:[~2023-07-24 8:21 UTC|newest] Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-09-30 9:06 [Bug target/107093] New: " rguenth at gcc dot gnu.org 2022-09-30 9:09 ` [Bug target/107093] " rguenth at gcc dot gnu.org 2022-10-10 1:50 ` crazylht at gmail dot com 2022-10-11 9:23 ` cvs-commit at gcc dot gnu.org 2022-10-11 10:05 ` crazylht at gmail dot com 2022-10-11 10:13 ` crazylht at gmail dot com 2022-10-11 10:51 ` rguenth at gcc dot gnu.org 2022-10-11 10:59 ` rguenth at gcc dot gnu.org 2022-10-11 11:08 ` crazylht at gmail dot com 2022-10-11 11:14 ` rguenther at suse dot de 2023-07-24 8:21 ` rguenth at gcc dot gnu.org [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-107093-4-K89aVO2ctS@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).