public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/108410] x264 averaging loop not optimized well for avx512 Date: Fri, 09 Jun 2023 12:11:56 +0000 [thread overview] Message-ID: <bug-108410-4-DgfOUooFRv@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-108410-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Btw, for the case we can use the same mask compare type as we use as type for the IV (so we know we can represent all required values) we can elide the saturation. So for example void foo (double * __restrict a, double *b, double *c, int n) { for (int i = 0; i < n; ++i) a[i] = b[i] + c[i]; } can produce testl %ecx, %ecx jle .L5 vmovdqa .LC0(%rip), %ymm3 vpbroadcastd %ecx, %ymm2 xorl %eax, %eax subl $8, %ecx vpcmpud $6, %ymm3, %ymm2, %k1 .p2align 4 .p2align 3 .L3: vmovupd (%rsi,%rax), %zmm1{%k1} vmovupd (%rdx,%rax), %zmm0{%k1} movl %ecx, %r8d vaddpd %zmm1, %zmm0, %zmm2{%k1}{z} addl $8, %r8d vmovupd %zmm2, (%rdi,%rax){%k1} vpbroadcastd %ecx, %ymm2 addq $64, %rax subl $8, %ecx vpcmpud $6, %ymm3, %ymm2, %k1 cmpl $8, %r8d ja .L3 vzeroupper .L5: ret That should work as long as the data size is larger or matches the IV size which is hopefully the case for all FP testcases. The trick is going to be to make this visible to costing - I'm not sure we get to decide whether to use masking or not when we do not want to decide between vector sizes (the x86 backend picks the first successful one). For SVE it's either masking (with SVE modes) or not masking (with NEON modes) so it's decided based on mode rather than as additional knob. Performance-wise the above is likely still slower than not using masking plus a masked epilog but it would actually save on code-size for -Os or -O2. Of course for code-size we might want to stick to SSE/AVX for the smaller encoding. Note we have to watch out for all-zero masks for masked stores since that's very slow (for a reason unknown to me), when we have a stmt split to multiple vector stmts it's not uncommon (esp. for the epilog) to have one of them with an all-zero bit mask. For the loop case and .MASK_STORE we emit branchy code for this but we might want to avoid the situation by costing (and not using a masked loop/epilog in that case).
next prev parent reply other threads:[~2023-06-09 12:11 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-01-14 20:55 [Bug middle-end/108410] New: " hubicka at gcc dot gnu.org 2023-01-16 8:07 ` [Bug middle-end/108410] " rguenth at gcc dot gnu.org 2023-01-18 12:33 ` rguenth at gcc dot gnu.org 2023-01-18 12:46 ` rguenth at gcc dot gnu.org 2023-06-07 12:22 ` rguenth at gcc dot gnu.org 2023-06-09 12:11 ` rguenth at gcc dot gnu.org [this message] 2023-06-12 5:48 ` crazylht at gmail dot com 2023-06-12 8:06 ` rguenther at suse dot de 2023-06-13 3:45 ` crazylht at gmail dot com 2023-06-13 8:05 ` rguenther at suse dot de 2023-06-14 12:54 ` rguenth at gcc dot gnu.org 2024-02-09 13:53 ` rguenth at gcc dot gnu.org 2024-04-15 13:29 ` rguenth at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-108410-4-DgfOUooFRv@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).