public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/108410] x264 averaging loop not optimized well for avx512 Date: Wed, 18 Jan 2023 12:33:21 +0000 [thread overview] Message-ID: <bug-108410-4-u1nNWTYfCj@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-108410-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- The naiive masked epilogue (--param vect-partial-vector-usage=1 and support for whilesiult as in a prototype I have) then looks like leal -1(%rdx), %eax cmpl $62, %eax jbe .L11 .L11: xorl %ecx, %ecx jmp .L4 .L4: movl %ecx, %eax subl %ecx, %edx addq %rax, %rsi addq %rax, %rdi addq %r8, %rax cmpl $64, %edx jl .L8 kxorq %k1, %k1, %k1 kxnorq %k1, %k1, %k1 .L7: vmovdqu8 (%rsi), %zmm0{%k1}{z} vmovdqu8 (%rdi), %zmm1{%k1}{z} vpavgb %zmm1, %zmm0, %zmm0 vmovdqu8 %zmm0, (%rax){%k1} .L21: vzeroupper ret .L8: vmovdqa64 .LC0(%rip), %zmm1 vpbroadcastb %edx, %zmm0 vpcmpb $1, %zmm0, %zmm1, %k1 jmp .L7 RTL isn't good at jump threading the mess caused by my ad-hoc whileult RTL expansion - representing this at a higher level is probably the way to go. What you'd basically should get is for the epilogue (also used when the main vectorized loop isn't entered): vmovdqa64 .LC0(%rip), %zmm1 vpbroadcastb %edx, %zmm0 vpcmpb $1, %zmm0, %zmm1, %k1 vmovdqu8 (%rsi), %zmm0{%k1}{z} vmovdqu8 (%rdi), %zmm1{%k1}{z} vpavgb %zmm1, %zmm0, %zmm0 vmovdqu8 %zmm0, (%rax){%k1} that is a compare of a vector with { niter, niter, ... } with { 0, 1,2 3, .. } producing the mask (that has a latency of 3 according to agner) and then simply the vectorized code masked. You can probably assembly code that if you'd be interested in the (optimal) performance outcome. For now we probably want to have the main loop traditionally vectorized without masking because Intel has poor mask support and AMD has bad latency on the mask producing compares. But having a masked vectorized epilog avoids the need for a scalar epilog, saving code-size, and avoids the need to vectorize that multiple times (or choosing SSE vectors here). For Zen4 the above will of course utilize two 512bit op halves even when one is fully masked (well, I suppose at least that this is the case).
next prev parent reply other threads:[~2023-01-18 12:33 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2023-01-14 20:55 [Bug middle-end/108410] New: " hubicka at gcc dot gnu.org 2023-01-16 8:07 ` [Bug middle-end/108410] " rguenth at gcc dot gnu.org 2023-01-18 12:33 ` rguenth at gcc dot gnu.org [this message] 2023-01-18 12:46 ` rguenth at gcc dot gnu.org 2023-06-07 12:22 ` rguenth at gcc dot gnu.org 2023-06-09 12:11 ` rguenth at gcc dot gnu.org 2023-06-12 5:48 ` crazylht at gmail dot com 2023-06-12 8:06 ` rguenther at suse dot de 2023-06-13 3:45 ` crazylht at gmail dot com 2023-06-13 8:05 ` rguenther at suse dot de 2023-06-14 12:54 ` rguenth at gcc dot gnu.org 2024-02-09 13:53 ` rguenth at gcc dot gnu.org 2024-04-15 13:29 ` rguenth at gcc dot gnu.org
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-108410-4-u1nNWTYfCj@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).