public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/108410] x264 averaging loop not optimized well for avx512
Date: Wed, 18 Jan 2023 12:33:21 +0000	[thread overview]
Message-ID: <bug-108410-4-u1nNWTYfCj@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-108410-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108410

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The naiive masked epilogue (--param vect-partial-vector-usage=1 and support
for whilesiult as in a prototype I have) then looks like

        leal    -1(%rdx), %eax
        cmpl    $62, %eax
        jbe     .L11

.L11:
        xorl    %ecx, %ecx
        jmp     .L4

.L4:
        movl    %ecx, %eax
        subl    %ecx, %edx
        addq    %rax, %rsi
        addq    %rax, %rdi
        addq    %r8, %rax
        cmpl    $64, %edx
        jl      .L8 
        kxorq   %k1, %k1, %k1
        kxnorq  %k1, %k1, %k1
.L7:
        vmovdqu8        (%rsi), %zmm0{%k1}{z}
        vmovdqu8        (%rdi), %zmm1{%k1}{z}
        vpavgb  %zmm1, %zmm0, %zmm0
        vmovdqu8        %zmm0, (%rax){%k1}
.L21:
        vzeroupper
        ret

.L8:
        vmovdqa64       .LC0(%rip), %zmm1
        vpbroadcastb    %edx, %zmm0
        vpcmpb  $1, %zmm0, %zmm1, %k1
        jmp     .L7

RTL isn't good at jump threading the mess caused by my ad-hoc whileult
RTL expansion - representing this at a higher level is probably the way
to go.  What you'd basically should get is for the epilogue (also used
when the main vectorized loop isn't entered):

        vmovdqa64       .LC0(%rip), %zmm1
        vpbroadcastb    %edx, %zmm0
        vpcmpb  $1, %zmm0, %zmm1, %k1
        vmovdqu8        (%rsi), %zmm0{%k1}{z}
        vmovdqu8        (%rdi), %zmm1{%k1}{z}
        vpavgb  %zmm1, %zmm0, %zmm0
        vmovdqu8        %zmm0, (%rax){%k1}

that is a compare of a vector with { niter, niter, ... } with { 0, 1,2 3, .. }
producing the mask (that has a latency of 3 according to agner) and then
simply the vectorized code masked.  You can probably assembly code that
if you'd be interested in the (optimal) performance outcome.

For now we probably want to have the main loop traditionally vectorized
without masking because Intel has poor mask support and AMD has bad
latency on the mask producing compares.  But having a masked vectorized
epilog avoids the need for a scalar epilog, saving code-size, and
avoids the need to vectorize that multiple times (or choosing SSE vectors
here).  For Zen4 the above will of course utilize two 512bit op halves
even when one is fully masked (well, I suppose at least that this is the case).

  parent reply	other threads:[~2023-01-18 12:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-14 20:55 [Bug middle-end/108410] New: " hubicka at gcc dot gnu.org
2023-01-16  8:07 ` [Bug middle-end/108410] " rguenth at gcc dot gnu.org
2023-01-18 12:33 ` rguenth at gcc dot gnu.org [this message]
2023-01-18 12:46 ` rguenth at gcc dot gnu.org
2023-06-07 12:22 ` rguenth at gcc dot gnu.org
2023-06-09 12:11 ` rguenth at gcc dot gnu.org
2023-06-12  5:48 ` crazylht at gmail dot com
2023-06-12  8:06 ` rguenther at suse dot de
2023-06-13  3:45 ` crazylht at gmail dot com
2023-06-13  8:05 ` rguenther at suse dot de
2023-06-14 12:54 ` rguenth at gcc dot gnu.org
2024-02-09 13:53 ` rguenth at gcc dot gnu.org
2024-04-15 13:29 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-108410-4-u1nNWTYfCj@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).