[Bug target/101096] New: AVX512 VPMOV instruction should be used to downconvert vectors

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/101096] New: AVX512 VPMOV instruction should be used to downconvert vectors
@ 2021-06-16 14:45 ubizjak at gmail dot com
  2023-04-24  6:26 ` [Bug target/101096] " ubizjak at gmail dot com
  2023-04-26  6:52 ` crazylht at gmail dot com
  0 siblings, 2 replies; 3+ messages in thread
From: ubizjak at gmail dot com @ 2021-06-16 14:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101096

            Bug ID: 101096
           Summary: AVX512 VPMOV instruction should be used to downconvert
                    vectors
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ubizjak at gmail dot com
  Target Milestone: ---

Following testcases should use VPMOV downconvert instruction with AVX512VL:

void
foo (unsigned short* p1, unsigned short* p2, char* __restrict p3)
{
    for (int i = 0 ; i != 16; i++)
     p3[i] = p1[i] + p2[i];
     return;
}

void
foo1 (unsigned int* p1, unsigned int* p2, short* __restrict p3)
{
    for (int i = 0 ; i != 8; i++)
     p3[i] = p1[i] + p2[i];
     return;
}

gcc -O3 -mavx512vl:

foo:
        vpbroadcastw    .LC1(%rip), %xmm0
        vpand   16(%rsi), %xmm0, %xmm2
        vpand   (%rsi), %xmm0, %xmm1
        vpackuswb       %xmm2, %xmm1, %xmm1
        vpand   (%rdi), %xmm0, %xmm2
        vpand   16(%rdi), %xmm0, %xmm0
        vpackuswb       %xmm0, %xmm2, %xmm0
        vpaddb  %xmm0, %xmm1, %xmm0
        vmovdqu %xmm0, (%rdx)
        ret

foo1:
        vpbroadcastd    .LC3(%rip), %xmm0
        vpand   16(%rsi), %xmm0, %xmm2
        vpand   (%rsi), %xmm0, %xmm1
        vpackusdw       %xmm2, %xmm1, %xmm1
        vpand   (%rdi), %xmm0, %xmm2
        vpand   16(%rdi), %xmm0, %xmm0
        vpackusdw       %xmm0, %xmm2, %xmm0
        vpaddw  %xmm0, %xmm1, %xmm0
        vmovdqu %xmm0, (%rdx)
        ret

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/101096] AVX512 VPMOV instruction should be used to downconvert vectors
  2021-06-16 14:45 [Bug target/101096] New: AVX512 VPMOV instruction should be used to downconvert vectors ubizjak at gmail dot com
@ 2023-04-24  6:26 ` ubizjak at gmail dot com
  2023-04-26  6:52 ` crazylht at gmail dot com
  1 sibling, 0 replies; 3+ messages in thread
From: ubizjak at gmail dot com @ 2023-04-24  6:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101096

Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
Adding CC.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/101096] AVX512 VPMOV instruction should be used to downconvert vectors
  2021-06-16 14:45 [Bug target/101096] New: AVX512 VPMOV instruction should be used to downconvert vectors ubizjak at gmail dot com
  2023-04-24  6:26 ` [Bug target/101096] " ubizjak at gmail dot com
@ 2023-04-26  6:52 ` crazylht at gmail dot com
  1 sibling, 0 replies; 3+ messages in thread
From: crazylht at gmail dot com @ 2023-04-26  6:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101096

--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
For foo, after supporting downconvert instruction, below it's difference for
codegen.

@@ -6,15 +6,17 @@
 foo:
 .LFB0:
        .cfi_startproc
-       movl    $255, %eax
-       vpbroadcastw    %eax, %xmm0
-       vpand   16(%rsi), %xmm0, %xmm2
-       vpand   (%rsi), %xmm0, %xmm1
-       vpackuswb       %xmm2, %xmm1, %xmm1
-       vpand   (%rdi), %xmm0, %xmm2
-       vpand   16(%rdi), %xmm0, %xmm0
-       vpackuswb       %xmm0, %xmm2, %xmm0
-       vpaddb  %xmm0, %xmm1, %xmm0
+       vmovdqu16       (%rsi), %xmm1
+       vmovdqu16       16(%rsi), %xmm0
+       vmovdqu16       16(%rdi), %xmm2
+       vpmovwb %xmm0, %xmm0
+       vpmovwb %xmm1, %xmm1
+       vpunpcklqdq     %xmm0, %xmm1, %xmm1
+       vmovdqu16       (%rdi), %xmm0
+       vpmovwb %xmm2, %xmm2
+       vpmovwb %xmm0, %xmm0
+       vpunpcklqdq     %xmm2, %xmm0, %xmm0
+       vpaddb  %xmm1, %xmm0, %xmm0
        vmovdqu8        %xmm0, (%rdx)

If GCC vectorizer support different vector length(then we don't need to down
convert and pack), vpmovwb may be better, but if not, the instructions number
seems more or less, but vpmovwb is more expensive than vpand.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-04-26  6:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16 14:45 [Bug target/101096] New: AVX512 VPMOV instruction should be used to downconvert vectors ubizjak at gmail dot com
2023-04-24  6:26 ` [Bug target/101096] " ubizjak at gmail dot com
2023-04-26  6:52 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).