public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95488] New: Suboptimal multiplication codegen for v16qi
@ 2020-06-03 3:00 crazylht at gmail dot com
2020-06-03 3:26 ` [Bug target/95488] " crazylht at gmail dot com
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-03 3:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
Bug ID: 95488
Summary: Suboptimal multiplication codegen for v16qi
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: crazylht at gmail dot com
Target Milestone: ---
Target: x86_64-*-* i?86-*-*
cat test.c
---
typedef unsigned char v16qi __attribute__ ((vector_size (16)));
v16qi
foo (v16qi a, v16qi b)
{
return a*b;
}
---
gcc -O2 -march=skylake-avx512
---
foo(unsigned char __vector(16), unsigned char __vector(16)):
vpunpcklbw xmm3, xmm0, xmm0
vpunpcklbw xmm2, xmm1, xmm1
vpunpckhbw xmm0, xmm0, xmm0
vpunpckhbw xmm1, xmm1, xmm1
vpmullw xmm2, xmm2, xmm3
vpmullw xmm1, xmm1, xmm0
vmovdqa xmm3, XMMWORD PTR .LC0[rip]
vpand xmm0, xmm3, xmm2
vpand xmm3, xmm3, xmm1
vpackuswb xmm0, xmm0, xmm3
ret
.LC0:
.value 255
.value 255
.value 255
.value 255
.value 255
.value 255
.value 255
.value 255
---
icc generate
---
foo(unsigned char __vector(16), unsigned char __vector(16)):
vpmovzxbw ymm2, xmm0 #5.15
vpmovzxbw ymm3, xmm1 #5.15
vpmullw ymm4, ymm2, ymm3 #5.15
vpmovwb xmm0, ymm4 #5.15
vzeroupper #5.15
ret #5.15
---
we can do better in ix86_expand_vecop_qihi, problem is how can i get sign info
for an rtx operand.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
@ 2020-06-03 3:26 ` crazylht at gmail dot com
2020-06-03 6:13 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-03 3:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))).
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
2020-06-03 3:26 ` [Bug target/95488] " crazylht at gmail dot com
@ 2020-06-03 6:13 ` rguenth at gcc dot gnu.org
2020-06-03 7:08 ` crazylht at gmail dot com
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-03 6:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #1)
> I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))).
That's not reliable. Mutliplication shouldn't care about sign?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
2020-06-03 3:26 ` [Bug target/95488] " crazylht at gmail dot com
2020-06-03 6:13 ` rguenth at gcc dot gnu.org
@ 2020-06-03 7:08 ` crazylht at gmail dot com
2020-06-03 7:15 ` crazylht at gmail dot com
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-03 7:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #2)
> (In reply to Hongtao.liu from comment #1)
> > I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))).
>
> That's not reliable. Mutliplication shouldn't care about sign?
We need to extend v16qi to v16hi first, extension does care about sign.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
` (2 preceding siblings ...)
2020-06-03 7:08 ` crazylht at gmail dot com
@ 2020-06-03 7:15 ` crazylht at gmail dot com
2020-06-11 8:11 ` crazylht at gmail dot com
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-03 7:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> (In reply to Richard Biener from comment #2)
> > (In reply to Hongtao.liu from comment #1)
> > > I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))).
> >
> > That's not reliable. Mutliplication shouldn't care about sign?
I think you're right, as along as we only care about lower 8 bits, sign isn't a
matter.
>
> We need to extend v16qi to v16hi first, extension does care about sign.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
` (3 preceding siblings ...)
2020-06-03 7:15 ` crazylht at gmail dot com
@ 2020-06-11 8:11 ` crazylht at gmail dot com
2020-06-15 1:47 ` cvs-commit at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-11 8:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
Microbenchmark
----
cat test.c
#include <stdio.h>
#include <stdlib.h>
#include <x86intrin.h>
typedef char v16qi __attribute__ ((vector_size (16)));
extern v16qi interleave_mul (v16qi, v16qi);
extern v16qi extend_mul (v16qi, v16qi);
#define LOOP 30000000
int
main ()
{
int i;
unsigned long long start, end;
unsigned long long diff;
unsigned int aux;
v16qi *p0;
v16qi *p1;
v16qi x, y;
p0 = (v16qi *) malloc (LOOP * sizeof (*p0));
p1 = (v16qi *) malloc (LOOP * sizeof (*p1));
for (i = 0; i < LOOP; i++)
for (int j = 0; j != 16; j++)
{
p0[i][j] = 1 + i + j;
p1[i][j] = 1 + i * i + j * j;
}
#if 1
start = __rdtscp (&aux);
for (i = 0; i < LOOP; i+=16)
y = interleave_mul (p0[i], p1[i]);
end = __rdtscp (&aux);
diff = end - start;
printf ("interleave_mul : %lld\n", diff);
#endif
#if 1
start = __rdtscp (&aux);
for (i = 0; i < LOOP; i+=16)
x = extend_mul (p0[i], p1[i]);
end = __rdtscp (&aux);
diff = end - start;
printf ("extend_mul : %lld\n", diff);
#endif
free (p0);
free (p1);
return 0;
}
---
show a little bit improvement:
interleave_mul : 104180000
extend_mul : 103922083
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
` (4 preceding siblings ...)
2020-06-11 8:11 ` crazylht at gmail dot com
@ 2020-06-15 1:47 ` cvs-commit at gcc dot gnu.org
2020-06-15 2:06 ` crazylht at gmail dot com
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-06-15 1:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:54cdb2f5a5b01a482d7cbce30e7b738558eecf59
commit r11-1301-g54cdb2f5a5b01a482d7cbce30e7b738558eecf59
Author: liuhongt <hongtao.liu@intel.com>
Date: Wed Jun 3 17:25:47 2020 +0800
Optimize multiplication for V8QI,V16QI,V32QI under TARGET_AVX512BW.
2020-06-13 Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog:
PR target/95488
* config/i386/i386-expand.c (ix86_expand_vecmul_qihi): New
function.
* config/i386/i386-protos.h (ix86_expand_vecmul_qihi): Declare.
* config/i386/sse.md (mul<mode>3): Drop mask_name since
there's no real simd int8 multiplication instruction with
mask. Also optimize it under TARGET_AVX512BW.
(mulv8qi3): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bw-pr95488-1.c: New test.
* gcc.target/i386/avx512bw-pr95488-2.c: Ditto.
* gcc.target/i386/avx512vl-pr95488-1.c: Ditto.
* gcc.target/i386/avx512vl-pr95488-2.c: Ditto.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
` (5 preceding siblings ...)
2020-06-15 1:47 ` cvs-commit at gcc dot gnu.org
@ 2020-06-15 2:06 ` crazylht at gmail dot com
2020-06-16 13:13 ` hjl.tools at gmail dot com
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-15 2:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED
--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC11.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
` (6 preceding siblings ...)
2020-06-15 2:06 ` crazylht at gmail dot com
@ 2020-06-16 13:13 ` hjl.tools at gmail dot com
2020-06-17 1:25 ` crazylht at gmail dot com
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2020-06-16 13:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|RESOLVED |REOPENED
Last reconfirmed| |2020-06-16
Resolution|FIXED |---
Ever confirmed|0 |1
--- Comment #8 from H.J. Lu <hjl.tools at gmail dot com> ---
-march=skylake-avx512 gave:
[hjl@gnu-cfl-2 gcc]$
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/avx512bw-pr95488-1.c
-march=skylake-avx512 -fno-diagnostics-show-caret
-fno-diagnostics-show-line-numbers -fdiagnostics-color=never
-fdiagnostics-urls=never -O2 -ffat-lto-objects -fno-ident -S -o
avx512bw-pr95488-1.s
[hjl@gnu-cfl-2 gcc]$ cat avx512bw-pr95488-1.s
.file "avx512bw-pr95488-1.c"
.text
.p2align 4
.globl mul_512
.type mul_512, @function
mul_512:
.LFB0:
.cfi_startproc
vpunpcklbw %ymm0, %ymm0, %ymm3
vpunpcklbw %ymm1, %ymm1, %ymm2
vpunpckhbw %ymm0, %ymm0, %ymm0
vpunpckhbw %ymm1, %ymm1, %ymm1
vpmullw %ymm3, %ymm2, %ymm2
vpmullw %ymm0, %ymm1, %ymm1
vpshufb .LC0(%rip), %ymm2, %ymm0
vpshufb .LC1(%rip), %ymm1, %ymm1
vpor %ymm1, %ymm0, %ymm0
ret
.cfi_endproc
.LFE0:
.size mul_512, .-mul_512
.p2align 4
.globl umul_512
.type umul_512, @function
umul_512:
.LFB1:
.cfi_startproc
vpunpcklbw %ymm0, %ymm0, %ymm3
vpunpcklbw %ymm1, %ymm1, %ymm2
vpunpckhbw %ymm0, %ymm0, %ymm0
vpunpckhbw %ymm1, %ymm1, %ymm1
vpmullw %ymm3, %ymm2, %ymm2
vpmullw %ymm0, %ymm1, %ymm1
vpshufb .LC0(%rip), %ymm2, %ymm0
vpshufb .LC1(%rip), %ymm1, %ymm1
vpor %ymm1, %ymm0, %ymm0
ret
.cfi_endproc
.LFE1:
.size umul_512, .-umul_512
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
` (7 preceding siblings ...)
2020-06-16 13:13 ` hjl.tools at gmail dot com
@ 2020-06-17 1:25 ` crazylht at gmail dot com
2020-07-09 6:56 ` crazylht at gmail dot com
2021-08-21 18:26 ` pinskia at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-17 1:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to H.J. Lu from comment #8)
> -march=skylake-avx512 gave:
>
> [hjl@gnu-cfl-2 gcc]$
> /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/
> /export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/avx512bw-
> pr95488-1.c -march=skylake-avx512 -fno-diagnostics-show-caret
> -fno-diagnostics-show-line-numbers -fdiagnostics-color=never
> -fdiagnostics-urls=never -O2 -ffat-lto-objects -fno-ident -S -o
> avx512bw-pr95488-1.s
> [hjl@gnu-cfl-2 gcc]$ cat avx512bw-pr95488-1.s
> .file "avx512bw-pr95488-1.c"
> .text
> .p2align 4
> .globl mul_512
> .type mul_512, @function
> mul_512:
> .LFB0:
> .cfi_startproc
> vpunpcklbw %ymm0, %ymm0, %ymm3
> vpunpcklbw %ymm1, %ymm1, %ymm2
> vpunpckhbw %ymm0, %ymm0, %ymm0
> vpunpckhbw %ymm1, %ymm1, %ymm1
> vpmullw %ymm3, %ymm2, %ymm2
> vpmullw %ymm0, %ymm1, %ymm1
> vpshufb .LC0(%rip), %ymm2, %ymm0
> vpshufb .LC1(%rip), %ymm1, %ymm1
> vpor %ymm1, %ymm0, %ymm0
> ret
> .cfi_endproc
> .LFE0:
> .size mul_512, .-mul_512
> .p2align 4
> .globl umul_512
> .type umul_512, @function
> umul_512:
> .LFB1:
> .cfi_startproc
> vpunpcklbw %ymm0, %ymm0, %ymm3
> vpunpcklbw %ymm1, %ymm1, %ymm2
> vpunpckhbw %ymm0, %ymm0, %ymm0
> vpunpckhbw %ymm1, %ymm1, %ymm1
> vpmullw %ymm3, %ymm2, %ymm2
> vpmullw %ymm0, %ymm1, %ymm1
> vpshufb .LC0(%rip), %ymm2, %ymm0
> vpshufb .LC1(%rip), %ymm1, %ymm1
> vpor %ymm1, %ymm0, %ymm0
> ret
> .cfi_endproc
> .LFE1:
> .size umul_512, .-umul_512
It's on purpose, maybe I'll add -mprefer-vector-with=512 to testcase.
----
19498 /* Not generate zmm instruction when prefer 128/256 bit vector width.
*/
19499 if (qimode == V32QImode
19500 && (TARGET_PREFER_AVX128 || TARGET_PREFER_AVX256))
19501 return false;
----
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
` (8 preceding siblings ...)
2020-06-17 1:25 ` crazylht at gmail dot com
@ 2020-07-09 6:56 ` crazylht at gmail dot com
2021-08-21 18:26 ` pinskia at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-07-09 6:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution|--- |FIXED
--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC11
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/95488] Suboptimal multiplication codegen for v16qi
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
` (9 preceding siblings ...)
2020-07-09 6:56 ` crazylht at gmail dot com
@ 2021-08-21 18:26 ` pinskia at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-21 18:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |11.0
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-08-21 18:26 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-03 3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
2020-06-03 3:26 ` [Bug target/95488] " crazylht at gmail dot com
2020-06-03 6:13 ` rguenth at gcc dot gnu.org
2020-06-03 7:08 ` crazylht at gmail dot com
2020-06-03 7:15 ` crazylht at gmail dot com
2020-06-11 8:11 ` crazylht at gmail dot com
2020-06-15 1:47 ` cvs-commit at gcc dot gnu.org
2020-06-15 2:06 ` crazylht at gmail dot com
2020-06-16 13:13 ` hjl.tools at gmail dot com
2020-06-17 1:25 ` crazylht at gmail dot com
2020-07-09 6:56 ` crazylht at gmail dot com
2021-08-21 18:26 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).