public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95488] New: Suboptimal multiplication codegen for v16qi
@ 2020-06-03  3:00 crazylht at gmail dot com
  2020-06-03  3:26 ` [Bug target/95488] " crazylht at gmail dot com
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-03  3:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

            Bug ID: 95488
           Summary: Suboptimal multiplication codegen for v16qi
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---
            Target: x86_64-*-* i?86-*-*

cat test.c

---
typedef unsigned char v16qi __attribute__ ((vector_size (16)));
v16qi
foo (v16qi a, v16qi b)
{
    return  a*b;
}
---

gcc -O2 -march=skylake-avx512

---
foo(unsigned char __vector(16), unsigned char __vector(16)):
        vpunpcklbw      xmm3, xmm0, xmm0
        vpunpcklbw      xmm2, xmm1, xmm1
        vpunpckhbw      xmm0, xmm0, xmm0
        vpunpckhbw      xmm1, xmm1, xmm1
        vpmullw xmm2, xmm2, xmm3
        vpmullw xmm1, xmm1, xmm0
        vmovdqa xmm3, XMMWORD PTR .LC0[rip]
        vpand   xmm0, xmm3, xmm2
        vpand   xmm3, xmm3, xmm1
        vpackuswb       xmm0, xmm0, xmm3
        ret
.LC0:
        .value  255
        .value  255
        .value  255
        .value  255
        .value  255
        .value  255
        .value  255
        .value  255
---

icc generate
---
foo(unsigned char __vector(16), unsigned char __vector(16)):
        vpmovzxbw ymm2, xmm0                                    #5.15
        vpmovzxbw ymm3, xmm1                                    #5.15
        vpmullw   ymm4, ymm2, ymm3                              #5.15
        vpmovwb   xmm0, ymm4                                    #5.15
        vzeroupper                                              #5.15
        ret                                                     #5.15
---

we can do better in ix86_expand_vecop_qihi, problem is how can i get sign info
for an rtx operand.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
@ 2020-06-03  3:26 ` crazylht at gmail dot com
  2020-06-03  6:13 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-03  3:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
  2020-06-03  3:26 ` [Bug target/95488] " crazylht at gmail dot com
@ 2020-06-03  6:13 ` rguenth at gcc dot gnu.org
  2020-06-03  7:08 ` crazylht at gmail dot com
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-03  6:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #1)
> I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))).

That's not reliable.  Mutliplication shouldn't care about sign?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
  2020-06-03  3:26 ` [Bug target/95488] " crazylht at gmail dot com
  2020-06-03  6:13 ` rguenth at gcc dot gnu.org
@ 2020-06-03  7:08 ` crazylht at gmail dot com
  2020-06-03  7:15 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-03  7:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #2)
> (In reply to Hongtao.liu from comment #1)
> > I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))).
> 
> That's not reliable.  Mutliplication shouldn't care about sign?

We need to extend v16qi to v16hi first, extension does care about sign.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
                   ` (2 preceding siblings ...)
  2020-06-03  7:08 ` crazylht at gmail dot com
@ 2020-06-03  7:15 ` crazylht at gmail dot com
  2020-06-11  8:11 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-03  7:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #3)
> (In reply to Richard Biener from comment #2)
> > (In reply to Hongtao.liu from comment #1)
> > > I think it's this TYPE_SIGN (TREE_TYPE (REG_EXPR (op1))).
> > 
> > That's not reliable.  Mutliplication shouldn't care about sign?
I think you're right, as along as we only care about lower 8 bits, sign isn't a
matter. 
> 
> We need to extend v16qi to v16hi first, extension does care about sign.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
                   ` (3 preceding siblings ...)
  2020-06-03  7:15 ` crazylht at gmail dot com
@ 2020-06-11  8:11 ` crazylht at gmail dot com
  2020-06-15  1:47 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-11  8:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
Microbenchmark
----
cat test.c

#include <stdio.h>
#include <stdlib.h>
#include <x86intrin.h>

typedef char  v16qi  __attribute__ ((vector_size (16)));
extern v16qi interleave_mul (v16qi, v16qi);
extern v16qi extend_mul (v16qi, v16qi);

#define LOOP 30000000


int
main ()
{
  int i;
  unsigned long long start, end;
  unsigned long long diff;
  unsigned int aux;
  v16qi *p0;
  v16qi *p1;
  v16qi x, y;

  p0 = (v16qi *) malloc (LOOP *  sizeof (*p0));
  p1 = (v16qi *) malloc (LOOP *  sizeof (*p1));
  for (i = 0; i < LOOP; i++)
    for (int j = 0; j != 16; j++)
    {
      p0[i][j] = 1 + i + j;
      p1[i][j] = 1 + i * i + j * j;
    }

#if 1
  start = __rdtscp (&aux);
  for (i = 0; i < LOOP; i+=16)
    y = interleave_mul (p0[i], p1[i]);
  end = __rdtscp (&aux);
  diff = end - start;

  printf ("interleave_mul : %lld\n", diff);

#endif

#if 1
  start = __rdtscp (&aux);
  for (i = 0; i < LOOP; i+=16)
    x = extend_mul (p0[i], p1[i]);
  end = __rdtscp (&aux);
  diff = end - start;

  printf ("extend_mul :    %lld\n", diff);
#endif

  free (p0);
  free (p1);

  return 0;
}
---
show a little bit improvement:

interleave_mul : 104180000
extend_mul :    103922083

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
                   ` (4 preceding siblings ...)
  2020-06-11  8:11 ` crazylht at gmail dot com
@ 2020-06-15  1:47 ` cvs-commit at gcc dot gnu.org
  2020-06-15  2:06 ` crazylht at gmail dot com
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-06-15  1:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:54cdb2f5a5b01a482d7cbce30e7b738558eecf59

commit r11-1301-g54cdb2f5a5b01a482d7cbce30e7b738558eecf59
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Jun 3 17:25:47 2020 +0800

    Optimize multiplication for V8QI,V16QI,V32QI under TARGET_AVX512BW.

    2020-06-13   Hongtao Liu  <hongtao.liu@intel.com>

    gcc/ChangeLog:
            PR target/95488
            * config/i386/i386-expand.c (ix86_expand_vecmul_qihi): New
            function.
            * config/i386/i386-protos.h (ix86_expand_vecmul_qihi): Declare.
            * config/i386/sse.md (mul<mode>3): Drop mask_name since
            there's no real simd int8 multiplication instruction with
            mask. Also optimize it under TARGET_AVX512BW.
            (mulv8qi3): New expander.

    gcc/testsuite/ChangeLog:
            * gcc.target/i386/avx512bw-pr95488-1.c: New test.
            * gcc.target/i386/avx512bw-pr95488-2.c: Ditto.
            * gcc.target/i386/avx512vl-pr95488-1.c: Ditto.
            * gcc.target/i386/avx512vl-pr95488-2.c: Ditto.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
                   ` (5 preceding siblings ...)
  2020-06-15  1:47 ` cvs-commit at gcc dot gnu.org
@ 2020-06-15  2:06 ` crazylht at gmail dot com
  2020-06-16 13:13 ` hjl.tools at gmail dot com
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-15  2:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC11.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
                   ` (6 preceding siblings ...)
  2020-06-15  2:06 ` crazylht at gmail dot com
@ 2020-06-16 13:13 ` hjl.tools at gmail dot com
  2020-06-17  1:25 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: hjl.tools at gmail dot com @ 2020-06-16 13:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

H.J. Lu <hjl.tools at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
   Last reconfirmed|                            |2020-06-16
         Resolution|FIXED                       |---
     Ever confirmed|0                           |1

--- Comment #8 from H.J. Lu <hjl.tools at gmail dot com> ---
 -march=skylake-avx512 gave:

[hjl@gnu-cfl-2 gcc]$
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/
/export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/avx512bw-pr95488-1.c
 -march=skylake-avx512   -fno-diagnostics-show-caret
-fno-diagnostics-show-line-numbers -fdiagnostics-color=never 
-fdiagnostics-urls=never  -O2  -ffat-lto-objects -fno-ident -S -o
avx512bw-pr95488-1.s
[hjl@gnu-cfl-2 gcc]$ cat avx512bw-pr95488-1.s
        .file   "avx512bw-pr95488-1.c"
        .text
        .p2align 4
        .globl  mul_512
        .type   mul_512, @function
mul_512:
.LFB0:
        .cfi_startproc
        vpunpcklbw      %ymm0, %ymm0, %ymm3
        vpunpcklbw      %ymm1, %ymm1, %ymm2
        vpunpckhbw      %ymm0, %ymm0, %ymm0
        vpunpckhbw      %ymm1, %ymm1, %ymm1
        vpmullw %ymm3, %ymm2, %ymm2
        vpmullw %ymm0, %ymm1, %ymm1
        vpshufb .LC0(%rip), %ymm2, %ymm0
        vpshufb .LC1(%rip), %ymm1, %ymm1
        vpor    %ymm1, %ymm0, %ymm0
        ret
        .cfi_endproc
.LFE0:
        .size   mul_512, .-mul_512
        .p2align 4
        .globl  umul_512
        .type   umul_512, @function
umul_512:
.LFB1:
        .cfi_startproc
        vpunpcklbw      %ymm0, %ymm0, %ymm3
        vpunpcklbw      %ymm1, %ymm1, %ymm2
        vpunpckhbw      %ymm0, %ymm0, %ymm0
        vpunpckhbw      %ymm1, %ymm1, %ymm1
        vpmullw %ymm3, %ymm2, %ymm2
        vpmullw %ymm0, %ymm1, %ymm1
        vpshufb .LC0(%rip), %ymm2, %ymm0
        vpshufb .LC1(%rip), %ymm1, %ymm1
        vpor    %ymm1, %ymm0, %ymm0
        ret
        .cfi_endproc
.LFE1:
        .size   umul_512, .-umul_512

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
                   ` (7 preceding siblings ...)
  2020-06-16 13:13 ` hjl.tools at gmail dot com
@ 2020-06-17  1:25 ` crazylht at gmail dot com
  2020-07-09  6:56 ` crazylht at gmail dot com
  2021-08-21 18:26 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-06-17  1:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to H.J. Lu from comment #8)
>  -march=skylake-avx512 gave:
> 
> [hjl@gnu-cfl-2 gcc]$
> /export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/
> /export/gnu/import/git/sources/gcc/gcc/testsuite/gcc.target/i386/avx512bw-
> pr95488-1.c  -march=skylake-avx512   -fno-diagnostics-show-caret
> -fno-diagnostics-show-line-numbers -fdiagnostics-color=never 
> -fdiagnostics-urls=never  -O2  -ffat-lto-objects -fno-ident -S -o
> avx512bw-pr95488-1.s
> [hjl@gnu-cfl-2 gcc]$ cat avx512bw-pr95488-1.s
> 	.file	"avx512bw-pr95488-1.c"
> 	.text
> 	.p2align 4
> 	.globl	mul_512
> 	.type	mul_512, @function
> mul_512:
> .LFB0:
> 	.cfi_startproc
> 	vpunpcklbw	%ymm0, %ymm0, %ymm3
> 	vpunpcklbw	%ymm1, %ymm1, %ymm2
> 	vpunpckhbw	%ymm0, %ymm0, %ymm0
> 	vpunpckhbw	%ymm1, %ymm1, %ymm1
> 	vpmullw	%ymm3, %ymm2, %ymm2
> 	vpmullw	%ymm0, %ymm1, %ymm1
> 	vpshufb	.LC0(%rip), %ymm2, %ymm0
> 	vpshufb	.LC1(%rip), %ymm1, %ymm1
> 	vpor	%ymm1, %ymm0, %ymm0
> 	ret
> 	.cfi_endproc
> .LFE0:
> 	.size	mul_512, .-mul_512
> 	.p2align 4
> 	.globl	umul_512
> 	.type	umul_512, @function
> umul_512:
> .LFB1:
> 	.cfi_startproc
> 	vpunpcklbw	%ymm0, %ymm0, %ymm3
> 	vpunpcklbw	%ymm1, %ymm1, %ymm2
> 	vpunpckhbw	%ymm0, %ymm0, %ymm0
> 	vpunpckhbw	%ymm1, %ymm1, %ymm1
> 	vpmullw	%ymm3, %ymm2, %ymm2
> 	vpmullw	%ymm0, %ymm1, %ymm1
> 	vpshufb	.LC0(%rip), %ymm2, %ymm0
> 	vpshufb	.LC1(%rip), %ymm1, %ymm1
> 	vpor	%ymm1, %ymm0, %ymm0
> 	ret
> 	.cfi_endproc
> .LFE1:
> 	.size	umul_512, .-umul_512

It's on purpose, maybe I'll add -mprefer-vector-with=512 to testcase.
----
19498  /* Not generate zmm instruction when prefer 128/256 bit vector width. 
*/
19499  if (qimode == V32QImode 
19500      && (TARGET_PREFER_AVX128 || TARGET_PREFER_AVX256))
19501    return false;
----

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
                   ` (8 preceding siblings ...)
  2020-06-17  1:25 ` crazylht at gmail dot com
@ 2020-07-09  6:56 ` crazylht at gmail dot com
  2021-08-21 18:26 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2020-07-09  6:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC11

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/95488] Suboptimal multiplication codegen for v16qi
  2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
                   ` (9 preceding siblings ...)
  2020-07-09  6:56 ` crazylht at gmail dot com
@ 2021-08-21 18:26 ` pinskia at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-21 18:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95488

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-08-21 18:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-03  3:00 [Bug target/95488] New: Suboptimal multiplication codegen for v16qi crazylht at gmail dot com
2020-06-03  3:26 ` [Bug target/95488] " crazylht at gmail dot com
2020-06-03  6:13 ` rguenth at gcc dot gnu.org
2020-06-03  7:08 ` crazylht at gmail dot com
2020-06-03  7:15 ` crazylht at gmail dot com
2020-06-11  8:11 ` crazylht at gmail dot com
2020-06-15  1:47 ` cvs-commit at gcc dot gnu.org
2020-06-15  2:06 ` crazylht at gmail dot com
2020-06-16 13:13 ` hjl.tools at gmail dot com
2020-06-17  1:25 ` crazylht at gmail dot com
2020-07-09  6:56 ` crazylht at gmail dot com
2021-08-21 18:26 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).