public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/113079] New: [x86] Fails to generate dot_prod instructions for 64-bit vector.
@ 2023-12-19  5:05 liuhongt at gcc dot gnu.org
  2023-12-19  5:21 ` [Bug target/113079] " liuhongt at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2023-12-19  5:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113079

            Bug ID: 113079
           Summary: [x86] Fails to generate dot_prod instructions for
                    64-bit vector.
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: liuhongt at gcc dot gnu.org
  Target Milestone: ---

int
foo (int n, unsigned char* p, char* pi)
{
    int sum = 0;
    for (int i = 0; i != 8; i++)
    {
        sum += p[i] * pi[i];
    }
    return sum;
}

We can use 128-bit dot_prod instruction + clean upper 64 bits. Currently, gcc
generates a long instruction sequence.

        vmovq   xmm0, QWORD PTR [rsi]
        vmovq   xmm2, QWORD PTR [rdx]
        vpmovzxbw       xmm1, xmm0
        vpsrlq  xmm0, xmm0, 32
        vpmovsxbw       xmm3, xmm2
        vpmullw xmm1, xmm1, xmm3
        vpsrlq  xmm2, xmm2, 32
        vpmovzxbw       xmm0, xmm0
        vpmovsxbw       xmm2, xmm2
        vpmullw xmm0, xmm0, xmm2
        vpmovsxwd       xmm2, xmm1
        vpsrlq  xmm1, xmm1, 32
        vpmovsxwd       xmm1, xmm1
        vpaddd  xmm2, xmm2, xmm1
        vpmovsxwd       xmm1, xmm0
        vpsrlq  xmm0, xmm0, 32
        vpmovsxwd       xmm0, xmm0
        vpaddd  xmm1, xmm1, xmm2
        vpxor   xmm2, xmm2, xmm2
        vpshufb xmm2, xmm2, XMMWORD PTR .LC1[rip]
        vpaddd  xmm0, xmm0, xmm1
        vpshufb xmm1, xmm0, XMMWORD PTR .LC0[rip]
        vpor    xmm1, xmm1, xmm2
        vpaddd  xmm0, xmm0, xmm1
        vmovd   eax, xmm0

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/113079] [x86] Fails to generate dot_prod instructions for 64-bit vector.
  2023-12-19  5:05 [Bug target/113079] New: [x86] Fails to generate dot_prod instructions for 64-bit vector liuhongt at gcc dot gnu.org
@ 2023-12-19  5:21 ` liuhongt at gcc dot gnu.org
  2023-12-19  9:08 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2023-12-19  5:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113079

--- Comment #1 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #0)
> int
> foo (int n, unsigned char* p, char* pi)
> {
>     int sum = 0;
>     for (int i = 0; i != 8; i++)
>     {
>         sum += p[i] * pi[i];
>     }
>     return sum;
> }
> 
> We can use 128-bit dot_prod instruction + clean upper 64 bits. Currently,
clean upper is not needed since it's integral operations, no side effect from
upper 64-bits operations.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/113079] [x86] Fails to generate dot_prod instructions for 64-bit vector.
  2023-12-19  5:05 [Bug target/113079] New: [x86] Fails to generate dot_prod instructions for 64-bit vector liuhongt at gcc dot gnu.org
  2023-12-19  5:21 ` [Bug target/113079] " liuhongt at gcc dot gnu.org
@ 2023-12-19  9:08 ` rguenth at gcc dot gnu.org
  2024-05-07  7:45 ` cvs-commit at gcc dot gnu.org
  2024-05-07  7:45 ` liuhongt at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-12-19  9:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113079

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-12-19

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It's includes a lane reduction so we need to have those zero at least on
GIMPLE (if we'd do it there) because the tree codes do not specify which
lanes are reduced.  The actual x86 instruction is probably fine so if
you make V8QI operation available in the backend that should work without
zeroing.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/113079] [x86] Fails to generate dot_prod instructions for 64-bit vector.
  2023-12-19  5:05 [Bug target/113079] New: [x86] Fails to generate dot_prod instructions for 64-bit vector liuhongt at gcc dot gnu.org
  2023-12-19  5:21 ` [Bug target/113079] " liuhongt at gcc dot gnu.org
  2023-12-19  9:08 ` rguenth at gcc dot gnu.org
@ 2024-05-07  7:45 ` cvs-commit at gcc dot gnu.org
  2024-05-07  7:45 ` liuhongt at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-07  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113079

--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:fa911365490a7ca308878517a4af6189ffba7ed6

commit r15-235-gfa911365490a7ca308878517a4af6189ffba7ed6
Author: liuhongt <hongtao.liu@intel.com>
Date:   Wed Dec 20 11:43:25 2023 +0800

    Support dot_prod optabs for 64-bit vector.

    gcc/ChangeLog:

            PR target/113079
            * config/i386/mmx.md (usdot_prodv8qi): New expander.
            (sdot_prodv8qi): Ditto.
            (udot_prodv8qi): Ditto.
            (usdot_prodv4hi): Ditto.
            (udot_prodv4hi): Ditto.
            (sdot_prodv4hi): Ditto.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr113079.c: New test.
            * gcc.target/i386/pr113079-2.c: New test.
            * gcc.target/i386/sse4-pr113079-2.c: New test.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/113079] [x86] Fails to generate dot_prod instructions for 64-bit vector.
  2023-12-19  5:05 [Bug target/113079] New: [x86] Fails to generate dot_prod instructions for 64-bit vector liuhongt at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-05-07  7:45 ` cvs-commit at gcc dot gnu.org
@ 2024-05-07  7:45 ` liuhongt at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-05-07  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113079

Hongtao Liu <liuhongt at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
Fixed in GCC15.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-07  7:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-19  5:05 [Bug target/113079] New: [x86] Fails to generate dot_prod instructions for 64-bit vector liuhongt at gcc dot gnu.org
2023-12-19  5:21 ` [Bug target/113079] " liuhongt at gcc dot gnu.org
2023-12-19  9:08 ` rguenth at gcc dot gnu.org
2024-05-07  7:45 ` cvs-commit at gcc dot gnu.org
2024-05-07  7:45 ` liuhongt at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).