public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/106060] New: Inefficient constant broadcast on x86_64
@ 2022-06-23  1:59 goldstein.w.n at gmail dot com
  2022-06-23  7:05 ` [Bug target/106060] " crazylht at gmail dot com
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: goldstein.w.n at gmail dot com @ 2022-06-23  1:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

            Bug ID: 106060
           Summary: Inefficient constant broadcast on x86_64
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: goldstein.w.n at gmail dot com
  Target Milestone: ---

```
#include <immintrin.h>

__m256i
shouldnt_have_movabs ()
{
  return _mm256_set1_epi8 (123);
}

__m256i
should_be_cmpeq_abs ()
{
  return _mm256_set1_epi8 (1);
}

__m256i
should_be_cmpeq_add ()
{
  return _mm256_set1_epi8 (-2);
}
```

Compiled with: '-O3 -march=x86-64-v3'

Results in:
```
Disassembly of section .text:

0000000000000000 <shouldnt_have_movabs>:
   0:   48 b8 7b 7b 7b 7b 7b    movabs $0x7b7b7b7b7b7b7b7b,%rax
   7:   7b 7b 7b
   a:   c4 e1 f9 6e c8          vmovq  %rax,%xmm1
   f:   c4 e2 7d 59 c1          vpbroadcastq %xmm1,%ymm0
  14:   c3                      retq
  15:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
  1c:   00 00 00 00

0000000000000020 <should_be_cmpeq_abs>:
  20:   48 b8 01 01 01 01 01    movabs $0x101010101010101,%rax
  27:   01 01 01
  2a:   c4 e1 f9 6e c8          vmovq  %rax,%xmm1
  2f:   c4 e2 7d 59 c1          vpbroadcastq %xmm1,%ymm0
  34:   c3                      retq
  35:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
  3c:   00 00 00 00

0000000000000040 <should_be_cmpeq_add>:
  40:   48 b8 fe fe fe fe fe    movabs $0xfefefefefefefefe,%rax
  47:   fe fe fe
  4a:   c4 e1 f9 6e c8          vmovq  %rax,%xmm1
  4f:   c4 e2 7d 59 c1          vpbroadcastq %xmm1,%ymm0
  54:   c3                      retq
```

Compiled with: '-O3 -march=x86-64-v4'

Results in:
```
0000000000000000 <shouldnt_have_movabs>:
   0:   48 b8 7b 7b 7b 7b 7b    movabs $0x7b7b7b7b7b7b7b7b,%rax
   7:   7b 7b 7b
   a:   62 f2 fd 28 7c c0       vpbroadcastq %rax,%ymm0
  10:   c3                      retq
  11:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
  18:   00 00 00 00
  1c:   0f 1f 40 00             nopl   0x0(%rax)

0000000000000020 <should_be_cmpeq_abs>:
  20:   48 b8 01 01 01 01 01    movabs $0x101010101010101,%rax
  27:   01 01 01
  2a:   62 f2 fd 28 7c c0       vpbroadcastq %rax,%ymm0
  30:   c3                      retq
  31:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
  38:   00 00 00 00
  3c:   0f 1f 40 00             nopl   0x0(%rax)

0000000000000040 <should_be_cmpeq_add>:
  40:   48 b8 fe fe fe fe fe    movabs $0xfefefefefefefefe,%rax
  47:   fe fe fe
  4a:   62 f2 fd 28 7c c0       vpbroadcastq %rax,%ymm0
  50:   c3                      retq
```


All functions / targets are inoptimal.

Generating 1/2 can be done without any lane-cross broadcast.

Generating constants like 123 shouldn't first be constant broadcast
into an imm64. That makes it require an 10-byte `movabs` and wastes
spaces.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106060] Inefficient constant broadcast on x86_64
  2022-06-23  1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
@ 2022-06-23  7:05 ` crazylht at gmail dot com
  2022-06-23 15:46 ` hjl.tools at gmail dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: crazylht at gmail dot com @ 2022-06-23  7:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
I remember it's on purpose by r12-1958-gedafb35bdadf30, related PR100865.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106060] Inefficient constant broadcast on x86_64
  2022-06-23  1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
  2022-06-23  7:05 ` [Bug target/106060] " crazylht at gmail dot com
@ 2022-06-23 15:46 ` hjl.tools at gmail dot com
  2023-05-17 23:15 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: hjl.tools at gmail dot com @ 2022-06-23 15:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
Created attachment 53196
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53196&action=edit
A patch

This generates:

0000000000000000 <shouldnt_have_movabs>:
   0:   b8 7b 00 00 00          mov    $0x7b,%eax
   5:   c5 f9 6e c0             vmovd  %eax,%xmm0
   9:   c4 e2 7d 78 c0          vpbroadcastb %xmm0,%ymm0
   e:   c3                      ret
   f:   90                      nop

0000000000000010 <should_be_cmpeq_abs>:
  10:   b8 01 00 00 00          mov    $0x1,%eax
  15:   c5 f9 6e c0             vmovd  %eax,%xmm0
  19:   c4 e2 7d 78 c0          vpbroadcastb %xmm0,%ymm0
  1e:   c3                      ret
  1f:   90                      nop

0000000000000020 <should_be_cmpeq_add>:
  20:   b8 fe ff ff ff          mov    $0xfffffffe,%eax
  25:   c5 f9 6e c0             vmovd  %eax,%xmm0
  29:   c4 e2 7d 78 c0          vpbroadcastb %xmm0,%ymm0
  2e:   c3                      ret

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106060] Inefficient constant broadcast on x86_64
  2022-06-23  1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
  2022-06-23  7:05 ` [Bug target/106060] " crazylht at gmail dot com
  2022-06-23 15:46 ` hjl.tools at gmail dot com
@ 2023-05-17 23:15 ` pinskia at gcc dot gnu.org
  2023-05-17 23:15 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17 23:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-05-17

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed, I think HJL's patch definitely improves things. Though I wonder if
they could be improved further.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106060] Inefficient constant broadcast on x86_64
  2022-06-23  1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
                   ` (2 preceding siblings ...)
  2023-05-17 23:15 ` pinskia at gcc dot gnu.org
@ 2023-05-17 23:15 ` pinskia at gcc dot gnu.org
  2024-01-14 11:57 ` roger at nextmovesoftware dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17 23:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106060] Inefficient constant broadcast on x86_64
  2022-06-23  1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
                   ` (3 preceding siblings ...)
  2023-05-17 23:15 ` pinskia at gcc dot gnu.org
@ 2024-01-14 11:57 ` roger at nextmovesoftware dot com
  2024-02-16  9:41 ` roger at nextmovesoftware dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-01-14 11:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |roger at nextmovesoftware dot com
             Status|NEW                         |ASSIGNED

--- Comment #4 from Roger Sayle <roger at nextmovesoftware dot com> ---
I have a patch for better materialization of vector constants (including
cmpeq+abs  and cmpeq+abs), but now that we've transitioned for stage 3 (bug
fixing) to stage 4 (regression fixing), this will have to wait for GCC 15's
stage 1.  I'm happy to post the patch here or to gcc-patches, if anyone would
like to pre-review it and/or benchmark the proposed changes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106060] Inefficient constant broadcast on x86_64
  2022-06-23  1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
                   ` (4 preceding siblings ...)
  2024-01-14 11:57 ` roger at nextmovesoftware dot com
@ 2024-02-16  9:41 ` roger at nextmovesoftware dot com
  2024-05-07  6:19 ` cvs-commit at gcc dot gnu.org
  2024-05-12  9:13 ` roger at nextmovesoftware dot com
  7 siblings, 0 replies; 9+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-02-16  9:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |15.0

--- Comment #5 from Roger Sayle <roger at nextmovesoftware dot com> ---
For the record (so it doesn't get lost) the final patch was posted at
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643973.html
and approved (for stage 1) at
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643996.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106060] Inefficient constant broadcast on x86_64
  2022-06-23  1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
                   ` (5 preceding siblings ...)
  2024-02-16  9:41 ` roger at nextmovesoftware dot com
@ 2024-05-07  6:19 ` cvs-commit at gcc dot gnu.org
  2024-05-12  9:13 ` roger at nextmovesoftware dot com
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-07  6:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:79649a5dcd81bc05c0ba591068c9075de43bd417

commit r15-222-g79649a5dcd81bc05c0ba591068c9075de43bd417
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Tue May 7 07:14:40 2024 +0100

    PR target/106060: Improved SSE vector constant materialization on x86.

    This patch resolves PR target/106060 by providing efficient methods for
    materializing/synthesizing special "vector" constants on x86.  Currently
    there are three methods of materializing a vector constant; the most
    general is to load a vector from the constant pool, secondly "duplicated"
    constants can be synthesized by moving an integer between units and
    broadcasting (of shuffling it), and finally the special cases of the
    all-zeros vector and all-ones vectors can be loaded via a single SSE
    instruction.   This patch handle additional cases that can be synthesized
    in two instructions, loading an all-ones vector followed by another SSE
    instruction.  Following my recent patch for PR target/112992, there's
    conveniently a single place in i386-expand.cc where these special cases
    can be handled.

    Two examples are given in the original bugzilla PR for 106060.

    __m256i should_be_cmpeq_abs ()
    {
      return _mm256_set1_epi8 (1);
    }

    is now generated (with -O3 -march=x86-64-v3) as:

            vpcmpeqd        %ymm0, %ymm0, %ymm0
            vpabsb  %ymm0, %ymm0
            ret

    and

    __m256i should_be_cmpeq_add ()
    {
      return _mm256_set1_epi8 (-2);
    }

    is now generated as:

            vpcmpeqd        %ymm0, %ymm0, %ymm0
            vpaddb  %ymm0, %ymm0, %ymm0
            ret

    2024-05-07  Roger Sayle  <roger@nextmovesoftware.com>
                Hongtao Liu  <hongtao.liu@intel.com>

    gcc/ChangeLog
            PR target/106060
            * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New.
            (struct ix86_vec_bcast_map_simode_t): New type for table below.
            (ix86_vec_bcast_map_simode): Table of SImode constants that may
            be efficiently synthesized by a ix86_vec_bcast_alg method.
            (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch.
            (ix86_vector_duplicate_simode_const): Efficiently synthesize
            V4SImode and V8SImode constants that duplicate special constants.
            (ix86_vector_duplicate_value): Attempt to synthesize "special"
            vector constants using ix86_vector_duplicate_simode_const.
            * config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a
            vector integer mode costs with a single SSE instruction.

    gcc/testsuite/ChangeLog
            PR target/106060
            * gcc.target/i386/auto-init-8.c: Update test case.
            * gcc.target/i386/avx512fp16-13.c: Likewise.
            * gcc.target/i386/pr100865-9a.c: Likewise.
            * gcc.target/i386/pr101796-1.c: Likewise.
            * gcc.target/i386/pr106060-1.c: New test case.
            * gcc.target/i386/pr106060-2.c: Likewise.
            * gcc.target/i386/pr106060-3.c: Likewise.
            * gcc.target/i386/pr70314.c: Update test case.
            * gcc.target/i386/vect-shiftv4qi.c: Likewise.
            * gcc.target/i386/vect-shiftv8qi.c: Likewise.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug target/106060] Inefficient constant broadcast on x86_64
  2022-06-23  1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
                   ` (6 preceding siblings ...)
  2024-05-07  6:19 ` cvs-commit at gcc dot gnu.org
@ 2024-05-12  9:13 ` roger at nextmovesoftware dot com
  7 siblings, 0 replies; 9+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-05-12  9:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
      Known to work|                            |15.0
             Status|ASSIGNED                    |RESOLVED

--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> ---
This has now been fixed on mainline for GCC 15.  There are still improvements
that can be made to vector constant materialization/initialization on x86_64,
but the issues/ideas described in this bugzilla PR are all now implemented. 
Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-05-12  9:13 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-23  1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
2022-06-23  7:05 ` [Bug target/106060] " crazylht at gmail dot com
2022-06-23 15:46 ` hjl.tools at gmail dot com
2023-05-17 23:15 ` pinskia at gcc dot gnu.org
2023-05-17 23:15 ` pinskia at gcc dot gnu.org
2024-01-14 11:57 ` roger at nextmovesoftware dot com
2024-02-16  9:41 ` roger at nextmovesoftware dot com
2024-05-07  6:19 ` cvs-commit at gcc dot gnu.org
2024-05-12  9:13 ` roger at nextmovesoftware dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).