public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/106060] New: Inefficient constant broadcast on x86_64
@ 2022-06-23 1:59 goldstein.w.n at gmail dot com
2022-06-23 7:05 ` [Bug target/106060] " crazylht at gmail dot com
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: goldstein.w.n at gmail dot com @ 2022-06-23 1:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060
Bug ID: 106060
Summary: Inefficient constant broadcast on x86_64
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: goldstein.w.n at gmail dot com
Target Milestone: ---
```
#include <immintrin.h>
__m256i
shouldnt_have_movabs ()
{
return _mm256_set1_epi8 (123);
}
__m256i
should_be_cmpeq_abs ()
{
return _mm256_set1_epi8 (1);
}
__m256i
should_be_cmpeq_add ()
{
return _mm256_set1_epi8 (-2);
}
```
Compiled with: '-O3 -march=x86-64-v3'
Results in:
```
Disassembly of section .text:
0000000000000000 <shouldnt_have_movabs>:
0: 48 b8 7b 7b 7b 7b 7b movabs $0x7b7b7b7b7b7b7b7b,%rax
7: 7b 7b 7b
a: c4 e1 f9 6e c8 vmovq %rax,%xmm1
f: c4 e2 7d 59 c1 vpbroadcastq %xmm1,%ymm0
14: c3 retq
15: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
1c: 00 00 00 00
0000000000000020 <should_be_cmpeq_abs>:
20: 48 b8 01 01 01 01 01 movabs $0x101010101010101,%rax
27: 01 01 01
2a: c4 e1 f9 6e c8 vmovq %rax,%xmm1
2f: c4 e2 7d 59 c1 vpbroadcastq %xmm1,%ymm0
34: c3 retq
35: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
3c: 00 00 00 00
0000000000000040 <should_be_cmpeq_add>:
40: 48 b8 fe fe fe fe fe movabs $0xfefefefefefefefe,%rax
47: fe fe fe
4a: c4 e1 f9 6e c8 vmovq %rax,%xmm1
4f: c4 e2 7d 59 c1 vpbroadcastq %xmm1,%ymm0
54: c3 retq
```
Compiled with: '-O3 -march=x86-64-v4'
Results in:
```
0000000000000000 <shouldnt_have_movabs>:
0: 48 b8 7b 7b 7b 7b 7b movabs $0x7b7b7b7b7b7b7b7b,%rax
7: 7b 7b 7b
a: 62 f2 fd 28 7c c0 vpbroadcastq %rax,%ymm0
10: c3 retq
11: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
18: 00 00 00 00
1c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000000020 <should_be_cmpeq_abs>:
20: 48 b8 01 01 01 01 01 movabs $0x101010101010101,%rax
27: 01 01 01
2a: 62 f2 fd 28 7c c0 vpbroadcastq %rax,%ymm0
30: c3 retq
31: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
38: 00 00 00 00
3c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000000040 <should_be_cmpeq_add>:
40: 48 b8 fe fe fe fe fe movabs $0xfefefefefefefefe,%rax
47: fe fe fe
4a: 62 f2 fd 28 7c c0 vpbroadcastq %rax,%ymm0
50: c3 retq
```
All functions / targets are inoptimal.
Generating 1/2 can be done without any lane-cross broadcast.
Generating constants like 123 shouldn't first be constant broadcast
into an imm64. That makes it require an 10-byte `movabs` and wastes
spaces.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/106060] Inefficient constant broadcast on x86_64
2022-06-23 1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
@ 2022-06-23 7:05 ` crazylht at gmail dot com
2022-06-23 15:46 ` hjl.tools at gmail dot com
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: crazylht at gmail dot com @ 2022-06-23 7:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
I remember it's on purpose by r12-1958-gedafb35bdadf30, related PR100865.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/106060] Inefficient constant broadcast on x86_64
2022-06-23 1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
2022-06-23 7:05 ` [Bug target/106060] " crazylht at gmail dot com
@ 2022-06-23 15:46 ` hjl.tools at gmail dot com
2023-05-17 23:15 ` pinskia at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: hjl.tools at gmail dot com @ 2022-06-23 15:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060
--- Comment #2 from H.J. Lu <hjl.tools at gmail dot com> ---
Created attachment 53196
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53196&action=edit
A patch
This generates:
0000000000000000 <shouldnt_have_movabs>:
0: b8 7b 00 00 00 mov $0x7b,%eax
5: c5 f9 6e c0 vmovd %eax,%xmm0
9: c4 e2 7d 78 c0 vpbroadcastb %xmm0,%ymm0
e: c3 ret
f: 90 nop
0000000000000010 <should_be_cmpeq_abs>:
10: b8 01 00 00 00 mov $0x1,%eax
15: c5 f9 6e c0 vmovd %eax,%xmm0
19: c4 e2 7d 78 c0 vpbroadcastb %xmm0,%ymm0
1e: c3 ret
1f: 90 nop
0000000000000020 <should_be_cmpeq_add>:
20: b8 fe ff ff ff mov $0xfffffffe,%eax
25: c5 f9 6e c0 vmovd %eax,%xmm0
29: c4 e2 7d 78 c0 vpbroadcastb %xmm0,%ymm0
2e: c3 ret
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/106060] Inefficient constant broadcast on x86_64
2022-06-23 1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
2022-06-23 7:05 ` [Bug target/106060] " crazylht at gmail dot com
2022-06-23 15:46 ` hjl.tools at gmail dot com
@ 2023-05-17 23:15 ` pinskia at gcc dot gnu.org
2023-05-17 23:15 ` pinskia at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17 23:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2023-05-17
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Confirmed, I think HJL's patch definitely improves things. Though I wonder if
they could be improved further.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/106060] Inefficient constant broadcast on x86_64
2022-06-23 1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
` (2 preceding siblings ...)
2023-05-17 23:15 ` pinskia at gcc dot gnu.org
@ 2023-05-17 23:15 ` pinskia at gcc dot gnu.org
2024-01-14 11:57 ` roger at nextmovesoftware dot com
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17 23:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/106060] Inefficient constant broadcast on x86_64
2022-06-23 1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
` (3 preceding siblings ...)
2023-05-17 23:15 ` pinskia at gcc dot gnu.org
@ 2024-01-14 11:57 ` roger at nextmovesoftware dot com
2024-02-16 9:41 ` roger at nextmovesoftware dot com
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-01-14 11:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com
Status|NEW |ASSIGNED
--- Comment #4 from Roger Sayle <roger at nextmovesoftware dot com> ---
I have a patch for better materialization of vector constants (including
cmpeq+abs and cmpeq+abs), but now that we've transitioned for stage 3 (bug
fixing) to stage 4 (regression fixing), this will have to wait for GCC 15's
stage 1. I'm happy to post the patch here or to gcc-patches, if anyone would
like to pre-review it and/or benchmark the proposed changes.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/106060] Inefficient constant broadcast on x86_64
2022-06-23 1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
` (4 preceding siblings ...)
2024-01-14 11:57 ` roger at nextmovesoftware dot com
@ 2024-02-16 9:41 ` roger at nextmovesoftware dot com
2024-05-07 6:19 ` cvs-commit at gcc dot gnu.org
2024-05-12 9:13 ` roger at nextmovesoftware dot com
7 siblings, 0 replies; 9+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-02-16 9:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |15.0
--- Comment #5 from Roger Sayle <roger at nextmovesoftware dot com> ---
For the record (so it doesn't get lost) the final patch was posted at
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643973.html
and approved (for stage 1) at
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643996.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/106060] Inefficient constant broadcast on x86_64
2022-06-23 1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
` (5 preceding siblings ...)
2024-02-16 9:41 ` roger at nextmovesoftware dot com
@ 2024-05-07 6:19 ` cvs-commit at gcc dot gnu.org
2024-05-12 9:13 ` roger at nextmovesoftware dot com
7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-07 6:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:
https://gcc.gnu.org/g:79649a5dcd81bc05c0ba591068c9075de43bd417
commit r15-222-g79649a5dcd81bc05c0ba591068c9075de43bd417
Author: Roger Sayle <roger@nextmovesoftware.com>
Date: Tue May 7 07:14:40 2024 +0100
PR target/106060: Improved SSE vector constant materialization on x86.
This patch resolves PR target/106060 by providing efficient methods for
materializing/synthesizing special "vector" constants on x86. Currently
there are three methods of materializing a vector constant; the most
general is to load a vector from the constant pool, secondly "duplicated"
constants can be synthesized by moving an integer between units and
broadcasting (of shuffling it), and finally the special cases of the
all-zeros vector and all-ones vectors can be loaded via a single SSE
instruction. This patch handle additional cases that can be synthesized
in two instructions, loading an all-ones vector followed by another SSE
instruction. Following my recent patch for PR target/112992, there's
conveniently a single place in i386-expand.cc where these special cases
can be handled.
Two examples are given in the original bugzilla PR for 106060.
__m256i should_be_cmpeq_abs ()
{
return _mm256_set1_epi8 (1);
}
is now generated (with -O3 -march=x86-64-v3) as:
vpcmpeqd %ymm0, %ymm0, %ymm0
vpabsb %ymm0, %ymm0
ret
and
__m256i should_be_cmpeq_add ()
{
return _mm256_set1_epi8 (-2);
}
is now generated as:
vpcmpeqd %ymm0, %ymm0, %ymm0
vpaddb %ymm0, %ymm0, %ymm0
ret
2024-05-07 Roger Sayle <roger@nextmovesoftware.com>
Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog
PR target/106060
* config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New.
(struct ix86_vec_bcast_map_simode_t): New type for table below.
(ix86_vec_bcast_map_simode): Table of SImode constants that may
be efficiently synthesized by a ix86_vec_bcast_alg method.
(ix86_vec_bcast_map_simode_cmp): New comparator for bsearch.
(ix86_vector_duplicate_simode_const): Efficiently synthesize
V4SImode and V8SImode constants that duplicate special constants.
(ix86_vector_duplicate_value): Attempt to synthesize "special"
vector constants using ix86_vector_duplicate_simode_const.
* config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a
vector integer mode costs with a single SSE instruction.
gcc/testsuite/ChangeLog
PR target/106060
* gcc.target/i386/auto-init-8.c: Update test case.
* gcc.target/i386/avx512fp16-13.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr101796-1.c: Likewise.
* gcc.target/i386/pr106060-1.c: New test case.
* gcc.target/i386/pr106060-2.c: Likewise.
* gcc.target/i386/pr106060-3.c: Likewise.
* gcc.target/i386/pr70314.c: Update test case.
* gcc.target/i386/vect-shiftv4qi.c: Likewise.
* gcc.target/i386/vect-shiftv8qi.c: Likewise.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug target/106060] Inefficient constant broadcast on x86_64
2022-06-23 1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
` (6 preceding siblings ...)
2024-05-07 6:19 ` cvs-commit at gcc dot gnu.org
@ 2024-05-12 9:13 ` roger at nextmovesoftware dot com
7 siblings, 0 replies; 9+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-05-12 9:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106060
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Known to work| |15.0
Status|ASSIGNED |RESOLVED
--- Comment #7 from Roger Sayle <roger at nextmovesoftware dot com> ---
This has now been fixed on mainline for GCC 15. There are still improvements
that can be made to vector constant materialization/initialization on x86_64,
but the issues/ideas described in this bugzilla PR are all now implemented.
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-05-12 9:13 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-23 1:59 [Bug target/106060] New: Inefficient constant broadcast on x86_64 goldstein.w.n at gmail dot com
2022-06-23 7:05 ` [Bug target/106060] " crazylht at gmail dot com
2022-06-23 15:46 ` hjl.tools at gmail dot com
2023-05-17 23:15 ` pinskia at gcc dot gnu.org
2023-05-17 23:15 ` pinskia at gcc dot gnu.org
2024-01-14 11:57 ` roger at nextmovesoftware dot com
2024-02-16 9:41 ` roger at nextmovesoftware dot com
2024-05-07 6:19 ` cvs-commit at gcc dot gnu.org
2024-05-12 9:13 ` roger at nextmovesoftware dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).