public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/109771] New: Unnecessary pblendw for vectorized or
@ 2023-05-08 13:58 chfast at gmail dot com
2023-05-08 16:01 ` [Bug rtl-optimization/109771] " pinskia at gcc dot gnu.org
2023-05-08 16:03 ` [Bug tree-optimization/109771] " pinskia at gcc dot gnu.org
0 siblings, 2 replies; 3+ messages in thread
From: chfast at gmail dot com @ 2023-05-08 13:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109771
Bug ID: 109771
Summary: Unnecessary pblendw for vectorized or
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: chfast at gmail dot com
Target Milestone: ---
I have an example of vectorization of 4x64-bit struct (representation of
256-bit integer). The implementation just uses for loop of count 4.
This is vectorized in isolation however when combined with some non-trivial
control-flow and additional wrapping functions the final assembly contains
weird pblendw instructions.
pblendw xmm1, xmm3, 240 (GCC 13, x86-64-v2)
movlpd xmm1, QWORD PTR [rdi+16] (GCC 13, x86-64-v1)
shufpd xmm1, xmm3, 2 (GCC 12)
I believe this is some kind of regression in GCC 13 because I have a bigger
context where GCC 12 was optimizing it "correctly". However, I lost this
information during test reduction.
https://godbolt.org/z/jzK44h3js
cpp:
struct u256 {
unsigned long w[4];
};
inline u256 or_(u256 x, u256 y) {
u256 z;
for (int i = 0; i < 4; ++i)
z.w[i] = x.w[i] | y.w[i];
return z;
}
inline void or_to(u256& z, u256 y) { z = or_(z, y); }
void op_or(u256* t) { or_to(t[1], t[0]); }
void test(u256* t) {
void* tbl[]{&&CLOBBER, &&OR};
CLOBBER:
goto * 0;
OR:
op_or(t);
goto * 0;
}
x86-64-v2 asm:
test(u256*):
xorl %eax, %eax
jmp *%rax
movdqu 32(%rdi), %xmm3
movdqu (%rdi), %xmm1
movdqu 16(%rdi), %xmm2
movdqu 48(%rdi), %xmm0
por %xmm3, %xmm1
movups %xmm1, 32(%rdi)
movdqa %xmm2, %xmm1
pblendw $240, %xmm0, %xmm1
pblendw $240, %xmm2, %xmm0
por %xmm1, %xmm0
movups %xmm0, 48(%rdi)
jmp *%rax
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug rtl-optimization/109771] Unnecessary pblendw for vectorized or
2023-05-08 13:58 [Bug rtl-optimization/109771] New: Unnecessary pblendw for vectorized or chfast at gmail dot com
@ 2023-05-08 16:01 ` pinskia at gcc dot gnu.org
2023-05-08 16:03 ` [Bug tree-optimization/109771] " pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-08 16:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109771
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
-march=x86-64-v2
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug tree-optimization/109771] Unnecessary pblendw for vectorized or
2023-05-08 13:58 [Bug rtl-optimization/109771] New: Unnecessary pblendw for vectorized or chfast at gmail dot com
2023-05-08 16:01 ` [Bug rtl-optimization/109771] " pinskia at gcc dot gnu.org
@ 2023-05-08 16:03 ` pinskia at gcc dot gnu.org
1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-08 16:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109771
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|rtl-optimization |tree-optimization
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
vect_y_w_0_5.51_14 = MEM <vector(2) long unsigned int> [(void *)t_4(D)];
vect_y_w_0_5.52_10 = MEM <vector(2) long unsigned int> [(void *)t_4(D) +
16B];
vect_y_w_0_5.54_24 = MEM <vector(2) long unsigned int> [(void *)t_4(D) +
48B];
vect_y_w_0_5.55_6 = VEC_PERM_EXPR <vect_y_w_0_5.52_10, vect_y_w_0_5.54_24, {
0, 3 }>;
vect_x_w_0_36.60_44 = MEM <vector(2) long unsigned int> [(void *)t_4(D) +
32B];
vect_y_w_0_5.62_21 = VEC_PERM_EXPR <vect_y_w_0_5.54_24, vect_y_w_0_5.52_10, {
0, 3 }>;
vect__25.63_20 = vect_y_w_0_5.51_14 | vect_x_w_0_36.60_44;
vect__25.63_19 = vect_y_w_0_5.55_6 | vect_y_w_0_5.62_21;
MEM <vector(2) long unsigned int> [(long unsigned int *)t_4(D) + 32B] =
vect__25.63_20;
MEM <vector(2) long unsigned int> [(long unsigned int *)t_4(D) + 48B] =
vect__25.63_19;
vs
vect_y_w_0_10.25_44 = MEM <vector(2) long unsigned int> [(long unsigned int
*)t_2(D)];
vect_y_w_0_10.26_46 = MEM <vector(2) long unsigned int> [(long unsigned int
*)t_2(D) + 16B];
x = MEM[(const struct u256 &)t_2(D) + 32];
vect__6.14_9 = MEM <vector(2) long unsigned int> [(long unsigned int *)&x];
vect__8.18_14 = vect__6.14_9 | vect_y_w_0_10.25_44;
MEM <vector(2) long unsigned int> [(long unsigned int *)&z] = vect__8.18_14;
vect__6.14_11 = MEM <vector(2) long unsigned int> [(long unsigned int *)&x +
16B];
vect__8.18_36 = vect__6.14_11 | vect_y_w_0_10.26_46;
MEM <vector(2) long unsigned int> [(long unsigned int *)&z + 16B] =
vect__8.18_36;
x ={v} {CLOBBER(eol)};
MEM[(struct u256 *)t_2(D) + 32B] = z;
z ={v} {CLOBBER(eol)};
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-05-08 16:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-08 13:58 [Bug rtl-optimization/109771] New: Unnecessary pblendw for vectorized or chfast at gmail dot com
2023-05-08 16:01 ` [Bug rtl-optimization/109771] " pinskia at gcc dot gnu.org
2023-05-08 16:03 ` [Bug tree-optimization/109771] " pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).