public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog
@ 2024-05-10 3:57 liuhongt at gcc dot gnu.org
2024-05-10 12:35 ` [Bug rtl-optimization/115021] " rguenth at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-05-10 3:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021
Bug ID: 115021
Summary: [14/15 regression] unnecessary spill for vpternlog
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: liuhongt at gcc dot gnu.org
Target Milestone: ---
typedef signed char v16qi __attribute__ ((__vector_size__ (16)));
v16qi foo (v16qi x) { return x >> 5; }
with -march=x86-64-v4 -O2, GCC 13.2 generates
foo(signed char __vector(16)):
mov eax, 4
vpsraw xmm2, xmm0, 5
vpbroadcastb xmm1, eax
mov eax, 7
vpbroadcastb xmm3, eax
vmovdqa xmm0, xmm1
vpternlogd xmm0, xmm2, xmm3, 120
vpsubb xmm0, xmm0, xmm1
ret
GCC 14.1 generates
foo(signed char __vector(16)):
mov eax, 67372036
vpsraw xmm2, xmm0, 5
vpbroadcastd xmm1, eax
mov eax, 117901063
vpbroadcastd xmm3, eax
vmovdqa xmm0, xmm1
vmovdqa XMMWORD PTR [rsp-24], xmm3
vpternlogd xmm0, xmm2, XMMWORD PTR [rsp-24], 120
vpsubb xmm0, xmm0, xmm1
ret
There's extra spill.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
2024-05-10 3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
@ 2024-05-10 12:35 ` rguenth at gcc dot gnu.org
2024-05-10 15:55 ` roger at nextmovesoftware dot com
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-10 12:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |14.2
Priority|P3 |P2
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
2024-05-10 3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
2024-05-10 12:35 ` [Bug rtl-optimization/115021] " rguenth at gcc dot gnu.org
@ 2024-05-10 15:55 ` roger at nextmovesoftware dot com
2024-05-10 17:01 ` roger at nextmovesoftware dot com
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-05-10 15:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |roger at nextmovesoftware dot com
Last reconfirmed| |2024-05-10
CC| |roger at nextmovesoftware dot com
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
--- Comment #1 from Roger Sayle <roger at nextmovesoftware dot com> ---
I have a patch for x86 ternlog handling that changes the output for this
testcase (without the pending change to optimize V8QI shifts) to:
foo: movl $67372036, %eax
vpsraw $5, %xmm0, %xmm0
vpbroadcastd %eax, %xmm1
vpternlogd $108, .LC0(%rip), %xmm1, %xmm0
vpsubb %xmm1, %xmm0, %xmm0
ret
.align 16
.LC0:
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
.byte 7
which at least doesn't construct the vector with a broadcast, and then "spill"
it to the stack before reading it back from memory. I've no idea if this is
optimal, but it's certainly better than the current "spill".
I'm curious about what has changed to make this code (register allocation)
regress since GCC 13. It was a patch of mine that changed broadcastb to
broadcastd, but that shouldn't have affected reload/register preferencing.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
2024-05-10 3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
2024-05-10 12:35 ` [Bug rtl-optimization/115021] " rguenth at gcc dot gnu.org
2024-05-10 15:55 ` roger at nextmovesoftware dot com
@ 2024-05-10 17:01 ` roger at nextmovesoftware dot com
2024-05-20 7:42 ` lin1.hu at intel dot com
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: roger at nextmovesoftware dot com @ 2024-05-10 17:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021
--- Comment #2 from Roger Sayle <roger at nextmovesoftware dot com> ---
Here's a reduced test case that should be unaffected by the pending changes to
how V8QI shifts are expanded. Note that the final "t -= t4" is required to
convince the register allocator to "spill".
typedef signed char v16qi __attribute__ ((__vector_size__ (16)));
// sign-extend low 3 bits to a byte.
v16qi foo (v16qi x) {
v16qi t7 = (v16qi){7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7};
v16qi t4 = (v16qi){4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4};
v16qi t = x & t7;
t ^= t4;
t -= t4;
return t;
}
which produces:
foo: movl $67372036, %eax
vmovdqa %xmm0, %xmm2
vpbroadcastd %eax, %xmm1
movl $117901063, %eax
vpbroadcastd %eax, %xmm3
vmovdqa %xmm1, %xmm0
vmovdqa %xmm3, -24(%rsp)
vmovdqa -24(%rsp), %xmm4
vpternlogd $120, %xmm2, %xmm4, %xmm0
vpsubb %xmm1, %xmm0, %xmm0
ret
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
2024-05-10 3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
` (2 preceding siblings ...)
2024-05-10 17:01 ` roger at nextmovesoftware dot com
@ 2024-05-20 7:42 ` lin1.hu at intel dot com
2024-05-20 7:49 ` liuhongt at gcc dot gnu.org
2024-06-13 14:14 ` liuhongt at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: lin1.hu at intel dot com @ 2024-05-20 7:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021
Hu Lin <lin1.hu at intel dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lin1.hu at intel dot com
--- Comment #3 from Hu Lin <lin1.hu at intel dot com> ---
I found compiler allocates mem to the third source register of vpternlog in IRA
after commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e. And it cause the
generate code will be
8 .cfi_startproc
9 movl $4, %eax
10 vpsraw $5, %xmm0, %xmm2
11 vpbroadcastb %eax, %xmm1
12 movl $7, %eax
13 vpbroadcastb %eax, %xmm3
14 vmovdqa %xmm1, %xmm0
15 vpternlogd $120, %xmm3, %xmm2, %xmm0
16 vmovdqa %xmm3, -24(%rsp)
17 vpsubb %xmm1, %xmm0, %xmm0
18 ret
And 6a67fdcb3f0cc8be47b49ddd246d0c50c3770800 changes the vector type from v16qi
to v4si, leading to movv4si can't combine with the vpternlog in postreload, so
the result is what you see now.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
2024-05-10 3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
` (3 preceding siblings ...)
2024-05-20 7:42 ` lin1.hu at intel dot com
@ 2024-05-20 7:49 ` liuhongt at gcc dot gnu.org
2024-06-13 14:14 ` liuhongt at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-05-20 7:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021
--- Comment #4 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
(In reply to Hu Lin from comment #3)
> I found compiler allocates mem to the third source register of vpternlog in
> IRA after commit f55cdce3f8dd8503e080e35be59c5f5390f6d95e. And it cause the
> generate code will be
>
> 8 .cfi_startproc
> 9 movl $4, %eax
> 10 vpsraw $5, %xmm0, %xmm2
> 11 vpbroadcastb %eax, %xmm1
> 12 movl $7, %eax
> 13 vpbroadcastb %eax, %xmm3
> 14 vmovdqa %xmm1, %xmm0
> 15 vpternlogd $120, %xmm3, %xmm2, %xmm0
> 16 vmovdqa %xmm3, -24(%rsp)
> 17 vpsubb %xmm1, %xmm0, %xmm0
> 18 ret
>
> And 6a67fdcb3f0cc8be47b49ddd246d0c50c3770800 changes the vector type from
> v16qi to v4si, leading to movv4si can't combine with the vpternlog in
> postreload, so the result is what you see now.
To clarify: The extra spill is caused by r14-4944-gf55cdce3f8dd85,
r14-7026-g6a67fdcb3f0cc8 only causes an extra mov instruction(which is not a
big deal).
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug rtl-optimization/115021] [14/15 regression] unnecessary spill for vpternlog
2024-05-10 3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
` (4 preceding siblings ...)
2024-05-20 7:49 ` liuhongt at gcc dot gnu.org
@ 2024-06-13 14:14 ` liuhongt at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: liuhongt at gcc dot gnu.org @ 2024-06-13 14:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115021
--- Comment #5 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
It's fixed by r15-1100-gec985bc97a0157
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-06-13 14:14 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-10 3:57 [Bug rtl-optimization/115021] New: [14/15 regression] unnecessary spill for vpternlog liuhongt at gcc dot gnu.org
2024-05-10 12:35 ` [Bug rtl-optimization/115021] " rguenth at gcc dot gnu.org
2024-05-10 15:55 ` roger at nextmovesoftware dot com
2024-05-10 17:01 ` roger at nextmovesoftware dot com
2024-05-20 7:42 ` lin1.hu at intel dot com
2024-05-20 7:49 ` liuhongt at gcc dot gnu.org
2024-06-13 14:14 ` liuhongt at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).