public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug ipa/95790] New: Incorrect static target dispatch
@ 2020-06-20 19:08 yyc1992 at gmail dot com
2020-06-20 19:22 ` [Bug ipa/95790] " hjl.tools at gmail dot com
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: yyc1992 at gmail dot com @ 2020-06-20 19:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790
Bug ID: 95790
Summary: Incorrect static target dispatch
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: ipa
Assignee: unassigned at gcc dot gnu.org
Reporter: yyc1992 at gmail dot com
CC: marxin at gcc dot gnu.org
Target Milestone: ---
The indirection elimination code currently only check for match of the target
for the specific version but doesn't check if all the targets are matching.
Modifying from
https://github.com/gcc-mirror/gcc/commit/b8ce8129a560f64f8b2855c4a3812b7c3c0ebf3f#diff-e2d535917af8555baad2e9c8749e96a5
```
__attribute__ ((target ("default")))
static unsigned foo(const char *buf, unsigned size) {
return 1;
}
__attribute__ ((target ("avx")))
static unsigned foo(const char *buf, unsigned size) {
return 2;
}
__attribute__ ((target ("avx512f")))
static unsigned foo(const char *buf, unsigned size) {
return 3;
}
__attribute__ ((target ("default")))
unsigned bar() {
char buf[4096];
unsigned acc = 0;
for (int i = 0; i < sizeof(buf); i++) {
acc += foo(&buf[i], 1);
}
return acc;
}
__attribute__ ((target ("avx")))
unsigned bar() {
char buf[4096];
unsigned acc = 0;
for (int i = 0; i < sizeof(buf); i++) {
acc += foo(&buf[i], 1);
}
return acc;
}
```
With the optimization disabled, which is possible by adding a flatten attribute
to the functions and triggering PR95780 and PR95778, a resolver function is
automatically generated for foo like
```
.text
.LHOTB0:
.p2align 4
.type _ZL3fooPKcj.resolver, @function
_ZL3fooPKcj.resolver:
subq $8, %rsp
call __cpu_indicator_init@PLT
movq __cpu_model@GOTPCREL(%rip), %rax
movl 12(%rax), %eax
testb $-128, %ah
je .L8
leaq _ZL3fooPKcj.avx512f(%rip), %rax
.L7:
addq $8, %rsp
ret
.section .text.unlikely
.type _ZL3fooPKcj.resolver.cold, @function
_ZL3fooPKcj.resolver.cold:
.L8:
testb $2, %ah
leaq _ZL3fooPKcj.avx(%rip), %rdx
leaq _ZL3fooPKcj(%rip), %rax
cmovne %rdx, %rax
jmp .L7
.text
.size _ZL3fooPKcj.resolver, .-_ZL3fooPKcj.resolver
.section .text.unlikely
.size _ZL3fooPKcj.resolver.cold, .-_ZL3fooPKcj.resolver.cold
.LCOLDE0:
.text
.LHOTE0:
.type _Z11_ZL3fooPKcjPKcj, @gnu_indirect_function
.set _Z11_ZL3fooPKcjPKcj,_ZL3fooPKcj.resolver
```
and the calls from bar goes through the PLT. This is the correct behavior
(albeit sub-optimal since the default could call the default directly) and
allows avx512f version of foo to be called on the correct processor from the
avx version of bar.
With the optimization enabled, however, the call of foo's are inlined to bar
and the avx512f version is never used.
This is somewhat a regression caused by
b8ce8129a560f64f8b2855c4a3812b7c3c0ebf3f.
It'll also affect my fix for PR95780 and PR95778.
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548631.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch
2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com
@ 2020-06-20 19:22 ` hjl.tools at gmail dot com
2020-06-20 19:25 ` yyc1992 at gmail dot com
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: hjl.tools at gmail dot com @ 2020-06-20 19:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790
H.J. Lu <hjl.tools at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|UNCONFIRMED |WAITING
Last reconfirmed| |2020-06-20
--- Comment #1 from H.J. Lu <hjl.tools at gmail dot com> ---
Please show exactly how to reproduce it.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch
2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com
2020-06-20 19:22 ` [Bug ipa/95790] " hjl.tools at gmail dot com
@ 2020-06-20 19:25 ` yyc1992 at gmail dot com
2020-06-20 19:26 ` yyc1992 at gmail dot com
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: yyc1992 at gmail dot com @ 2020-06-20 19:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790
--- Comment #2 from Yichao Yu <yyc1992 at gmail dot com> ---
The C++ code attached above produces the following incorrect code with `g++ -O2
-S`
.file "a.c"
.text
.p2align 4
.globl _Z3barv
.type _Z3barv, @function
_Z3barv:
.LFB3:
.cfi_startproc
movl $4096, %eax
ret
.cfi_endproc
.LFE3:
.size _Z3barv, .-_Z3barv
.p2align 4
.globl _Z3barv.avx
.type _Z3barv.avx, @function
_Z3barv.avx:
.LFB4:
.cfi_startproc
movl $8192, %eax
ret
.cfi_endproc
.LFE4:
.size _Z3barv.avx, .-_Z3barv.avx
.ident "GCC: (GNU) 10.1.0"
.section .note.GNU-stack,"",@progbits
Triggering the bug PR95778 with
__attribute__ ((flatten,target ("default")))
static unsigned foo(const char *buf, unsigned size) {
return 1;
}
__attribute__ ((flatten,target ("avx")))
static unsigned foo(const char *buf, unsigned size) {
return 2;
}
__attribute__ ((flatten,target ("avx512f")))
static unsigned foo(const char *buf, unsigned size) {
return 3;
}
__attribute__ ((target ("default")))
unsigned bar() {
char buf[4096];
unsigned acc = 0;
for (int i = 0; i < sizeof(buf); i++) {
acc += foo(&buf[i], 1);
}
return acc;
}
__attribute__ ((target ("avx")))
unsigned bar() {
char buf[4096];
unsigned acc = 0;
for (int i = 0; i < sizeof(buf); i++) {
acc += foo(&buf[i], 1);
}
return acc;
}
produces the correct code.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch
2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com
2020-06-20 19:22 ` [Bug ipa/95790] " hjl.tools at gmail dot com
2020-06-20 19:25 ` yyc1992 at gmail dot com
@ 2020-06-20 19:26 ` yyc1992 at gmail dot com
2020-06-20 20:33 ` hjl.tools at gmail dot com
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: yyc1992 at gmail dot com @ 2020-06-20 19:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790
--- Comment #3 from Yichao Yu <yyc1992 at gmail dot com> ---
And the assembly showing the correct dispatch is
.file "a.c"
.text
.p2align 4
.type _ZL3fooPKcj, @function
_ZL3fooPKcj:
.LFB0:
.cfi_startproc
movl $1, %eax
ret
.cfi_endproc
.LFE0:
.size _ZL3fooPKcj, .-_ZL3fooPKcj
.p2align 4
.type _ZL3fooPKcj.avx, @function
_ZL3fooPKcj.avx:
.LFB1:
.cfi_startproc
movl $2, %eax
ret
.cfi_endproc
.LFE1:
.size _ZL3fooPKcj.avx, .-_ZL3fooPKcj.avx
.p2align 4
.type _ZL3fooPKcj.avx512f, @function
_ZL3fooPKcj.avx512f:
.LFB2:
.cfi_startproc
movl $3, %eax
ret
.cfi_endproc
.LFE2:
.size _ZL3fooPKcj.avx512f, .-_ZL3fooPKcj.avx512f
.section .text.unlikely,"ax",@progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4
.type _ZL3fooPKcj.resolver, @function
_ZL3fooPKcj.resolver:
.LFB6:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
call __cpu_indicator_init@PLT
movq __cpu_model@GOTPCREL(%rip), %rax
movl 12(%rax), %eax
testb $-128, %ah
je .L8
leaq _ZL3fooPKcj.avx512f(%rip), %rax
.L7:
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.section .text.unlikely
.cfi_startproc
.type _ZL3fooPKcj.resolver.cold, @function
_ZL3fooPKcj.resolver.cold:
.LFSB6:
.L8:
.cfi_def_cfa_offset 16
testb $2, %ah
leaq _ZL3fooPKcj.avx(%rip), %rdx
leaq _ZL3fooPKcj(%rip), %rax
cmovne %rdx, %rax
jmp .L7
.cfi_endproc
.LFE6:
.text
.size _ZL3fooPKcj.resolver, .-_ZL3fooPKcj.resolver
.section .text.unlikely
.size _ZL3fooPKcj.resolver.cold, .-_ZL3fooPKcj.resolver.cold
.LCOLDE0:
.text
.LHOTE0:
.type _Z11_ZL3fooPKcjPKcj, @gnu_indirect_function
.set _Z11_ZL3fooPKcjPKcj,_ZL3fooPKcj.resolver
.p2align 4
.globl _Z3barv
.type _Z3barv, @function
_Z3barv:
.LFB3:
.cfi_startproc
pushq %r12
.cfi_def_cfa_offset 16
.cfi_offset 12, -16
xorl %r12d, %r12d
pushq %rbp
.cfi_def_cfa_offset 24
.cfi_offset 6, -24
pushq %rbx
.cfi_def_cfa_offset 32
.cfi_offset 3, -32
subq $4112, %rsp
.cfi_def_cfa_offset 4144
movq %fs:40, %rax
movq %rax, 4104(%rsp)
xorl %eax, %eax
movq %rsp, %rbx
leaq 4096(%rsp), %rbp
.p2align 4,,10
.p2align 3
.L12:
movq %rbx, %rdi
movl $1, %esi
addq $1, %rbx
call _Z11_ZL3fooPKcjPKcj@PLT
addl %eax, %r12d
cmpq %rbp, %rbx
jne .L12
movq 4104(%rsp), %rax
subq %fs:40, %rax
jne .L16
addq $4112, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 32
movl %r12d, %eax
popq %rbx
.cfi_def_cfa_offset 24
popq %rbp
.cfi_def_cfa_offset 16
popq %r12
.cfi_def_cfa_offset 8
ret
.L16:
.cfi_restore_state
call __stack_chk_fail@PLT
.cfi_endproc
.LFE3:
.size _Z3barv, .-_Z3barv
.p2align 4
.globl _Z3barv.avx
.type _Z3barv.avx, @function
_Z3barv.avx:
.LFB4:
.cfi_startproc
pushq %r12
.cfi_def_cfa_offset 16
.cfi_offset 12, -16
xorl %r12d, %r12d
pushq %rbp
.cfi_def_cfa_offset 24
.cfi_offset 6, -24
pushq %rbx
.cfi_def_cfa_offset 32
.cfi_offset 3, -32
subq $4112, %rsp
.cfi_def_cfa_offset 4144
movq %fs:40, %rax
movq %rax, 4104(%rsp)
xorl %eax, %eax
movq %rsp, %rbx
leaq 4096(%rsp), %rbp
.p2align 4,,10
.p2align 3
.L18:
movq %rbx, %rdi
movl $1, %esi
addq $1, %rbx
call _Z11_ZL3fooPKcjPKcj@PLT
addl %eax, %r12d
cmpq %rbp, %rbx
jne .L18
movq 4104(%rsp), %rax
subq %fs:40, %rax
jne .L22
addq $4112, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 32
movl %r12d, %eax
popq %rbx
.cfi_def_cfa_offset 24
popq %rbp
.cfi_def_cfa_offset 16
popq %r12
.cfi_def_cfa_offset 8
ret
.L22:
.cfi_restore_state
call __stack_chk_fail@PLT
.cfi_endproc
.LFE4:
.size _Z3barv.avx, .-_Z3barv.avx
.ident "GCC: (GNU) 10.1.0"
.section .note.GNU-stack,"",@progbits
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch
2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com
` (2 preceding siblings ...)
2020-06-20 19:26 ` yyc1992 at gmail dot com
@ 2020-06-20 20:33 ` hjl.tools at gmail dot com
2020-06-20 20:59 ` yyc1992 at gmail dot com
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: hjl.tools at gmail dot com @ 2020-06-20 20:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790
--- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Yichao Yu from comment #2)
> The C++ code attached above produces the following incorrect code with `g++
> -O2 -S`
>
> .file "a.c"
> .text
> .p2align 4
> .globl _Z3barv
> .type _Z3barv, @function
> _Z3barv:
> .LFB3:
> .cfi_startproc
> movl $4096, %eax
> ret
> .cfi_endproc
> .LFE3:
> .size _Z3barv, .-_Z3barv
> .p2align 4
> .globl _Z3barv.avx
> .type _Z3barv.avx, @function
> _Z3barv.avx:
> .LFB4:
> .cfi_startproc
> movl $8192, %eax
> ret
> .cfi_endproc
> .LFE4:
> .size _Z3barv.avx, .-_Z3barv.avx
> .ident "GCC: (GNU) 10.1.0"
> .section .note.GNU-stack,"",@progbits
>
It looks correct me. What is the issue?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch
2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com
` (3 preceding siblings ...)
2020-06-20 20:33 ` hjl.tools at gmail dot com
@ 2020-06-20 20:59 ` yyc1992 at gmail dot com
2020-06-20 22:36 ` hjl.tools at gmail dot com
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: yyc1992 at gmail dot com @ 2020-06-20 20:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790
--- Comment #5 from Yichao Yu <yyc1992 at gmail dot com> ---
It’s wrong when running on a target that has avx512f. The unoptimuzed version
will call the correct foo but the unoptimized case won’t.
As I said, this is an issue when the total targets are different between the
callee and caller.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch
2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com
` (4 preceding siblings ...)
2020-06-20 20:59 ` yyc1992 at gmail dot com
@ 2020-06-20 22:36 ` hjl.tools at gmail dot com
2020-06-21 0:02 ` yyc1992 at gmail dot com
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: hjl.tools at gmail dot com @ 2020-06-20 22:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790
--- Comment #6 from H.J. Lu <hjl.tools at gmail dot com> ---
(In reply to Yichao Yu from comment #5)
> It’s wrong when running on a target that has avx512f. The unoptimuzed
> version will call the correct foo but the unoptimized case won’t.
>
> As I said, this is an issue when the total targets are different between the
> callee and caller.
Your testcase has nested function multi-versioning. I don't think it works
at all. I opened PR 95793.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch
2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com
` (5 preceding siblings ...)
2020-06-20 22:36 ` hjl.tools at gmail dot com
@ 2020-06-21 0:02 ` yyc1992 at gmail dot com
2020-06-21 0:12 ` yyc1992 at gmail dot com
2020-06-22 8:12 ` rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: yyc1992 at gmail dot com @ 2020-06-21 0:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790
--- Comment #7 from Yichao Yu <yyc1992 at gmail dot com> ---
> Your testcase has nested function multi-versioning. I don't think it works
at all. I opened PR 95793.
I'm sorry but what is nested function multi-versioning? and what's the
difference between the test case here and the one in PR95793?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch
2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com
` (6 preceding siblings ...)
2020-06-21 0:02 ` yyc1992 at gmail dot com
@ 2020-06-21 0:12 ` yyc1992 at gmail dot com
2020-06-22 8:12 ` rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: yyc1992 at gmail dot com @ 2020-06-21 0:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790
--- Comment #8 from Yichao Yu <yyc1992 at gmail dot com> ---
And the reason I reported this as a mis-optimization rather than something
completely unsupported is that the following code.
```
#include <stdio.h>
// #define disable_opt __attribute__((flatten))
#define disable_opt
disable_opt __attribute__ ((target ("default")))
static unsigned foo(const char *buf, unsigned size) {
return 1;
}
disable_opt __attribute__ ((target ("avx")))
static unsigned foo(const char *buf, unsigned size) {
return 2;
}
disable_opt __attribute__ ((target ("avx2")))
static unsigned foo(const char *buf, unsigned size) {
return 3;
}
__attribute__ ((target ("default")))
unsigned bar() {
char buf[4096];
unsigned acc = 0;
for (int i = 0; i < sizeof(buf); i++) {
acc += foo(&buf[i], 1);
}
return acc;
}
__attribute__ ((target ("avx")))
unsigned bar() {
char buf[4096];
unsigned acc = 0;
for (int i = 0; i < sizeof(buf); i++) {
acc += foo(&buf[i], 1);
}
return acc;
}
int main()
{
printf("%u\n", bar());
return 0;
}
```
when compiled with `#define disable_opt`, prints the wrong answer `8192` on my
avx2 laptop. OTOH, with `#define disable_opt __attribute__((flatten))` to
disable the inlining using the bug, it prints the correct result of 12288.
Other ways force an independent dispatch like the following using a volatile
slot also works.
```
#include <stdio.h>
__attribute__ ((target ("default")))
static unsigned _foo(const char *buf, unsigned size) {
return 1;
}
__attribute__ ((target ("avx")))
static unsigned _foo(const char *buf, unsigned size) {
return 2;
}
__attribute__ ((target ("avx2")))
static unsigned _foo(const char *buf, unsigned size) {
return 3;
}
static unsigned (* volatile foo)(const char *buf, unsigned size) = _foo;
__attribute__ ((target ("default")))
unsigned bar() {
char buf[4096];
unsigned acc = 0;
for (int i = 0; i < sizeof(buf); i++) {
acc += foo(&buf[i], 1);
}
return acc;
}
__attribute__ ((target ("avx")))
unsigned bar() {
char buf[4096];
unsigned acc = 0;
for (int i = 0; i < sizeof(buf); i++) {
acc += foo(&buf[i], 1);
}
return acc;
}
int main()
{
printf("%u\n", bar());
return 0;
}
```
I think this suggests that the most basic codegen without optimization is
clearly working and this usage (being it nested multiversioning or not) isn't
something that's just not supported. Rather it's only the optimization that's
wrong.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch
2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com
` (7 preceding siblings ...)
2020-06-21 0:12 ` yyc1992 at gmail dot com
@ 2020-06-22 8:12 ` rguenth at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-06-22 8:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |wrong-code
Status|WAITING |NEW
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think "nested" MV should just work, we just have to be careful when
optimizing
dispatch between them. Well, if I understand correctly what nested MV should
be
(calling a MV function from a MVed function with a different target).
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-06-22 8:12 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com
2020-06-20 19:22 ` [Bug ipa/95790] " hjl.tools at gmail dot com
2020-06-20 19:25 ` yyc1992 at gmail dot com
2020-06-20 19:26 ` yyc1992 at gmail dot com
2020-06-20 20:33 ` hjl.tools at gmail dot com
2020-06-20 20:59 ` yyc1992 at gmail dot com
2020-06-20 22:36 ` hjl.tools at gmail dot com
2020-06-21 0:02 ` yyc1992 at gmail dot com
2020-06-21 0:12 ` yyc1992 at gmail dot com
2020-06-22 8:12 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).