public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug ipa/95790] New: Incorrect static target dispatch @ 2020-06-20 19:08 yyc1992 at gmail dot com 2020-06-20 19:22 ` [Bug ipa/95790] " hjl.tools at gmail dot com ` (8 more replies) 0 siblings, 9 replies; 10+ messages in thread From: yyc1992 at gmail dot com @ 2020-06-20 19:08 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 Bug ID: 95790 Summary: Incorrect static target dispatch Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: yyc1992 at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- The indirection elimination code currently only check for match of the target for the specific version but doesn't check if all the targets are matching. Modifying from https://github.com/gcc-mirror/gcc/commit/b8ce8129a560f64f8b2855c4a3812b7c3c0ebf3f#diff-e2d535917af8555baad2e9c8749e96a5 ``` __attribute__ ((target ("default"))) static unsigned foo(const char *buf, unsigned size) { return 1; } __attribute__ ((target ("avx"))) static unsigned foo(const char *buf, unsigned size) { return 2; } __attribute__ ((target ("avx512f"))) static unsigned foo(const char *buf, unsigned size) { return 3; } __attribute__ ((target ("default"))) unsigned bar() { char buf[4096]; unsigned acc = 0; for (int i = 0; i < sizeof(buf); i++) { acc += foo(&buf[i], 1); } return acc; } __attribute__ ((target ("avx"))) unsigned bar() { char buf[4096]; unsigned acc = 0; for (int i = 0; i < sizeof(buf); i++) { acc += foo(&buf[i], 1); } return acc; } ``` With the optimization disabled, which is possible by adding a flatten attribute to the functions and triggering PR95780 and PR95778, a resolver function is automatically generated for foo like ``` .text .LHOTB0: .p2align 4 .type _ZL3fooPKcj.resolver, @function _ZL3fooPKcj.resolver: subq $8, %rsp call __cpu_indicator_init@PLT movq __cpu_model@GOTPCREL(%rip), %rax movl 12(%rax), %eax testb $-128, %ah je .L8 leaq _ZL3fooPKcj.avx512f(%rip), %rax .L7: addq $8, %rsp ret .section .text.unlikely .type _ZL3fooPKcj.resolver.cold, @function _ZL3fooPKcj.resolver.cold: .L8: testb $2, %ah leaq _ZL3fooPKcj.avx(%rip), %rdx leaq _ZL3fooPKcj(%rip), %rax cmovne %rdx, %rax jmp .L7 .text .size _ZL3fooPKcj.resolver, .-_ZL3fooPKcj.resolver .section .text.unlikely .size _ZL3fooPKcj.resolver.cold, .-_ZL3fooPKcj.resolver.cold .LCOLDE0: .text .LHOTE0: .type _Z11_ZL3fooPKcjPKcj, @gnu_indirect_function .set _Z11_ZL3fooPKcjPKcj,_ZL3fooPKcj.resolver ``` and the calls from bar goes through the PLT. This is the correct behavior (albeit sub-optimal since the default could call the default directly) and allows avx512f version of foo to be called on the correct processor from the avx version of bar. With the optimization enabled, however, the call of foo's are inlined to bar and the avx512f version is never used. This is somewhat a regression caused by b8ce8129a560f64f8b2855c4a3812b7c3c0ebf3f. It'll also affect my fix for PR95780 and PR95778. https://gcc.gnu.org/pipermail/gcc-patches/2020-June/548631.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch 2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com @ 2020-06-20 19:22 ` hjl.tools at gmail dot com 2020-06-20 19:25 ` yyc1992 at gmail dot com ` (7 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: hjl.tools at gmail dot com @ 2020-06-20 19:22 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 H.J. Lu <hjl.tools at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |WAITING Last reconfirmed| |2020-06-20 --- Comment #1 from H.J. Lu <hjl.tools at gmail dot com> --- Please show exactly how to reproduce it. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch 2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com 2020-06-20 19:22 ` [Bug ipa/95790] " hjl.tools at gmail dot com @ 2020-06-20 19:25 ` yyc1992 at gmail dot com 2020-06-20 19:26 ` yyc1992 at gmail dot com ` (6 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: yyc1992 at gmail dot com @ 2020-06-20 19:25 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #2 from Yichao Yu <yyc1992 at gmail dot com> --- The C++ code attached above produces the following incorrect code with `g++ -O2 -S` .file "a.c" .text .p2align 4 .globl _Z3barv .type _Z3barv, @function _Z3barv: .LFB3: .cfi_startproc movl $4096, %eax ret .cfi_endproc .LFE3: .size _Z3barv, .-_Z3barv .p2align 4 .globl _Z3barv.avx .type _Z3barv.avx, @function _Z3barv.avx: .LFB4: .cfi_startproc movl $8192, %eax ret .cfi_endproc .LFE4: .size _Z3barv.avx, .-_Z3barv.avx .ident "GCC: (GNU) 10.1.0" .section .note.GNU-stack,"",@progbits Triggering the bug PR95778 with __attribute__ ((flatten,target ("default"))) static unsigned foo(const char *buf, unsigned size) { return 1; } __attribute__ ((flatten,target ("avx"))) static unsigned foo(const char *buf, unsigned size) { return 2; } __attribute__ ((flatten,target ("avx512f"))) static unsigned foo(const char *buf, unsigned size) { return 3; } __attribute__ ((target ("default"))) unsigned bar() { char buf[4096]; unsigned acc = 0; for (int i = 0; i < sizeof(buf); i++) { acc += foo(&buf[i], 1); } return acc; } __attribute__ ((target ("avx"))) unsigned bar() { char buf[4096]; unsigned acc = 0; for (int i = 0; i < sizeof(buf); i++) { acc += foo(&buf[i], 1); } return acc; } produces the correct code. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch 2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com 2020-06-20 19:22 ` [Bug ipa/95790] " hjl.tools at gmail dot com 2020-06-20 19:25 ` yyc1992 at gmail dot com @ 2020-06-20 19:26 ` yyc1992 at gmail dot com 2020-06-20 20:33 ` hjl.tools at gmail dot com ` (5 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: yyc1992 at gmail dot com @ 2020-06-20 19:26 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #3 from Yichao Yu <yyc1992 at gmail dot com> --- And the assembly showing the correct dispatch is .file "a.c" .text .p2align 4 .type _ZL3fooPKcj, @function _ZL3fooPKcj: .LFB0: .cfi_startproc movl $1, %eax ret .cfi_endproc .LFE0: .size _ZL3fooPKcj, .-_ZL3fooPKcj .p2align 4 .type _ZL3fooPKcj.avx, @function _ZL3fooPKcj.avx: .LFB1: .cfi_startproc movl $2, %eax ret .cfi_endproc .LFE1: .size _ZL3fooPKcj.avx, .-_ZL3fooPKcj.avx .p2align 4 .type _ZL3fooPKcj.avx512f, @function _ZL3fooPKcj.avx512f: .LFB2: .cfi_startproc movl $3, %eax ret .cfi_endproc .LFE2: .size _ZL3fooPKcj.avx512f, .-_ZL3fooPKcj.avx512f .section .text.unlikely,"ax",@progbits .LCOLDB0: .text .LHOTB0: .p2align 4 .type _ZL3fooPKcj.resolver, @function _ZL3fooPKcj.resolver: .LFB6: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 call __cpu_indicator_init@PLT movq __cpu_model@GOTPCREL(%rip), %rax movl 12(%rax), %eax testb $-128, %ah je .L8 leaq _ZL3fooPKcj.avx512f(%rip), %rax .L7: addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc .section .text.unlikely .cfi_startproc .type _ZL3fooPKcj.resolver.cold, @function _ZL3fooPKcj.resolver.cold: .LFSB6: .L8: .cfi_def_cfa_offset 16 testb $2, %ah leaq _ZL3fooPKcj.avx(%rip), %rdx leaq _ZL3fooPKcj(%rip), %rax cmovne %rdx, %rax jmp .L7 .cfi_endproc .LFE6: .text .size _ZL3fooPKcj.resolver, .-_ZL3fooPKcj.resolver .section .text.unlikely .size _ZL3fooPKcj.resolver.cold, .-_ZL3fooPKcj.resolver.cold .LCOLDE0: .text .LHOTE0: .type _Z11_ZL3fooPKcjPKcj, @gnu_indirect_function .set _Z11_ZL3fooPKcjPKcj,_ZL3fooPKcj.resolver .p2align 4 .globl _Z3barv .type _Z3barv, @function _Z3barv: .LFB3: .cfi_startproc pushq %r12 .cfi_def_cfa_offset 16 .cfi_offset 12, -16 xorl %r12d, %r12d pushq %rbp .cfi_def_cfa_offset 24 .cfi_offset 6, -24 pushq %rbx .cfi_def_cfa_offset 32 .cfi_offset 3, -32 subq $4112, %rsp .cfi_def_cfa_offset 4144 movq %fs:40, %rax movq %rax, 4104(%rsp) xorl %eax, %eax movq %rsp, %rbx leaq 4096(%rsp), %rbp .p2align 4,,10 .p2align 3 .L12: movq %rbx, %rdi movl $1, %esi addq $1, %rbx call _Z11_ZL3fooPKcjPKcj@PLT addl %eax, %r12d cmpq %rbp, %rbx jne .L12 movq 4104(%rsp), %rax subq %fs:40, %rax jne .L16 addq $4112, %rsp .cfi_remember_state .cfi_def_cfa_offset 32 movl %r12d, %eax popq %rbx .cfi_def_cfa_offset 24 popq %rbp .cfi_def_cfa_offset 16 popq %r12 .cfi_def_cfa_offset 8 ret .L16: .cfi_restore_state call __stack_chk_fail@PLT .cfi_endproc .LFE3: .size _Z3barv, .-_Z3barv .p2align 4 .globl _Z3barv.avx .type _Z3barv.avx, @function _Z3barv.avx: .LFB4: .cfi_startproc pushq %r12 .cfi_def_cfa_offset 16 .cfi_offset 12, -16 xorl %r12d, %r12d pushq %rbp .cfi_def_cfa_offset 24 .cfi_offset 6, -24 pushq %rbx .cfi_def_cfa_offset 32 .cfi_offset 3, -32 subq $4112, %rsp .cfi_def_cfa_offset 4144 movq %fs:40, %rax movq %rax, 4104(%rsp) xorl %eax, %eax movq %rsp, %rbx leaq 4096(%rsp), %rbp .p2align 4,,10 .p2align 3 .L18: movq %rbx, %rdi movl $1, %esi addq $1, %rbx call _Z11_ZL3fooPKcjPKcj@PLT addl %eax, %r12d cmpq %rbp, %rbx jne .L18 movq 4104(%rsp), %rax subq %fs:40, %rax jne .L22 addq $4112, %rsp .cfi_remember_state .cfi_def_cfa_offset 32 movl %r12d, %eax popq %rbx .cfi_def_cfa_offset 24 popq %rbp .cfi_def_cfa_offset 16 popq %r12 .cfi_def_cfa_offset 8 ret .L22: .cfi_restore_state call __stack_chk_fail@PLT .cfi_endproc .LFE4: .size _Z3barv.avx, .-_Z3barv.avx .ident "GCC: (GNU) 10.1.0" .section .note.GNU-stack,"",@progbits ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch 2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com ` (2 preceding siblings ...) 2020-06-20 19:26 ` yyc1992 at gmail dot com @ 2020-06-20 20:33 ` hjl.tools at gmail dot com 2020-06-20 20:59 ` yyc1992 at gmail dot com ` (4 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: hjl.tools at gmail dot com @ 2020-06-20 20:33 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #4 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Yichao Yu from comment #2) > The C++ code attached above produces the following incorrect code with `g++ > -O2 -S` > > .file "a.c" > .text > .p2align 4 > .globl _Z3barv > .type _Z3barv, @function > _Z3barv: > .LFB3: > .cfi_startproc > movl $4096, %eax > ret > .cfi_endproc > .LFE3: > .size _Z3barv, .-_Z3barv > .p2align 4 > .globl _Z3barv.avx > .type _Z3barv.avx, @function > _Z3barv.avx: > .LFB4: > .cfi_startproc > movl $8192, %eax > ret > .cfi_endproc > .LFE4: > .size _Z3barv.avx, .-_Z3barv.avx > .ident "GCC: (GNU) 10.1.0" > .section .note.GNU-stack,"",@progbits > It looks correct me. What is the issue? ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch 2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com ` (3 preceding siblings ...) 2020-06-20 20:33 ` hjl.tools at gmail dot com @ 2020-06-20 20:59 ` yyc1992 at gmail dot com 2020-06-20 22:36 ` hjl.tools at gmail dot com ` (3 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: yyc1992 at gmail dot com @ 2020-06-20 20:59 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #5 from Yichao Yu <yyc1992 at gmail dot com> --- It’s wrong when running on a target that has avx512f. The unoptimuzed version will call the correct foo but the unoptimized case won’t. As I said, this is an issue when the total targets are different between the callee and caller. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch 2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com ` (4 preceding siblings ...) 2020-06-20 20:59 ` yyc1992 at gmail dot com @ 2020-06-20 22:36 ` hjl.tools at gmail dot com 2020-06-21 0:02 ` yyc1992 at gmail dot com ` (2 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: hjl.tools at gmail dot com @ 2020-06-20 22:36 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #6 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Yichao Yu from comment #5) > It’s wrong when running on a target that has avx512f. The unoptimuzed > version will call the correct foo but the unoptimized case won’t. > > As I said, this is an issue when the total targets are different between the > callee and caller. Your testcase has nested function multi-versioning. I don't think it works at all. I opened PR 95793. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch 2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com ` (5 preceding siblings ...) 2020-06-20 22:36 ` hjl.tools at gmail dot com @ 2020-06-21 0:02 ` yyc1992 at gmail dot com 2020-06-21 0:12 ` yyc1992 at gmail dot com 2020-06-22 8:12 ` rguenth at gcc dot gnu.org 8 siblings, 0 replies; 10+ messages in thread From: yyc1992 at gmail dot com @ 2020-06-21 0:02 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #7 from Yichao Yu <yyc1992 at gmail dot com> --- > Your testcase has nested function multi-versioning. I don't think it works at all. I opened PR 95793. I'm sorry but what is nested function multi-versioning? and what's the difference between the test case here and the one in PR95793? ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch 2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com ` (6 preceding siblings ...) 2020-06-21 0:02 ` yyc1992 at gmail dot com @ 2020-06-21 0:12 ` yyc1992 at gmail dot com 2020-06-22 8:12 ` rguenth at gcc dot gnu.org 8 siblings, 0 replies; 10+ messages in thread From: yyc1992 at gmail dot com @ 2020-06-21 0:12 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 --- Comment #8 from Yichao Yu <yyc1992 at gmail dot com> --- And the reason I reported this as a mis-optimization rather than something completely unsupported is that the following code. ``` #include <stdio.h> // #define disable_opt __attribute__((flatten)) #define disable_opt disable_opt __attribute__ ((target ("default"))) static unsigned foo(const char *buf, unsigned size) { return 1; } disable_opt __attribute__ ((target ("avx"))) static unsigned foo(const char *buf, unsigned size) { return 2; } disable_opt __attribute__ ((target ("avx2"))) static unsigned foo(const char *buf, unsigned size) { return 3; } __attribute__ ((target ("default"))) unsigned bar() { char buf[4096]; unsigned acc = 0; for (int i = 0; i < sizeof(buf); i++) { acc += foo(&buf[i], 1); } return acc; } __attribute__ ((target ("avx"))) unsigned bar() { char buf[4096]; unsigned acc = 0; for (int i = 0; i < sizeof(buf); i++) { acc += foo(&buf[i], 1); } return acc; } int main() { printf("%u\n", bar()); return 0; } ``` when compiled with `#define disable_opt`, prints the wrong answer `8192` on my avx2 laptop. OTOH, with `#define disable_opt __attribute__((flatten))` to disable the inlining using the bug, it prints the correct result of 12288. Other ways force an independent dispatch like the following using a volatile slot also works. ``` #include <stdio.h> __attribute__ ((target ("default"))) static unsigned _foo(const char *buf, unsigned size) { return 1; } __attribute__ ((target ("avx"))) static unsigned _foo(const char *buf, unsigned size) { return 2; } __attribute__ ((target ("avx2"))) static unsigned _foo(const char *buf, unsigned size) { return 3; } static unsigned (* volatile foo)(const char *buf, unsigned size) = _foo; __attribute__ ((target ("default"))) unsigned bar() { char buf[4096]; unsigned acc = 0; for (int i = 0; i < sizeof(buf); i++) { acc += foo(&buf[i], 1); } return acc; } __attribute__ ((target ("avx"))) unsigned bar() { char buf[4096]; unsigned acc = 0; for (int i = 0; i < sizeof(buf); i++) { acc += foo(&buf[i], 1); } return acc; } int main() { printf("%u\n", bar()); return 0; } ``` I think this suggests that the most basic codegen without optimization is clearly working and this usage (being it nested multiversioning or not) isn't something that's just not supported. Rather it's only the optimization that's wrong. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug ipa/95790] Incorrect static target dispatch 2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com ` (7 preceding siblings ...) 2020-06-21 0:12 ` yyc1992 at gmail dot com @ 2020-06-22 8:12 ` rguenth at gcc dot gnu.org 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2020-06-22 8:12 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95790 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |wrong-code Status|WAITING |NEW --- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- I think "nested" MV should just work, we just have to be careful when optimizing dispatch between them. Well, if I understand correctly what nested MV should be (calling a MV function from a MVed function with a different target). ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-06-22 8:12 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-20 19:08 [Bug ipa/95790] New: Incorrect static target dispatch yyc1992 at gmail dot com 2020-06-20 19:22 ` [Bug ipa/95790] " hjl.tools at gmail dot com 2020-06-20 19:25 ` yyc1992 at gmail dot com 2020-06-20 19:26 ` yyc1992 at gmail dot com 2020-06-20 20:33 ` hjl.tools at gmail dot com 2020-06-20 20:59 ` yyc1992 at gmail dot com 2020-06-20 22:36 ` hjl.tools at gmail dot com 2020-06-21 0:02 ` yyc1992 at gmail dot com 2020-06-21 0:12 ` yyc1992 at gmail dot com 2020-06-22 8:12 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).