public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
@ 2020-09-09 23:38 vries at gcc dot gnu.org
2022-02-06 8:04 ` [Bug target/97005] " vries at gcc dot gnu.org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2020-09-09 23:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
Bug ID: 97005
Summary: [nvptx] FAIL:
c-c++-common/torture/builtin-arith-overflow-15.c -O0
execution test
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
Target Milestone: ---
Minimized to:
...
$ cat builtin-arith-overflow-15.c
int
main (void)
{
signed char r;
unsigned char y = (unsigned char) 0x80;
if (__builtin_sub_overflow ((unsigned char)0,
(unsigned char)y,
&r))
__builtin_abort ();
return 0;
}
...
Compile like this:
...
$ ./build-gcc/gcc/xgcc \
-B./build-gcc/gcc/ \
builtin-arith-overflow-15.c \
-O0 \
-L./build-gcc/nvptx-none/./newlib \
-mmainkernel \
-o ./builtin-arith-overflow-15.exe
...
Run:
...
$ ./install/bin/nvptx-none-run ./builtin-arith-overflow-15.exe
nvptx-run: error getting kernel result: an illegal instruction was encountered
(CUDA_ERROR_ILLEGAL_INSTRUCTION, 715)
...
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
@ 2022-02-06 8:04 ` vries at gcc dot gnu.org
2022-02-06 8:42 ` vries at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-06 8:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 52359
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52359&action=edit
Cuda reproducer
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
2022-02-06 8:04 ` [Bug target/97005] " vries at gcc dot gnu.org
@ 2022-02-06 8:42 ` vries at gcc dot gnu.org
2022-02-06 10:41 ` jakub at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-06 8:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #1)
> Created attachment 52359 [details]
> Cuda reproducer
Filed at https://developer.nvidia.com/nvidia_bug/3527713 as "cvt.u32.u16
sign-extends instead of zero-extends".
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
2022-02-06 8:04 ` [Bug target/97005] " vries at gcc dot gnu.org
2022-02-06 8:42 ` vries at gcc dot gnu.org
@ 2022-02-06 10:41 ` jakub at gcc dot gnu.org
2022-02-07 8:43 ` vries at gcc dot gnu.org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-06 10:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jakub at gcc dot gnu.org
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Is some workaround possible, like instead of emitting cvt.u32.u16 do
cvt.u32.s16 and add explicit and? Do other zero extends work correctly?
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
` (2 preceding siblings ...)
2022-02-06 10:41 ` jakub at gcc dot gnu.org
@ 2022-02-07 8:43 ` vries at gcc dot gnu.org
2022-02-07 8:48 ` jakub at gcc dot gnu.org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-07 8:43 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
--- Comment #4 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #3)
> Is some workaround possible, like instead of emitting cvt.u32.u16 do
> cvt.u32.s16 and add explicit and?
This already works:
...
diff --git a/builtin-arith-overflow-15/src.cu
b/builtin-arith-overflow-15/src.cu
index 7a2535f..96f5f1e 100644
--- a/builtin-arith-overflow-15/src.cu
+++ b/builtin-arith-overflow-15/src.cu
@@ -46,6 +46,7 @@ hello (unsigned int *output)
//"mov.u16 r33,0xff80;"
"cvt.u32.u16 r35,r33;"
+ "and.b32 r35,r35,0x0000ffff;"
//"mov.u32 r35, 0x0000ff80;"
"st.u32 [rp], r35;"
...
> Do other zero extends work correctly?
I've rewritten the example to cvt.u64.u32, but that one passes fine.
But cvt.u64.u16 runs into the same problem.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
` (3 preceding siblings ...)
2022-02-07 8:43 ` vries at gcc dot gnu.org
@ 2022-02-07 8:48 ` jakub at gcc dot gnu.org
2022-02-07 8:52 ` vries at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-07 8:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
What about u16.u8, u32.u8 and u64.u8 zero extensions?
If it is just hi -> {si,di} zext, then we could take HImode out of the
(define_insn "zero_extend<mode>si2"
[(set (match_operand:SI 0 "nvptx_register_operand" "=R,R")
(zero_extend:SI (match_operand:QHIM 1 "nvptx_nonimmediate_operand"
"R,m")))]
""
"@
%.\\tcvt.u32.u%T1\\t%0, %1;
%.\\tld%A1.u%T1\\t%0, %1;"
[(set_attr "subregs_ok" "true")])
(define_insn "zero_extend<mode>di2"
[(set (match_operand:DI 0 "nvptx_register_operand" "=R,R")
(zero_extend:DI (match_operand:QHSIM 1 "nvptx_nonimmediate_operand"
"R,m")))]
""
"@
%.\\tcvt.u64.u%T1\\t%0, %1;
%.\\tld%A1%u1\\t%0, %1;"
[(set_attr "subregs_ok" "true")])
iterators and add patterns for the hisi and hidi that would do the and
afterwards for the cvt case.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
` (4 preceding siblings ...)
2022-02-07 8:48 ` jakub at gcc dot gnu.org
@ 2022-02-07 8:52 ` vries at gcc dot gnu.org
2022-02-07 8:57 ` vries at gcc dot gnu.org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-07 8:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
--- Comment #6 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #5)
> What about u16.u8, u32.u8 and u64.u8 zero extensions?
ptx has no .u8 registers, so there's no straightforward translation of the
example.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
` (5 preceding siblings ...)
2022-02-07 8:52 ` vries at gcc dot gnu.org
@ 2022-02-07 8:57 ` vries at gcc dot gnu.org
2022-02-07 11:49 ` vries at gcc dot gnu.org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-07 8:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #6)
> (In reply to Jakub Jelinek from comment #5)
> > What about u16.u8, u32.u8 and u64.u8 zero extensions?
>
> ptx has no .u8 registers, so there's no straightforward translation of the
> example.
Um, sorry, I misremembered, that's not true, it does exist, but it's very
restricted: mostly ld, st, and cvt. So the sub insn doesn't exist in a u8
mode.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
` (6 preceding siblings ...)
2022-02-07 8:57 ` vries at gcc dot gnu.org
@ 2022-02-07 11:49 ` vries at gcc dot gnu.org
2022-02-10 8:52 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-07 11:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
--- Comment #8 from Tom de Vries <vries at gcc dot gnu.org> ---
I've tried the workaround (posting here only the patch for trunchiqi2, the
pattern that was actually triggered):
...
@@ -424,9 +436,21 @@
[(set (match_operand:QI 0 "nvptx_nonimmediate_operand" "=R,m")
(truncate:QI (match_operand:HI 1 "nvptx_register_operand" "R,R")))]
""
- "@
- %.\\tcvt%t0.u16\\t%0, %1;
- %.\\tst%A0.u8\\t%0, %1;"
+ {
+ if (which_alternative == 1)
+ return "%.\\tst%A0.u8\\t%0, %1;";
+
+ const char *cvt = "%.\\tcvt%t0.u16\\t%0, %1;";
+ if (1)
+ {
+ /* Workaround https://developer.nvidia.com/nvidia_bug/3527713. */
+ output_asm_insn ("%.\\tcvt.s32.s16\\t%0, %1;", operands);
+ output_asm_insn ("%.\\tand.b32\\t%0, %0,0x0000ffff;", operands);
+ return "";
+ }
+
+ return cvt;
+ }
[(set_attr "subregs_ok" "true")])
(define_insn "truncsi<mode>2"
...
but it didn't work for the test-case from comment 0.
Something that does seem to work for both cases, and the unreduced
builtin-arith-overflow-15.c:
...
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 6c399dea1908..c33903688a5d 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -507,7 +507,13 @@
(minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
(match_operand:HSDIM 2 "nvptx_register_operand" "R")))]
""
- "%.\\tsub%t0\\t%0, %1, %2;")
+ {
+ if (GET_MODE (operands[0]) == HImode)
+ /* Workaround https://developer.nvidia.com/nvidia_bug/3527713. */
+ return "%.\\tsub.s16\\t%0, %1, %2;";
+
+ return "%.\\tsub%t0\\t%0, %1, %2;";
+ })
(define_insn "mul<mode>3"
[(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
...
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
` (7 preceding siblings ...)
2022-02-07 11:49 ` vries at gcc dot gnu.org
@ 2022-02-10 8:52 ` cvs-commit at gcc dot gnu.org
2022-02-10 8:55 ` vries at gcc dot gnu.org
2022-02-24 20:07 ` vries at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-10 8:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tom de Vries <vries@gcc.gnu.org>:
https://gcc.gnu.org/g:5b2d679bbbcc2b976c6e228ba63afdf67c33164e
commit r12-7170-g5b2d679bbbcc2b976c6e228ba63afdf67c33164e
Author: Tom de Vries <tdevries@suse.de>
Date: Mon Feb 7 14:12:34 2022 +0100
[nvptx] Workaround sub.u16 driver JIT bug
There's a nvidia driver JIT bug that mishandles this code (minimized from
builtin-arith-overflow-15.c):
...
int main (void) {
signed char r;
unsigned char y = (unsigned char) 0x80;
if (__builtin_sub_overflow ((unsigned char)0, (unsigned char)y, &r))
__builtin_abort ();
return 0;
}
...
which at ptx level minimizes to:
...
mov.u16 r22, 0x0080;
st.local.u16 [frame_var],r22;
ld.local.u16 r32,[frame_var];
sub.u16 r33,0x0000,r32;
cvt.u32.u16 r35,r33;
...
where we expect r35 == 0x0000ff80 but get instead 0xffffff80, and where
using
nvptx-none-run -O0 fixes the problem. [ See also
https://github.com/vries/nvidia-bugs/tree/master/builtin-arith-overflow-15
. ]
Try to workaround the bug by using sub.s16 instead of sub.u16.
Tested on nvptx.
gcc/ChangeLog:
2022-02-07 Tom de Vries <tdevries@suse.de>
PR target/97005
* config/nvptx/nvptx.md (define_insn "sub<mode>3"): Workaround
driver JIT bug by using sub.s16 instead of sub.u16.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
` (8 preceding siblings ...)
2022-02-10 8:52 ` cvs-commit at gcc dot gnu.org
@ 2022-02-10 8:55 ` vries at gcc dot gnu.org
2022-02-24 20:07 ` vries at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-10 8:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
Tom de Vries <vries at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Target Milestone|--- |12.0
Status|UNCONFIRMED |RESOLVED
--- Comment #10 from Tom de Vries <vries at gcc dot gnu.org> ---
Worked around by "[nvptx] Workaround sub.u16 driver JIT bug".
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
` (9 preceding siblings ...)
2022-02-10 8:55 ` vries at gcc dot gnu.org
@ 2022-02-24 20:07 ` vries at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-24 20:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005
--- Comment #11 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #2)
> (In reply to Tom de Vries from comment #1)
> > Created attachment 52359 [details]
> > Cuda reproducer
>
> Filed at https://developer.nvidia.com/nvidia_bug/3527713 as "cvt.u32.u16
> sign-extends instead of zero-extends".
Update from nvidia: Fix being tested.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-02-24 20:07 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
2022-02-06 8:04 ` [Bug target/97005] " vries at gcc dot gnu.org
2022-02-06 8:42 ` vries at gcc dot gnu.org
2022-02-06 10:41 ` jakub at gcc dot gnu.org
2022-02-07 8:43 ` vries at gcc dot gnu.org
2022-02-07 8:48 ` jakub at gcc dot gnu.org
2022-02-07 8:52 ` vries at gcc dot gnu.org
2022-02-07 8:57 ` vries at gcc dot gnu.org
2022-02-07 11:49 ` vries at gcc dot gnu.org
2022-02-10 8:52 ` cvs-commit at gcc dot gnu.org
2022-02-10 8:55 ` vries at gcc dot gnu.org
2022-02-24 20:07 ` vries at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).