public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
@ 2020-09-09 23:38 vries at gcc dot gnu.org
  2022-02-06  8:04 ` [Bug target/97005] " vries at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2020-09-09 23:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

            Bug ID: 97005
           Summary: [nvptx] FAIL:
                    c-c++-common/torture/builtin-arith-overflow-15.c   -O0
                     execution test
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Minimized to:
...
$ cat builtin-arith-overflow-15.c
int
main (void)
{
  signed char r;
  unsigned char y = (unsigned char) 0x80;

  if (__builtin_sub_overflow ((unsigned char)0,
                              (unsigned char)y,
                              &r))
    __builtin_abort ();

  return 0;
}
...

Compile like this:
...
$ ./build-gcc/gcc/xgcc \
    -B./build-gcc/gcc/ \
    builtin-arith-overflow-15.c \
    -O0  \
    -L./build-gcc/nvptx-none/./newlib \
    -mmainkernel \
    -o ./builtin-arith-overflow-15.exe
...

Run:
...
$ ./install/bin/nvptx-none-run ./builtin-arith-overflow-15.exe 
nvptx-run: error getting kernel result: an illegal instruction was encountered
(CUDA_ERROR_ILLEGAL_INSTRUCTION, 715)
...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
@ 2022-02-06  8:04 ` vries at gcc dot gnu.org
  2022-02-06  8:42 ` vries at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-06  8:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 52359
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52359&action=edit
Cuda reproducer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
  2022-02-06  8:04 ` [Bug target/97005] " vries at gcc dot gnu.org
@ 2022-02-06  8:42 ` vries at gcc dot gnu.org
  2022-02-06 10:41 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-06  8:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #1)
> Created attachment 52359 [details]
> Cuda reproducer

Filed at https://developer.nvidia.com/nvidia_bug/3527713 as "cvt.u32.u16
sign-extends instead of zero-extends".

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
  2022-02-06  8:04 ` [Bug target/97005] " vries at gcc dot gnu.org
  2022-02-06  8:42 ` vries at gcc dot gnu.org
@ 2022-02-06 10:41 ` jakub at gcc dot gnu.org
  2022-02-07  8:43 ` vries at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-06 10:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Is some workaround possible, like instead of emitting cvt.u32.u16 do
cvt.u32.s16 and add explicit and?  Do other zero extends work correctly?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-02-06 10:41 ` jakub at gcc dot gnu.org
@ 2022-02-07  8:43 ` vries at gcc dot gnu.org
  2022-02-07  8:48 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-07  8:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

--- Comment #4 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #3)
> Is some workaround possible, like instead of emitting cvt.u32.u16 do
> cvt.u32.s16 and add explicit and?

This already works:
...
diff --git a/builtin-arith-overflow-15/src.cu
b/builtin-arith-overflow-15/src.cu
index 7a2535f..96f5f1e 100644
--- a/builtin-arith-overflow-15/src.cu
+++ b/builtin-arith-overflow-15/src.cu
@@ -46,6 +46,7 @@ hello (unsigned int *output)
     //"mov.u16 r33,0xff80;"

     "cvt.u32.u16 r35,r33;"
+    "and.b32 r35,r35,0x0000ffff;"
     //"mov.u32 r35, 0x0000ff80;"

     "st.u32 [rp], r35;"
...

> Do other zero extends work correctly?

I've rewritten the example to cvt.u64.u32, but that one passes fine. 
But cvt.u64.u16 runs into the same problem.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-02-07  8:43 ` vries at gcc dot gnu.org
@ 2022-02-07  8:48 ` jakub at gcc dot gnu.org
  2022-02-07  8:52 ` vries at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-07  8:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
What about u16.u8, u32.u8 and u64.u8 zero extensions?
If it is just hi -> {si,di} zext, then we could take HImode out of the
(define_insn "zero_extend<mode>si2"
  [(set (match_operand:SI 0 "nvptx_register_operand" "=R,R")
        (zero_extend:SI (match_operand:QHIM 1 "nvptx_nonimmediate_operand"
"R,m")))]
  ""
  "@
   %.\\tcvt.u32.u%T1\\t%0, %1;
   %.\\tld%A1.u%T1\\t%0, %1;"
  [(set_attr "subregs_ok" "true")])

(define_insn "zero_extend<mode>di2"
  [(set (match_operand:DI 0 "nvptx_register_operand" "=R,R")
        (zero_extend:DI (match_operand:QHSIM 1 "nvptx_nonimmediate_operand"
"R,m")))]
  ""
  "@  
   %.\\tcvt.u64.u%T1\\t%0, %1;
   %.\\tld%A1%u1\\t%0, %1;"
  [(set_attr "subregs_ok" "true")])
iterators and add patterns for the hisi and hidi that would do the and
afterwards for the cvt case.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-02-07  8:48 ` jakub at gcc dot gnu.org
@ 2022-02-07  8:52 ` vries at gcc dot gnu.org
  2022-02-07  8:57 ` vries at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-07  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

--- Comment #6 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #5)
> What about u16.u8, u32.u8 and u64.u8 zero extensions?

ptx has no .u8 registers, so there's no straightforward translation of the
example.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-02-07  8:52 ` vries at gcc dot gnu.org
@ 2022-02-07  8:57 ` vries at gcc dot gnu.org
  2022-02-07 11:49 ` vries at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-07  8:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #6)
> (In reply to Jakub Jelinek from comment #5)
> > What about u16.u8, u32.u8 and u64.u8 zero extensions?
> 
> ptx has no .u8 registers, so there's no straightforward translation of the
> example.

Um, sorry, I misremembered, that's not true, it does exist, but it's very
restricted: mostly ld, st, and cvt.  So the sub insn doesn't exist in a u8
mode.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2022-02-07  8:57 ` vries at gcc dot gnu.org
@ 2022-02-07 11:49 ` vries at gcc dot gnu.org
  2022-02-10  8:52 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-07 11:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

--- Comment #8 from Tom de Vries <vries at gcc dot gnu.org> ---
I've tried the workaround (posting here only the patch for trunchiqi2, the
pattern that was actually triggered):
...
@@ -424,9 +436,21 @@
   [(set (match_operand:QI 0 "nvptx_nonimmediate_operand" "=R,m")
        (truncate:QI (match_operand:HI 1 "nvptx_register_operand" "R,R")))]
   ""
-  "@
-   %.\\tcvt%t0.u16\\t%0, %1;
-   %.\\tst%A0.u8\\t%0, %1;"
+  {
+    if (which_alternative == 1)
+      return "%.\\tst%A0.u8\\t%0, %1;";
+
+    const char *cvt = "%.\\tcvt%t0.u16\\t%0, %1;";
+    if (1)
+      {
+        /* Workaround https://developer.nvidia.com/nvidia_bug/3527713.  */
+        output_asm_insn ("%.\\tcvt.s32.s16\\t%0, %1;", operands);
+        output_asm_insn ("%.\\tand.b32\\t%0, %0,0x0000ffff;", operands);
+        return "";
+      }
+
+    return cvt;
+  }
   [(set_attr "subregs_ok" "true")])

 (define_insn "truncsi<mode>2"
...
but it didn't work for the test-case from comment 0.

Something that does seem to work for both cases, and the unreduced
builtin-arith-overflow-15.c:
...
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 6c399dea1908..c33903688a5d 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -507,7 +507,13 @@
        (minus:HSDIM (match_operand:HSDIM 1 "nvptx_register_operand" "R")
                     (match_operand:HSDIM 2 "nvptx_register_operand" "R")))]
   ""
-  "%.\\tsub%t0\\t%0, %1, %2;")
+  {
+    if (GET_MODE (operands[0]) == HImode)
+      /* Workaround https://developer.nvidia.com/nvidia_bug/3527713.  */
+      return "%.\\tsub.s16\\t%0, %1, %2;";
+
+    return "%.\\tsub%t0\\t%0, %1, %2;";
+  })

 (define_insn "mul<mode>3"
   [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2022-02-07 11:49 ` vries at gcc dot gnu.org
@ 2022-02-10  8:52 ` cvs-commit at gcc dot gnu.org
  2022-02-10  8:55 ` vries at gcc dot gnu.org
  2022-02-24 20:07 ` vries at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-10  8:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tom de Vries <vries@gcc.gnu.org>:

https://gcc.gnu.org/g:5b2d679bbbcc2b976c6e228ba63afdf67c33164e

commit r12-7170-g5b2d679bbbcc2b976c6e228ba63afdf67c33164e
Author: Tom de Vries <tdevries@suse.de>
Date:   Mon Feb 7 14:12:34 2022 +0100

    [nvptx] Workaround sub.u16 driver JIT bug

    There's a nvidia driver JIT bug that mishandles this code (minimized from
    builtin-arith-overflow-15.c):
    ...
    int main (void) {
      signed char r;
      unsigned char y = (unsigned char) 0x80;
      if (__builtin_sub_overflow ((unsigned char)0, (unsigned char)y, &r))
        __builtin_abort ();
      return 0;
    }
    ...
    which at ptx level minimizes to:
    ...
      mov.u16 r22, 0x0080;
      st.local.u16 [frame_var],r22;
      ld.local.u16 r32,[frame_var];
      sub.u16 r33,0x0000,r32;
      cvt.u32.u16 r35,r33;
    ...
    where we expect r35 == 0x0000ff80 but get instead 0xffffff80, and where
using
    nvptx-none-run -O0 fixes the problem.  [ See also
    https://github.com/vries/nvidia-bugs/tree/master/builtin-arith-overflow-15
. ]

    Try to workaround the bug by using sub.s16 instead of sub.u16.

    Tested on nvptx.

    gcc/ChangeLog:

    2022-02-07  Tom de Vries  <tdevries@suse.de>

            PR target/97005
            * config/nvptx/nvptx.md (define_insn "sub<mode>3"): Workaround
            driver JIT bug by using sub.s16 instead of sub.u16.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2022-02-10  8:52 ` cvs-commit at gcc dot gnu.org
@ 2022-02-10  8:55 ` vries at gcc dot gnu.org
  2022-02-24 20:07 ` vries at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-10  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
   Target Milestone|---                         |12.0
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #10 from Tom de Vries <vries at gcc dot gnu.org> ---
Worked around by "[nvptx] Workaround sub.u16 driver JIT bug".

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/97005] [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c   -O0  execution test
  2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2022-02-10  8:55 ` vries at gcc dot gnu.org
@ 2022-02-24 20:07 ` vries at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-24 20:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97005

--- Comment #11 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #2)
> (In reply to Tom de Vries from comment #1)
> > Created attachment 52359 [details]
> > Cuda reproducer
> 
> Filed at https://developer.nvidia.com/nvidia_bug/3527713 as "cvt.u32.u16
> sign-extends instead of zero-extends".

Update from nvidia: Fix being tested.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-02-24 20:07 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09 23:38 [Bug target/97005] New: [nvptx] FAIL: c-c++-common/torture/builtin-arith-overflow-15.c -O0 execution test vries at gcc dot gnu.org
2022-02-06  8:04 ` [Bug target/97005] " vries at gcc dot gnu.org
2022-02-06  8:42 ` vries at gcc dot gnu.org
2022-02-06 10:41 ` jakub at gcc dot gnu.org
2022-02-07  8:43 ` vries at gcc dot gnu.org
2022-02-07  8:48 ` jakub at gcc dot gnu.org
2022-02-07  8:52 ` vries at gcc dot gnu.org
2022-02-07  8:57 ` vries at gcc dot gnu.org
2022-02-07 11:49 ` vries at gcc dot gnu.org
2022-02-10  8:52 ` cvs-commit at gcc dot gnu.org
2022-02-10  8:55 ` vries at gcc dot gnu.org
2022-02-24 20:07 ` vries at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).