public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic.
@ 2023-05-06 16:04 Roger Sayle
  2023-05-19 21:36 ` Jeff Law
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Roger Sayle @ 2023-05-06 16:04 UTC (permalink / raw)
  To: 'GCC Patches'; +Cc: 'Tom de Vries'


[-- Attachment #1.1: Type: text/plain, Size: 1365 bytes --]

 

This patch adds support for (a pair of) bit reversal intrinsics

__builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit

and 64-bit bit reversal (using nvptx's brev instruction) matching

the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.

https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT
.html

 

This patch has been tested on nvptx-none which make and make -k check

with no new failures.  Ok for mainline?

 

 

2023-05-06  Roger Sayle  <roger@nextmovesoftware.com>

 

gcc/ChangeLog

        * config/nvptx/nvptx.cc (nvptx_expand_brev): Expand target

        builtin for bit reversal using brev instruction.

        (enum nvptx_builtins): Add NVPTX_BUILTIN_BREV and

        NVPTX_BUILTIN_BREVLL.

        (nvptx_init_builtins): Define "brev" and "brevll".

        (nvptx_expand_builtin): Expand NVPTX_BUILTIN_BREV and

        NVPTX_BUILTIN_BREVLL via nvptx_expand_brev function.

        * doc/extend.texi (Nvidia PTX Builtin-in Functions): New

        section, document __builtin_nvptx_brev{,ll}.

 

gcc/testsuite/ChangeLog

        * gcc.target/nvptx/brev-1.c: New 32-bit test case.

        * gcc.target/nvptx/brev-2.c: Likewise.

        * gcc.target/nvptx/brevll-1.c: New 64-bit test case.

        * gcc.target/nvptx/brevll-2.c: Likewise.

 

 

Thanks in advance,

Roger

--

 


[-- Attachment #2: patchbr_n.txt --]
[-- Type: text/plain, Size: 14105 bytes --]

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 89349da..1b99fca 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -6047,6 +6047,29 @@ nvptx_expand_shuffle (tree exp, rtx target, machine_mode mode, int ignore)
   return target;
 }
 
+/* Expander for the bit reverse builtins.  */
+
+static rtx
+nvptx_expand_brev (tree exp, rtx target, machine_mode mode, int ignore)
+{
+  if (ignore)
+    return target;
+  
+  rtx arg = expand_expr (CALL_EXPR_ARG (exp, 0),
+			 NULL_RTX, mode, EXPAND_NORMAL);
+  if (!REG_P (arg))
+    arg = copy_to_mode_reg (mode, arg);
+  if (!target)
+    target = gen_reg_rtx (mode);
+  rtx pat;
+  if (mode == SImode)
+    pat = gen_bitrevsi2 (target, arg);
+  else
+    pat = gen_bitrevdi2 (target, arg);
+  emit_insn (pat);
+  return target;
+}
+
 const char *
 nvptx_output_red_partition (rtx dst, rtx offset)
 {
@@ -6164,6 +6187,8 @@ enum nvptx_builtins
   NVPTX_BUILTIN_BAR_RED_AND,
   NVPTX_BUILTIN_BAR_RED_OR,
   NVPTX_BUILTIN_BAR_RED_POPC,
+  NVPTX_BUILTIN_BREV,
+  NVPTX_BUILTIN_BREVLL,
   NVPTX_BUILTIN_MAX
 };
 
@@ -6292,6 +6317,9 @@ nvptx_init_builtins (void)
   DEF (BAR_RED_POPC, "bar_red_popc",
        (UINT, UINT, UINT, UINT, UINT, NULL_TREE));
 
+  DEF (BREV, "brev", (UINT, UINT, NULL_TREE));
+  DEF (BREVLL, "brevll", (LLUINT, LLUINT, NULL_TREE));
+
 #undef DEF
 #undef ST
 #undef UINT
@@ -6339,6 +6367,10 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
     case NVPTX_BUILTIN_BAR_RED_POPC:
       return nvptx_expand_bar_red (exp, target, mode, ignore);
 
+    case NVPTX_BUILTIN_BREV:
+    case NVPTX_BUILTIN_BREVLL:
+      return nvptx_expand_brev (exp, target, mode, ignore);
+
     default: gcc_unreachable ();
     }
 }
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ac47680..871f0cf 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -14682,6 +14682,7 @@ instructions, but allow the compiler to schedule those calls.
 * Other MIPS Built-in Functions::
 * MSP430 Built-in Functions::
 * NDS32 Built-in Functions::
+* Nvidia PTX Built-in Functions::
 * Basic PowerPC Built-in Functions::
 * PowerPC AltiVec/VSX Built-in Functions::
 * PowerPC Hardware Transactional Memory Built-in Functions::
@@ -17941,6 +17942,20 @@ Enable global interrupt.
 Disable global interrupt.
 @enddefbuiltin
 
+@node Nvidia PTX Built-in Functions
+@subsection Nvidia PTX Built-in Functions
+
+These built-in functions are available for the Nvidia PTX target:
+
+@defbuiltin{unsigned int __builtin_nvptx_brev (unsigned int @var{x})}
+Reverse the bit order of a 32-bit unsigned integer.
+Disable global interrupt.
+@enddefbuiltin
+
+@defbuiltin{unsigned long long __builtin_nvptx_brevll (unsigned long long @var{x})}
+Reverse the bit order of a 64-bit unsigned integer.
+@enddefbuiltin
+
 @node Basic PowerPC Built-in Functions
 @subsection Basic PowerPC Built-in Functions
 
diff --git a/gcc/testsuite/gcc.target/nvptx/brev-1.c b/gcc/testsuite/gcc.target/nvptx/brev-1.c
new file mode 100644
index 0000000..fbb4fff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/brev-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+unsigned int foo(unsigned int x)
+{
+  return __builtin_nvptx_brev(x);
+}
+
+/* { dg-final { scan-assembler "brev.b32" } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/brev-2.c b/gcc/testsuite/gcc.target/nvptx/brev-2.c
new file mode 100644
index 0000000..9d0defe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/brev-2.c
@@ -0,0 +1,94 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+unsigned int bitreverse32(unsigned int x)
+{
+  return __builtin_nvptx_brev(x);
+}
+
+int main(void)
+{
+  if (bitreverse32(0x00000000) != 0x00000000)
+    __builtin_abort();
+  if (bitreverse32(0xffffffff) != 0xffffffff)
+    __builtin_abort();
+
+  if (bitreverse32(0x00000001) != 0x80000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000002) != 0x40000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000004) != 0x20000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000008) != 0x10000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000010) != 0x08000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000020) != 0x04000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000040) != 0x02000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000080) != 0x01000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000100) != 0x00800000)
+    __builtin_abort();
+  if (bitreverse32(0x00000200) != 0x00400000)
+    __builtin_abort();
+  if (bitreverse32(0x00000400) != 0x00200000)
+    __builtin_abort();
+  if (bitreverse32(0x00000800) != 0x00100000)
+    __builtin_abort();
+  if (bitreverse32(0x00001000) != 0x00080000)
+    __builtin_abort();
+  if (bitreverse32(0x00002000) != 0x00040000)
+    __builtin_abort();
+  if (bitreverse32(0x00004000) != 0x00020000)
+    __builtin_abort();
+  if (bitreverse32(0x00008000) != 0x00010000)
+    __builtin_abort();
+  if (bitreverse32(0x00010000) != 0x00008000)
+    __builtin_abort();
+  if (bitreverse32(0x00020000) != 0x00004000)
+    __builtin_abort();
+  if (bitreverse32(0x00040000) != 0x00002000)
+    __builtin_abort();
+  if (bitreverse32(0x00080000) != 0x00001000)
+    __builtin_abort();
+  if (bitreverse32(0x00100000) != 0x00000800)
+    __builtin_abort();
+  if (bitreverse32(0x00200000) != 0x00000400)
+    __builtin_abort();
+  if (bitreverse32(0x00400000) != 0x00000200)
+    __builtin_abort();
+  if (bitreverse32(0x00800000) != 0x00000100)
+    __builtin_abort();
+  if (bitreverse32(0x01000000) != 0x00000080)
+    __builtin_abort();
+  if (bitreverse32(0x02000000) != 0x00000040)
+    __builtin_abort();
+  if (bitreverse32(0x04000000) != 0x00000020)
+    __builtin_abort();
+  if (bitreverse32(0x08000000) != 0x00000010)
+    __builtin_abort();
+  if (bitreverse32(0x10000000) != 0x00000008)
+    __builtin_abort();
+  if (bitreverse32(0x20000000) != 0x00000004)
+    __builtin_abort();
+  if (bitreverse32(0x40000000) != 0x00000002)
+    __builtin_abort();
+  if (bitreverse32(0x80000000) != 0x00000001)
+    __builtin_abort();
+
+  if (bitreverse32(0x01234567) != 0xe6a2c480)
+    __builtin_abort();
+  if (bitreverse32(0xe6a2c480) != 0x01234567)
+    __builtin_abort();
+  if (bitreverse32(0xdeadbeef) != 0xf77db57b)
+    __builtin_abort();
+  if (bitreverse32(0xf77db57b) != 0xdeadbeef)
+    __builtin_abort();
+  if (bitreverse32(0xcafebabe) != 0x7d5d7f53)
+    __builtin_abort();
+  if (bitreverse32(0x7d5d7f53) != 0xcafebabe)
+    __builtin_abort();
+  return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/nvptx/brevll-1.c b/gcc/testsuite/gcc.target/nvptx/brevll-1.c
new file mode 100644
index 0000000..7009d5f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/brevll-1.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+unsigned long foo(unsigned long x)
+{
+  return __builtin_nvptx_brevll(x);
+}
+
+/* { dg-final { scan-assembler "brev.b64" } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/brevll-2.c b/gcc/testsuite/gcc.target/nvptx/brevll-2.c
new file mode 100644
index 0000000..56054b1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/brevll-2.c
@@ -0,0 +1,154 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+unsigned long long bitreverse64(unsigned long long x)
+{
+  return __builtin_nvptx_brevll(x);
+}
+
+int main(void)
+{
+  if (bitreverse64(0x0000000000000000ll) != 0x0000000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0xffffffffffffffffll) != 0xffffffffffffffffll)
+    __builtin_abort();
+
+  if (bitreverse64(0x0000000000000001ll) != 0x8000000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000002ll) != 0x4000000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000004ll) != 0x2000000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000008ll) != 0x1000000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000010ll) != 0x0800000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000020ll) != 0x0400000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000040ll) != 0x0200000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000080ll) != 0x0100000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000100ll) != 0x0080000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000200ll) != 0x0040000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000400ll) != 0x0020000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000800ll) != 0x0010000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000001000ll) != 0x0008000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000002000ll) != 0x0004000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000004000ll) != 0x0002000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000008000ll) != 0x0001000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000010000ll) != 0x0000800000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000020000ll) != 0x0000400000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000040000ll) != 0x0000200000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000080000ll) != 0x0000100000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000100000ll) != 0x0000080000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000200000ll) != 0x0000040000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000400000ll) != 0x0000020000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000800000ll) != 0x0000010000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000001000000ll) != 0x0000008000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000002000000ll) != 0x0000004000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000004000000ll) != 0x0000002000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000008000000ll) != 0x0000001000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000010000000ll) != 0x0000000800000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000020000000ll) != 0x0000000400000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000040000000ll) != 0x0000000200000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000080000000ll) != 0x0000000100000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000100000000ll) != 0x0000000080000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000200000000ll) != 0x0000000040000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000400000000ll) != 0x0000000020000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000800000000ll) != 0x0000000010000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000001000000000ll) != 0x0000000008000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000002000000000ll) != 0x0000000004000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000004000000000ll) != 0x0000000002000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000008000000000ll) != 0x0000000001000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000010000000000ll) != 0x0000000000800000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000020000000000ll) != 0x0000000000400000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000040000000000ll) != 0x0000000000200000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000080000000000ll) != 0x0000000000100000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000100000000000ll) != 0x0000000000080000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000200000000000ll) != 0x0000000000040000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000400000000000ll) != 0x0000000000020000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000800000000000ll) != 0x0000000000010000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0001000000000000ll) != 0x0000000000008000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0002000000000000ll) != 0x0000000000004000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0004000000000000ll) != 0x0000000000002000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0008000000000000ll) != 0x0000000000001000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0010000000000000ll) != 0x0000000000000800ll)
+    __builtin_abort();
+  if (bitreverse64(0x0020000000000000ll) != 0x0000000000000400ll)
+    __builtin_abort();
+  if (bitreverse64(0x0040000000000000ll) != 0x0000000000000200ll)
+    __builtin_abort();
+  if (bitreverse64(0x0080000000000000ll) != 0x0000000000000100ll)
+    __builtin_abort();
+  if (bitreverse64(0x0100000000000000ll) != 0x0000000000000080ll)
+    __builtin_abort();
+  if (bitreverse64(0x0200000000000000ll) != 0x0000000000000040ll)
+    __builtin_abort();
+  if (bitreverse64(0x0400000000000000ll) != 0x0000000000000020ll)
+    __builtin_abort();
+  if (bitreverse64(0x0800000000000000ll) != 0x0000000000000010ll)
+    __builtin_abort();
+  if (bitreverse64(0x1000000000000000ll) != 0x0000000000000008ll)
+    __builtin_abort();
+  if (bitreverse64(0x2000000000000000ll) != 0x0000000000000004ll)
+    __builtin_abort();
+  if (bitreverse64(0x4000000000000000ll) != 0x0000000000000002ll)
+    __builtin_abort();
+  if (bitreverse64(0x8000000000000000ll) != 0x0000000000000001ll)
+    __builtin_abort();
+
+  if (bitreverse64(0x0123456789abcdefll) != 0xf7b3d591e6a2c480ll)
+    __builtin_abort();
+  if (bitreverse64(0xf7b3d591e6a2c480ll) != 0x0123456789abcdefll)
+    __builtin_abort();
+  if (bitreverse64(0xdeadbeefcafebabell) != 0x7d5d7f53f77db57bll)
+    __builtin_abort();
+  if (bitreverse64(0x7d5d7f53f77db57bll) != 0xdeadbeefcafebabell)
+    __builtin_abort();
+  return 0;
+}
+

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic.
  2023-05-06 16:04 [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic Roger Sayle
@ 2023-05-19 21:36 ` Jeff Law
  2023-11-15 14:28 ` nvptx: Extend 'brev' test cases (was: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic) Thomas Schwinge
  2023-11-15 14:36 ` nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev' description " Thomas Schwinge
  2 siblings, 0 replies; 4+ messages in thread
From: Jeff Law @ 2023-05-19 21:36 UTC (permalink / raw)
  To: gcc-patches



On 5/6/23 10:04, Roger Sayle wrote:
>   
> 
> This patch adds support for (a pair of) bit reversal intrinsics
> 
> __builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit
> 
> and 64-bit bit reversal (using nvptx's brev instruction) matching
> 
> the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.
> 
> https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT
> .html
> 
>   
> 
> This patch has been tested on nvptx-none which make and make -k check
> 
> with no new failures.  Ok for mainline?
> 
>   
> 
>   
> 
> 2023-05-06  Roger Sayle  <roger@nextmovesoftware.com>
> 
>   
> 
> gcc/ChangeLog
> 
>          * config/nvptx/nvptx.cc (nvptx_expand_brev): Expand target
> 
>          builtin for bit reversal using brev instruction.
> 
>          (enum nvptx_builtins): Add NVPTX_BUILTIN_BREV and
> 
>          NVPTX_BUILTIN_BREVLL.
> 
>          (nvptx_init_builtins): Define "brev" and "brevll".
> 
>          (nvptx_expand_builtin): Expand NVPTX_BUILTIN_BREV and
> 
>          NVPTX_BUILTIN_BREVLL via nvptx_expand_brev function.
> 
>          * doc/extend.texi (Nvidia PTX Builtin-in Functions): New
> 
>          section, document __builtin_nvptx_brev{,ll}.
> 
>   
> 
> gcc/testsuite/ChangeLog
> 
>          * gcc.target/nvptx/brev-1.c: New 32-bit test case.
> 
>          * gcc.target/nvptx/brev-2.c: Likewise.
> 
>          * gcc.target/nvptx/brevll-1.c: New 64-bit test case.
> 
>          * gcc.target/nvptx/brevll-2.c: Likewise.
OK
jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* nvptx: Extend 'brev' test cases (was: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic)
  2023-05-06 16:04 [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic Roger Sayle
  2023-05-19 21:36 ` Jeff Law
@ 2023-11-15 14:28 ` Thomas Schwinge
  2023-11-15 14:36 ` nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev' description " Thomas Schwinge
  2 siblings, 0 replies; 4+ messages in thread
From: Thomas Schwinge @ 2023-11-15 14:28 UTC (permalink / raw)
  To: Roger Sayle, gcc-patches; +Cc: Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1565 bytes --]

Hi!

On 2023-05-06T17:04:57+0100, "Roger Sayle" <roger@nextmovesoftware.com> wrote:
> This patch adds support for (a pair of) bit reversal intrinsics
> __builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit
> and 64-bit bit reversal (using nvptx's brev instruction) matching
> the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.
> https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html
>
> This patch has been tested on nvptx-none which make and make -k check
> with no new failures.  Ok for mainline?

(That got pushed in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brev-1.c
> +[...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brev-2.c
> +[...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brevll-1.c
> +[...]

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/brevll-2.c
> +[...]

Pushed to master branch commit 61c45c055a5ccfc59463c21ab057dece822d973c
"nvptx: Extend 'brev' test cases", see attached.  That's in order to
observe effects of a later patch, and also to exercise the new nvptx
'check-function-bodies' a bit.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Extend-brev-test-cases.patch --]
[-- Type: text/x-diff, Size: 17637 bytes --]

From 61c45c055a5ccfc59463c21ab057dece822d973c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 4 Sep 2023 23:06:27 +0200
Subject: [PATCH] nvptx: Extend 'brev' test cases

In order to observe effects of a later patch, extend the 'brev' test cases
added in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".

	gcc/testsuite/
	* gcc.target/nvptx/brev-1.c: Extend.
	* gcc.target/nvptx/brev-2.c: Rename to...
	* gcc.target/nvptx/brev-2-O2.c: ... this, and extend.  Copy to...
	* gcc.target/nvptx/brev-2-O0.c: ... this, and adapt for '-O0'.
	* gcc.target/nvptx/brevll-1.c: Extend.
	* gcc.target/nvptx/brevll-2.c: Rename to...
	* gcc.target/nvptx/brevll-2-O2.c: ... this, and extend.  Copy to...
	* gcc.target/nvptx/brevll-2-O0.c: ... this, and adapt for '-O0'.
---
 gcc/testsuite/gcc.target/nvptx/brev-1.c       |  12 +-
 gcc/testsuite/gcc.target/nvptx/brev-2-O0.c    | 129 ++++++++++++
 .../nvptx/{brev-2.c => brev-2-O2.c}           |  27 +++
 gcc/testsuite/gcc.target/nvptx/brevll-1.c     |  12 +-
 gcc/testsuite/gcc.target/nvptx/brevll-2-O0.c  | 189 ++++++++++++++++++
 .../nvptx/{brevll-2.c => brevll-2-O2.c}       |  27 +++
 6 files changed, 392 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/brev-2-O0.c
 rename gcc/testsuite/gcc.target/nvptx/{brev-2.c => brev-2-O2.c} (80%)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/brevll-2-O0.c
 rename gcc/testsuite/gcc.target/nvptx/{brevll-2.c => brevll-2-O2.c} (90%)

diff --git a/gcc/testsuite/gcc.target/nvptx/brev-1.c b/gcc/testsuite/gcc.target/nvptx/brev-1.c
index fbb4fff1e59..af875dd4dcc 100644
--- a/gcc/testsuite/gcc.target/nvptx/brev-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/brev-1.c
@@ -1,8 +1,16 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies {**} {} } } */
+
 unsigned int foo(unsigned int x)
 {
   return __builtin_nvptx_brev(x);
 }
-
-/* { dg-final { scan-assembler "brev.b32" } } */
+/*
+** foo:
+**	...
+**	mov\.u32	(%r[0-9]+), %ar0;
+**	brev\.b32	%value, \1;
+**	st\.param\.u32	\[%value_out\], %value;
+**	ret;
+*/
diff --git a/gcc/testsuite/gcc.target/nvptx/brev-2-O0.c b/gcc/testsuite/gcc.target/nvptx/brev-2-O0.c
new file mode 100644
index 00000000000..ca011ebf472
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/brev-2-O0.c
@@ -0,0 +1,129 @@
+/* { dg-do run } */
+/* { dg-options "-O0" } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { check-function-bodies {**} {} } } */
+
+inline __attribute__((always_inline))
+unsigned int bitreverse32(unsigned int x)
+{
+  return __builtin_nvptx_brev(x);
+}
+
+int main(void)
+{
+  if (bitreverse32(0x00000000) != 0x00000000)
+    __builtin_abort();
+  if (bitreverse32(0xffffffff) != 0xffffffff)
+    __builtin_abort();
+
+  if (bitreverse32(0x00000001) != 0x80000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000002) != 0x40000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000004) != 0x20000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000008) != 0x10000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000010) != 0x08000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000020) != 0x04000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000040) != 0x02000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000080) != 0x01000000)
+    __builtin_abort();
+  if (bitreverse32(0x00000100) != 0x00800000)
+    __builtin_abort();
+  if (bitreverse32(0x00000200) != 0x00400000)
+    __builtin_abort();
+  if (bitreverse32(0x00000400) != 0x00200000)
+    __builtin_abort();
+  if (bitreverse32(0x00000800) != 0x00100000)
+    __builtin_abort();
+  if (bitreverse32(0x00001000) != 0x00080000)
+    __builtin_abort();
+  if (bitreverse32(0x00002000) != 0x00040000)
+    __builtin_abort();
+  if (bitreverse32(0x00004000) != 0x00020000)
+    __builtin_abort();
+  if (bitreverse32(0x00008000) != 0x00010000)
+    __builtin_abort();
+  if (bitreverse32(0x00010000) != 0x00008000)
+    __builtin_abort();
+  if (bitreverse32(0x00020000) != 0x00004000)
+    __builtin_abort();
+  if (bitreverse32(0x00040000) != 0x00002000)
+    __builtin_abort();
+  if (bitreverse32(0x00080000) != 0x00001000)
+    __builtin_abort();
+  if (bitreverse32(0x00100000) != 0x00000800)
+    __builtin_abort();
+  if (bitreverse32(0x00200000) != 0x00000400)
+    __builtin_abort();
+  if (bitreverse32(0x00400000) != 0x00000200)
+    __builtin_abort();
+  if (bitreverse32(0x00800000) != 0x00000100)
+    __builtin_abort();
+  if (bitreverse32(0x01000000) != 0x00000080)
+    __builtin_abort();
+  if (bitreverse32(0x02000000) != 0x00000040)
+    __builtin_abort();
+  if (bitreverse32(0x04000000) != 0x00000020)
+    __builtin_abort();
+  if (bitreverse32(0x08000000) != 0x00000010)
+    __builtin_abort();
+  if (bitreverse32(0x10000000) != 0x00000008)
+    __builtin_abort();
+  if (bitreverse32(0x20000000) != 0x00000004)
+    __builtin_abort();
+  if (bitreverse32(0x40000000) != 0x00000002)
+    __builtin_abort();
+  if (bitreverse32(0x80000000) != 0x00000001)
+    __builtin_abort();
+
+  if (bitreverse32(0x01234567) != 0xe6a2c480)
+    __builtin_abort();
+  if (bitreverse32(0xe6a2c480) != 0x01234567)
+    __builtin_abort();
+  if (bitreverse32(0xdeadbeef) != 0xf77db57b)
+    __builtin_abort();
+  if (bitreverse32(0xf77db57b) != 0xdeadbeef)
+    __builtin_abort();
+  if (bitreverse32(0xcafebabe) != 0x7d5d7f53)
+    __builtin_abort();
+  if (bitreverse32(0x7d5d7f53) != 0xcafebabe)
+    __builtin_abort();
+
+  return 0;
+}
+/*
+** main:
+**	...
+**	mov\.u32	(%r[0-9]+), 0;
+**	st\.u32	(\[%frame[+0-9]*\]), \1;
+**	ld\.u32	(%r[0-9]+), \2;
+**	brev\.b32	(%r[0-9]+), \3;
+**	setp\.[^.]+\.u32	%r[0-9]+, \4, 0;
+**	...
+**	mov\.u32	(%r[0-9]+), -1;
+**	st\.u32	(\[%frame[+0-9]*\]), \5;
+**	ld\.u32	(%r[0-9]+), \6;
+**	brev\.b32	(%r[0-9]+), \7;
+**	setp\.[^.]+\.u32	%r[0-9]+, \8, -1;
+**	...
+**	mov\.u32	(%r[0-9]+), 1;
+**	st\.u32	(\[%frame[+0-9]*\]), \9;
+**	ld\.u32	(%r[0-9]+), \10;
+**	brev\.b32	(%r[0-9]+), \11;
+**	setp\.[^.]+\.u32	%r[0-9]+, \12, -2147483648;
+**	...
+**	mov\.u32	(%r[0-9]+), 2;
+**	st\.u32	(\[%frame[+0-9]*\]), \13;
+**	ld\.u32	(%r[0-9]+), \14;
+**	brev\.b32	(%r[0-9]+), \15;
+**	setp\.[^.]+\.u32	%r[0-9]+, \16, 1073741824;
+**	...
+*/
+
+/* { dg-final { scan-assembler-times {\tbrev\.b32\t} 40 } } */
+/* { dg-final { scan-assembler {\mabort\M} } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/brev-2.c b/gcc/testsuite/gcc.target/nvptx/brev-2-O2.c
similarity index 80%
rename from gcc/testsuite/gcc.target/nvptx/brev-2.c
rename to gcc/testsuite/gcc.target/nvptx/brev-2-O2.c
index 9d0defe80bb..e35052208d0 100644
--- a/gcc/testsuite/gcc.target/nvptx/brev-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/brev-2-O2.c
@@ -1,5 +1,9 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { check-function-bodies {**} {} } } */
+
+inline __attribute__((always_inline))
 unsigned int bitreverse32(unsigned int x)
 {
   return __builtin_nvptx_brev(x);
@@ -89,6 +93,29 @@ int main(void)
     __builtin_abort();
   if (bitreverse32(0x7d5d7f53) != 0xcafebabe)
     __builtin_abort();
+
   return 0;
 }
+/*
+** main:
+**	...
+**	mov\.u32	(%r[0-9]+), 0;
+**	brev\.b32	(%r[0-9]+), \1;
+**	setp\.[^.]+\.u32	%r[0-9]+, \2, 0;
+**	...
+**	mov\.u32	(%r[0-9]+), -1;
+**	brev\.b32	(%r[0-9]+), \3;
+**	setp\.[^.]+\.u32	%r[0-9]+, \4, -1;
+**	...
+**	mov\.u32	(%r[0-9]+), 1;
+**	brev\.b32	(%r[0-9]+), \5;
+**	setp\.[^.]+\.u32	%r[0-9]+, \6, -2147483648;
+**	...
+**	mov\.u32	(%r[0-9]+), 2;
+**	brev\.b32	(%r[0-9]+), \7;
+**	setp\.[^.]+\.u32	%r[0-9]+, \8, 1073741824;
+**	...
+*/
 
+/* { dg-final { scan-assembler-times {\tbrev\.b32\t} 40 } } */
+/* { dg-final { scan-assembler {\mabort\M} } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/brevll-1.c b/gcc/testsuite/gcc.target/nvptx/brevll-1.c
index 7009d5f5f8c..0b03fee9292 100644
--- a/gcc/testsuite/gcc.target/nvptx/brevll-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/brevll-1.c
@@ -1,8 +1,16 @@
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies {**} {} } } */
+
 unsigned long foo(unsigned long x)
 {
   return __builtin_nvptx_brevll(x);
 }
-
-/* { dg-final { scan-assembler "brev.b64" } } */
+/*
+** foo:
+**	...
+**	mov\.u64	(%r[0-9]+), %ar0;
+**	brev\.b64	%value, \1;
+**	st\.param\.u64	\[%value_out\], %value;
+**	ret;
+*/
diff --git a/gcc/testsuite/gcc.target/nvptx/brevll-2-O0.c b/gcc/testsuite/gcc.target/nvptx/brevll-2-O0.c
new file mode 100644
index 00000000000..32bbfbf7ad6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/brevll-2-O0.c
@@ -0,0 +1,189 @@
+/* { dg-do run } */
+/* { dg-options "-O0" } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { check-function-bodies {**} {} } } */
+
+inline __attribute__((always_inline))
+unsigned long long bitreverse64(unsigned long long x)
+{
+  return __builtin_nvptx_brevll(x);
+}
+
+int main(void)
+{
+  if (bitreverse64(0x0000000000000000ll) != 0x0000000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0xffffffffffffffffll) != 0xffffffffffffffffll)
+    __builtin_abort();
+
+  if (bitreverse64(0x0000000000000001ll) != 0x8000000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000002ll) != 0x4000000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000004ll) != 0x2000000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000008ll) != 0x1000000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000010ll) != 0x0800000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000020ll) != 0x0400000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000040ll) != 0x0200000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000080ll) != 0x0100000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000100ll) != 0x0080000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000200ll) != 0x0040000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000400ll) != 0x0020000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000000800ll) != 0x0010000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000001000ll) != 0x0008000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000002000ll) != 0x0004000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000004000ll) != 0x0002000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000008000ll) != 0x0001000000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000010000ll) != 0x0000800000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000020000ll) != 0x0000400000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000040000ll) != 0x0000200000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000080000ll) != 0x0000100000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000100000ll) != 0x0000080000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000200000ll) != 0x0000040000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000400000ll) != 0x0000020000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000000800000ll) != 0x0000010000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000001000000ll) != 0x0000008000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000002000000ll) != 0x0000004000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000004000000ll) != 0x0000002000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000008000000ll) != 0x0000001000000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000010000000ll) != 0x0000000800000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000020000000ll) != 0x0000000400000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000040000000ll) != 0x0000000200000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000080000000ll) != 0x0000000100000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000100000000ll) != 0x0000000080000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000200000000ll) != 0x0000000040000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000400000000ll) != 0x0000000020000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000000800000000ll) != 0x0000000010000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000001000000000ll) != 0x0000000008000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000002000000000ll) != 0x0000000004000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000004000000000ll) != 0x0000000002000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000008000000000ll) != 0x0000000001000000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000010000000000ll) != 0x0000000000800000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000020000000000ll) != 0x0000000000400000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000040000000000ll) != 0x0000000000200000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000080000000000ll) != 0x0000000000100000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000100000000000ll) != 0x0000000000080000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000200000000000ll) != 0x0000000000040000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000400000000000ll) != 0x0000000000020000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0000800000000000ll) != 0x0000000000010000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0001000000000000ll) != 0x0000000000008000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0002000000000000ll) != 0x0000000000004000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0004000000000000ll) != 0x0000000000002000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0008000000000000ll) != 0x0000000000001000ll)
+    __builtin_abort();
+  if (bitreverse64(0x0010000000000000ll) != 0x0000000000000800ll)
+    __builtin_abort();
+  if (bitreverse64(0x0020000000000000ll) != 0x0000000000000400ll)
+    __builtin_abort();
+  if (bitreverse64(0x0040000000000000ll) != 0x0000000000000200ll)
+    __builtin_abort();
+  if (bitreverse64(0x0080000000000000ll) != 0x0000000000000100ll)
+    __builtin_abort();
+  if (bitreverse64(0x0100000000000000ll) != 0x0000000000000080ll)
+    __builtin_abort();
+  if (bitreverse64(0x0200000000000000ll) != 0x0000000000000040ll)
+    __builtin_abort();
+  if (bitreverse64(0x0400000000000000ll) != 0x0000000000000020ll)
+    __builtin_abort();
+  if (bitreverse64(0x0800000000000000ll) != 0x0000000000000010ll)
+    __builtin_abort();
+  if (bitreverse64(0x1000000000000000ll) != 0x0000000000000008ll)
+    __builtin_abort();
+  if (bitreverse64(0x2000000000000000ll) != 0x0000000000000004ll)
+    __builtin_abort();
+  if (bitreverse64(0x4000000000000000ll) != 0x0000000000000002ll)
+    __builtin_abort();
+  if (bitreverse64(0x8000000000000000ll) != 0x0000000000000001ll)
+    __builtin_abort();
+
+  if (bitreverse64(0x0123456789abcdefll) != 0xf7b3d591e6a2c480ll)
+    __builtin_abort();
+  if (bitreverse64(0xf7b3d591e6a2c480ll) != 0x0123456789abcdefll)
+    __builtin_abort();
+  if (bitreverse64(0xdeadbeefcafebabell) != 0x7d5d7f53f77db57bll)
+    __builtin_abort();
+  if (bitreverse64(0x7d5d7f53f77db57bll) != 0xdeadbeefcafebabell)
+    __builtin_abort();
+
+  return 0;
+}
+/*
+** main:
+**	...
+**	mov\.u64	(%r[0-9]+), 0;
+**	st\.u64	(\[%frame[+0-9]*\]), \1;
+**	ld\.u64	(%r[0-9]+), \2;
+**	brev\.b64	(%r[0-9]+), \3;
+**	setp\.[^.]+\.u64	%r[0-9]+, \4, 0;
+**	...
+**	mov\.u64	(%r[0-9]+), -1;
+**	st\.u64	(\[%frame[+0-9]*\]), \5;
+**	ld\.u64	(%r[0-9]+), \6;
+**	brev\.b64	(%r[0-9]+), \7;
+**	setp\.[^.]+\.u64	%r[0-9]+, \8, -1;
+**	...
+**	mov\.u64	(%r[0-9]+), 1;
+**	st\.u64	(\[%frame[+0-9]*\]), \9;
+**	ld\.u64	(%r[0-9]+), \10;
+**	brev\.b64	(%r[0-9]+), \11;
+**	setp\.[^.]+\.u64	%r[0-9]+, \12, -9223372036854775808;
+**	...
+**	mov\.u64	(%r[0-9]+), 2;
+**	st\.u64	(\[%frame[+0-9]*\]), \13;
+**	ld\.u64	(%r[0-9]+), \14;
+**	brev\.b64	(%r[0-9]+), \15;
+**	setp\.[^.]+\.u64	%r[0-9]+, \16, 4611686018427387904;
+**	...
+*/
+
+/* { dg-final { scan-assembler-times {\tbrev\.b64\t} 70 } } */
+/* { dg-final { scan-assembler {\mabort\M} } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/brevll-2.c b/gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c
similarity index 90%
rename from gcc/testsuite/gcc.target/nvptx/brevll-2.c
rename to gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c
index 56054b1e92a..cbfda1b9601 100644
--- a/gcc/testsuite/gcc.target/nvptx/brevll-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/brevll-2-O2.c
@@ -1,5 +1,9 @@
 /* { dg-do run } */
 /* { dg-options "-O2" } */
+/* { dg-additional-options -save-temps } */
+/* { dg-final { check-function-bodies {**} {} } } */
+
+inline __attribute__((always_inline))
 unsigned long long bitreverse64(unsigned long long x)
 {
   return __builtin_nvptx_brevll(x);
@@ -149,6 +153,29 @@ int main(void)
     __builtin_abort();
   if (bitreverse64(0x7d5d7f53f77db57bll) != 0xdeadbeefcafebabell)
     __builtin_abort();
+
   return 0;
 }
+/*
+** main:
+**	...
+**	mov\.u64	(%r[0-9]+), 0;
+**	brev\.b64	(%r[0-9]+), \1;
+**	setp\.[^.]+\.u64	%r[0-9]+, \2, 0;
+**	...
+**	mov\.u64	(%r[0-9]+), -1;
+**	brev\.b64	(%r[0-9]+), \3;
+**	setp\.[^.]+\.u64	%r[0-9]+, \4, -1;
+**	...
+**	mov\.u64	(%r[0-9]+), 1;
+**	brev\.b64	(%r[0-9]+), \5;
+**	setp\.[^.]+\.u64	%r[0-9]+, \6, -9223372036854775808;
+**	...
+**	mov\.u64	(%r[0-9]+), 2;
+**	brev\.b64	(%r[0-9]+), \7;
+**	setp\.[^.]+\.u64	%r[0-9]+, \8, 4611686018427387904;
+**	...
+*/
 
+/* { dg-final { scan-assembler-times {\tbrev\.b64\t} 70 } } */
+/* { dg-final { scan-assembler {\mabort\M} } } */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev' description (was: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic)
  2023-05-06 16:04 [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic Roger Sayle
  2023-05-19 21:36 ` Jeff Law
  2023-11-15 14:28 ` nvptx: Extend 'brev' test cases (was: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic) Thomas Schwinge
@ 2023-11-15 14:36 ` Thomas Schwinge
  2 siblings, 0 replies; 4+ messages in thread
From: Thomas Schwinge @ 2023-11-15 14:36 UTC (permalink / raw)
  To: Roger Sayle, gcc-patches; +Cc: Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 1820 bytes --]

Hi!

On 2023-05-06T17:04:57+0100, "Roger Sayle" <roger@nextmovesoftware.com> wrote:
> This patch adds support for (a pair of) bit reversal intrinsics
> __builtin_nvptx_brev and __builtin_nvptx_brevll which perform 32-bit
> and 64-bit bit reversal (using nvptx's brev instruction) matching
> the __brev and __brevll instrinsics provided by NVidia's nvcc compiler.
> https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html

(That got pushed in commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".)

> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi

> @@ -17941,6 +17942,20 @@ Enable global interrupt.
>  Disable global interrupt.
>  @enddefbuiltin
>
> +@node Nvidia PTX Built-in Functions
> +@subsection Nvidia PTX Built-in Functions
> +
> +These built-in functions are available for the Nvidia PTX target:
> +
> +@defbuiltin{unsigned int __builtin_nvptx_brev (unsigned int @var{x})}
> +Reverse the bit order of a 32-bit unsigned integer.
> +Disable global interrupt.

Pushed to master branch commit 4450984d0a18cd4e352d396231ba2c457d20feea
"nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev' description", see
attached.

> +@enddefbuiltin
> +
> +@defbuiltin{unsigned long long __builtin_nvptx_brevll (unsigned long long @var{x})}
> +Reverse the bit order of a 64-bit unsigned integer.
> +@enddefbuiltin
> +
>  @node Basic PowerPC Built-in Functions
>  @subsection Basic PowerPC Built-in Functions


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-nvptx-Fix-copy-n-paste-o-in-__builtin_nvptx_brev-des.patch --]
[-- Type: text/x-diff, Size: 1047 bytes --]

From 4450984d0a18cd4e352d396231ba2c457d20feea Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 4 Sep 2023 17:20:28 +0200
Subject: [PATCH] nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev'
 description

Minor fix-up for commit c09471fbc7588db2480f036aa56a2403d3c03ae5
"nvptx: Add suppport for __builtin_nvptx_brev instrinsic".

	gcc/
	* doc/extend.texi (Nvidia PTX Built-in Functions): Fix
	copy'n'paste-o in '__builtin_nvptx_brev' description.
---
 gcc/doc/extend.texi | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 406ccc9bc75..a95121b0124 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18471,7 +18471,6 @@ These built-in functions are available for the Nvidia PTX target:
 
 @defbuiltin{unsigned int __builtin_nvptx_brev (unsigned int @var{x})}
 Reverse the bit order of a 32-bit unsigned integer.
-Disable global interrupt.
 @enddefbuiltin
 
 @defbuiltin{unsigned long long __builtin_nvptx_brevll (unsigned long long @var{x})}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-11-15 14:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-06 16:04 [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic Roger Sayle
2023-05-19 21:36 ` Jeff Law
2023-11-15 14:28 ` nvptx: Extend 'brev' test cases (was: [PATCH] nvptx: Add suppport for __builtin_nvptx_brev instrinsic) Thomas Schwinge
2023-11-15 14:36 ` nvptx: Fix copy'n'paste-o in '__builtin_nvptx_brev' description " Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).