[Bug target/98321] New: [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/98321] New: [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
@ 2020-12-16 14:39 tschwinge at gcc dot gnu.org
  2020-12-17 16:30 ` [Bug target/98321] " vries at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2020-12-16 14:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

            Bug ID: 98321
           Summary: [nvptx] 'atom.add.f32' for atomic add of 32-bit
                    'float'
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tschwinge at gcc dot gnu.org
                CC: vries at gcc dot gnu.org
  Target Milestone: ---
            Target: nvptx

Consider:

    TYPE f(TYPE a, TYPE b)
    {
      #pragma acc atomic update
      a += b;

      return a;
    }

Compiling always with '-fopenacc', for '-DTYPE=int'/'-DTYPE=long' I do see the
expected 'atom.add.u32'/'atom.add.u64', but for '-DTYPE=float' I do not see the
expected 'atom.add.f32' but instead an 'atom.cas.b32' loop.  (I understand that
'-DTYPE=double': 'atom.add.f64' depends on PTX 5.0, SM 6.0 support.)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
  2020-12-16 14:39 [Bug target/98321] New: [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float' tschwinge at gcc dot gnu.org
@ 2020-12-17 16:30 ` vries at gcc dot gnu.org
  2020-12-17 20:18 ` tschwinge at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: vries at gcc dot gnu.org @ 2020-12-17 16:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
Ok, let's first make a runnable test-case:
...
$ cat src/libgomp/testsuite/libgomp.oacc-c/test.c
#include <stdio.h>

#define TYPE float

TYPE a = 1;
TYPE b = 2;

int
main (void)
{

  printf ("A: %f\n", a);

#pragma acc parallel num_gangs (1) num_workers (1) copy (a, b)
#pragma acc atomic update
  a += b;

  printf ("A: %f\n", a);

  return !(a == 3);
}
...

Indeed we see the cas, but that has nothing to do with support in the nvptx
port:
...
                atom.cas.b32    %r29, [%r25], %r22, %r28;                       
...

This appears already at ompexp on the host, where we expand:
...
  #pragma omp atomic_load relaxed
    D.2555 = *D.2568

  <bb 4> :
  D.2557 = D.2555 + b.1;                                                        
  #pragma omp atomic_store relaxed (D.2557)
...
into:
...
  D.2583 = __atomic_load_4 (D.2582, 0);
  D.2584 = D.2583;

  <bb 4> :
  D.2585 = VIEW_CONVERT_EXPR<float>(D.2584);
  D.2586 = D.2585 + b.1;
  D.2587 = VIEW_CONVERT_EXPR<unsigned int>(D.2586);
  D.2588 = __sync_val_compare_and_swap_4 (D.2582, D.2584, D.2587);
...

This is part of a generic problem with offloading, where choices are made in
the host compiler which are suboptimal or even unsupported in the offload
compiler.

Ideally this should be addressed in the host compiler.

It may be possible to address this in the nvptx port by trying to detect the
unoptimal pattern and converting it to the optimal atom.add.f32.  But
ultimately that's a workaround, and it's better to fix this at the source.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
  2020-12-16 14:39 [Bug target/98321] New: [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float' tschwinge at gcc dot gnu.org
  2020-12-17 16:30 ` [Bug target/98321] " vries at gcc dot gnu.org
@ 2020-12-17 20:18 ` tschwinge at gcc dot gnu.org
  2020-12-17 22:15 ` vries at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2020-12-17 20:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

--- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Thanks for having a look.


(In reply to Tom de Vries from comment #1)
> Ok, let's first make a runnable test-case:
> ...
> $ cat src/libgomp/testsuite/libgomp.oacc-c/test.c
> [...]
> Indeed we see the cas, but that has nothing to do with support in the nvptx
> port:
> ...
>                 atom.cas.b32    %r29, [%r25], %r22, %r28;                   
> 
> ...
> 
> This appears already at ompexp on the host, where we expand:
> [...]
> This is part of a generic problem with offloading, where choices are made in
> the host compiler which are suboptimal or even unsupported in the offload
> compiler.

Yes, I'm aware of that problem -- and we should do something about it.

> Ideally this should be addressed in the host compiler.

(Strike the "ideally"?)

> It may be possible to address this in the nvptx port by trying to detect the
> unoptimal pattern and converting it to the optimal atom.add.f32.  But
> ultimately that's a workaround, and it's better to fix this at the source.

I agree; don't see much point in investing effort in such a workaround (which
doesn't sound easy either).


However, my report was specifically for the nvptx target compiler.  Just
compile with 'nvptx-gcc -fopenacc -S' the code I posed, and compare
'-DTYPE=int'/'-DTYPE=long' vs. '-DTYPE=float'.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
  2020-12-16 14:39 [Bug target/98321] New: [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float' tschwinge at gcc dot gnu.org
  2020-12-17 16:30 ` [Bug target/98321] " vries at gcc dot gnu.org
  2020-12-17 20:18 ` tschwinge at gcc dot gnu.org
@ 2020-12-17 22:15 ` vries at gcc dot gnu.org
  2020-12-18  8:31 ` tschwinge at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: vries at gcc dot gnu.org @ 2020-12-17 22:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

--- Comment #3 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Thomas Schwinge from comment #2)
> However, my report was specifically for the nvptx target compiler.  Just
> compile with 'nvptx-gcc -fopenacc -S' the code I posed, and compare
> '-DTYPE=int'/'-DTYPE=long' vs. '-DTYPE=float'.


Ah, I was not aware of usage of openacc beyond the offloading setup.

For my understanding, is this just a way for you to easily reproduce some
problem really occurring elsewhere, or is this actually used for something?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
  2020-12-16 14:39 [Bug target/98321] New: [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float' tschwinge at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2020-12-17 22:15 ` vries at gcc dot gnu.org
@ 2020-12-18  8:31 ` tschwinge at gcc dot gnu.org
  2020-12-18 16:34 ` vries at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2020-12-18  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

--- Comment #4 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #3)
> (In reply to Thomas Schwinge from comment #2)
> > However, my report was specifically for the nvptx target compiler.  Just
> > compile with 'nvptx-gcc -fopenacc -S' the code I posed, and compare
> > '-DTYPE=int'/'-DTYPE=long' vs. '-DTYPE=float'.
> 
> Ah, I was not aware of usage of openacc beyond the offloading setup.

;-D

> For my understanding, is this just a way for you to easily reproduce some
> problem really occurring elsewhere, or is this actually used for something?

No, I don't think this has any practical use other than testing.

I had been looking into how/when PTX 'atom' is used for reductions, and first
had a look what the back end currently might emit at all, found SDIM
'atomic_fetch_add<mode>', and SF 'atomic_fetch_addsf'.  I tried to get these
used via '(void) __atomic_fetch_add (&a, b, __ATOMIC_RELAXED);', which works
fine for integer types, but 'error: operand type ‘float *’ is incompatible with
argument 1 of ‘__atomic_fetch_add’' (didn't research the rationale behind
that), so resorted to 'acc atomic'.  Further analysis to be done.  (Can
floating-point type atomic generally not be supported, given that
'__atomic_fetch_add' rejects it?  Is OMP atomic handling doing something wrong
for these even for nvptx target (real, not via offloading)?  Is something wrong
in the nvptx back end?)

This isn't important right now; I just filed the issue as I'd found it.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
  2020-12-16 14:39 [Bug target/98321] New: [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float' tschwinge at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2020-12-18  8:31 ` tschwinge at gcc dot gnu.org
@ 2020-12-18 16:34 ` vries at gcc dot gnu.org
  2021-07-06 11:18 ` tschwinge at gcc dot gnu.org
  2022-02-20  9:25 ` vries at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: vries at gcc dot gnu.org @ 2020-12-18 16:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

--- Comment #5 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Thomas Schwinge from comment #4)
> I had been looking into how/when PTX 'atom' is used for reductions, and
> first had a look what the back end currently might emit at all, found SDIM
> 'atomic_fetch_add<mode>', and SF 'atomic_fetch_addsf'.

Ack.

> I tried to get these
> used via '(void) __atomic_fetch_add (&a, b, __ATOMIC_RELAXED);', which works
> fine for integer types, but 'error: operand type ‘float *’ is incompatible
> with argument 1 of ‘__atomic_fetch_add’' (didn't research the rationale
> behind that), so resorted to 'acc atomic'.
> Further analysis to be done. 
> (Can floating-point type atomic generally not be supported, given that
> '__atomic_fetch_add' rejects it?  Is OMP atomic handling doing something
> wrong for these even for nvptx target (real, not via offloading)?  Is
> something wrong in the nvptx back end?)
> 

I don't know the rationale either, but at least it looks like documented
behaviour, both for the builtin and the pattern.

I don't see the backend doing anything wrong.

> This isn't important right now; I just filed the issue as I'd found it.

Ack, understood.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
  2020-12-16 14:39 [Bug target/98321] New: [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float' tschwinge at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2020-12-18 16:34 ` vries at gcc dot gnu.org
@ 2021-07-06 11:18 ` tschwinge at gcc dot gnu.org
  2022-02-20  9:25 ` vries at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2021-07-06 11:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-07-06
     Ever confirmed|0                           |1

--- Comment #6 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Seems this will be a bigger task:
<http://mid.mail-archive.com/878s2kya3e.fsf@euler.schwinge.homeip.net> "Atomic
operations on floating-point data types".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/98321] [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float'
  2020-12-16 14:39 [Bug target/98321] New: [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float' tschwinge at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2021-07-06 11:18 ` tschwinge at gcc dot gnu.org
@ 2022-02-20  9:25 ` vries at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: vries at gcc dot gnu.org @ 2022-02-20  9:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98321

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Thomas Schwinge from comment #6)
> Seems this will be a bigger task:
> <http://mid.mail-archive.com/878s2kya3e.fsf@euler.schwinge.homeip.net>
> "Atomic operations on floating-point data types".

Link above is broken.  I found an email with this $subject at
https://gcc.gnu.org/pipermail/gcc/2021-July/236673.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-02-20  9:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-16 14:39 [Bug target/98321] New: [nvptx] 'atom.add.f32' for atomic add of 32-bit 'float' tschwinge at gcc dot gnu.org
2020-12-17 16:30 ` [Bug target/98321] " vries at gcc dot gnu.org
2020-12-17 20:18 ` tschwinge at gcc dot gnu.org
2020-12-17 22:15 ` vries at gcc dot gnu.org
2020-12-18  8:31 ` tschwinge at gcc dot gnu.org
2020-12-18 16:34 ` vries at gcc dot gnu.org
2021-07-06 11:18 ` tschwinge at gcc dot gnu.org
2022-02-20  9:25 ` vries at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).