public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9
@ 2020-12-04 12:42 alexander.grund@tu-dresden.de
  2020-12-04 16:09 ` [Bug target/98140] " alexander.grund@tu-dresden.de
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: alexander.grund@tu-dresden.de @ 2020-12-04 12:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98140

            Bug ID: 98140
           Summary: Reused register by xsmincdp leads to wrong NaN
                    propagation on Power9
           Product: gcc
           Version: 8.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: alexander.grund@tu-dresden.de
  Target Milestone: ---

Created attachment 49679
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49679&action=edit
(preprocessed) source code to reproduce issue

Summary: xsmincdp instructions are generated in a form like `xsmincdp b,a,b`
for code that looks like `(a > b) ? b : a`

I was debugging an issue in PyTorch
(https://github.com/pytorch/pytorch/issues/48591) where I encountered the
following problem:

A clamp function is used which looks like this:

c[i] = a[i] < min_vec[i] ? min_vec[i] : (a[i] > max_vec[i] ? max_vec[i] :
a[i]);

This is used in very complex code using multiple levels of C++ templates,
lambdas and such and uses a combination of manually unrolled loops and
unroll-friendly loops (i.e. the above is called in a loop with a fixed trip
count of 8)

The generated ASM code has this (using objdump):
    c[i] = a[i] < min_vec[i] ? min_vec[i] : (a[i] > max_vec[i] ? max_vec[i] :
a[i]);
   8e970:       20 00 fe cb     lfd     f31,32(r30)
   8e974:       00 f8 9c ff     fcmpu   cr7,f28,f31
   8e978:       0c 00 9c 41     blt     cr7,8e984 
   8e97c:       40 00 fe cb     lfd     f31,64(r30)
   8e980:       40 fc fc f3     xsmincdp vs31,vs28,vs31
   8e984:       28 00 9e cb     lfd     f28,40(r30)


So I assume f28/vs28 contains a[i] and vs31 contains max_vec[i], so the
instruction generated looks like `xsmincdp max_vec,a,max_vec` which on NaN will
return max_vec. However in the source code a should be returned due to the
condition evaluating to false when a NaN is involved.

Reproducing this is tricky, as it depends on many conditions. From my
observations I assume some register pressure is required and even some other
function also calling that code, so maybe some side effects from there. Using
GCC 10.2.0 I wasn't able to reproduce this as the codegen is slightly
different: Seemingly it notices that max_vec contains the same value for all i
and reuses a single register:
     324:       00 70 1f fc     fcmpu   cr0,f31,f14
     328:       90 f8 a0 fe     fmr     f21,f31
     32c:       08 00 81 41     bgt     334
     330:       40 74 be f2     xsmincdp vs21,vs30,vs14
     334:       00 78 1f fc     fcmpu   cr0,f31,f15
     338:       90 f8 c0 fe     fmr     f22,f31
     33c:       08 00 81 41     bgt     344
     340:       40 7c de f2     xsmincdp vs22,vs30,vs15

I'm attaching some source code which can be compiled using PyTorch 1.7.0 and 2
examples of preprocessed code which yield the above when compiled using `g++ 
-mcpu=power9 -g -std=gnu++14 -O3`

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/98140] Reused register by xsmincdp leads to wrong NaN propagation on Power9
  2020-12-04 12:42 [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9 alexander.grund@tu-dresden.de
@ 2020-12-04 16:09 ` alexander.grund@tu-dresden.de
  2024-04-25  5:40 ` guojiufu at gcc dot gnu.org
  2024-05-14  7:04 ` guojiufu at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: alexander.grund@tu-dresden.de @ 2020-12-04 16:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98140

--- Comment #1 from Alexander Grund <alexander.grund@tu-dresden.de> ---
It looks like this was fixed in 10.1 by this commit
https://github.com/gcc-mirror/gcc/commit/37e0df8a9be5a8232f4ccb73cdadb02121ba523f

However the codegen looks worse:

     390:       20 00 9e c3     lfs     f28,32(r30)
     394:       00 e0 9e ff     fcmpu   cr7,f30,f28
     398:       10 00 9c 41     blt     cr7,3a8
     39c:       40 00 9e c3     lfs     f28,64(r30)
     3a0:       58 e0 1e f0     xscmpgtdp vs0,vs30,vs28
     3a4:       30 e0 9e f3     xxsel   vs28,vs30,vs28,vs0

When switching an (a > b) ? b : a to an xsmincdp, the arguments must be swapped
to honor the NaN rules. This would allow this to be used for the `HONOR_NANS
(compare_mode)` case. However it still ignores signed zeros. Maybe xsmindp
would be a better fit as it preserves the signed zeros. Only downside I see is
that it converts sNan to qNan which may be an issue.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/98140] Reused register by xsmincdp leads to wrong NaN propagation on Power9
  2020-12-04 12:42 [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9 alexander.grund@tu-dresden.de
  2020-12-04 16:09 ` [Bug target/98140] " alexander.grund@tu-dresden.de
@ 2024-04-25  5:40 ` guojiufu at gcc dot gnu.org
  2024-05-14  7:04 ` guojiufu at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2024-04-25  5:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98140

Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |guojiufu at gcc dot gnu.org

--- Comment #2 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Alexander Grund from comment #1)
> It looks like this was fixed in 10.1 by this commit
> https://github.com/gcc-mirror/gcc/commit/
> 37e0df8a9be5a8232f4ccb73cdadb02121ba523f
...
> `HONOR_NANS (compare_mode)` case. However it still ignores signed zeros.

'xsmincdp' may be fine for zeros, it seems '!HONOR_SIGNED_ZEROS' is not needed.

> Maybe xsmindp would be a better fit as it preserves the signed zeros. Only
> downside I see is that it converts sNan to qNan which may be an issue.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/98140] Reused register by xsmincdp leads to wrong NaN propagation on Power9
  2020-12-04 12:42 [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9 alexander.grund@tu-dresden.de
  2020-12-04 16:09 ` [Bug target/98140] " alexander.grund@tu-dresden.de
  2024-04-25  5:40 ` guojiufu at gcc dot gnu.org
@ 2024-05-14  7:04 ` guojiufu at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2024-05-14  7:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98140

--- Comment #3 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Jiu Fu Guo from comment #2)
> (In reply to Alexander Grund from comment #1)
> > It looks like this was fixed in 10.1 by this commit
> > https://github.com/gcc-mirror/gcc/commit/
> > 37e0df8a9be5a8232f4ccb73cdadb02121ba523f
> ...
> > `HONOR_NANS (compare_mode)` case. However it still ignores signed zeros.
> 
> 'xsmincdp' may be fine for zeros, it seems '!HONOR_SIGNED_ZEROS' is not
> needed.
> 
> > Maybe xsmindp would be a better fit as it preserves the signed zeros. Only
> > downside I see is that it converts sNan to qNan which may be an issue.
Oh, sorry. for -0 and +0, xsmincdp returns the 2nd operand,  xsmindp returns
-0.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-14  7:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-04 12:42 [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9 alexander.grund@tu-dresden.de
2020-12-04 16:09 ` [Bug target/98140] " alexander.grund@tu-dresden.de
2024-04-25  5:40 ` guojiufu at gcc dot gnu.org
2024-05-14  7:04 ` guojiufu at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).