public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9
@ 2020-12-04 12:42 alexander.grund@tu-dresden.de
2020-12-04 16:09 ` [Bug target/98140] " alexander.grund@tu-dresden.de
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: alexander.grund@tu-dresden.de @ 2020-12-04 12:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98140
Bug ID: 98140
Summary: Reused register by xsmincdp leads to wrong NaN
propagation on Power9
Product: gcc
Version: 8.3.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: alexander.grund@tu-dresden.de
Target Milestone: ---
Created attachment 49679
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49679&action=edit
(preprocessed) source code to reproduce issue
Summary: xsmincdp instructions are generated in a form like `xsmincdp b,a,b`
for code that looks like `(a > b) ? b : a`
I was debugging an issue in PyTorch
(https://github.com/pytorch/pytorch/issues/48591) where I encountered the
following problem:
A clamp function is used which looks like this:
c[i] = a[i] < min_vec[i] ? min_vec[i] : (a[i] > max_vec[i] ? max_vec[i] :
a[i]);
This is used in very complex code using multiple levels of C++ templates,
lambdas and such and uses a combination of manually unrolled loops and
unroll-friendly loops (i.e. the above is called in a loop with a fixed trip
count of 8)
The generated ASM code has this (using objdump):
c[i] = a[i] < min_vec[i] ? min_vec[i] : (a[i] > max_vec[i] ? max_vec[i] :
a[i]);
8e970: 20 00 fe cb lfd f31,32(r30)
8e974: 00 f8 9c ff fcmpu cr7,f28,f31
8e978: 0c 00 9c 41 blt cr7,8e984
8e97c: 40 00 fe cb lfd f31,64(r30)
8e980: 40 fc fc f3 xsmincdp vs31,vs28,vs31
8e984: 28 00 9e cb lfd f28,40(r30)
So I assume f28/vs28 contains a[i] and vs31 contains max_vec[i], so the
instruction generated looks like `xsmincdp max_vec,a,max_vec` which on NaN will
return max_vec. However in the source code a should be returned due to the
condition evaluating to false when a NaN is involved.
Reproducing this is tricky, as it depends on many conditions. From my
observations I assume some register pressure is required and even some other
function also calling that code, so maybe some side effects from there. Using
GCC 10.2.0 I wasn't able to reproduce this as the codegen is slightly
different: Seemingly it notices that max_vec contains the same value for all i
and reuses a single register:
324: 00 70 1f fc fcmpu cr0,f31,f14
328: 90 f8 a0 fe fmr f21,f31
32c: 08 00 81 41 bgt 334
330: 40 74 be f2 xsmincdp vs21,vs30,vs14
334: 00 78 1f fc fcmpu cr0,f31,f15
338: 90 f8 c0 fe fmr f22,f31
33c: 08 00 81 41 bgt 344
340: 40 7c de f2 xsmincdp vs22,vs30,vs15
I'm attaching some source code which can be compiled using PyTorch 1.7.0 and 2
examples of preprocessed code which yield the above when compiled using `g++
-mcpu=power9 -g -std=gnu++14 -O3`
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/98140] Reused register by xsmincdp leads to wrong NaN propagation on Power9
2020-12-04 12:42 [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9 alexander.grund@tu-dresden.de
@ 2020-12-04 16:09 ` alexander.grund@tu-dresden.de
2024-04-25 5:40 ` guojiufu at gcc dot gnu.org
2024-05-14 7:04 ` guojiufu at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: alexander.grund@tu-dresden.de @ 2020-12-04 16:09 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98140
--- Comment #1 from Alexander Grund <alexander.grund@tu-dresden.de> ---
It looks like this was fixed in 10.1 by this commit
https://github.com/gcc-mirror/gcc/commit/37e0df8a9be5a8232f4ccb73cdadb02121ba523f
However the codegen looks worse:
390: 20 00 9e c3 lfs f28,32(r30)
394: 00 e0 9e ff fcmpu cr7,f30,f28
398: 10 00 9c 41 blt cr7,3a8
39c: 40 00 9e c3 lfs f28,64(r30)
3a0: 58 e0 1e f0 xscmpgtdp vs0,vs30,vs28
3a4: 30 e0 9e f3 xxsel vs28,vs30,vs28,vs0
When switching an (a > b) ? b : a to an xsmincdp, the arguments must be swapped
to honor the NaN rules. This would allow this to be used for the `HONOR_NANS
(compare_mode)` case. However it still ignores signed zeros. Maybe xsmindp
would be a better fit as it preserves the signed zeros. Only downside I see is
that it converts sNan to qNan which may be an issue.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/98140] Reused register by xsmincdp leads to wrong NaN propagation on Power9
2020-12-04 12:42 [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9 alexander.grund@tu-dresden.de
2020-12-04 16:09 ` [Bug target/98140] " alexander.grund@tu-dresden.de
@ 2024-04-25 5:40 ` guojiufu at gcc dot gnu.org
2024-05-14 7:04 ` guojiufu at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2024-04-25 5:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98140
Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |guojiufu at gcc dot gnu.org
--- Comment #2 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Alexander Grund from comment #1)
> It looks like this was fixed in 10.1 by this commit
> https://github.com/gcc-mirror/gcc/commit/
> 37e0df8a9be5a8232f4ccb73cdadb02121ba523f
...
> `HONOR_NANS (compare_mode)` case. However it still ignores signed zeros.
'xsmincdp' may be fine for zeros, it seems '!HONOR_SIGNED_ZEROS' is not needed.
> Maybe xsmindp would be a better fit as it preserves the signed zeros. Only
> downside I see is that it converts sNan to qNan which may be an issue.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/98140] Reused register by xsmincdp leads to wrong NaN propagation on Power9
2020-12-04 12:42 [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9 alexander.grund@tu-dresden.de
2020-12-04 16:09 ` [Bug target/98140] " alexander.grund@tu-dresden.de
2024-04-25 5:40 ` guojiufu at gcc dot gnu.org
@ 2024-05-14 7:04 ` guojiufu at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2024-05-14 7:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98140
--- Comment #3 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Jiu Fu Guo from comment #2)
> (In reply to Alexander Grund from comment #1)
> > It looks like this was fixed in 10.1 by this commit
> > https://github.com/gcc-mirror/gcc/commit/
> > 37e0df8a9be5a8232f4ccb73cdadb02121ba523f
> ...
> > `HONOR_NANS (compare_mode)` case. However it still ignores signed zeros.
>
> 'xsmincdp' may be fine for zeros, it seems '!HONOR_SIGNED_ZEROS' is not
> needed.
>
> > Maybe xsmindp would be a better fit as it preserves the signed zeros. Only
> > downside I see is that it converts sNan to qNan which may be an issue.
Oh, sorry. for -0 and +0, xsmincdp returns the 2nd operand, xsmindp returns
-0.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-05-14 7:04 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-04 12:42 [Bug target/98140] New: Reused register by xsmincdp leads to wrong NaN propagation on Power9 alexander.grund@tu-dresden.de
2020-12-04 16:09 ` [Bug target/98140] " alexander.grund@tu-dresden.de
2024-04-25 5:40 ` guojiufu at gcc dot gnu.org
2024-05-14 7:04 ` guojiufu at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).