public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/52889] New: incorrect sign of _mm_nmsub_XX intrinsics in fma4intrin.h
@ 2012-04-06 16:31 MathiasPuetz at gmx dot de
  2023-12-17  2:08 ` [Bug target/52889] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: MathiasPuetz at gmx dot de @ 2012-04-06 16:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52889

             Bug #: 52889
           Summary: incorrect sign of _mm_nmsub_XX intrinsics in
                    fma4intrin.h
    Classification: Unclassified
           Product: gcc
           Version: 4.6.2
            Status: UNCONFIRMED
          Severity: critical
          Priority: P3
         Component: c
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: MathiasPuetz@gmx.de


Created attachment 27106
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27106
corrected fma4intrin.h include file

All _mm_nmsub_xx and _mm256_nmsub_xx intrinsic definitions for FMA4
instructions in the fma4intrin.h header file are incorrect.

nmsub(a,b,c) should compute the equivalent of -( a*b - c) = c - a*b.

However the fma4intrin.h file maps

   nmsub(a,b,c) ->  madd(-(a),b,-(c)) -> -a*b - c

i.e. the sign in front of the c operand is erroneous.

The impact of this bug is, that code which actively uses the _mm_nmsub_xx
intrinsics gives incorrect results.

The attached fma4intrin.h file has all signs properly corrected
and can be used as drop-in replacement for fma4intrin.h in GCC 4.6.2.
The bug is also present in 4.6.1.

I have not checked correctness of prior GCC version 4.5.x nor newer GCC 4.7.
However the fma4intrin.h mapping has changed from 4.5 -> 4.6,
which might likely have introduced the error.

Best regards,
Mathias Puetz / Cray Inc.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/52889] incorrect sign of _mm_nmsub_XX intrinsics in fma4intrin.h
  2012-04-06 16:31 [Bug c/52889] New: incorrect sign of _mm_nmsub_XX intrinsics in fma4intrin.h MathiasPuetz at gmx dot de
@ 2023-12-17  2:08 ` pinskia at gcc dot gnu.org
  2023-12-17  2:16 ` pinskia at gcc dot gnu.org
  2023-12-17 11:00 ` MathiasPuetz at gmx dot de
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-17  2:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52889

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|critical                    |normal

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I am not sure how important this is since FMA4 was only supported on a few AMD
chips FMA4 and is no longer implemented on newer AMD chips (since Zen 1).
I will double check to see if the definition was fixed though ...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/52889] incorrect sign of _mm_nmsub_XX intrinsics in fma4intrin.h
  2012-04-06 16:31 [Bug c/52889] New: incorrect sign of _mm_nmsub_XX intrinsics in fma4intrin.h MathiasPuetz at gmx dot de
  2023-12-17  2:08 ` [Bug target/52889] " pinskia at gcc dot gnu.org
@ 2023-12-17  2:16 ` pinskia at gcc dot gnu.org
  2023-12-17 11:00 ` MathiasPuetz at gmx dot de
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-12-17  2:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52889

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I tested:
```
#include <x86intrin.h>
extern  __m128 a,b,c;
void foo(){
   a = _mm_nmsub_ps(a,b,c);
}
```
GCC produces correctly:
foo:
        vmovaps a(%rip), %xmm1
        vmovaps b(%rip), %xmm2
        vfnmsubps       c(%rip), %xmm2, %xmm1, %xmm0
        vmovaps %xmm0, a(%rip)
        ret

> However the fma4intrin.h mapping has changed from 4.5 -> 4.6,
> which might likely have introduced the error.

No r0-103863-g89509419968e2b (which was included in GCC 4.6.0) fixed the
defintion.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/52889] incorrect sign of _mm_nmsub_XX intrinsics in fma4intrin.h
  2012-04-06 16:31 [Bug c/52889] New: incorrect sign of _mm_nmsub_XX intrinsics in fma4intrin.h MathiasPuetz at gmx dot de
  2023-12-17  2:08 ` [Bug target/52889] " pinskia at gcc dot gnu.org
  2023-12-17  2:16 ` pinskia at gcc dot gnu.org
@ 2023-12-17 11:00 ` MathiasPuetz at gmx dot de
  2 siblings, 0 replies; 4+ messages in thread
From: MathiasPuetz at gmx dot de @ 2023-12-17 11:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52889

--- Comment #3 from MathiasPuetz at gmx dot de ---
Hi Andrew,
I only vaguely remember this after 11 (!) years.
The generated code looks ok on first sight.
However the reference doc (e.g.
https://www.cs.ucr.edu/~csong/cs153/refs/amd64-vol4-media.pdf 581) shows a
different operand order for the vfnmsubpd instruction than the output of the
GNU assembler (vfnmsubps dest,a,b,c vs. c,b,a,dest)
This could just be a peculiarity of GNU assembly mnemonics definitions.
You should check the results of the code though
A=1
B=2
c=3
Should return 3-1*2=1 according to AMD ref guide. Just looking at the code
won’t tell.
I remember that I checked with the Intel compiler as well, and got the expected
result using the same intrinsic code. When I tried with GNU, I didn’t, which
caused me to investigate.
I can’t tell, if the AMD ref guide document is actually correct. Maybe there
was an erratum, that I am not aware of, and Intel rectified this in their
intrinsic definitions.
If the code doesn’t produce the expected result, you would need to talk to AMD
to get to the bottom of this (the mistake might be in their docs).

Anyway, AMD still supports FMA4 code on their latest Epyc CPUs.
Most new binaries won’t run into the issue, as newer compilers would rather
generate AVX256/512 instructions, which are faster on newer hardware. So it’s
not 100% obsolete, but it’s indeed unlikely that someone would practically run
into this except for comparing some old benchmarks after so many years.

Mathias
ParTec AG

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-12-17 11:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-06 16:31 [Bug c/52889] New: incorrect sign of _mm_nmsub_XX intrinsics in fma4intrin.h MathiasPuetz at gmx dot de
2023-12-17  2:08 ` [Bug target/52889] " pinskia at gcc dot gnu.org
2023-12-17  2:16 ` pinskia at gcc dot gnu.org
2023-12-17 11:00 ` MathiasPuetz at gmx dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).