public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result
@ 2022-09-10 18:58 jhllawrence963 at gmail dot com
  2022-09-10 19:07 ` [Bug target/106902] Program compiled with -O3 -mfma " pinskia at gcc dot gnu.org
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: jhllawrence963 at gmail dot com @ 2022-09-10 18:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

            Bug ID: 106902
           Summary: Program compiled with -O3 -fmfa produces different
                    result
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jhllawrence963 at gmail dot com
  Target Milestone: ---

Created attachment 53560
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53560&action=edit
Sample C++ program

Compiling the attached sample program with g++ -mfma -O3 and executing it leads
to the wrong output starting with GCC version 11.1. The expected output is
approximately 0.905017, but the actual output is -415762. GCC 10.4 and lower
works as expected. Compiling with other optimization flags and -mno-fma works
as expected too.

About the program:
It starts with an array of 1s, performs a local average for each element, then
prints one result from the middle of the array. The algorithm has been reduced
to remove code that is not needed to reproduce the bug, which is why the
expected output is not exactly 1. The sample contains extra code which is not
relevant to the bug, but removing them causes the bug to be not reproducible.
The relevant parts have been commented with "FIXME". I'm not 100% certain, but
there appears to be some loss of precision which gets compounded because the
result of one loop iteration is used as an input to the next iterations. The
program output becomes more incorrect as the input array size increases.

GCC Version:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-bootstrap
--prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--with-build-config=bootstrap-lto --with-linker-hash-style=gnu
--with-system-zlib --enable-__cxa_atexit --enable-cet=auto
--enable-checking=release --enable-clocale=gnu --enable-default-pie
--enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object
--enable-libstdcxx-backtrace --enable-link-serialization=1
--enable-linker-build-id --enable-lto --enable-multilib --enable-plugin
--enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch
--disable-werror
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (GCC)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
@ 2022-09-10 19:07 ` pinskia at gcc dot gnu.org
  2022-09-12  8:01 ` [Bug target/106902] [11/12/13 Regression] " rguenth at gcc dot gnu.org
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-10 19:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note -mfma should only increase the precision of doing a multiple and add in
infinite precision before a rounding step. So if you depend on exactly rounding
after each operation then you need to use -ffp-contract=off option.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
  2022-09-10 19:07 ` [Bug target/106902] Program compiled with -O3 -mfma " pinskia at gcc dot gnu.org
@ 2022-09-12  8:01 ` rguenth at gcc dot gnu.org
  2022-09-12 14:08 ` marxin at gcc dot gnu.org
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-12  8:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Target Milestone|---                         |11.4
      Known to fail|                            |11.2.0, 12.2.0
           Priority|P3                          |P2
      Known to work|                            |10.3.0
           Keywords|                            |needs-bisection, wrong-code
             Target|X86_64                      |x86_64-*-*
                 CC|                            |rguenth at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
            Summary|Program compiled with -O3   |[11/12/13 Regression]
                   |-mfma produces different    |Program compiled with -O3
                   |result                      |-mfma produces different
                   |                            |result
   Last reconfirmed|                            |2022-09-12

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  -fno-tree-slp-vectorize avoids the issue.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
  2022-09-10 19:07 ` [Bug target/106902] Program compiled with -O3 -mfma " pinskia at gcc dot gnu.org
  2022-09-12  8:01 ` [Bug target/106902] [11/12/13 Regression] " rguenth at gcc dot gnu.org
@ 2022-09-12 14:08 ` marxin at gcc dot gnu.org
  2022-09-12 14:10 ` [Bug target/106902] [11/12 " marxin at gcc dot gnu.org
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-09-12 14:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marxin at gcc dot gnu.org

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
Fixed on master with r13-1450-gd2a898666609452e.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (2 preceding siblings ...)
  2022-09-12 14:08 ` marxin at gcc dot gnu.org
@ 2022-09-12 14:10 ` marxin at gcc dot gnu.org
  2022-09-13  7:06 ` [Bug target/106902] [11/12/13 " rguenth at gcc dot gnu.org
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: marxin at gcc dot gnu.org @ 2022-09-12 14:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[11/12/13 Regression]       |[11/12 Regression] Program
                   |Program compiled with -O3   |compiled with -O3 -mfma
                   |-mfma produces different    |produces different result
                   |result                      |
                 CC|                            |linkw at gcc dot gnu.org

--- Comment #4 from Martin Liška <marxin at gcc dot gnu.org> ---
Started with r11-4637-gf5e18dd9c7dacc96.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (3 preceding siblings ...)
  2022-09-12 14:10 ` [Bug target/106902] [11/12 " marxin at gcc dot gnu.org
@ 2022-09-13  7:06 ` rguenth at gcc dot gnu.org
  2022-09-14 15:20 ` amonakov at gcc dot gnu.org
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-13  7:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[11/12 Regression] Program  |[11/12/13 Regression]
                   |compiled with -O3 -mfma     |Program compiled with -O3
                   |produces different result   |-mfma produces different
                   |                            |result
             Blocks|                            |53947

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #4)
> Fixed on master with r13-1450-gd2a898666609452e.
> Started with r11-4637-gf5e18dd9c7dacc96.

I believe both a are unrelated.  The fix possibly caused a missed optimization
while the cause exposed some opportunity.

More analysis is needed here.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (4 preceding siblings ...)
  2022-09-13  7:06 ` [Bug target/106902] [11/12/13 " rguenth at gcc dot gnu.org
@ 2022-09-14 15:20 ` amonakov at gcc dot gnu.org
  2022-09-15  9:33 ` amonakov at gcc dot gnu.org
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-09-14 15:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #6 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
This is a lovely showcase how optimizations cooperatively produce something
unexpected.

TL;DR: SLP introduces redundant computations and then fma formation contracts
some (but not all) of those, dramatically reducing numerical stability. In
principle that's similar to incorrectly "optimizing"

double f(double x)
{
  double y = x * x;
  return y - y;
}

(which is guaranteed to return either NaN or 0) to

double f(double x)
{
  return fma(x, x, -(x * x));
}

which returns the round-off tail of x * x (or NaN). I think there's already
another bug with a similar root cause.

In this bug, we begin with (note, all following examples are supposed to be
compiled without fma contraction, i.e. -O0, plain -O2, or -O2 -ffp-contract=off
if your target has fma):

#include <math.h>
#include <stdio.h>

double one = 1;

double b1 = 0x1.70e906b54fe4fp+1;
double b2 = -0x1.62adb4752c14ep+1;
double b3 = 0x1.c7001a6f3bd8p-1;
double B = 0x1.29c9034e7cp-13;

void f1(void)
{
        double x1 = 0, x2 = 0, x3 = 0;

        for (int i = 0; i < 99; ) {
                double t = B * one + x1 * b1 + x2 * b2 + x3 * b3;
                printf("%d %g\t%a\n", i++, t, t);

                x3 = x2, x2 = x1, x1 = t;
        }
}

predcom unrolls by 3 to get rid of moves:

void f2(void)
{
        double x1 = 0, x2 = 0, x3 = 0;

        for (int i = 0; i < 99; ) {
                x3 = B * one + x1 * b1 + x2 * b2 + x3 * b3;
                printf("%d %g\t%a\n", i++, x3, x3);

                x2 = B * one + x3 * b1 + x1 * b2 + x2 * b3;
                printf("%d %g\t%a\n", i++, x2, x2);

                x1 = B * one + x2 * b1 + x3 * b2 + x1 * b3;
                printf("%d %g\t%a\n", i++, x1, x1);
        }
}

SLP introduces some redundant vector computations:

typedef double f64v2 __attribute__((vector_size(16)));

void f3(void)
{
        double x1 = 0, x2 = 0, x3 = 0;

        f64v2 x32 = { 0 }, x21 = { 0 };

        for (int i = 0; i < 99; ) {
                x3 = B * one + x21[1] * b1 + x2 * b2 + x3 * b3;

                f64v2 x13b1 = { x21[1] * b1, x3 * b1 };

                x32 = B * one + x13b1 + x21 * b2 + x32 * b3;

                x2 = B * one + x3 * b1 + x1 * b2 + x2 * b3;

                f64v2 x13b2 = { b2 * x1, b2 * x32[0] };

                x21 = B * one + x32 * b1 + x13b2 + x21 * b3;

                x1 = B * one + x2 * b1 + x32[0] * b2 + x1 * b3;

                printf("%d %g\t%a\n", i++, x32[0], x32[0]);
                printf("%d %g\t%a\n", i++, x32[1], x32[1]);
                printf("%d %g\t%a\n", i++, x21[1], x21[1]);
        }
}

Note that this is still bit-identical to the initial function. But then
tree-ssa-math-opts "randomly" forms some FMAs:

f64v2 vfma(f64v2 x, f64v2 y, f64v2 z)
{
        return (f64v2){ fma(x[0], y[0], z[0]), fma(x[1], y[1], z[1]) };
}

void f4(void)
{
        f64v2 vone = { one, one }, vB = { B, B };
        f64v2 vb1 = { b1, b1 }, vb2 = { b2, b2 }, vb3 = { b3, b3 };

        double x1 = 0, x2 = 0, x3 = 0;

        f64v2 x32 = { 0 }, x21 = { 0 };

        for (int i = 0; i < 99; ) {
                x3 = fma(b3, x3, fma(b2, x2, fma(B, one, x21[1] * b1)));

                f64v2 x13b1 = { x21[1] * b1, x3 * b1 };

                x32 = vfma(vb3, x32, vfma(vb2, x21, vfma(vB, vone, x13b1)));

                x2 = fma(b3, x2, b2 * x1 + fma(B, one, x3 * b1));

                f64v2 x13b2 = { b2 * x1, b2 * x32[0] };

                x21 = vfma(vb3, x21, x13b2 + vfma(vB, vone, x32 * vb1));

                x1 = fma(b3, x1, b2 * x32[0] + fma(B, one, b1 * x2));

                printf("%d %g\t%a\n", i++, x32[0], x32[0]);
                printf("%d %g\t%a\n", i++, x32[1], x32[1]);
                printf("%d %g\t%a\n", i++, x21[1], x21[1]);
        }
}

and here some of the redundantly computed values are computed differently
depending on where rounding after multiplication was omitted. Somehow this is
enough to make the computation explode numerically.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (5 preceding siblings ...)
  2022-09-14 15:20 ` amonakov at gcc dot gnu.org
@ 2022-09-15  9:33 ` amonakov at gcc dot gnu.org
  2022-09-17 18:19 ` jhllawrence963 at gmail dot com
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-09-15  9:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #7 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Lawrence, thank you for the nice work reducing the testcase. For RawTherapee
the recommended course of action would be to compile everything with
-ffp-contract=off, then manually reintroduce use of fma in
performance-sensitive places by testing the FP_FAST_FMA macro to know if
hardware fma is available. This way you'll know that all systems without fma
get the same results, and all systems with fma also get the same results (but
different from the former).

For example, my function 'f1' could be adapted like this:

void f1(void)
{
        double x1 = 0, x2 = 0, x3 = 0;

        for (int i = 0; i < 99; ) {
                double t;
#ifdef FP_FAST_FMA
                t = fma(x1, b1, fma(x2, b2, fma(x3, b3, B * one)));
#else
                t = B * one + x1 * b1 + x2 * b2 + x3 * b3;
#endif
                printf("%d %g\t%a\n", i++, t, t);

                x3 = x2, x2 = x1, x1 = t;
        }
}

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (6 preceding siblings ...)
  2022-09-15  9:33 ` amonakov at gcc dot gnu.org
@ 2022-09-17 18:19 ` jhllawrence963 at gmail dot com
  2022-09-17 18:23 ` jhllawrence963 at gmail dot com
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jhllawrence963 at gmail dot com @ 2022-09-17 18:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Lawrence Lee <jhllawrence963 at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #53560|0                           |1
        is obsolete|                            |

--- Comment #8 from Lawrence Lee <jhllawrence963 at gmail dot com> ---
Created attachment 53588
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53588&action=edit
Updated Sample Program

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (7 preceding siblings ...)
  2022-09-17 18:19 ` jhllawrence963 at gmail dot com
@ 2022-09-17 18:23 ` jhllawrence963 at gmail dot com
  2022-09-19  7:08 ` rguenth at gcc dot gnu.org
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: jhllawrence963 at gmail dot com @ 2022-09-17 18:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #9 from Lawrence Lee <jhllawrence963 at gmail dot com> ---
Thank you Alexander for the recommendation.

I don't know if this helps, but I updated the sample code to make the issue
reproducible with the GCC trunk build available on godbolt.org. I just
introduced an unused function parameter.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (8 preceding siblings ...)
  2022-09-17 18:23 ` jhllawrence963 at gmail dot com
@ 2022-09-19  7:08 ` rguenth at gcc dot gnu.org
  2022-09-19  7:14 ` amonakov at gcc dot gnu.org
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-19  7:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
           Keywords|wrong-code                  |
         Resolution|---                         |INVALID

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Thanks Alexander for the analysis.  The situation seems to be impossible to
avoid in general so I think it isn't a bug but just very unfortunate and the
suggested fixes sound correct.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (9 preceding siblings ...)
  2022-09-19  7:08 ` rguenth at gcc dot gnu.org
@ 2022-09-19  7:14 ` amonakov at gcc dot gnu.org
  2022-09-19  7:25 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-09-19  7:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #11 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Can we move -ffp-contract=fast under the -ffast-math umbrella and default to
-ffp-contract=on/off?

Isn't it easy now to implement -ffp-contract=on by a GENERIC-only match.pd
rule?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (10 preceding siblings ...)
  2022-09-19  7:14 ` amonakov at gcc dot gnu.org
@ 2022-09-19  7:25 ` rguenth at gcc dot gnu.org
  2022-09-19  8:14 ` amonakov at gcc dot gnu.org
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-19  7:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jsm28 at gcc dot gnu.org

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #11)
> Can we move -ffp-contract=fast under the -ffast-math umbrella and default to
> -ffp-contract=on/off?

That's probably a question for the frontend maintainers.

> Isn't it easy now to implement -ffp-contract=on by a GENERIC-only match.pd
> rule?

You mean in the frontend only for -ffp-contract=on?  Maybe, I suppose FE
specific folding would also work in that case.  One would also need to read
the fine prints in the language standards again as to whether FP contraction
allows to form FMA for

 double tem = a * b;
 double res = tem + c;

or across inlined function call boundaries which we'll happily do.

Of course for the testcase at hand it's all in
a single statement and no parens specify association (in case parens also
matter here, like in Fortran).  The fortran frontend adds PAREN_EXPRs
as association barriers which also would prevent FMAs to be formed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (11 preceding siblings ...)
  2022-09-19  7:25 ` rguenth at gcc dot gnu.org
@ 2022-09-19  8:14 ` amonakov at gcc dot gnu.org
  2022-09-19  9:44 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-09-19  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #13 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #12)
> > Isn't it easy now to implement -ffp-contract=on by a GENERIC-only match.pd
> > rule?
> 
> You mean in the frontend only for -ffp-contract=on?

Yes. 

> Maybe, I suppose FE
> specific folding would also work in that case.  One would also need to read
> the fine prints in the language standards again as to whether FP contraction
> allows to form FMA for
> 
>  double tem = a * b;
>  double res = tem + c;
> 
> or across inlined function call boundaries which we'll happily do.

In C contraction is allowed only within an expression (hence a difference
between -ffp-contract=fast vs. -ffp-contract=on).

The original testcase was in C++, I think C++ does not specify it, but
hopefully we'd aim to implement the same semantics as for C.

> Of course for the testcase at hand it's all in
> a single statement and no parens specify association (in case parens also
> matter here, like in Fortran).  The fortran frontend adds PAREN_EXPRs
> as association barriers which also would prevent FMAs to be formed.

Please note that in this testcase GCC is breaking language semantics by
computing the same value in two different ways, and then using different
computed values in dependent computations. This could not have happened in the
abstract machine (there's a singular assignment in the original program, which
is then used in subsequent iterations of the loop).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (12 preceding siblings ...)
  2022-09-19  8:14 ` amonakov at gcc dot gnu.org
@ 2022-09-19  9:44 ` rguenth at gcc dot gnu.org
  2022-09-27 18:31 ` amonakov at gcc dot gnu.org
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-19  9:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|INVALID                     |---
             Status|RESOLVED                    |NEW

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #13)
> (In reply to Richard Biener from comment #12)
> > Of course for the testcase at hand it's all in
> > a single statement and no parens specify association (in case parens also
> > matter here, like in Fortran).  The fortran frontend adds PAREN_EXPRs
> > as association barriers which also would prevent FMAs to be formed.
> 
> Please note that in this testcase GCC is breaking language semantics by
> computing the same value in two different ways, and then using different
> computed values in dependent computations. This could not have happened in
> the abstract machine (there's a singular assignment in the original program,
> which is then used in subsequent iterations of the loop).

Hmm, OK.  I think we have a separate bugreport for this kind of thing.  I can't
seem to reproduce any vectorization for your smaller example though.

For SLP vectorization the main source of "duplication" is when we have
unvectorized uses of an SSA name (aka a LIVE def) and cannot use a lane
extract but retain the scalar computations.  In the updated Sample Program
this happens once but the corresponding subgraph is then not profitable
to vectorize for me, so it must be something else.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (13 preceding siblings ...)
  2022-09-19  9:44 ` rguenth at gcc dot gnu.org
@ 2022-09-27 18:31 ` amonakov at gcc dot gnu.org
  2022-09-29  6:41 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-09-27 18:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #15 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #14)
> I can't
> seem to reproduce any vectorization for your smaller example though.

My small C samples omit some detail as they were meant to illustrate what
happened in the IR. Is that a problem?

By the way, I noticed that tree-ssa-math-opts incorrectly handles
-ffp-contract:

  if (FLOAT_TYPE_P (type)
      && flag_fp_contract_mode == FP_CONTRACT_OFF)
    return false;

It should be 'flag_fp_contract_mode != FP_CONTRACT_FAST' instead (the pass
doesn't have any idea about expression boundaries). It dates back to
g:1694907238eb

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (14 preceding siblings ...)
  2022-09-27 18:31 ` amonakov at gcc dot gnu.org
@ 2022-09-29  6:41 ` rguenth at gcc dot gnu.org
  2022-09-29 11:28 ` amonakov at gcc dot gnu.org
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-29  6:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #15)
> (In reply to Richard Biener from comment #14)
> > I can't
> > seem to reproduce any vectorization for your smaller example though.
> 
> My small C samples omit some detail as they were meant to illustrate what
> happened in the IR. Is that a problem?

Not a Problem - it would have made life easier of course ;)

> By the way, I noticed that tree-ssa-math-opts incorrectly handles
> -ffp-contract:
> 
>   if (FLOAT_TYPE_P (type)
>       && flag_fp_contract_mode == FP_CONTRACT_OFF)
>     return false;
> 
> It should be 'flag_fp_contract_mode != FP_CONTRACT_FAST' instead (the pass
> doesn't have any idea about expression boundaries). It dates back to
> g:1694907238eb

Ah - feel free to fix that (I think such change would be obvious, even better
when accompanied by a comment).  I do think that since the only way to
preserve expression boundaries is by PAREN_EXPR that the middle-end
shouldn't care about FAST vs. ON (well, it cannot), but the language
frontends need to ensure to emit PAREN_EXPRs for =ON and omit them for
=FAST.

Since we don't implement ON the above check should indeed be changed until
that's fixed.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (15 preceding siblings ...)
  2022-09-29  6:41 ` rguenth at gcc dot gnu.org
@ 2022-09-29 11:28 ` amonakov at gcc dot gnu.org
  2022-09-29 13:39 ` rguenther at suse dot de
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-09-29 11:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #17 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #16)
> I do think that since the only way to
> preserve expression boundaries is by PAREN_EXPR

Yes, but...

>  that the middle-end
> shouldn't care about FAST vs. ON (well, it cannot), but the language
> frontends need to ensure to emit PAREN_EXPRs for =ON and omit them for
> =FAST.

this will also prevent reassociation across statements too. Doing FMA
contraction in the frontends via a match.pd rule doesn't have this drawback.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (16 preceding siblings ...)
  2022-09-29 11:28 ` amonakov at gcc dot gnu.org
@ 2022-09-29 13:39 ` rguenther at suse dot de
  2022-09-30  6:17 ` amonakov at gcc dot gnu.org
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2022-09-29 13:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #18 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 29 Sep 2022, amonakov at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902
> 
> --- Comment #17 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #16)
> > I do think that since the only way to
> > preserve expression boundaries is by PAREN_EXPR
> 
> Yes, but...
> 
> >  that the middle-end
> > shouldn't care about FAST vs. ON (well, it cannot), but the language
> > frontends need to ensure to emit PAREN_EXPRs for =ON and omit them for
> > =FAST.
> 
> this will also prevent reassociation across statements too. Doing FMA
> contraction in the frontends via a match.pd rule doesn't have this drawback.

True - but does that catch the cases people are interested and are
allowed by the FP contraction rules?  I'm thinking of

 x = a*b + c*d + e + f;

with -fassociative-math we can form two FMAs here?  Of course with
strict IEEE compliance but allowed FP contraction we can only
do FMA (a, b, c*d) + e + f, right?  Does that mean -ffp-contract=on
only makes sense in absence of any other -ffast-math flags?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (17 preceding siblings ...)
  2022-09-29 13:39 ` rguenther at suse dot de
@ 2022-09-30  6:17 ` amonakov at gcc dot gnu.org
  2023-05-11 17:32 ` [Bug target/106902] [11/12/13/14 " amonakov at gcc dot gnu.org
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2022-09-30  6:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #19 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #18)
> True - but does that catch the cases people are interested and are
> allowed by the FP contraction rules?  I'm thinking of
> 
>  x = a*b + c*d + e + f;
> 
> with -fassociative-math we can form two FMAs here?

Yes; it might be reasonable to limit the match.pd rule to
-fno-associative-math, leaving mul/adds as-is for tree-ssa-math-opts to
recombine otherwise.

>  Of course with
> strict IEEE compliance but allowed FP contraction we can only
> do FMA (a, b, c*d) + e + f, right?

I think so.

>  Does that mean -ffp-contract=on
> only makes sense in absence of any other -ffast-math flags?

Well, the proposal was to make -ffp-contract=fast an '-ffast-math' flag, not
=on. I don't want to judge if '-ffp-contract=on -ffast-math' combination is
reasonable or not, because -ffast-math by itself quite nonsensical already.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13/14 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (18 preceding siblings ...)
  2022-09-30  6:17 ` amonakov at gcc dot gnu.org
@ 2023-05-11 17:32 ` amonakov at gcc dot gnu.org
  2023-05-12  6:27 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-05-11 17:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #20 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
I missed it the first time around, but placing PAREN_EXPR around the complete
expression won't work: nothing will prevent GCC from duplicating evaluations of
the sub-expressions, and then randomly forming FMAs like here. It would just
bury this class of bugs deeper.

Now that we are in stage1, can we make some kind of progress here? Is there any
buy-in for:

1. Implementing fp-contract=on via GENERIC folding?
2. Defaulting to fp-contract=on instead of fp-contract=fast under -std=gnu*?
3. Enabling fp-contract=fast under -ffast-math?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13/14 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (19 preceding siblings ...)
  2023-05-11 17:32 ` [Bug target/106902] [11/12/13/14 " amonakov at gcc dot gnu.org
@ 2023-05-12  6:27 ` rguenth at gcc dot gnu.org
  2023-05-17 18:49 ` amonakov at gcc dot gnu.org
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-12  6:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #20)
> I missed it the first time around, but placing PAREN_EXPR around the
> complete expression won't work: nothing will prevent GCC from duplicating
> evaluations of the sub-expressions, and then randomly forming FMAs like
> here. It would just bury this class of bugs deeper.

Hmm, true.

> Now that we are in stage1, can we make some kind of progress here? Is there
> any buy-in for:
> 
> 1. Implementing fp-contract=on via GENERIC folding?
> 2. Defaulting to fp-contract=on instead of fp-contract=fast under -std=gnu*?
> 3. Enabling fp-contract=fast under -ffast-math?

Sounds reasonable.  Though I wouldn't use GENERIC folding but instead
some folding-like code in c-family/ that for example would get invoked
by genericization or via the gimplification hook?  If we'd add GENERIC
folding in fold-const.cc or match.pd the chance is that it will pick up
FMAs "late".

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13/14 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (20 preceding siblings ...)
  2023-05-12  6:27 ` rguenth at gcc dot gnu.org
@ 2023-05-17 18:49 ` amonakov at gcc dot gnu.org
  2023-05-17 18:54 ` pinskia at gcc dot gnu.org
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-05-17 18:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #22 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Created attachment 55105
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55105&action=edit
patch 1/3

(In reply to Richard Biener from comment #21)
> 
> Sounds reasonable.  Though I wouldn't use GENERIC folding but instead
> some folding-like code in c-family/ that for example would get invoked
> by genericization or via the gimplification hook?  If we'd add GENERIC
> folding in fold-const.cc or match.pd the chance is that it will pick up
> FMAs "late".

Agreed, thank you. I'm working on it. The attached patch implements this via
c_gimplify_expr and passes bootstrap+regtest under 'configure
--with-cpu=znver2' (i.e. with fma available by default).

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13/14 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (21 preceding siblings ...)
  2023-05-17 18:49 ` amonakov at gcc dot gnu.org
@ 2023-05-17 18:54 ` pinskia at gcc dot gnu.org
  2023-05-18  5:53 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-17 18:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #23 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #22)
> Created attachment 55105 [details]
> patch 1/3
> 
> (In reply to Richard Biener from comment #21)
> > 
> > Sounds reasonable.  Though I wouldn't use GENERIC folding but instead
> > some folding-like code in c-family/ that for example would get invoked
> > by genericization or via the gimplification hook?  If we'd add GENERIC
> > folding in fold-const.cc or match.pd the chance is that it will pick up
> > FMAs "late".
> 
> Agreed, thank you. I'm working on it. The attached patch implements this via
> c_gimplify_expr and passes bootstrap+regtest under 'configure
> --with-cpu=znver2' (i.e. with fma available by default).

Hmm, seems like this should not be in the C family but the generic part of
gimplifier. Because IIRC Fortran has similar rules but IIRC fortran front-end
emits PAREN_EXPR a lot more which improves the situtation there ...

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13/14 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (22 preceding siblings ...)
  2023-05-17 18:54 ` pinskia at gcc dot gnu.org
@ 2023-05-18  5:53 ` rguenth at gcc dot gnu.org
  2023-05-18  8:31 ` amonakov at gcc dot gnu.org
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-05-18  5:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #23)
> (In reply to Alexander Monakov from comment #22)
> > Created attachment 55105 [details]
> > patch 1/3
> > 
> > (In reply to Richard Biener from comment #21)
> > > 
> > > Sounds reasonable.  Though I wouldn't use GENERIC folding but instead
> > > some folding-like code in c-family/ that for example would get invoked
> > > by genericization or via the gimplification hook?  If we'd add GENERIC
> > > folding in fold-const.cc or match.pd the chance is that it will pick up
> > > FMAs "late".
> > 
> > Agreed, thank you. I'm working on it. The attached patch implements this via
> > c_gimplify_expr and passes bootstrap+regtest under 'configure
> > --with-cpu=znver2' (i.e. with fma available by default).
> 
> Hmm, seems like this should not be in the C family but the generic part of
> gimplifier. Because IIRC Fortran has similar rules but IIRC fortran
> front-end emits PAREN_EXPR a lot more which improves the situtation there ...

The actual worker can be put into generic code but frontends need to set
the rules here I think as they might differ slightly.

As of the patch it looks good, I wonder if we want to check for OPTIMIZE_BOTH
though since at least when no extra negations are required the contraction
should also be a win when optimizing for size?

Also I wondered about the PROP_gimple_any check - do we get into the
gimplification langhook after lowering?  I see we are not resetting the
langhook after lowering (only in free-lang-data, but that only runs with LTO).
We probably at least should gate the langhook invocation in the gimplifier
with what you added in the patch or specify whether the gimplifier is
invoked from the middle-end via the gimplifier context.

If we go for c-family only the genericize entry could be another place to
handle this.

Did you run into any of NON_LVALUE / C_MAYBE_CONST wrappings of the
multiplication btw?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13/14 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (23 preceding siblings ...)
  2023-05-18  5:53 ` rguenth at gcc dot gnu.org
@ 2023-05-18  8:31 ` amonakov at gcc dot gnu.org
  2023-05-18 16:03 ` amonakov at gcc dot gnu.org
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-05-18  8:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #25 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #24)
> As of the patch it looks good, I wonder if we want to check for OPTIMIZE_BOTH
> though since at least when no extra negations are required the contraction
> should also be a win when optimizing for size?

Makes sense, I'll change that (current target hooks always return true for
fma).

> Also I wondered about the PROP_gimple_any check - do we get into the
> gimplification langhook after lowering?  I see we are not resetting the
> langhook after lowering (only in free-lang-data, but that only runs with
> LTO).

Yes, that surprised me. I caught it when analyzing ICE on slp-50.c testcase.

> We probably at least should gate the langhook invocation in the gimplifier
> with what you added in the patch or specify whether the gimplifier is
> invoked from the middle-end via the gimplifier context.

Perhaps. I'll add a comment that we want to handle -ffp-contract=on strictly
during initial gimplification, to hash this out further on gcc-patches, if
necessary.  

> If we go for c-family only the genericize entry could be another place to
> handle this.

That seems less convenient to me. Is IFN_FMA representable as a tree?

> Did you run into any of NON_LVALUE / C_MAYBE_CONST wrappings of the
> multiplication btw?

No, I'm not familiar with those, so I didn't try to construct corresponding
testcases.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13/14 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (24 preceding siblings ...)
  2023-05-18  8:31 ` amonakov at gcc dot gnu.org
@ 2023-05-18 16:03 ` amonakov at gcc dot gnu.org
  2023-05-18 16:52 ` rguenther at suse dot de
  2023-05-29 10:07 ` jakub at gcc dot gnu.org
  27 siblings, 0 replies; 29+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-05-18 16:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #26 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
> > Did you run into any of NON_LVALUE / C_MAYBE_CONST wrappings of the
> > multiplication btw?
> 
> No, I'm not familiar with those, so I didn't try to construct corresponding
> testcases.

I had a look now. My understanding is they are eliminated in c_fully_fold, so
c_gimplify_expr will not encounter those trees.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13/14 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (25 preceding siblings ...)
  2023-05-18 16:03 ` amonakov at gcc dot gnu.org
@ 2023-05-18 16:52 ` rguenther at suse dot de
  2023-05-29 10:07 ` jakub at gcc dot gnu.org
  27 siblings, 0 replies; 29+ messages in thread
From: rguenther at suse dot de @ 2023-05-18 16:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

--- Comment #27 from rguenther at suse dot de <rguenther at suse dot de> ---
> Am 18.05.2023 um 10:31 schrieb amonakov at gcc dot gnu.org <gcc-bugzilla@gcc.gnu.org>:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902
> 
> --- Comment #25 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #24)
>> As of the patch it looks good, I wonder if we want to check for OPTIMIZE_BOTH
>> though since at least when no extra negations are required the contraction
>> should also be a win when optimizing for size?
> 
> Makes sense, I'll change that (current target hooks always return true for
> fma).
> 
>> Also I wondered about the PROP_gimple_any check - do we get into the
>> gimplification langhook after lowering?  I see we are not resetting the
>> langhook after lowering (only in free-lang-data, but that only runs with
>> LTO).
> 
> Yes, that surprised me. I caught it when analyzing ICE on slp-50.c testcase.
> 
>> We probably at least should gate the langhook invocation in the gimplifier
>> with what you added in the patch or specify whether the gimplifier is
>> invoked from the middle-end via the gimplifier context.
> 
> Perhaps. I'll add a comment that we want to handle -ffp-contract=on strictly
> during initial gimplification, to hash this out further on gcc-patches, if
> necessary.  
> 
>> If we go for c-family only the genericize entry could be another place to
>> handle this.
> 
> That seems less convenient to me. Is IFN_FMA representable as a tree?

Yes, that’s possible.  Let’s see if others have an opinion on the ml.

>> Did you run into any of NON_LVALUE / C_MAYBE_CONST wrappings of the
>> multiplication btw?
> 
> No, I'm not familiar with those, so I didn't try to construct corresponding
> testcases.
> 
> -- 
> You are receiving this mail because:
> You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Bug target/106902] [11/12/13/14 Regression] Program compiled with -O3 -mfma produces different result
  2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
                   ` (26 preceding siblings ...)
  2023-05-18 16:52 ` rguenther at suse dot de
@ 2023-05-29 10:07 ` jakub at gcc dot gnu.org
  27 siblings, 0 replies; 29+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-29 10:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|11.4                        |11.5

--- Comment #28 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 11.4 is being released, retargeting bugs to GCC 11.5.

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-05-29 10:07 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa produces different result jhllawrence963 at gmail dot com
2022-09-10 19:07 ` [Bug target/106902] Program compiled with -O3 -mfma " pinskia at gcc dot gnu.org
2022-09-12  8:01 ` [Bug target/106902] [11/12/13 Regression] " rguenth at gcc dot gnu.org
2022-09-12 14:08 ` marxin at gcc dot gnu.org
2022-09-12 14:10 ` [Bug target/106902] [11/12 " marxin at gcc dot gnu.org
2022-09-13  7:06 ` [Bug target/106902] [11/12/13 " rguenth at gcc dot gnu.org
2022-09-14 15:20 ` amonakov at gcc dot gnu.org
2022-09-15  9:33 ` amonakov at gcc dot gnu.org
2022-09-17 18:19 ` jhllawrence963 at gmail dot com
2022-09-17 18:23 ` jhllawrence963 at gmail dot com
2022-09-19  7:08 ` rguenth at gcc dot gnu.org
2022-09-19  7:14 ` amonakov at gcc dot gnu.org
2022-09-19  7:25 ` rguenth at gcc dot gnu.org
2022-09-19  8:14 ` amonakov at gcc dot gnu.org
2022-09-19  9:44 ` rguenth at gcc dot gnu.org
2022-09-27 18:31 ` amonakov at gcc dot gnu.org
2022-09-29  6:41 ` rguenth at gcc dot gnu.org
2022-09-29 11:28 ` amonakov at gcc dot gnu.org
2022-09-29 13:39 ` rguenther at suse dot de
2022-09-30  6:17 ` amonakov at gcc dot gnu.org
2023-05-11 17:32 ` [Bug target/106902] [11/12/13/14 " amonakov at gcc dot gnu.org
2023-05-12  6:27 ` rguenth at gcc dot gnu.org
2023-05-17 18:49 ` amonakov at gcc dot gnu.org
2023-05-17 18:54 ` pinskia at gcc dot gnu.org
2023-05-18  5:53 ` rguenth at gcc dot gnu.org
2023-05-18  8:31 ` amonakov at gcc dot gnu.org
2023-05-18 16:03 ` amonakov at gcc dot gnu.org
2023-05-18 16:52 ` rguenther at suse dot de
2023-05-29 10:07 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).