[Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "amonakov at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/106902] [11/12/13 Regression] Program compiled with -O3 -mfma produces different result
Date: Wed, 14 Sep 2022 15:20:13 +0000	[thread overview]
Message-ID: <bug-106902-4-h85nPs5LeF@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-106902-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106902

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #6 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
This is a lovely showcase how optimizations cooperatively produce something
unexpected.

TL;DR: SLP introduces redundant computations and then fma formation contracts
some (but not all) of those, dramatically reducing numerical stability. In
principle that's similar to incorrectly "optimizing"

double f(double x)
{
  double y = x * x;
  return y - y;
}

(which is guaranteed to return either NaN or 0) to

double f(double x)
{
  return fma(x, x, -(x * x));
}

which returns the round-off tail of x * x (or NaN). I think there's already
another bug with a similar root cause.

In this bug, we begin with (note, all following examples are supposed to be
compiled without fma contraction, i.e. -O0, plain -O2, or -O2 -ffp-contract=off
if your target has fma):

#include <math.h>
#include <stdio.h>

double one = 1;

double b1 = 0x1.70e906b54fe4fp+1;
double b2 = -0x1.62adb4752c14ep+1;
double b3 = 0x1.c7001a6f3bd8p-1;
double B = 0x1.29c9034e7cp-13;

void f1(void)
{
        double x1 = 0, x2 = 0, x3 = 0;

        for (int i = 0; i < 99; ) {
                double t = B * one + x1 * b1 + x2 * b2 + x3 * b3;
                printf("%d %g\t%a\n", i++, t, t);

                x3 = x2, x2 = x1, x1 = t;
        }
}

predcom unrolls by 3 to get rid of moves:

void f2(void)
{
        double x1 = 0, x2 = 0, x3 = 0;

        for (int i = 0; i < 99; ) {
                x3 = B * one + x1 * b1 + x2 * b2 + x3 * b3;
                printf("%d %g\t%a\n", i++, x3, x3);

                x2 = B * one + x3 * b1 + x1 * b2 + x2 * b3;
                printf("%d %g\t%a\n", i++, x2, x2);

                x1 = B * one + x2 * b1 + x3 * b2 + x1 * b3;
                printf("%d %g\t%a\n", i++, x1, x1);
        }
}

SLP introduces some redundant vector computations:

typedef double f64v2 __attribute__((vector_size(16)));

void f3(void)
{
        double x1 = 0, x2 = 0, x3 = 0;

        f64v2 x32 = { 0 }, x21 = { 0 };

        for (int i = 0; i < 99; ) {
                x3 = B * one + x21[1] * b1 + x2 * b2 + x3 * b3;

                f64v2 x13b1 = { x21[1] * b1, x3 * b1 };

                x32 = B * one + x13b1 + x21 * b2 + x32 * b3;

                x2 = B * one + x3 * b1 + x1 * b2 + x2 * b3;

                f64v2 x13b2 = { b2 * x1, b2 * x32[0] };

                x21 = B * one + x32 * b1 + x13b2 + x21 * b3;

                x1 = B * one + x2 * b1 + x32[0] * b2 + x1 * b3;

                printf("%d %g\t%a\n", i++, x32[0], x32[0]);
                printf("%d %g\t%a\n", i++, x32[1], x32[1]);
                printf("%d %g\t%a\n", i++, x21[1], x21[1]);
        }
}

Note that this is still bit-identical to the initial function. But then
tree-ssa-math-opts "randomly" forms some FMAs:

f64v2 vfma(f64v2 x, f64v2 y, f64v2 z)
{
        return (f64v2){ fma(x[0], y[0], z[0]), fma(x[1], y[1], z[1]) };
}

void f4(void)
{
        f64v2 vone = { one, one }, vB = { B, B };
        f64v2 vb1 = { b1, b1 }, vb2 = { b2, b2 }, vb3 = { b3, b3 };

        double x1 = 0, x2 = 0, x3 = 0;

        f64v2 x32 = { 0 }, x21 = { 0 };

        for (int i = 0; i < 99; ) {
                x3 = fma(b3, x3, fma(b2, x2, fma(B, one, x21[1] * b1)));

                f64v2 x13b1 = { x21[1] * b1, x3 * b1 };

                x32 = vfma(vb3, x32, vfma(vb2, x21, vfma(vB, vone, x13b1)));

                x2 = fma(b3, x2, b2 * x1 + fma(B, one, x3 * b1));

                f64v2 x13b2 = { b2 * x1, b2 * x32[0] };

                x21 = vfma(vb3, x21, x13b2 + vfma(vB, vone, x32 * vb1));

                x1 = fma(b3, x1, b2 * x32[0] + fma(B, one, b1 * x2));

                printf("%d %g\t%a\n", i++, x32[0], x32[0]);
                printf("%d %g\t%a\n", i++, x32[1], x32[1]);
                printf("%d %g\t%a\n", i++, x21[1], x21[1]);
        }
}

and here some of the redundantly computed values are computed differently
depending on where rounding after multiplication was omitted. Somehow this is
enough to make the computation explode numerically.

next prev parent reply	other threads:[~2022-09-14 15:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-10 18:58 [Bug tree-optimization/106902] New: Program compiled with -O3 -fmfa " jhllawrence963 at gmail dot com
2022-09-10 19:07 ` [Bug target/106902] Program compiled with -O3 -mfma " pinskia at gcc dot gnu.org
2022-09-12  8:01 ` [Bug target/106902] [11/12/13 Regression] " rguenth at gcc dot gnu.org
2022-09-12 14:08 ` marxin at gcc dot gnu.org
2022-09-12 14:10 ` [Bug target/106902] [11/12 " marxin at gcc dot gnu.org
2022-09-13  7:06 ` [Bug target/106902] [11/12/13 " rguenth at gcc dot gnu.org
2022-09-14 15:20 ` amonakov at gcc dot gnu.org [this message]
2022-09-15  9:33 ` amonakov at gcc dot gnu.org
2022-09-17 18:19 ` jhllawrence963 at gmail dot com
2022-09-17 18:23 ` jhllawrence963 at gmail dot com
2022-09-19  7:08 ` rguenth at gcc dot gnu.org
2022-09-19  7:14 ` amonakov at gcc dot gnu.org
2022-09-19  7:25 ` rguenth at gcc dot gnu.org
2022-09-19  8:14 ` amonakov at gcc dot gnu.org
2022-09-19  9:44 ` rguenth at gcc dot gnu.org
2022-09-27 18:31 ` amonakov at gcc dot gnu.org
2022-09-29  6:41 ` rguenth at gcc dot gnu.org
2022-09-29 11:28 ` amonakov at gcc dot gnu.org
2022-09-29 13:39 ` rguenther at suse dot de
2022-09-30  6:17 ` amonakov at gcc dot gnu.org
2023-05-11 17:32 ` [Bug target/106902] [11/12/13/14 " amonakov at gcc dot gnu.org
2023-05-12  6:27 ` rguenth at gcc dot gnu.org
2023-05-17 18:49 ` amonakov at gcc dot gnu.org
2023-05-17 18:54 ` pinskia at gcc dot gnu.org
2023-05-18  5:53 ` rguenth at gcc dot gnu.org
2023-05-18  8:31 ` amonakov at gcc dot gnu.org
2023-05-18 16:03 ` amonakov at gcc dot gnu.org
2023-05-18 16:52 ` rguenther at suse dot de
2023-05-29 10:07 ` jakub at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-106902-4-h85nPs5LeF@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).