From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 7A27D3857031; Tue, 17 Nov 2020 09:21:11 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7A27D3857031
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/97832] AoSoA complex caxpy-like loops:
 AVX2+FMA -Ofast 7 times slower than -O3
Date: Tue, 17 Nov 2020 09:21:11 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 10.2.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: component
Message-ID: <bug-97832-4-IIX5Y1ewqL@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-97832-4@http.gcc.gnu.org/bugzilla/>
References: <bug-97832-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Nov 2020 09:21:11 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97832

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|target                      |tree-optimization
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, thanks - that helps.  So we're re-associating from

  *_89 =3D (((*_89) - (f_re_34 * x_re_82)) - (f_im_35 * x_im_88));
  *_91 =3D (((*_91) + (f_im_35 * x_re_82)) - (f_re_34 * x_im_88));

to

  *_89 =3D ((*_89) - ((f_re_34 * x_re_82) + (f_im_35 * x_im_88)));
  *_91 =3D (((*_91) + (f_im_35 * x_re_82)) - (f_re_34 * x_im_88));

that makes the operations unbalanced.  This is (a - b) - c -> a - (b + c)
as we're optimizing this as a + -b + -c.

Even smaller testcase:

double a[1024], b[1024], c[1024];

void foo()
{
  for (int i =3D 0; i < 256; ++i)
    {
      a[2*i] =3D a[2*i] + b[2*i] - c[2*i];
      a[2*i+1] =3D a[2*i+1] - b[2*i+1] - c[2*i+1];
    }
}

here ranks end up associating the expr as (-b + -c) + a and negate
re-propagation goes (-b - c) + a -> -(b + c) + a -> a - (b + c)
which is all sensible in isolation.

You could say that associating as (-b + -c) + a is worse than
(a + -b) + -c in this respect.  Ranks are

Rank for _8 is 327683 (a)
Rank for _13 is 327684 (-b)
Rank for _21 is 327684 (-c)

where the rank is one more for the negated values because of the
negate operation.  While heuristically ignoring negates for rank
propagation to make all ranks equal helps this new testcase it
doesn't help for the larger two.

It might still be a generally sound heuristic improvement though.

For the effects on vectorization I think we need to do sth in the
vectorizer itself, for example linearizing expressions.  The
first reassoc pass is supposed to do this but then negate
re-propagation undoes it in this case - which maybe points to
it that needs fixing, somehow associating a not negated operand
first.=