From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7A27D3857031; Tue, 17 Nov 2020 09:21:11 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7A27D3857031 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3 Date: Tue, 17 Nov 2020 09:21:11 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 10.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: component Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Nov 2020 09:21:11 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D97832 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Component|target |tree-optimization --- Comment #4 from Richard Biener --- Ah, thanks - that helps. So we're re-associating from *_89 =3D (((*_89) - (f_re_34 * x_re_82)) - (f_im_35 * x_im_88)); *_91 =3D (((*_91) + (f_im_35 * x_re_82)) - (f_re_34 * x_im_88)); to *_89 =3D ((*_89) - ((f_re_34 * x_re_82) + (f_im_35 * x_im_88))); *_91 =3D (((*_91) + (f_im_35 * x_re_82)) - (f_re_34 * x_im_88)); that makes the operations unbalanced. This is (a - b) - c -> a - (b + c) as we're optimizing this as a + -b + -c. Even smaller testcase: double a[1024], b[1024], c[1024]; void foo() { for (int i =3D 0; i < 256; ++i) { a[2*i] =3D a[2*i] + b[2*i] - c[2*i]; a[2*i+1] =3D a[2*i+1] - b[2*i+1] - c[2*i+1]; } } here ranks end up associating the expr as (-b + -c) + a and negate re-propagation goes (-b - c) + a -> -(b + c) + a -> a - (b + c) which is all sensible in isolation. You could say that associating as (-b + -c) + a is worse than (a + -b) + -c in this respect. Ranks are Rank for _8 is 327683 (a) Rank for _13 is 327684 (-b) Rank for _21 is 327684 (-c) where the rank is one more for the negated values because of the negate operation. While heuristically ignoring negates for rank propagation to make all ranks equal helps this new testcase it doesn't help for the larger two. It might still be a generally sound heuristic improvement though. For the effects on vectorization I think we need to do sth in the vectorizer itself, for example linearizing expressions. The first reassoc pass is supposed to do this but then negate re-propagation undoes it in this case - which maybe points to it that needs fixing, somehow associating a not negated operand first.=