From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id F06FD3858C3A; Wed, 22 Dec 2021 11:08:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F06FD3858C3A From: "hubicka at kam dot mff.cuni.cz" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/103797] Clang vectorized LightPixel while GCC does not Date: Wed, 22 Dec 2021 11:08:18 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: unknown X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: hubicka at kam dot mff.cuni.cz X-Bugzilla-Status: WAITING X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Dec 2021 11:08:19 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D103797 --- Comment #4 from hubicka at kam dot mff.cuni.cz --- > -E and remove not needed code. >=20 > > The > > declaratoins are quite convoluted, but the function is well isolated and > > easy to inspect from full one... >=20 > Do we speak about: > https://github.com/mozilla/gecko-dev/blob/bd25b1ca76dd5d323ffc69557f6cf75= 9ba76ba23/gfx/2d/FilterNodeSoftware.cpp#L3670-L3691 > ? Yes. >=20 > It should be possible creating a synthetical test that does the same (and= lives > in a loop, right?). Well, I tried that for a while and got bit lost (either code got vectorized by both gcc and clang or by neither). There are more issues where we have over 50% regression wrt clang build at gfx code, so I think I will first try to reproduce those locally and perf them to see if there is more pattern here. The releavant code is: uint32_t mozilla::gfx::{anonymous}::SpecularLightingSoftware::LightPixel (struct SpecularLightingSoftware * const this, const struct Point3D & aNorm= al, const struct Point3D & aVectorToLight, uint32_t aColor) { [local count: 118111600]: _48 =3D MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.z; _49 =3D _48 + 1.0e+0; _50 =3D MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.y; _51 =3D _50 + 0.0; _52 =3D MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.x; _53 =3D _52 + 0.0; _80 =3D _53 * _53; _82 =3D _51 * _51; _83 =3D _80 + _82; _85 =3D _49 * _49; _86 =3D _83 + _85; if (_86 u>=3D 0.0) goto ; [99.95%] else goto ; [0.05%] [local count: 118052545]: _87 =3D .SQRT (_86); goto ; [100.00%] [local count: 59055]: _29 =3D __builtin_sqrtf (_86); [local count: 118111600]: # _30 =3D PHI <_29(4), _87(3)> _88 =3D _53 / _30; _89 =3D _51 / _30; _90 =3D _49 / _30; _41 =3D MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.x; _39 =3D _41 * _88; _37 =3D MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.y; _33 =3D _37 * _89; _27 =3D _33 + _39; _45 =3D MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.z; _46 =3D _45 * _90; _47 =3D _27 + _46; if (_47 >=3D 0.0) goto ; [59.00%] else goto ; [41.00%] With -Ofast it gets bit more streamlined: [local count: 118111600]: _48 =3D MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.z; _49 =3D _48 + 1.0e+0; _50 =3D MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.y; _51 =3D MEM[(const struct BasePoint3D *)aVectorToLight_25(D)].D.75826.D.75829.x; powmult_78 =3D _51 * _51; powmult_80 =3D _50 * _50; _81 =3D powmult_78 + powmult_80; powmult_83 =3D _49 * _49; _84 =3D _81 + powmult_83; _85 =3D __builtin_sqrtf (_84); _86 =3D _51 / _85; _87 =3D _50 / _85; _88 =3D _49 / _85; _41 =3D MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.x; _39 =3D _41 * _86; _37 =3D MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.y; _33 =3D _37 * _87; _27 =3D _33 + _39; _45 =3D MEM[(const struct BasePoint3D *)aNormal_26(D)].D.75826.D.75829.z; _46 =3D _45 * _88; _47 =3D _27 + _46; if (_47 >=3D 0.0) goto ; [59.00%] else goto ; [41.00%] But I do not quite see in the slp dump why this is not considered for vectorization. I attach the dump. Honza=