From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9114E3892471; Wed, 27 Jan 2021 13:46:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9114E3892471 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/98854] [11 Regression] cray benchmark is about 15% slower since r11-4428-g4a369d199bf2f34e Date: Wed, 27 Jan 2021 13:46:43 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jan 2021 13:46:43 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98854 --- Comment #3 from Richard Biener --- OK, one can see it with BB vectorization enabled vs. disabled. Bad: Samples: 7K of event 'cycles:u', Event count (approx.): 7540324763=20=20=20= =20=20=20=20=20=20=20=20=20=20=20 Overhead Samples Command Shared Object Symbol=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 53.11% 3711 a.out a.out [.] shade 25.39% 1774 a.out a.out [.] trace 18.16% 1271 a.out a.out [.] render_scanline 1.56% 109 a.out libm-2.26.so [.] __ieee754_pow_sse2 Good: Samples: 6K of event 'cycles:u', Event count (approx.): 6673802579=20=20=20= =20=20=20=20=20=20=20=20=20=20=20 Overhead Samples Command Shared Object Symbol=20=20=20=20=20= =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 61.21% 3857 a.out a.out [.] shade 20.44% 1288 a.out a.out [.] trace 14.42% 912 a.out a.out [.] render_scanline 1.81% 114 a.out libm-2.26.so [.] __ieee754_pow_sse2 With added -fwhole-program we have c-ray-mt.c:624:18: optimized: basic block part vectorized using 32 byte vec= tors c-ray-mt.c:372:13: optimized: basic block part vectorized using 32 byte vec= tors c-ray-mt.c:372:13: optimized: basic block part vectorized using 32 byte vec= tors c-ray-mt.c:432:9: optimized: basic block part vectorized using 32 byte vect= ors c-ray-mt.c:656:7: optimized: basic block part vectorized using 32 byte vect= ors c-ray-mt.c:656:7: optimized: basic block part vectorized using 32 byte vect= ors c-ray-mt.c:265:23: optimized: basic block part vectorized using 32 byte vec= tors :372 is bad and then :656 For the first we vectorize a store [local count: 31445960]: # nearest_obj_239 =3D PHI ... _816 =3D {nearest_sp_pos_x_lsm.258_78, nearest_sp_pos_y_lsm.259_174, nearest_sp_pos_z_lsm.260_201, nearest_sp_normal_x_lsm.261_200}; _820 =3D {nearest_sp_normal_y_lsm.262_122, nearest_sp_normal_z_lsm.263_29= 3, nearest_sp_vref_x_lsm.264_124, nearest_sp_vref_y_lsm.265_148}; iter_231 =3D iter_363->next; if (iter_231 !=3D 0B) goto ; [89.00%] else goto ; [11.00%] [local count: 27986904]: goto ; [100.00%] [local count: 3459055]: # nearest_sp_dist_lsm.257_228 =3D PHI # nearest_sp_pos_x_lsm.258_226 =3D PHI # nearest_sp_normal_y_lsm.262_343 =3D PHI # nearest_sp_vref_x_lsm.264_238 =3D PHI # nearest_sp_vref_y_lsm.265_237 =3D PHI # nearest_sp_vref_z_lsm.266_236 =3D PHI # nearest_sp_pos_y_lsm.259_342 =3D PHI # nearest_sp_normal_x_lsm.261_351 =3D PHI # nearest_sp_pos_z_lsm.260_304 =3D PHI # nearest_obj_197 =3D PHI # nearest_sp_normal_z_lsm.263_821 =3D PHI # vect_nearest_sp_pos_x_lsm.258_226.268_815 =3D PHI <_816(26)> # vect_nearest_sp_pos_x_lsm.258_226.268_814 =3D PHI <_820(26)> nearest_sp.vref.z =3D nearest_sp_vref_z_lsm.266_236; MEM [(double *)&nearest_sp] =3D vect_nearest_sp_pos_x_lsm.258_226.268_815; _812 =3D &nearest_sp.pos.x + 32; MEM [(double *)_812] =3D vect_nearest_sp_pos_x_lsm.258_226.268_814; but we insert the vector CTOR on a path that's more often executed than the use. And since there's no sinking pass after vectorization nothing fixes this up.=