From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 3FB333858D35; Wed, 1 Feb 2023 13:01:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3FB333858D35 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1675256513; bh=wYVG9QDRYSEbYk6EDYD+YrqzuPETlnCHluG7BWM1vTg=; h=From:To:Subject:Date:From; b=Xb11Z3nfY7P2R3eGN+GPILAuiUB3jhhrnRHXnxSZEaJj9/dM/d4zU0iw4dcNWBh2Z 2PAAAnyhBsRTfT876/osoYTGCJp+ef4YnlaEYikOy87W6iyWr5JT0M083z0Nkbr7t4 Up7iVlBkyLvAlonEdbrK8csMITom5aOZVPYhvu+8= From: "jamborm at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/108629] New: 549.fotonik3d_r regresses 15-24% at -O2 -flto -march=x86-64-v3 since r13-1203-g038b077689bb53 Date: Wed, 01 Feb 2023 13:01:52 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: jamborm at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc blocked target_milestone cf_gcchost cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108629 Bug ID: 108629 Summary: 549.fotonik3d_r regresses 15-24% at -O2 -flto -march=3Dx86-64-v3 since r13-1203-g038b077689bb53 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: rsandifo at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux When benchmarking trunk revision 99ea0d76116 I noticed a 24% regression on Zen4 and Zen3 machines and 16% on a Zen2 and a Intel CascadeLake when running 549.fotonik3d_r from SPEC 2017 FPrate suite built with options -O2 -g -march=3Dx86-64-v3 -flto=3D32 compared to the binary produced by GCC 12. The number of branches reported by perf stat between gcc 12 and the aforementioned trunk revision on the Zen3 machine jumped by 90%. The symbol profile changed from: Overhead Samples Shared object Name 33.23% 40078 fotonik3d_r_peak.gcc12=20 __upml_mod_MOD_upml_updatee_simple.lto_priv.0 27.74% 33471 fotonik3d_r_peak.gcc12 __upml_mod_MOD_upml_updateh 17.50% 21114 fotonik3d_r_peak.gcc12 __material_mod_MOD_mat_updatee 9.52% 11493 fotonik3d_r_peak.gcc12 __update_mod_MOD_updateh 9.49% 11445 fotonik3d_r_peak.gcc12 __power_mod_MOD_power_dft To: Overhead Samples Shared object Name 26.68% 39825 fotonik3d_r_peak.trunk=20 __upml_mod_MOD_upml_updatee_simple.lto_priv.0 22.35% 33368 fotonik3d_r_peak.trunk __upml_mod_MOD_upml_updateh 13.99% 20892 fotonik3d_r_peak.trunk __material_mod_MOD_mat_updatee 13.96% 20816 fotonik3d_r_peak.trunk __power_mod_MOD_power_dft 11.51% 17164 libgcc_s.so.1 __muldc3 8.60% 12840 fotonik3d_r_peak.trunk __update_mod_MOD_updateh On the Zen3 machine at least, I have bisected this to: commit 038b077689bb5310386b04d40a2cea234f01e6aa Author: Richard Sandiford Date: Wed Jun 22 11:27:15 2022 +0100 data-ref: Improve non-loop disambiguation [PR106019] When dr_may_alias_p is called without a loop context, it tries to use the tree-affine interface to calculate the difference between the two addresses and use that difference to check whether the gap between the accesses is known at compile time. However, as the example in the PR shows, this doesn't expand SSA_NAMEs and so can easily be defeated by things like reassociation. One fix would have been to use aff_combination_expand to expand the SSA_NAMEs, but we'd then need some way of maintaining the associated cache. This patch instead reuses the innermost_loop_behavior fields (which exist even when no loop context is provided). It might still be useful to do the aff_combination_expand thing too, if an example turns out to need it. gcc/ PR tree-optimization/106019 * tree-data-ref.cc (dr_may_alias_p): Try using the innermost_loop_behavior to disambiguate non-loop queries. gcc/testsuite/ PR tree-optimization/106019 * gcc.dg/vect/bb-slp-pr106019.c: New test. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95= )=