From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5FC05385841E; Tue, 5 Mar 2024 13:03:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5FC05385841E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1709643792; bh=EtXOLqhHs+HQabl8+BPcPVrFmC3G0q8V1Wn5Q1QSzAc=; h=From:To:Subject:Date:From; b=Sq/Vty6u+jFgPC/srhTqLoKpFJyG9F2j9SBz64ONWR8KS3aVd9WE7xaJL3aV4Ly5o YCA+s336pDNpd2JDWJ73/3vMQ/wcpVMi+58e6SFR39clFbO/HPztDB1BIY9SR3/pJU tkDG9KxL9OzZAZ8jITCWatpJ+muoZg+VxVmuUiC0= From: "jamborm at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114238] New: Multiple 554.roms_r run-time regressions (4%-20%) since r14-9193-ga0b1798042d033 Date: Tue, 05 Mar 2024 13:03:10 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: jamborm at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc blocked target_milestone cf_gcchost cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114238 Bug ID: 114238 Summary: Multiple 554.roms_r run-time regressions (4%-20%) since r14-9193-ga0b1798042d033 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Blocks: 26163 Target Milestone: --- Host: x86_64-linux, aarch64-linux Target: x86_64-linux, aarch64-linux Our LNT instance has detected that runtime of benchmark 554.roms_r from the SPEC 2017 FPUrate suite regressed on all machines on most configurations by 4-20%. For example: simple -O2 -flto on AMD Zen 3 regressed by 14%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=3D470.537.0 on Zen2 -O2 -flto regression is the worst, 20%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=3D298.537.0 -Ofast -march=3Dnative -flto on AMD Zen 4 regressed by 7%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=3D959.537.0 -Ofast -march=3Dnative on AMD Zen 2 regressed by 17%: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=3D295.537.0 but it also happens on Intel Skylake: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=3D800.537.0 or Aarch64: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=3D587.537.0 and there are smaller regressions on the PGO configurations too. I have bisected the Zen3 -O2 -flto case to r14-9193-ga0b1798042d033 (Richard Biener: tree-optimization/114074 - CHREC multiplication and undefined overflow). I have then verified that the zen 4 -Ofast -march=3Dnatice -flto and zen 2 -Ofast -march=3Dnative cases have also been introduces by it: commit a0b1798042d033fd2cc2c806afbb77875dd2909b Author: Richard Biener Date: Mon Feb 26 13:33:21 2024 +0100 tree-optimization/114074 - CHREC multiplication and undefined overflow When folding a multiply CHRECs are handled like {a, +, b} * c is {a*c, +, b*c} but that isn't generally correct when overflow invokes undefined behavior. The following uses unsigned arithmetic unless either a is zero or a and b have the same sign. I've used simple early outs for INTEGER_CSTs and otherwise use a range-query since we lack a tree_expr_nonpositive_p and get_range_pos_neg isn't a good fit. PR tree-optimization/114074 * tree-chrec.h (chrec_convert_rhs): Default at_stmt arg to NULL. * tree-chrec.cc (chrec_fold_multiply): Canonicalize inputs. Handle poly vs. non-poly multiplication correctly with respect to undefined behavior on overflow. * gcc.dg/torture/pr114074.c: New testcase. * gcc.dg/pr68317.c: Adjust expected location of diagnostic. * gcc.dg/vect/vect-early-break_119-pr114068.c: Do not expect loop to be vectorized. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95= )=