From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 09BFF385B836; Mon, 30 Mar 2020 15:57:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 09BFF385B836 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1585583867; bh=dxZheABD8OvANNEy6OnJRNCYgXQiatxldCvDnbbikDk=; h=From:To:Subject:Date:From; b=t+NRvB9qqmBe0KcONLveBszAOorO8c9lpB3oIBHdhshq35yiTyyjE7fPgutI0X/Px H0Y1+9oqA/mGltipBlBOwTrrmp3HFysRNr+U2gqTuaY5a8sL+UfaaxmnwCroeNMgbJ 2lSk5Ec5MB8FPepLt2Se05hhxogEWhxK++XTy/3I= From: "jamborm at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/94406] New: 503.bwaves_r is 11% slower on Zen2 CPUs than GCC 9 with -Ofast -march=native Date: Mon, 30 Mar 2020 15:57:46 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: jamborm at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc blocked target_milestone cf_gcchost cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Mar 2020 15:57:47 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D94406 Bug ID: 94406 Summary: 503.bwaves_r is 11% slower on Zen2 CPUs than GCC 9 with -Ofast -march=3Dnative Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jamborm at gcc dot gnu.org CC: andre.simoesdiasvieira at arm dot com Blocks: 26163 Target Milestone: --- Host: x86_64-linux Target: x86_64-linux SPEC 2017 FPrate benchmark 503.bwaves_r compiled with -Ofast -march=3Dnative -mtune=3Dnative runs 11% slower on AMD Zen2 CPUs when built with trunk (revision abe13e1847f) than when compiled with GCC 9.2. Bisecting led to commit: commit 1297712fb4af6c6bfd827e0f0a9695b14669f87d Author: Andre Vieira Date: Thu Oct 31 09:49:47 2019 +0000 [vect]Make vect-epilogues-nomask=3D1 default This patch turns epilogue vectorization on by default for all targets. From-SVN: r277659 If we use current trunk but build also with option --param vect-epilogues-nomask=3D0 we get run-time on par with GCC 9. This is also the reason why generic march/tuning or building with -mprefer-vector-width=3D128 currently results in faster code than simple -march=3Dnative. Interestingly, I do not see this issue on an Intel Cascade Lake Server CPU, even though the epilogue is created there too - judging by CFG of the hottest function which looks the same. And I am not sure to what extent it tells anything at all, but I accidentally also perf'ed load-to-store-stall events and in the slow version, the reported "samples" was 10% higher and the reported "event count" shot up 2.8 times(!). Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D26163 [Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95= )=