From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id E0B813861812; Fri, 12 Feb 2021 23:32:35 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E0B813861812
From: "jamborm at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/99083] New: Big run-time regressions of 519.lbm_r with LTO
Date: Fri, 12 Feb 2021 23:32:35 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 11.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jamborm at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter cc blocked
 target_milestone cf_gcchost cf_gcctarget
Message-ID: <bug-99083-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Feb 2021 23:32:36 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99083

            Bug ID: 99083
           Summary: Big run-time regressions of 519.lbm_r with LTO
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: ubizjak at gmail dot com
            Blocks: 26163
  Target Milestone: ---
              Host: x86_64-linux
            Target: x86_64-linux

On AMD Zen2 CPUs, 519.lbm_r is 62.12% slower when built with -O2 and
-flto than when not using LTO.  It is also 62.12% slower than when
using GCC 10 with the two options.  My measurements match those from
LNT on a different zen2:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=3D325.477.0&plot.1=
=3D312.477.0&plot.2=3D349.477.0&plot.3=3D278.477.0&plot.4=3D401.477.0&plot.=
5=3D298.477.0

On the same CPU, compiling the benchmark with -Ofast -march=3Dnative
-flto is slower than non-LTO, by 8.07% on Zen2 and 6.06% on Zen3.  The
Zen2 case has also been caught by LNT:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=3D295.477.0&plot.1=
=3D293.477.0&plot.2=3D287.477.0&plot.3=3D286.477.0&

I have bisected both of these regressions (on Zen2s) to:

  commit 4c61e35f20fe2ffeb9421dbd6f26c767a234a4a0
  Author: Uros Bizjak <ubizjak@gmail.com>
  Date:   Wed Dec 9 21:06:07 2020 +0100

      i386: Remove REG_ALLOC_ORDER definition

      REG_ALLOC_ORDER just defines what the default is set to.

      2020-12-09  Uro=C5=A1 Bizjak  <ubizjak@gmail.com>

      gcc/=20=20=20=20
              * config/i386/i386.h (REG_ALLOC_ORDER): Remove

...which looks like it was supposed to be a no-op, but I looked at the
-O2 LTO case and the assembly generated by this commit definitely
differs from the assembly produced by the previous one in instruction
selection, spilling and even some scheduling.  For example, I see
hunks like:

@@ -994,10 +996,10 @@
        movapd  %xmm13, %xmm9
        movsd   96(%rsp), %xmm13
        subsd   %xmm12, %xmm9
-       movsd   256(%rsp), %xmm12
+       movq    %rbx, %xmm12
+       mulsd   %xmm6, %xmm12
        movsd   %xmm5, 15904(%rdx)
        movsd   72(%rax), %xmm5
-       mulsd   %xmm6, %xmm12
        mulsd   %xmm0, %xmm9
        subsd   %xmm10, %xmm5
        movsd   216(%rsp), %xmm10

The -Ofast native LTO assemblies also differ.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95=
)=