From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 9DB3E3858C60; Thu, 10 Feb 2022 07:45:34 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9DB3E3858C60
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/102178] [12 Regression] SPECFP 2006 470.lbm
 regressions on AMD Zen CPUs after r12-897-gde56f95afaaa22
Date: Thu, 10 Feb 2022 07:45:34 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: missed-optimization, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 12.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-102178-4-QlbvxFZX5w@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-102178-4@http.gcc.gnu.org/bugzilla/>
References: <bug-102178-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Feb 2022 07:45:34 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102178

--- Comment #29 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Vladimir Makarov from comment #28)
> Could somebody benchmark the following patch on zen2 470.lbm.
>=20
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index 9cee17479ba..76619aca8eb 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -5084,7 +5089,9 @@ lra_constraints (bool first_p)
>                              (x, lra_get_allocno_class (i)) =3D=3D NO_REG=
S))
>                         || contains_symbol_ref_p (x))))
>               ira_reg_equiv[i].defined_p =3D false;
> -           if (contains_reg_p (x, false, true))
> +           if (contains_reg_p (x, false, true)
> +               || (CONST_DOUBLE_P (x)
> +                   && maybe_ge (GET_MODE_SIZE (GET_MODE (x)), 8)))
>               ira_reg_equiv[i].profitable_p =3D false;
>             if (get_equiv (reg) !=3D reg)
>               bitmap_ior_into (equiv_insn_bitmap,
> &lra_reg_info[i].insn_bitmap);
>=20
> If it improves the performance, I'll commit this patch.
>=20
> The expander unconditionally uses memory pool for double constants.  I th=
ink
> the analogous treatment could be done for equiv double constants in LRA.
>=20
> I know only x86_64 permits 64-bit constants as immediate for moving them
> into general regs.  As double fp operations is not done in general regs in
> the most cases, they should be moved into fp regs and this is costly as J=
an
> wrote.  So it has sense to prohibit using equiv double constant values in
> LRA unconditionally.  If in the future we have a target which can move
> double immediate into fp regs we can introduce some target hooks to deal
> with equiv double constant.  But right now I think there is no need for t=
he
> new hook.

Code generation changes quite a bit, with the patch the offending function
is 16 bytes larger.  I see no large immediate moves to GPRs anymore but
there is still a lot of spilling of XMMs to GPRs.  Performance is
unchanged by the patch:

470.lbm         13740        128        107 S   13740        128        107=
 S
470.lbm         13740        128        107 *   13740        128        107=
 S
470.lbm         13740        128        107 S   13740        128        107=
 *

I've used
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 9cee17479ba..a0ec608c056 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -5084,7 +5084,9 @@ lra_constraints (bool first_p)
                             (x, lra_get_allocno_class (i)) =3D=3D NO_REGS))
                        || contains_symbol_ref_p (x))))
              ira_reg_equiv[i].defined_p =3D false;
-           if (contains_reg_p (x, false, true))
+           if (contains_reg_p (x, false, true)
+               || (CONST_DOUBLE_P (x)
+                   && maybe_ge (GET_MODE_SIZE (GET_MODE (x)),
UNITS_PER_WORD)))
              ira_reg_equiv[i].profitable_p =3D false;
            if (get_equiv (reg) !=3D reg)
              bitmap_ior_into (equiv_insn_bitmap,
&lra_reg_info[i].insn_bitmap);

note UNITS_PER_WORD vs. literal 8.

Without knowing much of the code I wonder if we can check whether the move
will be to a reg in GENERAL_REGS?  That is, do we know whether there are
(besides some special constants like zero), immediate moves to the
destination register class?

That said, given the result on LBM I'd not change this at this point.

Honza wanted to look at the move pattern to try to mitigate the
GPR spilling of XMMs.

I do think that we need to take costs into account at some point and get
rid of the reload style hand-waving with !?* in the move patterns.=