From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 6F58D384AB42; Thu, 11 Apr 2024 06:54:31 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6F58D384AB42
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1712818471;
	bh=YaOigfRkg9tA9d3K6ffupR5NasJNpB+vlcK7V27wldI=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=Mn9sAtHQOkBxU9bpqx1LtK2/4dxOWFmSIgAUm/y1QjSM/eWM+U5dMFfruYdu3ZVcH
	 0PCy8GiUsd4TqkpyGAQ7k109wI42Fp0eewcvZPuAKeA/xS6Vj/Hr3MjHz90/nxG93f
	 VqjLvEt2FBcCoRJKHjC+mNVJToNqxLu4xUAgCr+A=
From: "ubizjak at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/114591] [12/13/14 Regression] register allocators
 introduce an extra load operation since gcc-12
Date: Thu, 11 Apr 2024 06:54:30 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 13.2.0
X-Bugzilla-Keywords: missed-optimization, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: ubizjak at gmail dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 12.4
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-114591-4-mwwHaDEKTE@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114591-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114591-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114591
--- Comment #13 from Uro=C5=A1 Bizjak <ubizjak at gmail dot com> ---
(In reply to Hongtao Liu from comment #12)
> short a;
> short c;
> short d;
> void
> foo (short b, short f)
> {
>   c =3D b + a;
>   d =3D f + a;
> }
>=20
> foo(short, short):
>         addw    a(%rip), %di
>         addw    a(%rip), %si
>         movw    %di, c(%rip)
>         movw    %si, d(%rip)
>         ret
>=20
> this one is bad since gcc10.1 and there's no subreg, The problem is if the
> operand is used by more than 1 insn, and they all support separate m
> constraint, mem_cost is quite small(just 1, reg move cost is 2), and this
> makes RA more inclined to propagate memory across insns. I guess RA assum=
es
> the separate m means the insn only support memory_operand?

I don't see this as problematic. IIRC, there was a discussion in the past t=
hat
a couple (two?) memory accesses from the same location close to each other =
can
be faster (so, -O2, not -Os) than preloading the value to the register firs=
t.

In contrast, the example from the Comment #11 already has the correct value=
 in
%eax, so there is no need to reload it again from memory, even in a narrower
mode.=