From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 73C50384AB53; Thu, 11 Apr 2024 07:28:17 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 73C50384AB53
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1712820497;
	bh=6SNtzk6BUQjg7wPNDZDPAu1WjH92b1SoV1jTyjdoaN8=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=uo4U6FPr0Vg042WeVHIKmSaAfcDH4hky0YfzhJf4GcB6Q14CbT8rk2pvo1JqBf+O2
	 25m9Dak7ltyEMS9j3nvG9/lnyUhy5Sh8GLzf25+Eo6kuj6tbxl6xxhYfVFafHycwhS
	 gUTWgY7xXGs8Yin94lB1JJlvXwV5vLbcEluKhNQw=
From: "liuhongt at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/114591] [12/13/14 Regression] register allocators
 introduce an extra load operation since gcc-12
Date: Thu, 11 Apr 2024 07:28:16 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 13.2.0
X-Bugzilla-Keywords: missed-optimization, ra
X-Bugzilla-Severity: normal
X-Bugzilla-Who: liuhongt at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 12.4
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-114591-4-aBnvgxGPQl@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114591-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114591-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114591
--- Comment #15 from Hongtao Liu <liuhongt at gcc dot gnu.org> ---
> I don't see this as problematic. IIRC, there was a discussion in the past
> that a couple (two?) memory accesses from the same location close to each
> other can be faster (so, -O2, not -Os) than preloading the value to the
> register first.
At lease for memory with vector mode, it's better to preload the value to
register first.
>=20
> In contrast, the example from the Comment #11 already has the correct val=
ue
> in %eax, so there is no need to reload it again from memory, even in a
> narrower mode.

So the problem is why cse can't handle same memory with narrower mode, maybe
it's because there's zero_extend in the first load. cse looks like can hand=
le
simple wider mode memory.

4952      /* See if a MEM has already been loaded with a widening operation;
4953         if it has, we can use a subreg of that.  Many CISC machines
4954         also have such operations, but this is only likely to be
4955         beneficial on these machines.  */=