From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 73C50384AB53; Thu, 11 Apr 2024 07:28:17 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 73C50384AB53 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1712820497; bh=6SNtzk6BUQjg7wPNDZDPAu1WjH92b1SoV1jTyjdoaN8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=uo4U6FPr0Vg042WeVHIKmSaAfcDH4hky0YfzhJf4GcB6Q14CbT8rk2pvo1JqBf+O2 25m9Dak7ltyEMS9j3nvG9/lnyUhy5Sh8GLzf25+Eo6kuj6tbxl6xxhYfVFafHycwhS gUTWgY7xXGs8Yin94lB1JJlvXwV5vLbcEluKhNQw= From: "liuhongt at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/114591] [12/13/14 Regression] register allocators introduce an extra load operation since gcc-12 Date: Thu, 11 Apr 2024 07:28:16 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.2.0 X-Bugzilla-Keywords: missed-optimization, ra X-Bugzilla-Severity: normal X-Bugzilla-Who: liuhongt at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.4 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114591 --- Comment #15 from Hongtao Liu --- > I don't see this as problematic. IIRC, there was a discussion in the past > that a couple (two?) memory accesses from the same location close to each > other can be faster (so, -O2, not -Os) than preloading the value to the > register first. At lease for memory with vector mode, it's better to preload the value to register first. >=20 > In contrast, the example from the Comment #11 already has the correct val= ue > in %eax, so there is no need to reload it again from memory, even in a > narrower mode. So the problem is why cse can't handle same memory with narrower mode, maybe it's because there's zero_extend in the first load. cse looks like can hand= le simple wider mode memory. 4952 /* See if a MEM has already been loaded with a widening operation; 4953 if it has, we can use a subreg of that. Many CISC machines 4954 also have such operations, but this is only likely to be 4955 beneficial on these machines. */=