From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id CEF633858D32; Fri, 12 Jan 2024 19:06:40 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CEF633858D32
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1705086400;
	bh=uaUaq8/Ma9Voq4vDtsUnoljHMhhYvampq3LP8ThkKec=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=c/B/Wpz9Hg3Qq0HBi391HUivLtHi73BVNnumDODsAEqao7MpDR0g+ryO0RlaTevl7
	 Kz711MXGdT0ntZvSCloQ/AfPnfmcGqvlkrc9VREgy1uF9h2bCr3wmV69zs1Sn0EwQq
	 e3LxYtH/DlXQszVEt7D6pKm4Q58a/ZnDS2qnmLf0=
From: "roger at nextmovesoftware dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/111267] [14 Regression] Codegen regression
 from i386 argument passing changes
Date: Fri, 12 Jan 2024 19:06:40 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: roger at nextmovesoftware dot com
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 14.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-111267-4-sr7m1KcBDz@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-111267-4@http.gcc.gnu.org/bugzilla/>
References: <bug-111267-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111267
--- Comment #6 from Roger Sayle <roger at nextmovesoftware dot com> ---
Sorry for the delay in replying/answering Jakub's questions/comments.  Yes,
using a define_insn_and_split in the backend fixes/works around the issue (=
and
I agree your implementation/refinement in comment #5 is better than mine in
comment #2), but I've a feeling that this approach isn't the ideal solution=
.=20
Nothing about this split, is specific to these x86 instructions or even to =
the
i386 backend.

A more generic fix might be teach combine.cc that it can split parallels of=
 two
independent sets, with no inter dependencies, into two insns if the total c=
ost
of the two instructions is less than the original two, i.e. a 2 insn -> 2 i=
nsn
combination.

But then even this doesn't feel like the perfect approach... the reason com=
bine
doesn't already support 2->2 combinations is that they're not normally
required, these types of problems are usually handled by GCSE or CSE or PRE=
 (or
?).

The pattern is insn1 defines REG1 to a complicated expression, that is live=
 in
several locations, so this instruction can't be eliminated.  However, if the
definition of REG1 is provided to insn2 that sets REG2, this second instruc=
tion
can be significantly simplified.  This feels like a classic (non-)constant
propagation problem.  I'm thinking perhaps want_to_gcse_p (or somewhere
similar) could be tweaked.

For people just joining the discussion (hopefully Jeff or a Richard):

(set (REG:DI 1) (concat:DI (REG:SI 2) (REG:SI 3))
...
(set (REG:SI 4) (low_part (REG:DI 1))

can be simplified so that the second assignment becomes just:
(set (REG:SI 4) (REG:SI 2))
and similarly for high_part vs. low_part.  These don't even
need to be in the same basic block.

In actuality, "concat" is a large ugly expression, and high_part/low_part a=
re
actually SUBREGs (or could be TRUNCATE or SHIFT+TRUNCATE), but the theory
should remain the same.

I'm trying to figure out which pass (or cselib?) is normally responsible for
handling this type of pseudo-reg propagation.

But the define_insn_and_split certainly papers over the deficiency in the
middle-end's RTL optimizers and fixes this (very) specific case/regression.=