From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 8CE333858D20; Tue, 23 Jan 2024 03:27:19 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8CE333858D20
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1705980439;
	bh=tmmldXKVV7ykL3P/puprakMj9yGLf6O3Nvd7mpeevAI=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=PHjwGdktxLvNqPjAJbcqZmncEQ/5Xd001OZkgcbR5nzyUoKyjvWavx9rioRC2LosW
	 TJ2O6M17zZ24IcOfMrJwCE4tBSLSHf6owCIHpj/FP0FmmoIhDM/xAP0HcLBeVsMeJG
	 qFIIapjxb+hZimGTsVz05wMUQNCLMT8rkMsZdsV8=
From: "olegendo at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug rtl-optimization/113533] [14 Regression] Code generation
 regression after change for pr111267
Date: Tue, 23 Jan 2024 03:27:19 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: rtl-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: olegendo at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 14.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-113533-4-ymRSLixphF@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-113533-4@http.gcc.gnu.org/bugzilla/>
References: <bug-113533-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113533
--- Comment #11 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #10)

> I've found an interesting table of SH cycle counts (for different CPUs) at
> http://www.shared-ptr.com/sh_insns.html

Yeah, I know.  I did that ;)

> In my proposed patch, the address cost (1) when optimizing for size attem=
pts
> to return the additional size of an instruction based on the addressing
> mode.  For register, and reg+reg addressing modes there is no size increa=
se
> (overhead), and for adressing modes with displacements, and displacements=
 to
> address pointers, there is a cost.

AFAIR, I've added the 'sh_address_cost' function.  The intention was/is to
encourage/discourage usage of certain address modes based on the side effec=
ts
and impact on the surrounding code.  All insns/addr modes have the same len=
gth
and basically same execution time.  However, e.g. @(reg+reg) has a constrai=
nt
on 'r0' usage, so I weighted that heavier.  If there's anything that could =
use
@(reg+disp) as an alternative, that'd be better in some cases. (not sure if
such optimizations actually are done...)

> (2) when optimizing for speed, address
> cost remains between 0 and 3, and is used to prioritize between (equivale=
nt
> numbers of) instructions.  Normally, rtx_costs are defined in terms of
> COST_N_INSNS, which multiplies by 4.  Hence on many platforms a single
> instruction that references memory may be encoded as COSTS_N_INSNS(1)+1 (=
or
> a more complex addressing mode as COSTS_N_INSNS(1)+2) to show that this is
> disfavored to a single instruction that doesn't reference memory,
> COSTS_N_INSNS(1)+0.

That's actually what sh_rtx_costs was supposed to do as well.  I think in u=
sual
cases it does that, only that apparently I've screwed up the {SIGN|ZERO}_EX=
TEND
for the case of the mem load and it shows up only now, many years later.

It's still not entirely clear to me why we would want to squash the costs of
addresses to 0 when optimizing for size?  What does effect does it have on =
the
generated code?  I can't imagine how it would be possibly making any smaller
code?

With your patch, in case of the SIGN_EXTEND with mem operand, it would make=
 the
address cost 0 with -Os, which would return COSTS_N_INSNS(1) for reg operan=
d as
well as mem operand.  So both insns are equally weighted and could be
considered interchangeable.  And we might bump into this type of regression
again, if some (future) optimization decides that it can interchange/substi=
tute
insns of the same cost...=20


> For example, SH currently reports multiplications as a single cycle opera=
tion,

That doesn't seem to be the case.  It's supposed to be using the function
'multcosts' in sh.cc, which returns at least a cost of '2'.  Note that on S=
H1
and SH2 there is no dynamic (barrel) shift.  So actually some multiplicatio=
ns
could be faster than stitched shifts.


> sh_rtx_costs doesn't distinguish the machine mode, so the costs of SImode=
 multiplications are the same as DImode multiplications.

I guess this is because SH doesn't have real DImode multiplication (64 x 64=
 ->
64/128 bit).  It can only do 32 x 32 -> 64 bit widening multiplication.  Any
real DImode multiplication will result in either expanded sequence to calcu=
late
sum of particial products or a libcall, AFAIR=