From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 8CE333858D20; Tue, 23 Jan 2024 03:27:19 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8CE333858D20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1705980439; bh=tmmldXKVV7ykL3P/puprakMj9yGLf6O3Nvd7mpeevAI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=PHjwGdktxLvNqPjAJbcqZmncEQ/5Xd001OZkgcbR5nzyUoKyjvWavx9rioRC2LosW TJ2O6M17zZ24IcOfMrJwCE4tBSLSHf6owCIHpj/FP0FmmoIhDM/xAP0HcLBeVsMeJG qFIIapjxb+hZimGTsVz05wMUQNCLMT8rkMsZdsV8= From: "olegendo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/113533] [14 Regression] Code generation regression after change for pr111267 Date: Tue, 23 Jan 2024 03:27:19 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: olegendo at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113533 --- Comment #11 from Oleg Endo --- (In reply to Roger Sayle from comment #10) > I've found an interesting table of SH cycle counts (for different CPUs) at > http://www.shared-ptr.com/sh_insns.html Yeah, I know. I did that ;) > In my proposed patch, the address cost (1) when optimizing for size attem= pts > to return the additional size of an instruction based on the addressing > mode. For register, and reg+reg addressing modes there is no size increa= se > (overhead), and for adressing modes with displacements, and displacements= to > address pointers, there is a cost. AFAIR, I've added the 'sh_address_cost' function. The intention was/is to encourage/discourage usage of certain address modes based on the side effec= ts and impact on the surrounding code. All insns/addr modes have the same len= gth and basically same execution time. However, e.g. @(reg+reg) has a constrai= nt on 'r0' usage, so I weighted that heavier. If there's anything that could = use @(reg+disp) as an alternative, that'd be better in some cases. (not sure if such optimizations actually are done...) > (2) when optimizing for speed, address > cost remains between 0 and 3, and is used to prioritize between (equivale= nt > numbers of) instructions. Normally, rtx_costs are defined in terms of > COST_N_INSNS, which multiplies by 4. Hence on many platforms a single > instruction that references memory may be encoded as COSTS_N_INSNS(1)+1 (= or > a more complex addressing mode as COSTS_N_INSNS(1)+2) to show that this is > disfavored to a single instruction that doesn't reference memory, > COSTS_N_INSNS(1)+0. That's actually what sh_rtx_costs was supposed to do as well. I think in u= sual cases it does that, only that apparently I've screwed up the {SIGN|ZERO}_EX= TEND for the case of the mem load and it shows up only now, many years later. It's still not entirely clear to me why we would want to squash the costs of addresses to 0 when optimizing for size? What does effect does it have on = the generated code? I can't imagine how it would be possibly making any smaller code? With your patch, in case of the SIGN_EXTEND with mem operand, it would make= the address cost 0 with -Os, which would return COSTS_N_INSNS(1) for reg operan= d as well as mem operand. So both insns are equally weighted and could be considered interchangeable. And we might bump into this type of regression again, if some (future) optimization decides that it can interchange/substi= tute insns of the same cost...=20 > For example, SH currently reports multiplications as a single cycle opera= tion, That doesn't seem to be the case. It's supposed to be using the function 'multcosts' in sh.cc, which returns at least a cost of '2'. Note that on S= H1 and SH2 there is no dynamic (barrel) shift. So actually some multiplicatio= ns could be faster than stitched shifts. > sh_rtx_costs doesn't distinguish the machine mode, so the costs of SImode= multiplications are the same as DImode multiplications. I guess this is because SH doesn't have real DImode multiplication (64 x 64= -> 64/128 bit). It can only do 32 x 32 -> 64 bit widening multiplication. Any real DImode multiplication will result in either expanded sequence to calcu= late sum of particial products or a libcall, AFAIR=