From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id B58293858CDA; Sun, 28 May 2023 10:24:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B58293858CDA DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1685269464; bh=Pqmlo6x5ont1mbA0tFWCXM6My9NyYJw3JkGmFtJCSaQ=; h=From:To:Subject:Date:In-Reply-To:References:From; b=ie8iuNmgqKDPDq7lFE2Anu1+dtmUhXm1z+qG/nwd2Va/jG9YbKhulvOZj0WRTnJWo rOK7gXp97Cr5DKaiEyq5MpK1PpFPUnvLiUnsT7JWAjFY2cEyDEoSojQJoZM70DE526 P4ewIUYjBg9rOtRruVTsfHOWyLiRN4mOkPUQiXUc= From: "klepikov.alex+bugs at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/49263] SH Target: underutilized "TST #imm, R0" instruction Date: Sun, 28 May 2023 10:24:22 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 4.6.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: klepikov.alex+bugs at gmail dot com X-Bugzilla-Status: REOPENED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: olegendo at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D49263 --- Comment #43 from Alexander Klepikov --- > > Thank you! I have an idea. If it's impossible to defer initial optimiza= tion, > > maybe it's possible to emit some intermediate insn and catch it and opt= imize > > later? >=20 > This is basically what is supposed to be happening there already. Well, not really. Look what's happening during expand pass when 'ashrsi3' is expanding. Function 'expand_ashiftrt' is called and what it does at the end= - it explicitly emits 3 insns: wrk =3D gen_reg_rtx (Pmode); //This one emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]); sprintf (func, "__ashiftrt_r4_%d", value); rtx lab =3D function_symbol (wrk, func, SFUNC_STATIC).lab; //This one emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab)); //And this one emit_move_insn (operands[0], gen_rtx_REG (SImode, 4)); As far as I understand these insns could be catched later by a peephole and converted to 'tstsi_t' insn like it is done for other much simple insn sequences. What I'm thinkig about is to emit only one, say 'compound', insn. Which cou= ld be then splitted later somwhere in split pass to function call, to those 3 insns. I wrote test code that emits only one bogus insn. This insn expands to pure= asm code. Of course, that asm code is invalid, because it is impossible to plac= e a libcall label at the end of function with pure asm code injection. But then= all what is could be coverted to 'tst', converts to 'tst'. And all pure right shifts converts to invalid asm code, of course. That's why I am thinking about possibility of emitting some intermediate in= sn at expand pass that will defer it real expanding. But I still don't know ho= w to do it right and even if it is possible. By the way, right shift for integers expands to only one 'lshiftrt' insn and that's why it can be catched and converted to 'tst'. >=20 > However, it's a bit of a dilemma. >=20 > 1) If we don't have a dynamic shift insn and we smash the constant shift > into individual=20 > stitching shifts early, it might open some new optimization opportunities, > e.g. by sharing intermediate shift results. Not sure though if that > actually happens in practice though. >=20 > 2) Whether to use the dynamic shift insn or emit a function call or use > stitching shifts sequence, it all has an impact on register allocation and > other instruction use. This can be problematic during the course of RTL > optimization passes. >=20 > 3) Even if we have a dynamic shift, sometimes it's more beneficial to emi= t a > shorter stitching shift sequence. Which one is better depends on the > surrounding code. I don't think there is anything good there to make the > proper choice. >=20 > Some other shift related PRs: PR 54089, PR 65317, PR 67691, PR 67869, PR > 52628, PR 58017 Thank you for your time and detailed explanations! I agree with you on all points. Software cannot be perfect and it's OK for GCC not to be super optimized, so this part better sholud be left intact. By the way, I tried to link library to my project and I figured out that li= nker is smart enough to link only necessary library functions even without LTO. = So increase in size is about 100 or 200 bytes, that is acceptable. Thank you v= ery much for help!=