From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id B58293858CDA; Sun, 28 May 2023 10:24:24 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B58293858CDA
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1685269464;
	bh=Pqmlo6x5ont1mbA0tFWCXM6My9NyYJw3JkGmFtJCSaQ=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=ie8iuNmgqKDPDq7lFE2Anu1+dtmUhXm1z+qG/nwd2Va/jG9YbKhulvOZj0WRTnJWo
	 rOK7gXp97Cr5DKaiEyq5MpK1PpFPUnvLiUnsT7JWAjFY2cEyDEoSojQJoZM70DE526
	 P4ewIUYjBg9rOtRruVTsfHOWyLiRN4mOkPUQiXUc=
From: "klepikov.alex+bugs at gmail dot com" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/49263] SH Target: underutilized "TST #imm, R0"
 instruction
Date: Sun, 28 May 2023 10:24:22 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 4.6.1
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: klepikov.alex+bugs at gmail dot com
X-Bugzilla-Status: REOPENED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: olegendo at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-49263-4-I6txl8NYw2@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-49263-4@http.gcc.gnu.org/bugzilla/>
References: <bug-49263-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D49263

--- Comment #43 from Alexander Klepikov <klepikov.alex+bugs at gmail dot co=
m> ---
> > Thank you! I have an idea. If it's impossible to defer initial optimiza=
tion,
> > maybe it's possible to emit some intermediate insn and catch it and opt=
imize
> > later?
>=20
> This is basically what is supposed to be happening there already.

Well, not really. Look what's happening during expand pass when 'ashrsi3' is
expanding. Function 'expand_ashiftrt' is called and what it does at the end=
 -
it explicitly emits 3 insns:

wrk =3D gen_reg_rtx (Pmode);

  //This one
  emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);

  sprintf (func, "__ashiftrt_r4_%d", value);
  rtx lab =3D function_symbol (wrk, func, SFUNC_STATIC).lab;

  //This one
  emit_insn (gen_ashrsi3_n (GEN_INT (value), wrk, lab));

  //And this one
  emit_move_insn (operands[0], gen_rtx_REG (SImode, 4));

As far as I understand these insns could be catched later by a peephole and
converted to 'tstsi_t' insn like it is done for other much simple insn
sequences.

What I'm thinkig about is to emit only one, say 'compound', insn. Which cou=
ld
be then splitted later somwhere in split pass to function call, to those 3
insns.

I wrote test code that emits only one bogus insn. This insn expands to pure=
 asm
code. Of course, that asm code is invalid, because it is impossible to plac=
e a
libcall label at the end of function with pure asm code injection. But then=
 all
what is could be coverted to 'tst', converts to 'tst'. And all pure right
shifts converts to invalid asm code, of course.

That's why I am thinking about possibility of emitting some intermediate in=
sn
at expand pass that will defer it real expanding. But I still don't know ho=
w to
do it right and even if it is possible.

By the way, right shift for integers expands to only one 'lshiftrt' insn and
that's why it can be catched and converted to 'tst'.

>=20
> However, it's a bit of a dilemma.
>=20
> 1) If we don't have a dynamic shift insn and we smash the constant shift
> into individual=20
> stitching shifts early, it might open some new optimization opportunities,
> e.g. by sharing intermediate shift results.  Not sure though if that
> actually happens in practice though.
>=20
> 2) Whether to use the dynamic shift insn or emit a function call or use
> stitching shifts sequence, it all has an impact on register allocation and
> other instruction use.  This can be problematic during the course of RTL
> optimization passes.
>=20
> 3) Even if we have a dynamic shift, sometimes it's more beneficial to emi=
t a
> shorter stitching shift sequence.  Which one is better depends on the
> surrounding code.  I don't think there is anything good there to make the
> proper choice.
>=20
> Some other shift related PRs: PR 54089, PR 65317, PR 67691, PR 67869, PR
> 52628, PR 58017

Thank you for your time and detailed explanations! I agree with you on all
points. Software cannot be perfect and it's OK for GCC not to be super
optimized, so this part better sholud be left intact.

By the way, I tried to link library to my project and I figured out that li=
nker
is smart enough to link only necessary library functions even without LTO. =
So
increase in size is about 100 or 200 bytes, that is acceptable. Thank you v=
ery
much for help!=