From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 7D09B384AB45; Wed, 15 May 2024 07:05:20 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7D09B384AB45
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1715756720;
	bh=pPFkRO9yKMJwmLwnnPXJ5S3uGG1FIacKKeLnhvn4BLo=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=W9XWp4P8cI9S8JORBf56H+27ftIEi1iml9eWJtCcrVJUvZIJxZRfEbiChSuSU08Rf
	 8nE0KGex5dq+RP6X59cEvr7ldLXaSDF+uPjsbvE5HP+eVOkqK6K+AuWT/iO9wS7wh4
	 AFLsr/Fh7A+hueEHrIUmXxRoAiYFPlz8aw0EelNQ=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug ipa/115097] Strange suboptimal codegen specifically at -O2 when
 copying struct type
Date: Wed, 15 May 2024 07:05:20 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: ipa
X-Bugzilla-Version: 15.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: component cc
Message-ID: <bug-115097-4-ZsyBjcxj1X@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-115097-4@http.gcc.gnu.org/bugzilla/>
References: <bug-115097-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D115097

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|tree-optimization           |ipa
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
So actually it seems that the reason is ICF plus inlining:

t.ii:2:3: optimized: Semantic equality hit:A test1(A&)/0->A test2(A&&)/1
t.ii:2:3: optimized: Assembler symbol names:_Z5test1R1A/0->_Z5test2O1A/1
t.ii:2:3: optimized: Semantic equality hit:A test1(A&)/0->A test3(const A&)=
/2
t.ii:2:3: optimized: Assembler symbol names:_Z5test1R1A/0->_Z5test3RK1A/2
t.ii:2:3: optimized: Semantic equality hit:A test1(A&)/0->A test4(const A&&=
)/3
t.ii:2:3: optimized: Assembler symbol names:_Z5test1R1A/0->_Z5test4OK1A/3
optimized:  Inlined A test1(A&)/4 into A test2(A&&)/1 which now has time
4.000000 and size 5, net change of -1.
optimized:  Inlined A test1(A&)/5 into A test3(const A&)/2 which now has ti=
me
4.000000 and size 5, net change of -1.
optimized:  Inlined A test1(A&)/6 into A test4(const A&&)/3 which now has t=
ime
4.000000 and size 5, net change of -1.

for some reason we "optimize" the functions to the following in IPA ICF:

struct A test4 (const struct A & a)
{
  struct A retval.6;

  <bb 2> [local count: 1073741824]:
  retval.6 =3D test1 (a_2(D)); [tail call]
  return retval.6;

}


struct A test3 (const struct A & a)
{
  struct A retval.5;

  <bb 2> [local count: 1073741824]:
  retval.5 =3D test1 (a_2(D)); [tail call]
  return retval.5;

}


struct A test2 (struct A & a)
{
  struct A retval.4;

  <bb 2> [local count: 1073741824]:
  retval.4 =3D test1 (a_2(D)); [tail call]
  return retval.4;

}

and then we inline them back, introducing the extra copy.  Why do we use
tail-calls here instead of aliases?  Why do we lack cost modeling here?
Why do we inline back?  It looks like a pointless exercise to me ...
With -fdisable-ipa-inline we get

_Z5test2O1A:
.LFB5:
        .cfi_startproc
        jmp     _Z5test1R1A

so that's at least reasonable and what's expected I suppose.  So one
could argue the bug is in the inliner and with introducing the extra
copy (IIRC there's a bug about this), but still.=