From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 493A83858D3C; Mon, 17 Oct 2022 07:43:40 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 493A83858D3C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1665992620;
	bh=YLN3K3TWBfN437E5kba8Pp4lvdA4NJAb5wgNsiCvzkM=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=eXlrafVtOkcVfsUoDKexymG+EHsjLhaYnom49ahF+M5DVXmPd9JpRqmV0zoTKOAYn
	 tmXjdB0+UxCaMssmxrjz2n6OwjZ8hDzOez4TDKkkVlx2zDJzmPxd6MMtKSQNMhfHPJ
	 SACbYIEfEwSSFrFe/hYZZA7VBCjL6xoN2LSnWaUU=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/107263] Memcpy not elided when initializing
 struct
Date: Mon, 17 Oct 2022 07:43:39 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: enhancement
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cf_reconfirmed_on version everconfirmed bug_status
 cc
Message-ID: <bug-107263-4-wkZeREfYqD@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-107263-4@http.gcc.gnu.org/bugzilla/>
References: <bug-107263-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107263

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-10-17
            Version|unknown                     |13.0
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
                 CC|                            |jamborm at gcc dot gnu.org
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  The frontend leaves us with

  <<cleanup_point   struct Foo tmp =3D {};>>;
  <<cleanup_point <<< Unknown tree: expr_stmt
    tmp.next =3D f->next >>>>>;
  <<cleanup_point <<< Unknown tree: expr_stmt
    (void) (*NON_LVALUE_EXPR <f> =3D *(const struct Foo &) &tmp) >>>>>;

and ESRA sees

  <bb 2> :
  tmp =3D {};
  _1 =3D f_4(D)->next;
  tmp.next =3D _1;
  *f_4(D) =3D tmp;
  tmp =3D{v} {CLOBBER(eol)};
  return;

ESRA somewhat senselessly does

  <bb 2> :
  tmp =3D {};
  tmp$next_8 =3D 0B;
  _1 =3D f_4(D)->next;
  tmp$next_9 =3D _1;
  tmp.next =3D tmp$next_9;
  *f_4(D) =3D tmp;
  tmp =3D{v} {CLOBBER(eol)};
  return;

it doesn't scalarize the array because that's too large.  I would guess
that Clang doesn't split the initializer and thus its aggregate copy
propagation somehow manages to elide 'tmp'.  We don't have a good
place to peform the desired optimization, certainly the split
initialization of 'tmp' complicates things.

In principle it would be SRAs job since I think it does most of the
necessary analysis, it just lacks knowledge on how to re-materialize
*f_4(D) efficiently at the point of the aggregate assignment?
It has

Candidate (2384): tmp
Too big to totally scalarize: tmp (UID: 2384)
Created a replacement for tmp offset: 0, size: 64: tmp$nextD.2425

Access trees for tmp (UID: 2384):
access { base =3D (2384)'tmp', offset =3D 0, size =3D 4736, expr =3D tmp, t=
ype =3D struct
Foo, reverse =3D 0, grp_read =3D 1, grp_write =3D 1, grp_assignment_read =
=3D 1,
grp_assignment_write =3D 1, grp_scalar_read =3D 0, grp_scalar_write =3D 0,
grp_total_scalarization =3D 0, grp_hint =3D 0, grp_covered =3D 0,
grp_unscalarizable_region =3D 0, grp_unscalarized_data =3D 1, grp_same_acce=
ss_path
=3D 1, grp_partial_lhs =3D 0, grp_to_be_replaced =3D 0, grp_to_be_debug_rep=
laced =3D 0}
* access { base =3D (2384)'tmp', offset =3D 0, size =3D 64, expr =3D tmp.ne=
xt, type =3D
struct Foo *, reverse =3D 0, grp_read =3D 1, grp_write =3D 1, grp_assignmen=
t_read =3D
1, grp_assignment_write =3D 1, grp_scalar_read =3D 0, grp_scalar_write =3D =
1,
grp_total_scalarization =3D 0, grp_hint =3D 0, grp_covered =3D 1,
grp_unscalarizable_region =3D 0, grp_unscalarized_data =3D 0, grp_same_acce=
ss_path
=3D 1, grp_partial_lhs =3D 0, grp_to_be_replaced =3D 1, grp_to_be_debug_rep=
laced =3D 0}

but it fails to record that for the size 4736 write there's a clear perform=
ed
that's cheaply to re-materialize (and no variables need to be created).  SRA
could probably track writes from only constants that way, avoiding to create
scalar replacements.=