From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 493A83858D3C; Mon, 17 Oct 2022 07:43:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 493A83858D3C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665992620; bh=YLN3K3TWBfN437E5kba8Pp4lvdA4NJAb5wgNsiCvzkM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=eXlrafVtOkcVfsUoDKexymG+EHsjLhaYnom49ahF+M5DVXmPd9JpRqmV0zoTKOAYn tmXjdB0+UxCaMssmxrjz2n6OwjZ8hDzOez4TDKkkVlx2zDJzmPxd6MMtKSQNMhfHPJ SACbYIEfEwSSFrFe/hYZZA7VBCjL6xoN2LSnWaUU= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/107263] Memcpy not elided when initializing struct Date: Mon, 17 Oct 2022 07:43:39 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cf_reconfirmed_on version everconfirmed bug_status cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107263 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2022-10-17 Version|unknown |13.0 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW CC| |jamborm at gcc dot gnu.org --- Comment #1 from Richard Biener --- Confirmed. The frontend leaves us with <>; <next >>>>>; < =3D *(const struct Foo &) &tmp) >>>>>; and ESRA sees : tmp =3D {}; _1 =3D f_4(D)->next; tmp.next =3D _1; *f_4(D) =3D tmp; tmp =3D{v} {CLOBBER(eol)}; return; ESRA somewhat senselessly does : tmp =3D {}; tmp$next_8 =3D 0B; _1 =3D f_4(D)->next; tmp$next_9 =3D _1; tmp.next =3D tmp$next_9; *f_4(D) =3D tmp; tmp =3D{v} {CLOBBER(eol)}; return; it doesn't scalarize the array because that's too large. I would guess that Clang doesn't split the initializer and thus its aggregate copy propagation somehow manages to elide 'tmp'. We don't have a good place to peform the desired optimization, certainly the split initialization of 'tmp' complicates things. In principle it would be SRAs job since I think it does most of the necessary analysis, it just lacks knowledge on how to re-materialize *f_4(D) efficiently at the point of the aggregate assignment? It has Candidate (2384): tmp Too big to totally scalarize: tmp (UID: 2384) Created a replacement for tmp offset: 0, size: 64: tmp$nextD.2425 Access trees for tmp (UID: 2384): access { base =3D (2384)'tmp', offset =3D 0, size =3D 4736, expr =3D tmp, t= ype =3D struct Foo, reverse =3D 0, grp_read =3D 1, grp_write =3D 1, grp_assignment_read = =3D 1, grp_assignment_write =3D 1, grp_scalar_read =3D 0, grp_scalar_write =3D 0, grp_total_scalarization =3D 0, grp_hint =3D 0, grp_covered =3D 0, grp_unscalarizable_region =3D 0, grp_unscalarized_data =3D 1, grp_same_acce= ss_path =3D 1, grp_partial_lhs =3D 0, grp_to_be_replaced =3D 0, grp_to_be_debug_rep= laced =3D 0} * access { base =3D (2384)'tmp', offset =3D 0, size =3D 64, expr =3D tmp.ne= xt, type =3D struct Foo *, reverse =3D 0, grp_read =3D 1, grp_write =3D 1, grp_assignmen= t_read =3D 1, grp_assignment_write =3D 1, grp_scalar_read =3D 0, grp_scalar_write =3D = 1, grp_total_scalarization =3D 0, grp_hint =3D 0, grp_covered =3D 1, grp_unscalarizable_region =3D 0, grp_unscalarized_data =3D 0, grp_same_acce= ss_path =3D 1, grp_partial_lhs =3D 0, grp_to_be_replaced =3D 1, grp_to_be_debug_rep= laced =3D 0} but it fails to record that for the size 4736 write there's a clear perform= ed that's cheaply to re-materialize (and no variables need to be created). SRA could probably track writes from only constants that way, avoiding to create scalar replacements.=