From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0C4BE39540C4; Wed, 7 Jul 2021 13:59:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0C4BE39540C4 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/99728] code pessimization when using wrapper classes around SIMD types Date: Wed, 07 Jul 2021 13:59:35 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 10.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Jul 2021 13:59:36 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99728 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jason at gcc dot gnu.org --- Comment #17 from Richard Biener --- So we have val$v_62 =3D *d_28(D).lam1D.32701.vD.32579; *d_28(D).lam1D.32701 =3D *d_28(D).lam2D.32702; *d_28(D).lam2D.32702.vD.32579 =3D _34; I believe that the SRA pass has the best analysis capabilities to eventually decompose aggregate copies into register pieces (with cost considerations). In particular it knows (but without flow info) what kind of types sub-accesses use. Since we want the aggregate copy replaced with pieces that match the rest of the accesses (here because of LIMs restrictions). In particular we'd like to use 'vector double' typed accesses here, sth the middle-end usually avoids for block-copies which aggregate copies are to the middle-end. That said, it would be _much_ easier if the frontend with its language spec= ific semantic knowledge could avoid doing block-copies for such simple wrappers and instead perform (recursively) memberwise copy (for single member aggregates). Of course the simple fix in source is to add Tvsimple &operator=3D(const Tvsimple &other) { v =3D other.v; return *thi= s;} producing optimal code. Jason - would you consider this premature "optimization" in the C++ frontend? It doesn't seem that there's a operator=3D synthesized, instead we directly emit <lam1 =3D *(const struct Tvsimple &) &d->lam2) >>>>>; from d.lam1 =3D d.lam2; from build_over_call which has a series of optimizations at else if (DECL_ASSIGNMENT_OPERATOR_P (fn) && DECL_OVERLOADED_OPERATOR_IS (fn, NOP_EXPR) && trivial_fn_p (fn)) { ... if (is_really_empty_class (type, /*ignore_vptr*/true)) { /* Avoid copying empty classes. */ val =3D build2 (COMPOUND_EXPR, type, arg, to); suppress_warning (val, OPT_Wunused); } else if (tree_int_cst_equal (TYPE_SIZE (type), TYPE_SIZE (as_base))) { if (is_std_init_list (type) && conv_binds_ref_to_prvalue (convs[1])) warning_at (loc, OPT_Winit_list_lifetime, "assignment from temporary % doe= s " "not extend the lifetime of the underlying array"); arg =3D cp_build_fold_indirect_ref (arg); val =3D build2 (MODIFY_EXPR, TREE_TYPE (to), to, arg); so we handle empty classes, maybe we can also handle single data-member classes (not sure how to exactly test for this - walking TYPE_FIELDs repeatedly for each considered assignment would be slow I guess).=