From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D7A503861004; Wed, 24 Mar 2021 08:52:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D7A503861004 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/99728] code pessimization when using wrapper classes around SIMD types Date: Wed, 24 Mar 2021 08:52:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 10.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc bug_status everconfirmed component cf_reconfirmed_on Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Mar 2021 08:52:39 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99728 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org Status|UNCONFIRMED |NEW Ever confirmed|0 |1 Component|c++ |tree-optimization Last reconfirmed| |2021-03-24 --- Comment #9 from Richard Biener --- The issue for store-motion is that we see an aggregate copy: Unanalyzed memory reference 0: *d_28(D).lam1 =3D *d_28(D).lam2; __MEM (d_28(D)).lam1 =3D __MEM (d_28(D)).lam2; __MEM (d_28(D)).lam2.v =3D _38; il_36 =3D il_74 + 1ul; there is another PR about those Unanalyzed refs preventing LIM/SM but then getting rid of those aggregate copies would be nice as well since many passes do not like them. I suppose 'vtype' in this case has a FP mode which prevents us from simplistic folding of this (unless we'd always expand those to FP load/store sequences). Indeed, we're copying type unit-size align:256 warn_if_not_align:0 symtab:0 alias-set 1 canonical-type 0x7ffff437dbd0 fields public external autoinline decl_3 QI t.C:3:8 align:16 warn_if_not_align:0 context full-name "constexpr Tvsimple& Tvsimple::operator=3D(Tvsimple&&) noexcept ()" not-really-extern chain > context full-name "struct Tvsimple" needs-constructor X() X(constX&) this=3D(X&) n_parents=3D0 use_temp= late=3D0 interface-unknown pointer_to_this reference_to_this chain > OK, so for a simple struct X { double x; }; void foo (struct X *x, struct X *y) { *x =3D *y; } we do generate x87 FP load/store insns and do not transfer bytes. Probably OK from a C language perspective but questionable on the GIMPLE side (we've been there before). So one thing we can experiment with is to gimplify those aggregate copies to register load/store when the aggregates have been assigned non-BLKmode by the target. This might of course confuse SRA which means that SRA itself might be a better place to perform this optimization. [mind struct { double; double; } on x86 gets TImode for example]=