From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6E18A3858C50; Thu, 2 May 2024 11:12:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6E18A3858C50 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1714648345; bh=q6DUxBtZ7nqvWTaMxR/YIsZFTre6oZWVK8koaj/uS54=; h=From:To:Subject:Date:In-Reply-To:References:From; b=rpDfFF/OVK5PzCGxyDMlDqc9mVVPANqk/IvUN13zlTgMG/doHQhU63I/g2iipZNMR DXjUl8vmfaIBz9VCXq58l1vWYewpXxHrIyfV13735NbvpU9HuXV+TLzHB/2MxAf6e3 rSGMDSXCEPMMnIdjJ73SutAh8mr2h1Jg8LzHhweM= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114908] fails to optimize avx2 in-register permute written with std::experimental::simd Date: Thu, 02 May 2024 11:12:24 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc cf_gcctarget Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114908 Richard Biener changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mkretz at gcc dot gnu.org, | |rguenth at gcc dot gnu.org Target|x86-64-v3 |x86_64-*-* --- Comment #2 from Richard Biener --- The memcpy() calls are definitely a hindrance. I suppose that update-address-taken could replace some of them with BIT_INSERT_EXPRs but t= hen it doesn't handle any calls right now. Replacing the memcpy on its own would be possi= ble (special-casing just the "sub-vector" case) like __builtin_memcpy (&__r, &data, 24); to _1 =3D __r; _2 =3D data; _3 =3D VEC_PERM <_2, _1, {0, 1, 2, 7 }>; __r =3D _3; or if copying a single element using BIT_INSERT_EXPR. OTOH that's not good if __r stays in memory (the whole vector store might be good to avoid STLF fails, but the read will be bad for the same reason). The update-address-taken pass would know __r and data become registers. We already have a similar case involving ATOMIC_COMPARE_EXCHANGE that has delayed processing requring register arguments. It might or might not be a good example how to deal with this.=