From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 31CFD3858D1E; Mon, 6 May 2024 10:00:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 31CFD3858D1E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1714989601; bh=iLRWtFYYcU4hszOLi5ztdZZlFGSa7EREy812c76fNs8=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Bh5E9uFfP+09sqGww3PYMXRcbhUfMpkITBiBi7g5oS2GUwfbzDSzcSacPqVQjJxAy EU5tz5ffmGLEFUK5dBK6jQlulpmd4QifekVNtNqpmylTX5g/5ktzYmVcU9K9/5zpem 6iiBBw3U8QhBIyzzfJNMTOWWW7A9YJNM32Yku/es= From: "mkretz at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114908] fails to optimize avx2 in-register permute written with std::experimental::simd Date: Mon, 06 May 2024 10:00:01 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: mkretz at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114908 --- Comment #5 from Matthias Kretz (Vir) --- https://godbolt.org/z/P6cfbjT9f #include typedef uint64_t T; typedef T V [[gnu::vector_size(32)]]; typedef struct simd4 { V data; } simd4; typedef struct simd1 { T data; } simd1; typedef struct tup3_1 { simd4 a; simd1 b; } tup3_1; simd1 load1(const T* ptr) { simd1 ret =3D {ptr[0]}; return ret; } simd4 load3(const T* ptr) { simd4 ret =3D {}; __builtin_memcpy(&ret, ptr, 3 * sizeof(T)); return ret; } tup3_1 split3_1(simd4 x) { const T* ptr =3D (T*)&x; tup3_1 ret =3D {load3(ptr), load1(ptr + 3)}; return ret; } simd4 concat1_3(simd1 a, simd4 b) { simd4 ret =3D {}; char* ptr =3D (char*)&ret; __builtin_memcpy(ptr, &a, sizeof(T)); __builtin_memcpy(ptr + sizeof(T), &b, 3 * sizeof(T)); return ret; } simd4 perm(simd4 data) { tup3_1 carry =3D split3_1(data); simd1 zero =3D {}; return concat1_3(zero, carry.a); }=