From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5C30C385B53B; Fri, 12 May 2023 05:34:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5C30C385B53B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1683869678; bh=UQ3HppgbMSPyn81OOEeIOBQkzIIMYCuRYbxdjWeukLI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=aD1sV3vQR8DNUOwfrHwFqAF8YSnGbq83QjZ8QuU3wYRritJI8O9SQvsfD2kmMUJYV +Uh5pWpJvfSXN0rA3UKSsYpHkl5SrBbJRFRtHRrLPUb4snq5omvL3pGXwlUnvZnKDZ AywmSMGdLjK+4S6YI3KxSnEml0BPJpkPfCyrQmQY= From: "pinskia at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/98532] Use load/store pairs for 2-element vector in memory permutes Date: Fri, 12 May 2023 05:34:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: pinskia at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: assigned_to cf_known_to_work bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98532 Andrew Pinski changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|pinskia at gcc dot gnu.org |unassigned at gcc d= ot gnu.org Known to work| |12.1.0 Status|ASSIGNED |NEW --- Comment #3 from Andrew Pinski --- Starting in GCC 12 we produce: vect__1.5_10 =3D *a_4(D); vect__2.6_11 =3D VEC_PERM_EXPR ; *b_6(D) =3D vect__2.6_11; ldr q0, [x0] ext v0.16b, v0.16b, v0.16b, #8 str q0, [x1] RTL level wise: Trying 8 -> 9: 8: r96:V2DI=3Dunspec[r92:V2DI,r92:V2DI,0x1] 237 REG_DEAD r92:V2DI 9: [r98:DI]=3Dr96:V2DI REG_DEAD r98:DI REG_DEAD r96:V2DI Failed to match this instruction: (set (mem:V2DI (reg:DI 98) [1 *b_6(D)+0 S16 A128]) (unspec:V2DI [ (reg:V2DI 92 [ vect__1.5 ]) repeated x2 (const_int 1 [0x1]) ] UNSPEC_EXT)) Trying 7, 8 -> 9: 7: r92:V2DI=3D[r97:DI] REG_DEAD r97:DI 8: r96:V2DI=3Dunspec[r92:V2DI,r92:V2DI,0x1] 237 REG_DEAD r92:V2DI 9: [r98:DI]=3Dr96:V2DI REG_DEAD r98:DI REG_DEAD r96:V2DI Failed to match this instruction: (set (mem:V2DI (reg:DI 98) [1 *b_6(D)+0 S16 A128]) (unspec:V2DI [ (mem:V2DI (reg:DI 97) [1 *a_4(D)+0 S16 A128]) repeated x2 (const_int 1 [0x1]) ] UNSPEC_EXT)) Maybe the aarch64 backend could have a pattern that matches the last 7,8 ->= 9 combined rtl that then expands into a load pair/store pair with reversed registers.=