From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5A29538930E4; Mon, 29 Jun 2020 15:20:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5A29538930E4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1593444013; bh=f6O/Sc2EqAWn1YyK07/Y6kVEslHQJR4enQOzXUHGDGQ=; h=From:To:Subject:Date:From; b=yUgyWucgpHlpphf/lhIvbLIDSi9WfzqxSKPzybNbUAmrcvq67Ut9auXI0NPM9cgRb iHoPh8kEa56qFQYIsJAbI19n6a3xzgEK0xi346JvsIrLPC4n2BV0w4UqeI9XrSvcJr dPioajQPHznSKey5sB85awE0Pw9Jw3Tc8iwFut0s= From: "rsandifo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/95967] New: Poor aarch64 vector constructor code when using arm_neon.h Date: Mon, 29 Jun 2020 15:20:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rsandifo at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter dependson blocked target_milestone cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2020 15:20:13 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95967 Bug ID: 95967 Summary: Poor aarch64 vector constructor code when using arm_neon.h Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Depends on: 95962 Blocks: 95958 Target Milestone: --- Target: aarch64*-*-* We generate poor code for the attached functions: f1: movi v4.4s, 0 ins v4.s[0], v0.s[0] ins v4.s[1], v1.s[0] ins v4.s[2], v2.s[0] mov v0.16b, v4.16b ins v0.s[3], v3.s[0] ret f2: dup v0.4s, v0.s[0] ins v0.s[1], v1.s[0] ins v0.s[2], v2.s[0] ins v0.s[3], v3.s[0] ret f3: sub sp, sp, #16 stp s0, s1, [sp] stp s2, s3, [sp, 8] ldr q0, [sp] add sp, sp, 16 ret g1: movi v0.4s, 0 ld1 {v0.s}[0], [x0] ld1 {v0.s}[1], [x1] ld1 {v0.s}[2], [x2] ld1 {v0.s}[3], [x3] ret g2: ld1r {v0.4s}, [x0] ld1 {v0.s}[1], [x1] ld1 {v0.s}[2], [x2] ld1 {v0.s}[3], [x3] ret g3: sub sp, sp, #16 ldr s0, [x3] ldr s3, [x0] ldr s2, [x1] ldr s1, [x2] stp s3, s2, [sp] stp s1, s0, [sp, 8] ldr q0, [sp] add sp, sp, 16 ret All three f functions should generate: mov v0.s[1], v1.s[0] mov v0.s[2], v2.s[0] mov v0.s[3], v3.s[0] ret and all three g functions should generate: ldr s0, [x0] ld1 { v0.s }[1], [x1] ld1 { v0.s }[2], [x2] ld1 { v0.s }[3], [x3] ret which is what current Clang does. Getting the right code for f3 and g3 depends on the fix for PR95962. There's a reasonable chance that PR95962 will be enough on its own to fix f3 and g3, but I included them just in case it isn't. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95958 [Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95962 [Bug 95962] Inefficient code for simple arm_neon.h iota operation=