From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A6CC13858424; Sat, 5 Nov 2022 07:42:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A6CC13858424 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1667634163; bh=xB4nixukLhkedPjwjwdtInq8mBDQiNVPg0yUDbceF8U=; h=From:To:Subject:Date:From; b=h5DF46ZL5O0a2AE+BgefDpuVNb8zvcLAIhfZwXBOgJY44DQ+Tw438gnRxWf75ViTy voL6fItzPqOAJbQUngYFNjTCEl0Mf8pL9DyJMx2xkD4UrVWs8FQhImxq6qi+zzwKPG 9qRwTGqRGQZ3ENciAScMtgopqzC6sV1FP3S2sQ+4= From: "ramana at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/107533] New: Inefficient code sequence for fp16 testcase on aarch64 Date: Sat, 05 Nov 2022 07:42:43 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: ramana at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107533 Bug ID: 107533 Summary: Inefficient code sequence for fp16 testcase on aarch64 Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ramana at gcc dot gnu.org Target Milestone: --- Derived from PR92999=20 struct phalf { __fp16 first; __fp16 second; }; struct phalf phalf_copy(struct phalf* src) __attribute__((noinline)); struct phalf phalf_copy(struct phalf* src) { return *src; } Compiling for AArch64 with a recent enough compiler produces.=20 phalf_copy: ldr w0, [x0] ubfx x1, x0, 0, 16 lsr w0, w0, 16 dup v0.4h, w1 dup v1.4h, w0 ret Couldn't it just be ldr h0, [x0] ldr h1, [x0, 2]=20 IIRC this is in base v8 rather than v8.2=20 regards Ramana=