From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id ECA893892453; Mon, 29 Jun 2020 13:03:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ECA893892453 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1593435796; bh=6NNobd/gnUn85auzxSbbjqv4m48voAq/CUhFeeZkEW8=; h=From:To:Subject:Date:From; b=j9pomaPhGZsKy9lr5UkpHFg4LRaGDW9Mm8Dbvg7SJq4tNOL4M1sM+3wSbDp5qRe3G eLoba/kX7YNj7oy6/BIAypsUMot5aq/PwyTAEhBx83nwPghtbaf5Jt5SGqMpuvtbEn /zkft5mY8M8SvQzg92UK6SZcZSilZ9mUAtrdRW0E= From: "rsandifo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/95964] New: AArch64 arm_neon.h arithmetic functions lack appropriate attributes Date: Mon, 29 Jun 2020 13:03:16 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rsandifo at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter blocked target_milestone cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2020 13:03:17 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95964 Bug ID: 95964 Summary: AArch64 arm_neon.h arithmetic functions lack appropriate attributes Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Blocks: 95958 Target Milestone: --- Target: aarch64*-*-* For: --------------------------------------- #include #include std::vector a, b, c; void foo (size_t n) { for (size_t i =3D 0; i < n; ++i) a[i] =3D vfmaq_f32(a[i], b[i], c[i]); } --------------------------------------- we generate code that loads the start of a, b and c in every iteration of the loop: --------------------------------------- .cfi_startproc cbz x0, .L4 adrp x3, .LANCHOR0 add x3, x3, :lo12:.LANCHOR0 mov x2, 0 .p2align 3,,7 .L6: ldr x4, [x3] lsl x1, x2, 4 ldr x6, [x3, 24] add x2, x2, 1 ldr x5, [x3, 48] ldr q0, [x4, x1] ldr q2, [x6, x1] ldr q1, [x5, x1] fmla v0.4s, v2.4s, v1.4s str q0, [x4, x1] cmp x0, x2 bne .L6 .L4: ret .cfi_endproc --------------------------------------- The problem is that __builtin_aarch64_fmav4sf and similar operations are treated as general functions that can read memory, write memory, and call other functions. If the intrinsic is replaced by arithmetic then the start addresses are hoisted, as expected. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95958 [Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64=