From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 303B3389246B; Mon, 29 Jun 2020 12:49:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 303B3389246B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1593434964; bh=JjQU+bWBwoH/ROTrVbvFQwMkUAiQepyGknJjGx0r+wo=; h=From:To:Subject:Date:From; b=m8XWDez8SvAnAQx2/3ZdMLPkItrXBdc2vDvaPkz82R6hFE/zXZnfsaYXQxYAfea3z 2FnkTTiO065kh0PXPJ3HMcHaF1pUBjsWn3PNuSqiheqrs47ArRLg+Ul6sG4Grsthjj gfsKHc8FQfcqNVm9NVx3eyE0cBbNGrtlZ8uVdg84= From: "rsandifo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/95962] New: Inefficient code for simple arm_neon.h iota operation Date: Mon, 29 Jun 2020 12:49:23 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rsandifo at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter blocked target_milestone cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2020 12:49:24 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95962 Bug ID: 95962 Summary: Inefficient code for simple arm_neon.h iota operation Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Blocks: 95958 Target Milestone: --- Target: aarch64*-*-* For: #include int32x4_t foo (void) { int32_t array[] =3D { 0, 1, 2, 3 }; return vld1q_s32 (array); } we produce: foo: .LFB4217: .cfi_startproc sub sp, sp, #16 .cfi_def_cfa_offset 16 mov x0, 2 mov x1, 4294967296 movk x0, 0x3, lsl 32 stp x1, x0, [sp] ldr q0, [sp] add sp, sp, 16 .cfi_def_cfa_offset 0 ret In contrast, clang produces essentially perfect code: adrp x8, .LCPI0_0 ldr q0, [x8, :lo12:.LCPI0_0] ret I think the problem is a combination of two things: - __builtin_aarch64_ld1v4si & co. are treated as general functions rather than pure functions, so in principle it could write to the given address. This stops us promoting the array to a constant. - The loads could be reduced to native gimple-level operations, at least on little-endian targets. IMO this a bug rather than an enhancement. Intrinsics only exist to optimise code, and what GCC is doing falls short of what users should reasonably expect. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95958 [Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64=