From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 7F72338930DB; Mon, 29 Jun 2020 15:50:24 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7F72338930DB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1593445824; bh=GvedOfyaOCXPNyiKXJKqdCDLgJttfpBAQ3S4ELWm/TA=; h=From:To:Subject:Date:From; b=KgvIWdp897fRhOHmWHqWkJDm18tgy+sB0rWJzTgHnIPLPFO8ycQ2mhP/6Ra7ttucR QevVqsUGt0spLbXAqgnxWTsM4LzMHBn2GzEVXxCZh60a4lvDNu5+Gqk4plDnECQkib j908upZmiddfdw+qvh3VGZSZPSIl9yxoeTj24w5k= From: "rsandifo at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/95969] New: Use of __builtin_aarch64_im_lane_boundsi in AArch64 arm_neon.h interferes with gimple optimisation Date: Mon, 29 Jun 2020 15:50:24 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 11.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rsandifo at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter blocked target_milestone cf_gcctarget Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2020 15:50:24 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95969 Bug ID: 95969 Summary: Use of __builtin_aarch64_im_lane_boundsi in AArch64 arm_neon.h interferes with gimple optimisation Product: gcc Version: 11.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Blocks: 95958 Target Milestone: --- Target: aarch64*-*-* For: ---------------------------------------------------- #include void f (float32x4_t **ptr) { float32x4_t res =3D vsetq_lane_f32 (0.0f, **ptr, 0); **ptr =3D res; } ---------------------------------------------------- the final gimple .optimized dump looks like: ---------------------------------------------------- float32x4_t __vec; float32x4_t * _1; __Float32x4_t _2; float32x4_t * _3; [local count: 1073741824]: _1 =3D *ptr_5(D); _2 =3D *_1; __builtin_aarch64_im_lane_boundsi (16, 4, 0); __vec_8 =3D BIT_INSERT_EXPR <_2, 0.0, 0>; _3 =3D *ptr_5(D); *_3 =3D __vec_8; return; ---------------------------------------------------- where we still have two loads from *ptr. This is because __builtin_aarch64_im_lane_boundsi has no attributes: it's modelled a general function that could do pretty much anything. Although we fix this testcase in the RTL optimisers, it's easy for the issue to cause unoptimised code in larger, more realistic testcases. The problem is similar to PR95964. The difficulty is that here we have to model the function as having some kind of side-effect, otherwise it will simply get optimised away. Ideally we'd fix this by implementing the intrinsics directly in the compiler and doing the checks in the frontend via TARGET_CHECK_BUILTIN_CALL. That's obviously a big change though. Until then, we should optimise away calls whose arguments are already correct so that they don't clog up the IL. If not returning a value makes it harder to fold the call for some reason, perhaps an alternative would be to pass a vector value through a dummy const function, e.g.: _4 =3D __builtin_aarch64_... (_2, 16, 4, 0); __vec_8 =3D BIT_INSERT_EXPR <_4, 0.0, 0>; That might not be necessary though -- haven't checked. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D95958 [Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64=