public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95969] New: Use of __builtin_aarch64_im_lane_boundsi in AArch64 arm_neon.h interferes with gimple optimisation
@ 2020-06-29 15:50 rsandifo at gcc dot gnu.org
  2021-09-02  5:31 ` [Bug target/95969] " pinskia at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2020-06-29 15:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95969

            Bug ID: 95969
           Summary: Use of __builtin_aarch64_im_lane_boundsi in AArch64
                    arm_neon.h interferes with gimple optimisation
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rsandifo at gcc dot gnu.org
            Blocks: 95958
  Target Milestone: ---
            Target: aarch64*-*-*

For:

----------------------------------------------------
#include <arm_neon.h>

void
f (float32x4_t **ptr)
{
  float32x4_t res = vsetq_lane_f32 (0.0f, **ptr, 0);
  **ptr = res;
}
----------------------------------------------------

the final gimple .optimized dump looks like:

----------------------------------------------------
  float32x4_t __vec;
  float32x4_t * _1;
  __Float32x4_t _2;
  float32x4_t * _3;

  <bb 2> [local count: 1073741824]:
  _1 = *ptr_5(D);
  _2 = *_1;
  __builtin_aarch64_im_lane_boundsi (16, 4, 0);
  __vec_8 = BIT_INSERT_EXPR <_2, 0.0, 0>;
  _3 = *ptr_5(D);
  *_3 = __vec_8;
  return;
----------------------------------------------------

where we still have two loads from *ptr.  This is because
__builtin_aarch64_im_lane_boundsi has no attributes:
it's modelled a general function that could do pretty
much anything.

Although we fix this testcase in the RTL optimisers, it's easy
for the issue to cause unoptimised code in larger, more realistic
testcases.

The problem is similar to PR95964.  The difficulty is that
here we have to model the function as having some kind of
side-effect, otherwise it will simply get optimised away.

Ideally we'd fix this by implementing the intrinsics directly
in the compiler and doing the checks in the frontend via
TARGET_CHECK_BUILTIN_CALL.  That's obviously a big change
though.  Until then, we should optimise away calls whose
arguments are already correct so that they don't clog
up the IL.

If not returning a value makes it harder to fold the call
for some reason, perhaps an alternative would be to pass
a vector value through a dummy const function, e.g.:

  _4 = __builtin_aarch64_... (_2, 16, 4, 0);
  __vec_8 = BIT_INSERT_EXPR <_4, 0.0, 0>;

That might not be necessary though -- haven't checked.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95958
[Bug 95958] [meta-bug] Inefficient arm_neon.h code for AArch64

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-13 15:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-29 15:50 [Bug target/95969] New: Use of __builtin_aarch64_im_lane_boundsi in AArch64 arm_neon.h interferes with gimple optimisation rsandifo at gcc dot gnu.org
2021-09-02  5:31 ` [Bug target/95969] " pinskia at gcc dot gnu.org
2021-09-02  5:32 ` pinskia at gcc dot gnu.org
2021-09-02  7:14 ` pinskia at gcc dot gnu.org
2021-09-02  8:28 ` ktkachov at gcc dot gnu.org
2021-09-02  8:37 ` pinskia at gcc dot gnu.org
2021-09-02 10:35 ` pinskia at gcc dot gnu.org
2021-09-02 20:51 ` pinskia at gcc dot gnu.org
2021-09-02 22:26 ` pinskia at gcc dot gnu.org
2021-09-13 15:19 ` cvs-commit at gcc dot gnu.org
2021-09-13 15:20 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).