[Bug target/97906] New: [ARM NEON] Missed optimization in lowering to vcage

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/97906] New: [ARM NEON] Missed optimization in lowering to vcage
@ 2020-11-19 11:32 prathamesh3492 at gcc dot gnu.org
  2020-11-19 13:48 ` [Bug target/97906] " rguenth at gcc dot gnu.org
  2021-06-21  9:12 ` cvs-commit at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: prathamesh3492 at gcc dot gnu.org @ 2020-11-19 11:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97906

            Bug ID: 97906
           Summary: [ARM NEON] Missed optimization in lowering to vcage
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: prathamesh3492 at gcc dot gnu.org
  Target Milestone: ---

Hi,
Similar to PR97872 and PR97903, for following test-case:

#include <arm_neon.h>

uint32x2_t f1(float32x2_t a, float32x2_t b)
{
  return vabs_f32 (a) >= vabs_f32 (b);
}

uint32x2_t f2(float32x2_t a, float32x2_t b)
{
  return (uint32x2_t) __builtin_neon_vcagev2sf (a, b);
}

Code-gen:

f2:
        vacge.f32  d0, d0, d1
        bx         lr

f1:
        vabs.f32        d0, d0
        vabs.f32        d1, d1
        sub     sp, sp, #8
        vmov.32 r3, d0[0]
        vmov    s13, r3
        vmov.32 r3, d1[0]
        vmov    s12, r3
        vmov.32 r3, d1[1]
        vcmpe.f32       s12, s13
        vmov    s14, r3
        vmov.32 r3, d0[1]
        vmrs    APSR_nzcv, FPSCR
        vmov    s15, r3
        ite     ls
        movls   r3, #-1
        movhi   r3, #0
        vcmpe.f32       s14, s15
        str     r3, [sp]
        vmrs    APSR_nzcv, FPSCR
        ite     ls
        movls   r3, #-1
        movhi   r3, #0
        str     r3, [sp, #4]
        vldr    d0, [sp]
        add     sp, sp, #8
        @ sp needed
        bx      lr

For f1, it is initially lowered to:

f1 (float32x2_t a, float32x2_t b)
{
  vector(2) <signed-boolean:32> _1;
  vector(2) int _2;
  uint32x2_t _6;
  __simd64_float32_t _7;
  __simd64_float32_t _8;

  <bb 2> [local count: 1073741824]:
  _8 = __builtin_neon_vabsv2sf (a_4(D));
  _7 = __builtin_neon_vabsv2sf (b_5(D));
  _1 = _7 <= _8;
  _2 = VEC_COND_EXPR <_1, { -1, -1 }, { 0, 0 }>;
  _6 = VIEW_CONVERT_EXPR<uint32x2_t>(_2);
  return _6;
}

and veclower seems to "scalarize" the cond_expr op:

f1 (float32x2_t a, float32x2_t b)
{
  vector(2) int _2;
  uint32x2_t _6;
  __simd64_float32_t _7;
  __simd64_float32_t _8;
  float _11;
  float _12;
  int _13;
  float _14;
  float _15;
  int _16;

  <bb 2> [local count: 1073741824]:
  _8 = __builtin_neon_vabsv2sf (a_4(D));
  _7 = __builtin_neon_vabsv2sf (b_5(D));
  _11 = BIT_FIELD_REF <_7, 32, 0>;
  _12 = BIT_FIELD_REF <_8, 32, 0>;
  _13 = _11 <= _12 ? -1 : 0;
  _14 = BIT_FIELD_REF <_7, 32, 32>;
  _15 = BIT_FIELD_REF <_8, 32, 32>;
  _16 = _14 <= _15 ? -1 : 0;
  _2 = {_13, _16};
  _6 = VIEW_CONVERT_EXPR<uint32x2_t>(_2);
  return _6;

}

Thanks,
Prathamesh

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/97906] [ARM NEON] Missed optimization in lowering to vcage
  2020-11-19 11:32 [Bug target/97906] New: [ARM NEON] Missed optimization in lowering to vcage prathamesh3492 at gcc dot gnu.org
@ 2020-11-19 13:48 ` rguenth at gcc dot gnu.org
  2021-06-21  9:12 ` cvs-commit at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-11-19 13:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97906

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
That means the initial lowering is "bad" or the target doesn't have a way to
do this compare.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/97906] [ARM NEON] Missed optimization in lowering to vcage
  2020-11-19 11:32 [Bug target/97906] New: [ARM NEON] Missed optimization in lowering to vcage prathamesh3492 at gcc dot gnu.org
  2020-11-19 13:48 ` [Bug target/97906] " rguenth at gcc dot gnu.org
@ 2021-06-21  9:12 ` cvs-commit at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-06-21  9:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97906

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Prathamesh Kulkarni
<prathamesh3492@gcc.gnu.org>:

https://gcc.gnu.org/g:29a539a675b8ffd8e20fd3926d6ba0482ea0f275

commit r12-1671-g29a539a675b8ffd8e20fd3926d6ba0482ea0f275
Author: prathamesh.kulkarni <prathamesh.kulkarni@linaro.org>
Date:   Mon Jun 21 14:38:32 2021 +0530

    arm/97906: Adjust neon_vca patterns to use GLTE instead of GTGE iterator.

    gcc/ChangeLog:
            PR target/97906
            * config/arm/iterators.md (NEON_VACMP): Remove.
            * config/arm/neon.md (neon_vca<cmp_op><mode>): Use GLTE instead of
GTGE
            iterator.
            (neon_vca<cmp_op><mode>_insn): Likewise.
            (neon_vca<cmp_op_unsp><mode>_insn_unspec): Use NEON_VAGLTE instead
of
            NEON_VACMP.

    gcc/testsuite/ChangeLog:
            PR target/97906
            * gcc.target/arm/simd/pr97906.c: New test.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-06-21  9:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-19 11:32 [Bug target/97906] New: [ARM NEON] Missed optimization in lowering to vcage prathamesh3492 at gcc dot gnu.org
2020-11-19 13:48 ` [Bug target/97906] " rguenth at gcc dot gnu.org
2021-06-21  9:12 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).