public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0
@ 2021-01-18 14:32 clyon at gcc dot gnu.org
  2021-01-18 14:33 ` [Bug target/98730] " clyon at gcc dot gnu.org
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-01-18 14:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730

            Bug ID: 98730
           Summary: vceqzq_p64 does not generate vceq with immediate 0
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: clyon at gcc dot gnu.org
  Target Milestone: ---

vceqzq_p64 intrinsic was introduced with commit r11-6719
(g:63999d751df9bcde4ab9107edb4c635d274b248d)
defined as:

vceqzq_p64 (poly64x2_t __a)
{
  poly64x2_t __b = vreinterpretq_p64_u32 (vdupq_n_u32 (0));
  return vceqq_p64 (__a, __b);
}

which is similar to what vceqz_p64 does:
vceqz_p64 (poly64x1_t __a)
{
  poly64x1_t __b = vreinterpret_p64_u32 (vdup_n_u32 (0));
  return vceq_p64 (__a, __b);
}

vceqzq_p64 uses vceqq_p64 which is defined as:
vceqq_p64 (poly64x2_t __a, poly64x2_t __b)
{
  poly64_t __high_a = vget_high_p64 (__a);
  poly64_t __high_b = vget_high_p64 (__b);
  uint64x1_t __high = vceq_p64 (__high_a, __high_b);

  poly64_t __low_a = vget_low_p64 (__a);
  poly64_t __low_b = vget_low_p64 (__b);
  uint64x1_t __low = vceq_p64 (__low_a, __low_b);
  return vcombine_u64 (__low, __high);
}


Unlike vceqz_p64, vceqzq_p64 does not use the vceq alternative with an
immediate, as is shown by the vceqzq_p64.c testcase, which generates:
        ldr     r3, .L3
        vmov.i32        q10, #0  @ v4si
        vld1.64 {d16-d17}, [r3:64]
        vceq.i32        d18, d17, d21
        vceq.i32        d16, d16, d21
        vpmin.u32       d18, d18, d18
        vpmin.u32       d16, d16, d16
        vmov.f64        d17, d18        @ int
        vstr    d16, [r3, #16]
        vstr    d17, [r3, #24]
        bx      lr


By comparison, vceqz_p64 generates:
        ldr     r3, .L3
        vldr.64 d16, [r3]       @ int
        vceq.i32        d16, d16, #0
        vpmin.u32       d16, d16, d16
        vstr.64 d16, [r3, #8]   @ int
        bx      lr



The reload trace for vceqzq_p64 say:
Choosing alt 0 in insn 19:  (0) =w  (1) w  (2) w {neon_vceqv2si_insn}
 alt=0,overall=0,losers=0,rld_nregs=0
Choosing alt 0 in insn 15:  (0) =w  (1) w  (2) w {neon_vceqv2si_insn}
 alt=0,overall=0,losers=0,rld_nregs=0

(insn 19 8 15 2 (set (reg:V2SI 48 d16 [orig:128 _18 ] [128])
        (neg:V2SI (eq:V2SI (reg:V2SI 48 d16 [orig:139 v1 ] [139])
                (reg:V2SI 54 d19 [ _5+8 ]))))
"/home/christophe.lyon/src/GCC/builds/gcc-fsf-git-neon-intrinsics/tools/lib/gcc/arm-none-linux-gnueabihf/11.0.0/include/arm_neon.h":2404:22
1650 {neon_vceqv2si_insn}
     (expr_list:REG_EQUAL (neg:V2SI (eq:V2SI (subreg:V2SI (reg:DI 48 d16
[orig:139 v1 ] [139]) 0)
                (const_vector:V2SI [
                        (const_int 0 [0]) repeated x2
                    ])))
        (nil)))
(insn 15 19 20 2 (set (reg:V2SI 50 d17 [orig:121 _11 ] [121])
        (neg:V2SI (eq:V2SI (reg:V2SI 50 d17 [orig:141 v2 ] [141])
                (reg:V2SI 54 d19 [ _5+8 ]))))
"/home/christophe.lyon/src/GCC/builds/gcc-fsf-git-neon-intrinsics/tools/lib/gcc/arm-none-linux-gnueabihf/11.0.0/include/arm_neon.h":2404:22
1650 {neon_vceqv2si_insn}
     (expr_list:REG_EQUAL (neg:V2SI (eq:V2SI (subreg:V2SI (reg:DI 50 d17
[orig:141 v2 ] [141]) 0)
                (const_vector:V2SI [
                        (const_int 0 [0]) repeated x2
                    ])))
        (nil)))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
  2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
@ 2021-01-18 14:33 ` clyon at gcc dot gnu.org
  2021-01-21 15:08 ` rsandifo at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-01-18 14:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730

--- Comment #1 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Why does it choose alternative 0 instead of 1 which matches a vector of
constant zeros?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
  2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
  2021-01-18 14:33 ` [Bug target/98730] " clyon at gcc dot gnu.org
@ 2021-01-21 15:08 ` rsandifo at gcc dot gnu.org
  2021-01-21 15:39 ` clyon at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-01-21 15:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Christophe Lyon from comment #1)
> Why does it choose alternative 0 instead of 1 which matches a vector of
> constant zeros?
I wonder if it's an rtx costing problem.  What do the patterns
look like in .expand?  If the operands are still registers there,
what does .fwprop say?  It should be able to replace the registers
with zero if the target says that that's worthwhile.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
  2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
  2021-01-18 14:33 ` [Bug target/98730] " clyon at gcc dot gnu.org
  2021-01-21 15:08 ` rsandifo at gcc dot gnu.org
@ 2021-01-21 15:39 ` clyon at gcc dot gnu.org
  2021-01-21 17:37 ` rsandifo at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-01-21 15:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730

--- Comment #3 from Christophe Lyon <clyon at gcc dot gnu.org> ---
At expand time, we have:
(insn 13 12 14 2 (set (reg:V2SI 121 [ _11 ])
        (neg:V2SI (eq:V2SI (subreg:V2SI (reg:DI 116 [ _6 ]) 0)
                (subreg:V2SI (reg:DI 118 [ _8 ]) 0)))) "arm_neon.h":2404:22
1649 {neon_vceqv2si_insn}
     (nil))
(insn 17 16 18 2 (set (reg:V2SI 128 [ _18 ])
        (neg:V2SI (eq:V2SI (subreg:V2SI (reg:DI 124 [ _14 ]) 0)
                (subreg:V2SI (reg:DI 125 [ _15 ]) 0)))) "arm_neon.h":2404:22
1649 {neon_vceqv2si_insn}
     (nil))

in fwprop1:
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 (    1)
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 (    1)
change not profitable (cost 0 -> cost 4)
rescanning insn with uid = 12.
verify found no changes in insn with uid = 12.
rescanning insn with uid = 13.
verify found no changes in insn with uid = 13.
rescanning insn with uid = 16.
verify found no changes in insn with uid = 16.
rescanning insn with uid = 17.
verify found no changes in insn with uid = 17.
change not profitable (cost 0 -> cost 12)
change not profitable (cost 0 -> cost 4)
rescanning insn with uid = 13.
verify found no changes in insn with uid = 13.
change not profitable (cost 0 -> cost 4)
change not profitable (cost 80 -> cost 84)
change not profitable (cost 80 -> cost 84)
change not profitable (cost 80 -> cost 84)
rescanning insn with uid = 17.
verify found no changes in insn with uid = 17.
change not profitable (cost 80 -> cost 84)
change not profitable (cost 80 -> cost 84)

and insns 13 and 17 are:
(insn 13 8 14 2 (set (reg:V2SI 121 [ _11 ])
        (neg:V2SI (eq:V2SI (subreg:V2SI (reg:V2DI 113 [ v2.0_1 ]) 8)
                (subreg:V2SI (reg:V4SI 114 [ _4 ]) 8)))) "arm_neon.h":2404:22
1649 {neon_vceqv2si_insn}
     (expr_list:REG_EQUAL (neg:V2SI (eq:V2SI (subreg:V2SI (reg:V2DI 113 [
v2.0_1 ]) 8)
                (const_vector:V2SI [
                        (const_int 0 [0]) repeated x2
                    ])))
        (nil)))
(insn 17 14 18 2 (set (reg:V2SI 128 [ _18 ])
        (neg:V2SI (eq:V2SI (subreg:V2SI (reg:V2DI 113 [ v2.0_1 ]) 0)
                (subreg:V2SI (reg:V4SI 114 [ _4 ]) 8)))) "arm_neon.h":2404:22
-1
     (expr_list:REG_EQUAL (neg:V2SI (eq:V2SI (subreg:V2SI (reg:V2DI 113 [
v2.0_1 ]) 0)
                (const_vector:V2SI [
                        (const_int 0 [0]) repeated x2
                    ])))
        (nil)))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
  2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2021-01-21 15:39 ` clyon at gcc dot gnu.org
@ 2021-01-21 17:37 ` rsandifo at gcc dot gnu.org
  2021-01-28 17:56 ` cvs-commit at gcc dot gnu.org
  2021-01-28 17:58 ` clyon at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-01-21 17:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730

--- Comment #4 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Hmm, yeah, looks like it might be a cost issue then.
arm_rtx_costs_internal seems to give CONST_VECTOR
a cost of 1 or 4 instructions, whereas a zero CONST_VECTOR
is free in this context.

Although that should be fixed, another approach would
be to lower vget_high_* and vget_low_* at the gimple
level if the argument is a constant.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
  2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2021-01-21 17:37 ` rsandifo at gcc dot gnu.org
@ 2021-01-28 17:56 ` cvs-commit at gcc dot gnu.org
  2021-01-28 17:58 ` clyon at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-28 17:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Christophe Lyon <clyon@gcc.gnu.org>:

https://gcc.gnu.org/g:31a0ab9213f780d2fa1da6e4879df214c0f247f9

commit r11-6961-g31a0ab9213f780d2fa1da6e4879df214c0f247f9
Author: Christophe Lyon <christophe.lyon@linaro.org>
Date:   Thu Jan 28 17:55:45 2021 +0000

    arm: Adjust cost of vector of constant zero

    Neon vector comparisons have a dedicated version when comparing with
    constant zero: it means its cost is free.

    Adjust the cost in arm_rtx_costs_internal accordingly, for Neon only,
    since MVE does not support this.

    2021-01-28  Christophe Lyon  <christophe.lyon@linaro.org>

            gcc/
            PR target/98730
            * config/arm/arm.c (arm_rtx_costs_internal): Adjust cost of vector
            of constant zero for comparisons.

            gcc/testsuite/
            PR target/98730
            * gcc.target/arm/simd/vceqzq_p64.c: Update expected result.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
  2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2021-01-28 17:56 ` cvs-commit at gcc dot gnu.org
@ 2021-01-28 17:58 ` clyon at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-01-28 17:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730

Christophe Lyon <clyon at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #6 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Now fixed on trunk

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-01-28 17:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
2021-01-18 14:33 ` [Bug target/98730] " clyon at gcc dot gnu.org
2021-01-21 15:08 ` rsandifo at gcc dot gnu.org
2021-01-21 15:39 ` clyon at gcc dot gnu.org
2021-01-21 17:37 ` rsandifo at gcc dot gnu.org
2021-01-28 17:56 ` cvs-commit at gcc dot gnu.org
2021-01-28 17:58 ` clyon at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).