public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0
@ 2021-01-18 14:32 clyon at gcc dot gnu.org
2021-01-18 14:33 ` [Bug target/98730] " clyon at gcc dot gnu.org
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-01-18 14:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730
Bug ID: 98730
Summary: vceqzq_p64 does not generate vceq with immediate 0
Product: gcc
Version: 11.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: clyon at gcc dot gnu.org
Target Milestone: ---
vceqzq_p64 intrinsic was introduced with commit r11-6719
(g:63999d751df9bcde4ab9107edb4c635d274b248d)
defined as:
vceqzq_p64 (poly64x2_t __a)
{
poly64x2_t __b = vreinterpretq_p64_u32 (vdupq_n_u32 (0));
return vceqq_p64 (__a, __b);
}
which is similar to what vceqz_p64 does:
vceqz_p64 (poly64x1_t __a)
{
poly64x1_t __b = vreinterpret_p64_u32 (vdup_n_u32 (0));
return vceq_p64 (__a, __b);
}
vceqzq_p64 uses vceqq_p64 which is defined as:
vceqq_p64 (poly64x2_t __a, poly64x2_t __b)
{
poly64_t __high_a = vget_high_p64 (__a);
poly64_t __high_b = vget_high_p64 (__b);
uint64x1_t __high = vceq_p64 (__high_a, __high_b);
poly64_t __low_a = vget_low_p64 (__a);
poly64_t __low_b = vget_low_p64 (__b);
uint64x1_t __low = vceq_p64 (__low_a, __low_b);
return vcombine_u64 (__low, __high);
}
Unlike vceqz_p64, vceqzq_p64 does not use the vceq alternative with an
immediate, as is shown by the vceqzq_p64.c testcase, which generates:
ldr r3, .L3
vmov.i32 q10, #0 @ v4si
vld1.64 {d16-d17}, [r3:64]
vceq.i32 d18, d17, d21
vceq.i32 d16, d16, d21
vpmin.u32 d18, d18, d18
vpmin.u32 d16, d16, d16
vmov.f64 d17, d18 @ int
vstr d16, [r3, #16]
vstr d17, [r3, #24]
bx lr
By comparison, vceqz_p64 generates:
ldr r3, .L3
vldr.64 d16, [r3] @ int
vceq.i32 d16, d16, #0
vpmin.u32 d16, d16, d16
vstr.64 d16, [r3, #8] @ int
bx lr
The reload trace for vceqzq_p64 say:
Choosing alt 0 in insn 19: (0) =w (1) w (2) w {neon_vceqv2si_insn}
alt=0,overall=0,losers=0,rld_nregs=0
Choosing alt 0 in insn 15: (0) =w (1) w (2) w {neon_vceqv2si_insn}
alt=0,overall=0,losers=0,rld_nregs=0
(insn 19 8 15 2 (set (reg:V2SI 48 d16 [orig:128 _18 ] [128])
(neg:V2SI (eq:V2SI (reg:V2SI 48 d16 [orig:139 v1 ] [139])
(reg:V2SI 54 d19 [ _5+8 ]))))
"/home/christophe.lyon/src/GCC/builds/gcc-fsf-git-neon-intrinsics/tools/lib/gcc/arm-none-linux-gnueabihf/11.0.0/include/arm_neon.h":2404:22
1650 {neon_vceqv2si_insn}
(expr_list:REG_EQUAL (neg:V2SI (eq:V2SI (subreg:V2SI (reg:DI 48 d16
[orig:139 v1 ] [139]) 0)
(const_vector:V2SI [
(const_int 0 [0]) repeated x2
])))
(nil)))
(insn 15 19 20 2 (set (reg:V2SI 50 d17 [orig:121 _11 ] [121])
(neg:V2SI (eq:V2SI (reg:V2SI 50 d17 [orig:141 v2 ] [141])
(reg:V2SI 54 d19 [ _5+8 ]))))
"/home/christophe.lyon/src/GCC/builds/gcc-fsf-git-neon-intrinsics/tools/lib/gcc/arm-none-linux-gnueabihf/11.0.0/include/arm_neon.h":2404:22
1650 {neon_vceqv2si_insn}
(expr_list:REG_EQUAL (neg:V2SI (eq:V2SI (subreg:V2SI (reg:DI 50 d17
[orig:141 v2 ] [141]) 0)
(const_vector:V2SI [
(const_int 0 [0]) repeated x2
])))
(nil)))
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
@ 2021-01-18 14:33 ` clyon at gcc dot gnu.org
2021-01-21 15:08 ` rsandifo at gcc dot gnu.org
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-01-18 14:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730
--- Comment #1 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Why does it choose alternative 0 instead of 1 which matches a vector of
constant zeros?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
2021-01-18 14:33 ` [Bug target/98730] " clyon at gcc dot gnu.org
@ 2021-01-21 15:08 ` rsandifo at gcc dot gnu.org
2021-01-21 15:39 ` clyon at gcc dot gnu.org
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-01-21 15:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rsandifo at gcc dot gnu.org
--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
(In reply to Christophe Lyon from comment #1)
> Why does it choose alternative 0 instead of 1 which matches a vector of
> constant zeros?
I wonder if it's an rtx costing problem. What do the patterns
look like in .expand? If the operands are still registers there,
what does .fwprop say? It should be able to replace the registers
with zero if the target says that that's worthwhile.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
2021-01-18 14:33 ` [Bug target/98730] " clyon at gcc dot gnu.org
2021-01-21 15:08 ` rsandifo at gcc dot gnu.org
@ 2021-01-21 15:39 ` clyon at gcc dot gnu.org
2021-01-21 17:37 ` rsandifo at gcc dot gnu.org
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-01-21 15:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730
--- Comment #3 from Christophe Lyon <clyon at gcc dot gnu.org> ---
At expand time, we have:
(insn 13 12 14 2 (set (reg:V2SI 121 [ _11 ])
(neg:V2SI (eq:V2SI (subreg:V2SI (reg:DI 116 [ _6 ]) 0)
(subreg:V2SI (reg:DI 118 [ _8 ]) 0)))) "arm_neon.h":2404:22
1649 {neon_vceqv2si_insn}
(nil))
(insn 17 16 18 2 (set (reg:V2SI 128 [ _18 ])
(neg:V2SI (eq:V2SI (subreg:V2SI (reg:DI 124 [ _14 ]) 0)
(subreg:V2SI (reg:DI 125 [ _15 ]) 0)))) "arm_neon.h":2404:22
1649 {neon_vceqv2si_insn}
(nil))
in fwprop1:
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 ( 1)
df_worklist_dataflow_doublequeue: n_basic_blocks 3 n_edges 2 count 3 ( 1)
change not profitable (cost 0 -> cost 4)
rescanning insn with uid = 12.
verify found no changes in insn with uid = 12.
rescanning insn with uid = 13.
verify found no changes in insn with uid = 13.
rescanning insn with uid = 16.
verify found no changes in insn with uid = 16.
rescanning insn with uid = 17.
verify found no changes in insn with uid = 17.
change not profitable (cost 0 -> cost 12)
change not profitable (cost 0 -> cost 4)
rescanning insn with uid = 13.
verify found no changes in insn with uid = 13.
change not profitable (cost 0 -> cost 4)
change not profitable (cost 80 -> cost 84)
change not profitable (cost 80 -> cost 84)
change not profitable (cost 80 -> cost 84)
rescanning insn with uid = 17.
verify found no changes in insn with uid = 17.
change not profitable (cost 80 -> cost 84)
change not profitable (cost 80 -> cost 84)
and insns 13 and 17 are:
(insn 13 8 14 2 (set (reg:V2SI 121 [ _11 ])
(neg:V2SI (eq:V2SI (subreg:V2SI (reg:V2DI 113 [ v2.0_1 ]) 8)
(subreg:V2SI (reg:V4SI 114 [ _4 ]) 8)))) "arm_neon.h":2404:22
1649 {neon_vceqv2si_insn}
(expr_list:REG_EQUAL (neg:V2SI (eq:V2SI (subreg:V2SI (reg:V2DI 113 [
v2.0_1 ]) 8)
(const_vector:V2SI [
(const_int 0 [0]) repeated x2
])))
(nil)))
(insn 17 14 18 2 (set (reg:V2SI 128 [ _18 ])
(neg:V2SI (eq:V2SI (subreg:V2SI (reg:V2DI 113 [ v2.0_1 ]) 0)
(subreg:V2SI (reg:V4SI 114 [ _4 ]) 8)))) "arm_neon.h":2404:22
-1
(expr_list:REG_EQUAL (neg:V2SI (eq:V2SI (subreg:V2SI (reg:V2DI 113 [
v2.0_1 ]) 0)
(const_vector:V2SI [
(const_int 0 [0]) repeated x2
])))
(nil)))
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
` (2 preceding siblings ...)
2021-01-21 15:39 ` clyon at gcc dot gnu.org
@ 2021-01-21 17:37 ` rsandifo at gcc dot gnu.org
2021-01-28 17:56 ` cvs-commit at gcc dot gnu.org
2021-01-28 17:58 ` clyon at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-01-21 17:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730
--- Comment #4 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Hmm, yeah, looks like it might be a cost issue then.
arm_rtx_costs_internal seems to give CONST_VECTOR
a cost of 1 or 4 instructions, whereas a zero CONST_VECTOR
is free in this context.
Although that should be fixed, another approach would
be to lower vget_high_* and vget_low_* at the gimple
level if the argument is a constant.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
` (3 preceding siblings ...)
2021-01-21 17:37 ` rsandifo at gcc dot gnu.org
@ 2021-01-28 17:56 ` cvs-commit at gcc dot gnu.org
2021-01-28 17:58 ` clyon at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-28 17:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Christophe Lyon <clyon@gcc.gnu.org>:
https://gcc.gnu.org/g:31a0ab9213f780d2fa1da6e4879df214c0f247f9
commit r11-6961-g31a0ab9213f780d2fa1da6e4879df214c0f247f9
Author: Christophe Lyon <christophe.lyon@linaro.org>
Date: Thu Jan 28 17:55:45 2021 +0000
arm: Adjust cost of vector of constant zero
Neon vector comparisons have a dedicated version when comparing with
constant zero: it means its cost is free.
Adjust the cost in arm_rtx_costs_internal accordingly, for Neon only,
since MVE does not support this.
2021-01-28 Christophe Lyon <christophe.lyon@linaro.org>
gcc/
PR target/98730
* config/arm/arm.c (arm_rtx_costs_internal): Adjust cost of vector
of constant zero for comparisons.
gcc/testsuite/
PR target/98730
* gcc.target/arm/simd/vceqzq_p64.c: Update expected result.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [Bug target/98730] vceqzq_p64 does not generate vceq with immediate 0
2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
` (4 preceding siblings ...)
2021-01-28 17:56 ` cvs-commit at gcc dot gnu.org
@ 2021-01-28 17:58 ` clyon at gcc dot gnu.org
5 siblings, 0 replies; 7+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-01-28 17:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98730
Christophe Lyon <clyon at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED
--- Comment #6 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Now fixed on trunk
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-01-28 17:58 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-18 14:32 [Bug target/98730] New: vceqzq_p64 does not generate vceq with immediate 0 clyon at gcc dot gnu.org
2021-01-18 14:33 ` [Bug target/98730] " clyon at gcc dot gnu.org
2021-01-21 15:08 ` rsandifo at gcc dot gnu.org
2021-01-21 15:39 ` clyon at gcc dot gnu.org
2021-01-21 17:37 ` rsandifo at gcc dot gnu.org
2021-01-28 17:56 ` cvs-commit at gcc dot gnu.org
2021-01-28 17:58 ` clyon at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).