public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/104479] New: [12 Regression] cond_op is combined without considering single_use
@ 2022-02-10 6:45 crazylht at gmail dot com
2022-02-10 7:16 ` [Bug tree-optimization/104479] " rguenth at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2022-02-10 6:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104479
Bug ID: 104479
Summary: [12 Regression] cond_op is combined without
considering single_use
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: crazylht at gmail dot com
Target Milestone: ---
Host: x86_64-pc-linux-gnu
Target: x86_64-*-* i?86-*-*
cat test.c
void
mc_weight (unsigned int* __restrict dst, unsigned int* __restrict src,
int i_width,int i_scale, unsigned int* __restrict y)
{
for(int x = 0; x < i_width; x++)
dst[x] = src[x] >> 3 > 255 ? src[x] >> 3 : y[x];
}
gcc -march=icelake-server -O3
gcc11.2
vpsrld ymm0, YMMWORD PTR [rsi+rax], 3
vpcmpud k1, ymm0, ymm2, 2
vmovdqu32 ymm1{k1}, YMMWORD PTR [r8+rax]
vpcmpud k1, ymm0, ymm2, 6
vpblendmd ymm0{k1}, ymm1, ymm0
vmovdqu YMMWORD PTR [rcx+rax], ymm0
gcc 12
vmovdqu ymm1, YMMWORD PTR [rsi+rax]
vpsrld ymm2, ymm1, 3
vpcmpud k1, ymm2, ymm3, 2
vmovdqu32 ymm0{k1}, YMMWORD PTR [r8+rax]
vpcmpud k1, ymm2, ymm3, 6
vmovdqa ymm2, ymm0
vpsrld ymm2{k1}, ymm1, 3
vmovdqu YMMWORD PTR [rcx+rax], ymm2
It's because in match.pd
---------------cut----------------
(for uncond_op (UNCOND_BINARY)
cond_op (COND_BINARY)
(simplify
(vec_cond @0 (view_convert? (uncond_op@4 @1 @2)) @3)
(with { tree op_type = TREE_TYPE (@4); }
(if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
&& is_truth_type_for (op_type, TREE_TYPE (@0)))
(view_convert (cond_op @0 @1 @2 (view_convert:op_type @3))))))
(simplify
(vec_cond @0 @1 (view_convert? (uncond_op@4 @2 @3)))
(with { tree op_type = TREE_TYPE (@4); }
(if (vectorized_internal_fn_supported_p (as_internal_fn (cond_op), op_type)
&& is_truth_type_for (op_type, TREE_TYPE (@0)))
(view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1)))))))
---------------end-------------------
uncond_op + vec_cond is combined to cond_op w/o considering uncond_op result
could be used by others, which caused unoptimal codegen.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/104479] [12 Regression] cond_op is combined without considering single_use
2022-02-10 6:45 [Bug tree-optimization/104479] New: [12 Regression] cond_op is combined without considering single_use crazylht at gmail dot com
@ 2022-02-10 7:16 ` rguenth at gcc dot gnu.org
2022-02-10 8:30 ` crazylht at gmail dot com
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-02-10 7:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104479
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |12.0
CC| |rguenth at gcc dot gnu.org,
| |rsandifo at gcc dot gnu.org
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
Last reconfirmed| |2022-02-10
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. When uncond_op is expensive (there's *div amongst them) that's
definitely unwanted. OTOH when it is cheap then combining will reduce
latency.
GIMPLE wise it's a neutral transform if uncond_op is not single-use unless
we need two v_c_es.
In the assembly it's masked vpsrld vs. masked vpblendmd, it's not entirely
clear why one should be slower than the other (but yes, blends are usually
very cheap and also not resource constrained).
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/104479] [12 Regression] cond_op is combined without considering single_use
2022-02-10 6:45 [Bug tree-optimization/104479] New: [12 Regression] cond_op is combined without considering single_use crazylht at gmail dot com
2022-02-10 7:16 ` [Bug tree-optimization/104479] " rguenth at gcc dot gnu.org
@ 2022-02-10 8:30 ` crazylht at gmail dot com
2022-02-10 8:32 ` rguenther at suse dot de
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2022-02-10 8:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104479
--- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #1)
> Confirmed. When uncond_op is expensive (there's *div amongst them) that's
> definitely unwanted. OTOH when it is cheap then combining will reduce
> latency.
>
> GIMPLE wise it's a neutral transform if uncond_op is not single-use unless
> we need two v_c_es.
We can leave it to rtl combine/fwprop which will consider rtx_cost for them.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/104479] [12 Regression] cond_op is combined without considering single_use
2022-02-10 6:45 [Bug tree-optimization/104479] New: [12 Regression] cond_op is combined without considering single_use crazylht at gmail dot com
2022-02-10 7:16 ` [Bug tree-optimization/104479] " rguenth at gcc dot gnu.org
2022-02-10 8:30 ` crazylht at gmail dot com
@ 2022-02-10 8:32 ` rguenther at suse dot de
2022-02-11 7:48 ` cvs-commit at gcc dot gnu.org
2022-02-11 7:52 ` crazylht at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: rguenther at suse dot de @ 2022-02-10 8:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104479
--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 10 Feb 2022, crazylht at gmail dot com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104479
>
> --- Comment #2 from Hongtao.liu <crazylht at gmail dot com> ---
> (In reply to Richard Biener from comment #1)
> > Confirmed. When uncond_op is expensive (there's *div amongst them) that's
> > definitely unwanted. OTOH when it is cheap then combining will reduce
> > latency.
> >
> > GIMPLE wise it's a neutral transform if uncond_op is not single-use unless
> > we need two v_c_es.
>
> We can leave it to rtl combine/fwprop which will consider rtx_cost for them.
That certainly makes sense for the !single_use case.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/104479] [12 Regression] cond_op is combined without considering single_use
2022-02-10 6:45 [Bug tree-optimization/104479] New: [12 Regression] cond_op is combined without considering single_use crazylht at gmail dot com
` (2 preceding siblings ...)
2022-02-10 8:32 ` rguenther at suse dot de
@ 2022-02-11 7:48 ` cvs-commit at gcc dot gnu.org
2022-02-11 7:52 ` crazylht at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-11 7:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104479
--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:165947fecf4d78c7effb0f1ee15e6942d8dce4ea
commit r12-7193-g165947fecf4d78c7effb0f1ee15e6942d8dce4ea
Author: liuhongt <hongtao.liu@intel.com>
Date: Thu Feb 10 15:42:13 2022 +0800
Add single_use to simplification (uncond_op + vec_cond -> cond_op).
gcc/ChangeLog:
PR tree-optimization/104479
* match.pd (uncond_op + vec_cond -> cond_op): Add single_use
for the dest of uncond_op.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr104479.c: New test.
* gcc.target/i386/cond_op_shift_w-1.c: Adjust testcase.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug tree-optimization/104479] [12 Regression] cond_op is combined without considering single_use
2022-02-10 6:45 [Bug tree-optimization/104479] New: [12 Regression] cond_op is combined without considering single_use crazylht at gmail dot com
` (3 preceding siblings ...)
2022-02-11 7:48 ` cvs-commit at gcc dot gnu.org
@ 2022-02-11 7:52 ` crazylht at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: crazylht at gmail dot com @ 2022-02-11 7:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104479
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-02-11 7:52 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10 6:45 [Bug tree-optimization/104479] New: [12 Regression] cond_op is combined without considering single_use crazylht at gmail dot com
2022-02-10 7:16 ` [Bug tree-optimization/104479] " rguenth at gcc dot gnu.org
2022-02-10 8:30 ` crazylht at gmail dot com
2022-02-10 8:32 ` rguenther at suse dot de
2022-02-11 7:48 ` cvs-commit at gcc dot gnu.org
2022-02-11 7:52 ` crazylht at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).