* [Bug tree-optimization/100089] [11 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
@ 2021-04-15 7:17 ` rguenth at gcc dot gnu.org
2021-04-27 11:40 ` [Bug tree-optimization/100089] [11/12 " jakub at gcc dot gnu.org
` (16 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-15 7:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2021-04-15
Status|UNCONFIRMED |NEW
Summary|[11 Performance regression |[11 Regression] 30%
|] 30% for |performance regression for
|denbench/mp2decoddata2 with |denbench/mp2decoddata2 with
|-O3 |-O3
Target Milestone|--- |11.0
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Indeed loop vectorization throws if-converted bodies at the BB vectorizer as a
last resort (because BB vectorization doesn't do if-conversion itself). But
the BB vectorizer then uses the if-converted scalar code as the thing to
cost against (costing against the not if-converted loop body isn't really
possible). To quote
/* If we applied if-conversion then try to vectorize the
BB of innermost loops.
??? Ideally BB vectorization would learn to vectorize
control flow by applying if-conversion on-the-fly, the
following retains the if-converted loop body even when
only non-if-converted parts took part in BB vectorization. */
if (flag_tree_slp_vectorize != 0
&& loop_vectorized_call
&& ! loop->inner)
{
as a "hack" we could see to scalar cost the always executed part of
the not if-converted loop body and apply the full bias of this cost
vs. the scalar cost of the if-converted body to the scalar cost of the
BB vectorization. But that's really apples-to-oranges in the end
(as it is now).
Maybe we can cost the whole partly vectorized loop body in this mode
and compare it against the scalar cost of the original loop. But even
the loop vectorizer costs the if-converted scalar loop, so it is off as well.
Long-term if-conversion needs to be integrated with vectorization so we
can at least keep track of what stmts were originally executed conditional
and what not.
Short-term I'm not sure we can do much. Doing SLP on the if-converted
body does help in quite some cases.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
2021-04-15 7:17 ` [Bug tree-optimization/100089] [11 Regression] 30% performance regression " rguenth at gcc dot gnu.org
@ 2021-04-27 11:40 ` jakub at gcc dot gnu.org
2021-05-12 8:19 ` rsandifo at gcc dot gnu.org
` (15 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2021-04-27 11:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|11.0 |11.2
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 11.1 has been released, retargeting bugs to GCC 11.2.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
2021-04-15 7:17 ` [Bug tree-optimization/100089] [11 Regression] 30% performance regression " rguenth at gcc dot gnu.org
2021-04-27 11:40 ` [Bug tree-optimization/100089] [11/12 " jakub at gcc dot gnu.org
@ 2021-05-12 8:19 ` rsandifo at gcc dot gnu.org
2021-05-12 8:27 ` rguenth at gcc dot gnu.org
` (14 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2021-05-12 8:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
--- Comment #3 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
Is this really a costing issue, or should we instead reject the
BB fallback if it leaves any scalar COND_EXPRs around? This would
be similar to the way that we reject IFN_MASK_LOAD/STORE calls,
except that the COND_EXPR tests would only apply to unvectorised
statements and so would need to be tested after SLP discovery
rather than before it. (Ideally IFN_MASK_LOAD/STORE would work
like that too.)
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (2 preceding siblings ...)
2021-05-12 8:19 ` rsandifo at gcc dot gnu.org
@ 2021-05-12 8:27 ` rguenth at gcc dot gnu.org
2021-07-28 7:06 ` rguenth at gcc dot gnu.org
` (13 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-05-12 8:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to rsandifo@gcc.gnu.org from comment #3)
> Is this really a costing issue, or should we instead reject the
> BB fallback if it leaves any scalar COND_EXPRs around? This would
> be similar to the way that we reject IFN_MASK_LOAD/STORE calls,
> except that the COND_EXPR tests would only apply to unvectorised
> statements and so would need to be tested after SLP discovery
> rather than before it. (Ideally IFN_MASK_LOAD/STORE would work
> like that too.)
I suppose we could do that, but then I'm not sure how exactly we'd do it ;)
Good idea anyway.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (3 preceding siblings ...)
2021-05-12 8:27 ` rguenth at gcc dot gnu.org
@ 2021-07-28 7:06 ` rguenth at gcc dot gnu.org
2021-08-23 12:47 ` rguenth at gcc dot gnu.org
` (12 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-28 7:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|11.2 |11.3
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 11.2 is being released, retargeting bugs to GCC 11.3
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (4 preceding siblings ...)
2021-07-28 7:06 ` rguenth at gcc dot gnu.org
@ 2021-08-23 12:47 ` rguenth at gcc dot gnu.org
2021-08-24 1:28 ` cvs-commit at gcc dot gnu.org
` (11 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-23 12:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
Status|NEW |ASSIGNED
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Working on this now.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (5 preceding siblings ...)
2021-08-23 12:47 ` rguenth at gcc dot gnu.org
@ 2021-08-24 1:28 ` cvs-commit at gcc dot gnu.org
2021-08-24 9:36 ` rguenth at gcc dot gnu.org
` (10 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-08-24 1:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:819b7c3a339e3bdaf85cd55954c5536bd98aae09
commit r12-3103-g819b7c3a339e3bdaf85cd55954c5536bd98aae09
Author: liuhongt <hongtao.liu@intel.com>
Date: Wed Aug 4 16:39:31 2021 +0800
Disable slp in loop vectorizer when cost model is very-cheap.
Performance impact for the commit with option:
-march=x86-64 -O2 -ftree-vectorize -fvect-cost-model=very-cheap
SPEC2017 fprate
503.bwaves_r BuildSame
507.cactuBSSN_r -0.04
508.namd_r 0.14
510.parest_r -0.54
511.povray_r 0.10
519.lbm_r BuildSame
521.wrf_r 0.64
526.blender_r -0.32
527.cam4_r 0.17
538.imagick_r 0.09
544.nab_r BuildSame
549.fotonik3d_r BuildSame
554.roms_r BuildSame
997.specrand_fr -0.09
Geometric mean: 0.02
SPEC2017 intrate
500.perlbench_r 0.26
502.gcc_r 0.21
505.mcf_r -0.09
520.omnetpp_r BuildSame
523.xalancbmk_r BuildSame
525.x264_r -0.41
531.deepsjeng_r BuildSame
541.leela_r 0.13
548.exchange2_r BuildSame
557.xz_r BuildSame
999.specrand_ir BuildSame
Geometric mean: 0.02
EEMBC: no regression, only improvement or build the same, the below is
improved benchmarks.
mp2decoddata1 7.59
mp2decoddata2 31.80
mp2decoddata3 12.15
mp2decoddata4 11.16
mp2decoddata5 11.19
mp2decoddata1 7.06
mp2decoddata2 24.12
mp2decoddata3 10.83
mp2decoddata4 10.04
mp2decoddata5 10.07
gcc/ChangeLog:
PR tree-optimization/100089
* tree-vectorizer.c (try_vectorize_loop_1): Disable slp in
loop vectorizer when cost model is very-cheap.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (6 preceding siblings ...)
2021-08-24 1:28 ` cvs-commit at gcc dot gnu.org
@ 2021-08-24 9:36 ` rguenth at gcc dot gnu.org
2021-08-24 10:34 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-24 9:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
So if we agree to a sane way to cost branchy code on the scalar side then it
should be possible to compare the scalar cost of the not if-converted inner
loop body against the full partially vectorized and if-converted inner loop
body.
vect_bb_vectorization_profitable_p would have to add the cost of the scalar
stmts not covered by vectorization - this set is conveniently available as
the set of stmts not having the visited flag set before we clear it here:
vect_bb_vectorization_profitable_p (bb_vec_info bb_vinfo,
vec<slp_instance> slp_instances)
{
...
/* Unset visited flag. */
stmt_info_for_cost *cost;
FOR_EACH_VEC_ELT (scalar_costs, i, cost)
gimple_set_visited (cost->stmt_info->stmt, false);
so we'd need to walk over all stmts in the BB and add the cost of the
not marked stmts to the vector cost. We'd want to force a single
SLP "subgraph" in this mode to avoid going over the whole block
multiple times.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (7 preceding siblings ...)
2021-08-24 9:36 ` rguenth at gcc dot gnu.org
@ 2021-08-24 10:34 ` rguenth at gcc dot gnu.org
2021-08-24 12:23 ` cvs-commit at gcc dot gnu.org
` (8 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-24 10:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 51350
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51350&action=edit
patch
This implements scanning for not vectorized COND_EXPRs with
-fvect-cost-model=very-cheap when vectorizing if-converted loop bodies. It
also gives
vect_bb_vectorization_profitable_p enough info to eventually do something
more fancy (namely the original not if-converted loop body).
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (8 preceding siblings ...)
2021-08-24 10:34 ` rguenth at gcc dot gnu.org
@ 2021-08-24 12:23 ` cvs-commit at gcc dot gnu.org
2021-08-24 12:27 ` rguenth at gcc dot gnu.org
` (7 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-08-24 12:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:9216ee6d1195d48388f825cf1b072e570129cbbe
commit r12-3116-g9216ee6d1195d48388f825cf1b072e570129cbbe
Author: Richard Biener <rguenther@suse.de>
Date: Tue Aug 24 12:25:25 2021 +0200
tree-optimization/100089 - avoid leaving scalar if-converted code around
This avoids leaving scalar if-converted code around for the case
of BB vectorizing an if-converted loop body when using the very-cheap
cost model. In this case we scan not vectorized scalar stmts in
the basic-block vectorized for COND_EXPRs and force the vectorization
to be marked as not profitable.
The patch also makes sure to always consider all BB vectorization
subgraphs together for costing purposes when vectorizing an
if-converted loop body.
2021-08-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/100089
* tree-vectorizer.h (vect_slp_bb): Rename to ...
(vect_slp_if_converted_bb): ... this and get the original
loop as new argument.
* tree-vectorizer.c (try_vectorize_loop_1): Revert previous fix,
pass original loop to vect_slp_if_converted_bb.
* tree-vect-slp.c (vect_bb_vectorization_profitable_p):
If orig_loop was passed scan the not vectorized stmts
for COND_EXPRs and force not profitable if found.
(vect_slp_region): Pass down all SLP instances to costing
if orig_loop was specified.
(vect_slp_bbs): Pass through orig_loop.
(vect_slp_bb): Rename to ...
(vect_slp_if_converted_bb): ... this and get the original
loop as new argument.
(vect_slp_function): Adjust.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (9 preceding siblings ...)
2021-08-24 12:23 ` cvs-commit at gcc dot gnu.org
@ 2021-08-24 12:27 ` rguenth at gcc dot gnu.org
2021-08-30 12:05 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-24 12:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
So this fixes it for -fvect-cost-model=very-cheap. One could argue that we
should enable the code for all cost models, fixing the -O3 regression (and
backportable to the branch).
I'll see to experiment with "fancy" costing of the stray vectorizations. I'll
also note that the scanning for load/store (and other ifns not supported)
could be handled in the costing as well (but the costing would need to run
also for -fno-vect-cost-model then, just the result ignored if not forced).
I'm talking about
bool require_loop_vectorize = false;
for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
!gsi_end_p (gsi); gsi_next (&gsi))
{
gimple *stmt = gsi_stmt (gsi);
gcall *call = dyn_cast <gcall *> (stmt);
if (call && gimple_call_internal_p (call))
{
internal_fn ifn = gimple_call_internal_fn (call);
if (ifn == IFN_MASK_LOAD || ifn == IFN_MASK_STORE
/* Don't keep the if-converted parts when the ifn with
specifc type is not supported by the backend. */
|| (direct_internal_fn_p (ifn)
&& !direct_internal_fn_supported_p
(call, OPTIMIZE_FOR_SPEED)))
{
require_loop_vectorize = true;
break;
}
}
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (10 preceding siblings ...)
2021-08-24 12:27 ` rguenth at gcc dot gnu.org
@ 2021-08-30 12:05 ` rguenth at gcc dot gnu.org
2021-08-31 10:27 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-30 12:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
Bug 100089 depends on bug 102128, which changed state.
Bug 102128 Summary: [12 Regression] Huge performance drop for 519.lbm_r since r12-3116-g9216ee6d1195d48388f825cf1b072e570129cbbe
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102128
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (11 preceding siblings ...)
2021-08-30 12:05 ` rguenth at gcc dot gnu.org
@ 2021-08-31 10:27 ` rguenth at gcc dot gnu.org
2022-01-21 12:34 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-31 10:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
Bug 100089 depends on bug 102142, which changed state.
Bug 102142 Summary: [12 Regression] ICE Segmentation fault since r12-3222-g89f33f44addbf985
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102142
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (12 preceding siblings ...)
2021-08-31 10:27 ` rguenth at gcc dot gnu.org
@ 2022-01-21 12:34 ` rguenth at gcc dot gnu.org
2022-01-21 13:23 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-21 12:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11/12 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (13 preceding siblings ...)
2022-01-21 12:34 ` rguenth at gcc dot gnu.org
@ 2022-01-21 13:23 ` cvs-commit at gcc dot gnu.org
2022-01-21 13:24 ` [Bug tree-optimization/100089] [11 " rguenth at gcc dot gnu.org
` (2 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-01-21 13:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:199cd0e0f8744ca1e61a95987b2d020a592a46d9
commit r12-6795-g199cd0e0f8744ca1e61a95987b2d020a592a46d9
Author: Richard Biener <rguenther@suse.de>
Date: Fri Jan 21 13:29:06 2022 +0100
tree-optimization/100089 - BB vectorization of if-converted loop bodies
The PR complains that when we only partially BB vectorize an
if-converted loop body that this can leave unvectorized code
unconditionally executed and thus effectively slow down code.
For -O2 we already mitigated the issue by not doing BB vectorization
when not all if-converted stmts were covered but the issue is
present with -O3 as well. Thus the following simply extends the
fix to cover all but the unlimited cost models. It is after all
very likely that we vectorize some stmts, if only a single
paired store.
2022-01-21 Richard Biener <rguenther@suse.de>
PR tree-optimization/100089
* tree-vect-slp.cc (vect_slp_region): Reject BB vectorization
of if-converted loops with unvectorized COND_EXPRs for
all but the unlimited cost models.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (14 preceding siblings ...)
2022-01-21 13:23 ` cvs-commit at gcc dot gnu.org
@ 2022-01-21 13:24 ` rguenth at gcc dot gnu.org
2022-04-21 7:49 ` rguenth at gcc dot gnu.org
2023-05-29 10:04 ` jakub at gcc dot gnu.org
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-21 13:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[11/12 Regression] 30% |[11 Regression] 30%
|performance regression for |performance regression for
|denbench/mp2decoddata2 with |denbench/mp2decoddata2 with
|-O3 |-O3
Known to fail| |11.2.1
Known to work| |12.0
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Now fixed for GCC 12 even with -O3.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (15 preceding siblings ...)
2022-01-21 13:24 ` [Bug tree-optimization/100089] [11 " rguenth at gcc dot gnu.org
@ 2022-04-21 7:49 ` rguenth at gcc dot gnu.org
2023-05-29 10:04 ` jakub at gcc dot gnu.org
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-21 7:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|11.3 |11.4
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 11.3 is being released, retargeting bugs to GCC 11.4.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug tree-optimization/100089] [11 Regression] 30% performance regression for denbench/mp2decoddata2 with -O3
2021-04-15 6:35 [Bug tree-optimization/100089] New: [11 Performance regression ] 30% for denbench/mp2decoddata2 with -O3 crazylht at gmail dot com
` (16 preceding siblings ...)
2022-04-21 7:49 ` rguenth at gcc dot gnu.org
@ 2023-05-29 10:04 ` jakub at gcc dot gnu.org
17 siblings, 0 replies; 19+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-29 10:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100089
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|11.4 |11.5
--- Comment #15 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 11.4 is being released, retargeting bugs to GCC 11.5.
^ permalink raw reply [flat|nested] 19+ messages in thread