public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/65206] New: Vectorized version of loop is removed.
@ 2015-02-25 14:50 ysrumyan at gmail dot com
2015-02-25 14:55 ` [Bug tree-optimization/65206] " ysrumyan at gmail dot com
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: ysrumyan at gmail dot com @ 2015-02-25 14:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
Bug ID: 65206
Summary: Vectorized version of loop is removed.
Product: gcc
Version: 5.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: ysrumyan at gmail dot com
I noticed that vectorized version of loop is deleted although compiler reports
that it was successfully vectorized:
t1.c:7:3: note: LOOP VECTORIZED
but after we can see in vect-dump:
Removing basic block 4
basic block 4, loop depth 1
pred: 45
# i_16 = PHI <i_73(45)>
# ivtmp_15 = PHI <ivtmp_76(45)>
# vectp_a1.11_116 = PHI <vectp_a1.12_114(45)>
# vectp_a2.19_125 = PHI <vectp_a2.20_123(45)>
# vectp_a3.22_130 = PHI <vectp_a3.23_128(45)>
# vectp_a1.25_136 = PHI <vectp_a1.26_134(45)>
# vectp_a3.34_147 = PHI <vectp_a3.35_145(45)>
# ivtmp_38 = PHI <0(45)>
vect__5.13_118 = MEM[(float *)vectp_a1.11_116];
_5 = a1[i_16];
_31 = &a2[i_16];
vect__ifc__32.14_122 = VEC_COND_EXPR <vect__5.13_118 >= vect_cst_.15_119,
vect_cst_.16_120, vect_cst_.17_121>;
_ifc__32 = _5 >= x_6(D) ? 4294967295 : 0;
vect__7.18_127 = MASK_LOAD (vectp_a2.19_125, 0B, vect__ifc__32.14_122);
_7 = 0.0;
_33 = &a3[i_16];
vect__8.21_132 = MASK_LOAD (vectp_a3.22_130, 0B, vect__ifc__32.14_122);
_8 = 0.0;
vect__9.24_133 = vect__7.18_127 + vect__8.21_132;
_9 = _7 + _8;
_34 = &a1[i_16];
MASK_STORE (vectp_a1.25_136, 0B, vect__ifc__32.14_122, vect__9.24_133);
vect__11.27_140 = vect_cst_.28_37 + vect_cst_.29_139;
_11 = x_6(D) + 1.0e+0;
_35 = &a3[i_16];
vect__ifc__36.30_144 = VEC_COND_EXPR <vect__5.13_118 >= vect_cst_.31_141,
vect_cst_.32_142, vect_cst_.33_143>;
_ifc__36 = _5 >= x_6(D) ? 0 : 4294967295;
...
Test and compile options will be attached.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] Vectorized version of loop is removed.
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
@ 2015-02-25 14:55 ` ysrumyan at gmail dot com
2015-02-25 15:08 ` rguenth at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: ysrumyan at gmail dot com @ 2015-02-25 14:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
--- Comment #1 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
Created attachment 34867
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34867&action=edit
test-case to reproduce
Test needs to be compiled with -Ofast -m64 -mcore-avx2 options.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] Vectorized version of loop is removed.
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
2015-02-25 14:55 ` [Bug tree-optimization/65206] " ysrumyan at gmail dot com
@ 2015-02-25 15:08 ` rguenth at gcc dot gnu.org
2015-02-25 15:11 ` rguenth at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-25 15:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We apply versioning for aliasing but compute it as always aliasing in some way,
thus the runtime check gets immediately folded and thus the vectorized loop
removed:
t.c:7:3: note: create runtime check for data references a1[i_16] and *_34
t.c:7:3: note: created 1 versioning for alias checks.
t.c:7:3: note: loop versioned for vectorization because of possible aliasing
...
but I see the alias runtime check nowhere.
The DRs are
(gdb) p debug_data_reference (dr_a.dr)
#(Data Ref:
# bb: 4
# stmt: _5 = a1[i_16];
# ref: a1[i_16];
# base_object: a1;
# Access function 0: {0, +, 1}_1
#)
$17 = void
(gdb) p debug_data_reference (dr_b.dr)
#(Data Ref:
# bb: 4
# stmt: MASK_STORE (_34, 0B, _ifc__32, _9);
# ref: *_34;
# base_object: MEM[(float *)&a1];
# Access function 0: {0B, +, 4}_1
#)
so maybe the code doing masked loads/stores updates the DRs in a way that
will later confuse runtime alias checking. Or for some reason it doesn't
update it enough to make data-dependence analysis handle it.
Clearly this is a must-dependence (but with known distance), so sth
that data dependence analysis should handle and sth that the runtime
alias checking isn't handling.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] Vectorized version of loop is removed.
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
2015-02-25 14:55 ` [Bug tree-optimization/65206] " ysrumyan at gmail dot com
2015-02-25 15:08 ` rguenth at gcc dot gnu.org
@ 2015-02-25 15:11 ` rguenth at gcc dot gnu.org
2015-02-25 15:30 ` rguenth at gcc dot gnu.org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-25 15:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed| |2015-02-25
CC| |jakub at gcc dot gnu.org
Blocks| |53947
Ever confirmed|0 |1
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed. CCing Jakub who implemented this stuff.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] Vectorized version of loop is removed.
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
` (2 preceding siblings ...)
2015-02-25 15:11 ` rguenth at gcc dot gnu.org
@ 2015-02-25 15:30 ` rguenth at gcc dot gnu.org
2015-02-25 15:33 ` rguenth at gcc dot gnu.org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-25 15:30 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm talking about
(compute_affine_dependence
stmt_a: _5 = a1[i_16];
stmt_b: MASK_STORE (_34, 0B, _ifc__32, _9);
) -> dependence analysis failed
somehow it works for
(compute_affine_dependence
stmt_a: _8 = MASK_LOAD (_33, 0B, _ifc__32);
stmt_b: MASK_STORE (_35, 0B, _ifc__36, _11);
(analyze_overlapping_iterations
(chrec_a = {0B, +, 4}_1)
(chrec_b = {0B, +, 4}_1)
(overlap_iterations_a = [0])
(overlap_iterations_b = [0]))
)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] Vectorized version of loop is removed.
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
` (3 preceding siblings ...)
2015-02-25 15:30 ` rguenth at gcc dot gnu.org
@ 2015-02-25 15:33 ` rguenth at gcc dot gnu.org
2015-02-26 12:15 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-25 15:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> I'm talking about
>
> (compute_affine_dependence
> stmt_a: _5 = a1[i_16];
> stmt_b: MASK_STORE (_34, 0B, _ifc__32, _9);
> ) -> dependence analysis failed
Ah - this is pointer vs. array access, thus different "base". A generic issue
that can for example be "solved" by trying to force a pointer base for
a non-pointer base DR when analysis fails and one is based on a pointer.
Hmm.
Mine.
> somehow it works for
>
> (compute_affine_dependence
> stmt_a: _8 = MASK_LOAD (_33, 0B, _ifc__32);
> stmt_b: MASK_STORE (_35, 0B, _ifc__36, _11);
> (analyze_overlapping_iterations
> (chrec_a = {0B, +, 4}_1)
> (chrec_b = {0B, +, 4}_1)
> (overlap_iterations_a = [0])
> (overlap_iterations_b = [0]))
> )
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] Vectorized version of loop is removed.
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
` (4 preceding siblings ...)
2015-02-25 15:33 ` rguenth at gcc dot gnu.org
@ 2015-02-26 12:15 ` rguenth at gcc dot gnu.org
2015-02-26 12:38 ` rguenth at gcc dot gnu.org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-26 12:15 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 34882
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34882&action=edit
hack just for the masked load/store case
Incomplete special-casing for the masked load/store case. We need to mark
the masked load/store IFN calls somehow to mark the forwarding as valid.
A "real" fix would "duplicate" dr->indices to always have an alternate
analysis for *&dr->ref in case dr->ref is not a pointer evolution. We
could then pick the one with compatible dr->indices.base_object. Of course
that may still not handle all cases, if the ptr evolution is not enough
(like for having a[2][j] vs. *(&a[3][j])). But at least it could help in
some general cases.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] Vectorized version of loop is removed.
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
` (5 preceding siblings ...)
2015-02-26 12:15 ` rguenth at gcc dot gnu.org
@ 2015-02-26 12:38 ` rguenth at gcc dot gnu.org
2021-09-08 11:08 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-26 12:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
For the masked load/store case we could also simply put the "real" memory
access
in place of the pointer argument. To make that valid GIMPLE we could wrap it
inside a fake VIEW_CONVERT_EXPR for example - like one converting to a
char[sizeof (ref)], this would make it appear as aggregate. Of course when
we inspect the masked load/store we'd have to strip that VIEW_CONVERT_EXPR
again. But I don't think it would harm anyone seeing that VIEW_CONVERT_EXPR.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] Vectorized version of loop is removed.
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
` (6 preceding siblings ...)
2015-02-26 12:38 ` rguenth at gcc dot gnu.org
@ 2021-09-08 11:08 ` rguenth at gcc dot gnu.org
2021-09-20 6:51 ` [Bug tree-optimization/65206] vectorized version of loop is removed, dependence analysis fails for *&a[i] vs a[j] cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-08 11:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |rguenth at gcc dot gnu.org
--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
*** Bug 101548 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] vectorized version of loop is removed, dependence analysis fails for *&a[i] vs a[j]
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
` (7 preceding siblings ...)
2021-09-08 11:08 ` rguenth at gcc dot gnu.org
@ 2021-09-20 6:51 ` cvs-commit at gcc dot gnu.org
2021-09-20 6:51 ` rguenth at gcc dot gnu.org
2022-03-29 12:57 ` rguenth at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-20 6:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:f92901a508305f291fcf2acae0825379477724de
commit r12-3677-gf92901a508305f291fcf2acae0825379477724de
Author: Richard Biener <rguenther@suse.de>
Date: Wed Sep 8 14:42:31 2021 +0200
tree-optimization/65206 - dependence analysis on mixed pointer/array
This adds the capability to analyze the dependence of mixed
pointer/array accesses. The example is from where using a masked
load/store creates the pointer-based access when an otherwise
unconditional access is array based. Other examples would include
accesses to an array mixed with accesses from inlined helpers
that work on pointers.
The idea is quite simple and old - analyze the data-ref indices
as if the reference was pointer-based. The following change does
this by changing dr_analyze_indices to work on the indices
sub-structure and storing an alternate indices substructure in
each data reference. That alternate set of indices is analyzed
lazily by initialize_data_dependence_relation when it fails to
match-up the main set of indices of two data references.
initialize_data_dependence_relation is refactored into a head
and a tail worker and changed to work on one of the indices
structures and thus away from using DR_* access macros which
continue to reference the main indices substructure.
There are quite some vectorization and loop distribution opportunities
unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
544.nab_r see amendments in what they report with -fopt-info-loop while
the rest of the specrate set sees no changes there. Measuring runtime
for the set where changes were reported reveals nothing off-noise
besides 511.povray_r which seems to regress slightly for me
(on a Zen2 machine with -Ofast -march=native).
2021-09-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/65206
* tree-data-ref.h (struct data_reference): Add alt_indices,
order it last.
* tree-data-ref.c (free_data_ref): Release alt_indices.
(dr_analyze_indices): Work on struct indices and get DR_REF as
tree.
(create_data_ref): Adjust.
(initialize_data_dependence_relation): Split into head
and tail. When the base objects fail to match up try
again with pointer-based analysis of indices.
* tree-vectorizer.c (vec_info_shared::check_datarefs): Do
not compare the lazily computed alternate set of indices.
* gcc.dg/torture/20210916.c: New testcase.
* gcc.dg/vect/pr65206.c: Likewise.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] vectorized version of loop is removed, dependence analysis fails for *&a[i] vs a[j]
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
` (8 preceding siblings ...)
2021-09-20 6:51 ` [Bug tree-optimization/65206] vectorized version of loop is removed, dependence analysis fails for *&a[i] vs a[j] cvs-commit at gcc dot gnu.org
@ 2021-09-20 6:51 ` rguenth at gcc dot gnu.org
2022-03-29 12:57 ` rguenth at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-20 6:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Known to work| |12.0
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
Known to fail| |11.2.1
Target Milestone|--- |12.0
--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed for GCC 12.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug tree-optimization/65206] vectorized version of loop is removed, dependence analysis fails for *&a[i] vs a[j]
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
` (9 preceding siblings ...)
2021-09-20 6:51 ` rguenth at gcc dot gnu.org
@ 2022-03-29 12:57 ` rguenth at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-03-29 12:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
*** Bug 69732 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-03-29 12:58 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
2015-02-25 14:55 ` [Bug tree-optimization/65206] " ysrumyan at gmail dot com
2015-02-25 15:08 ` rguenth at gcc dot gnu.org
2015-02-25 15:11 ` rguenth at gcc dot gnu.org
2015-02-25 15:30 ` rguenth at gcc dot gnu.org
2015-02-25 15:33 ` rguenth at gcc dot gnu.org
2015-02-26 12:15 ` rguenth at gcc dot gnu.org
2015-02-26 12:38 ` rguenth at gcc dot gnu.org
2021-09-08 11:08 ` rguenth at gcc dot gnu.org
2021-09-20 6:51 ` [Bug tree-optimization/65206] vectorized version of loop is removed, dependence analysis fails for *&a[i] vs a[j] cvs-commit at gcc dot gnu.org
2021-09-20 6:51 ` rguenth at gcc dot gnu.org
2022-03-29 12:57 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).