public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/65206] New: Vectorized version of loop is removed.
@ 2015-02-25 14:50 ysrumyan at gmail dot com
  2015-02-25 14:55 ` [Bug tree-optimization/65206] " ysrumyan at gmail dot com
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: ysrumyan at gmail dot com @ 2015-02-25 14:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

            Bug ID: 65206
           Summary: Vectorized version of loop is removed.
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ysrumyan at gmail dot com

I noticed that vectorized version of loop is deleted although compiler reports
that it was successfully vectorized:
t1.c:7:3: note: LOOP VECTORIZED

but after we can see in vect-dump:

Removing basic block 4
basic block 4, loop depth 1
 pred:       45
# i_16 = PHI <i_73(45)>
# ivtmp_15 = PHI <ivtmp_76(45)>
# vectp_a1.11_116 = PHI <vectp_a1.12_114(45)>
# vectp_a2.19_125 = PHI <vectp_a2.20_123(45)>
# vectp_a3.22_130 = PHI <vectp_a3.23_128(45)>
# vectp_a1.25_136 = PHI <vectp_a1.26_134(45)>
# vectp_a3.34_147 = PHI <vectp_a3.35_145(45)>
# ivtmp_38 = PHI <0(45)>
vect__5.13_118 = MEM[(float *)vectp_a1.11_116];
_5 = a1[i_16];
_31 = &a2[i_16];
vect__ifc__32.14_122 = VEC_COND_EXPR <vect__5.13_118 >= vect_cst_.15_119,
vect_cst_.16_120, vect_cst_.17_121>;
_ifc__32 = _5 >= x_6(D) ? 4294967295 : 0;
vect__7.18_127 = MASK_LOAD (vectp_a2.19_125, 0B, vect__ifc__32.14_122);
_7 = 0.0;
_33 = &a3[i_16];
vect__8.21_132 = MASK_LOAD (vectp_a3.22_130, 0B, vect__ifc__32.14_122);
_8 = 0.0;
vect__9.24_133 = vect__7.18_127 + vect__8.21_132;
_9 = _7 + _8;
_34 = &a1[i_16];
MASK_STORE (vectp_a1.25_136, 0B, vect__ifc__32.14_122, vect__9.24_133);
vect__11.27_140 = vect_cst_.28_37 + vect_cst_.29_139;
_11 = x_6(D) + 1.0e+0;
_35 = &a3[i_16];
vect__ifc__36.30_144 = VEC_COND_EXPR <vect__5.13_118 >= vect_cst_.31_141,
vect_cst_.32_142, vect_cst_.33_143>;
_ifc__36 = _5 >= x_6(D) ? 0 : 4294967295;
...

Test and compile options will be attached.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] Vectorized version of loop is removed.
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
@ 2015-02-25 14:55 ` ysrumyan at gmail dot com
  2015-02-25 15:08 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: ysrumyan at gmail dot com @ 2015-02-25 14:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

--- Comment #1 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
Created attachment 34867
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34867&action=edit
test-case to reproduce

Test needs to be compiled with -Ofast -m64 -mcore-avx2 options.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] Vectorized version of loop is removed.
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
  2015-02-25 14:55 ` [Bug tree-optimization/65206] " ysrumyan at gmail dot com
@ 2015-02-25 15:08 ` rguenth at gcc dot gnu.org
  2015-02-25 15:11 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-25 15:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
We apply versioning for aliasing but compute it as always aliasing in some way,
thus the runtime check gets immediately folded and thus the vectorized loop
removed:

t.c:7:3: note: create runtime check for data references a1[i_16] and *_34
t.c:7:3: note: created 1 versioning for alias checks.
t.c:7:3: note: loop versioned for vectorization because of possible aliasing
...

but I see the alias runtime check nowhere.

The DRs are

(gdb) p debug_data_reference (dr_a.dr)
#(Data Ref: 
#  bb: 4 
#  stmt: _5 = a1[i_16];
#  ref: a1[i_16];
#  base_object: a1;
#  Access function 0: {0, +, 1}_1
#)
$17 = void
(gdb) p debug_data_reference (dr_b.dr)
#(Data Ref: 
#  bb: 4 
#  stmt: MASK_STORE (_34, 0B, _ifc__32, _9);
#  ref: *_34;
#  base_object: MEM[(float *)&a1];
#  Access function 0: {0B, +, 4}_1
#)

so maybe the code doing masked loads/stores updates the DRs in a way that
will later confuse runtime alias checking.  Or for some reason it doesn't
update it enough to make data-dependence analysis handle it.

Clearly this is a must-dependence (but with known distance), so sth
that data dependence analysis should handle and sth that the runtime
alias checking isn't handling.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] Vectorized version of loop is removed.
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
  2015-02-25 14:55 ` [Bug tree-optimization/65206] " ysrumyan at gmail dot com
  2015-02-25 15:08 ` rguenth at gcc dot gnu.org
@ 2015-02-25 15:11 ` rguenth at gcc dot gnu.org
  2015-02-25 15:30 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-25 15:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2015-02-25
                 CC|                            |jakub at gcc dot gnu.org
             Blocks|                            |53947
     Ever confirmed|0                           |1

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Confirmed.  CCing Jakub who implemented this stuff.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] Vectorized version of loop is removed.
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
                   ` (2 preceding siblings ...)
  2015-02-25 15:11 ` rguenth at gcc dot gnu.org
@ 2015-02-25 15:30 ` rguenth at gcc dot gnu.org
  2015-02-25 15:33 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-25 15:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm talking about

(compute_affine_dependence
  stmt_a: _5 = a1[i_16];
  stmt_b: MASK_STORE (_34, 0B, _ifc__32, _9);
) -> dependence analysis failed

somehow it works for

(compute_affine_dependence
  stmt_a: _8 = MASK_LOAD (_33, 0B, _ifc__32);
  stmt_b: MASK_STORE (_35, 0B, _ifc__36, _11);
(analyze_overlapping_iterations
  (chrec_a = {0B, +, 4}_1)
  (chrec_b = {0B, +, 4}_1)
  (overlap_iterations_a = [0])
  (overlap_iterations_b = [0]))
)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] Vectorized version of loop is removed.
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
                   ` (3 preceding siblings ...)
  2015-02-25 15:30 ` rguenth at gcc dot gnu.org
@ 2015-02-25 15:33 ` rguenth at gcc dot gnu.org
  2015-02-26 12:15 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-25 15:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #4)
> I'm talking about
> 
> (compute_affine_dependence
>   stmt_a: _5 = a1[i_16];
>   stmt_b: MASK_STORE (_34, 0B, _ifc__32, _9);
> ) -> dependence analysis failed

Ah - this is pointer vs. array access, thus different "base".  A generic issue
that can for example be "solved" by trying to force a pointer base for
a non-pointer base DR when analysis fails and one is based on a pointer.

Hmm.

Mine.

> somehow it works for
> 
> (compute_affine_dependence
>   stmt_a: _8 = MASK_LOAD (_33, 0B, _ifc__32);
>   stmt_b: MASK_STORE (_35, 0B, _ifc__36, _11);
> (analyze_overlapping_iterations
>   (chrec_a = {0B, +, 4}_1)
>   (chrec_b = {0B, +, 4}_1)
>   (overlap_iterations_a = [0])
>   (overlap_iterations_b = [0]))
> )


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] Vectorized version of loop is removed.
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
                   ` (4 preceding siblings ...)
  2015-02-25 15:33 ` rguenth at gcc dot gnu.org
@ 2015-02-26 12:15 ` rguenth at gcc dot gnu.org
  2015-02-26 12:38 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-26 12:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 34882
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34882&action=edit
hack just for the masked load/store case

Incomplete special-casing for the masked load/store case.  We need to mark
the masked load/store IFN calls somehow to mark the forwarding as valid.

A "real" fix would "duplicate" dr->indices to always have an alternate
analysis for *&dr->ref in case dr->ref is not a pointer evolution.  We
could then pick the one with compatible dr->indices.base_object.  Of course
that may still not handle all cases, if the ptr evolution is not enough
(like for having a[2][j] vs. *(&a[3][j])).  But at least it could help in
some general cases.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] Vectorized version of loop is removed.
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
                   ` (5 preceding siblings ...)
  2015-02-26 12:15 ` rguenth at gcc dot gnu.org
@ 2015-02-26 12:38 ` rguenth at gcc dot gnu.org
  2021-09-08 11:08 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-02-26 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
For the masked load/store case we could also simply put the "real" memory
access
in place of the pointer argument.  To make that valid GIMPLE we could wrap it
inside a fake VIEW_CONVERT_EXPR for example - like one converting to a
char[sizeof (ref)], this would make it appear as aggregate.  Of course when
we inspect the masked load/store we'd have to strip that VIEW_CONVERT_EXPR
again.  But I don't think it would harm anyone seeing that VIEW_CONVERT_EXPR.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] Vectorized version of loop is removed.
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
                   ` (6 preceding siblings ...)
  2015-02-26 12:38 ` rguenth at gcc dot gnu.org
@ 2021-09-08 11:08 ` rguenth at gcc dot gnu.org
  2021-09-20  6:51 ` [Bug tree-optimization/65206] vectorized version of loop is removed, dependence analysis fails for *&a[i] vs a[j] cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-08 11:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
*** Bug 101548 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] vectorized version of loop is removed,  dependence analysis fails for *&a[i] vs a[j]
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
                   ` (7 preceding siblings ...)
  2021-09-08 11:08 ` rguenth at gcc dot gnu.org
@ 2021-09-20  6:51 ` cvs-commit at gcc dot gnu.org
  2021-09-20  6:51 ` rguenth at gcc dot gnu.org
  2022-03-29 12:57 ` rguenth at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-20  6:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:f92901a508305f291fcf2acae0825379477724de

commit r12-3677-gf92901a508305f291fcf2acae0825379477724de
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Sep 8 14:42:31 2021 +0200

    tree-optimization/65206 - dependence analysis on mixed pointer/array

    This adds the capability to analyze the dependence of mixed
    pointer/array accesses.  The example is from where using a masked
    load/store creates the pointer-based access when an otherwise
    unconditional access is array based.  Other examples would include
    accesses to an array mixed with accesses from inlined helpers
    that work on pointers.

    The idea is quite simple and old - analyze the data-ref indices
    as if the reference was pointer-based.  The following change does
    this by changing dr_analyze_indices to work on the indices
    sub-structure and storing an alternate indices substructure in
    each data reference.  That alternate set of indices is analyzed
    lazily by initialize_data_dependence_relation when it fails to
    match-up the main set of indices of two data references.
    initialize_data_dependence_relation is refactored into a head
    and a tail worker and changed to work on one of the indices
    structures and thus away from using DR_* access macros which
    continue to reference the main indices substructure.

    There are quite some vectorization and loop distribution opportunities
    unleashed in SPEC CPU 2017, notably 520.omnetpp_r, 548.exchange2_r,
    510.parest_r, 511.povray_r, 521.wrf_r, 526.blender_r, 527.cam4_r and
    544.nab_r see amendments in what they report with -fopt-info-loop while
    the rest of the specrate set sees no changes there.  Measuring runtime
    for the set where changes were reported reveals nothing off-noise
    besides 511.povray_r which seems to regress slightly for me
    (on a Zen2 machine with -Ofast -march=native).

    2021-09-08  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/65206
            * tree-data-ref.h (struct data_reference): Add alt_indices,
            order it last.
            * tree-data-ref.c (free_data_ref): Release alt_indices.
            (dr_analyze_indices): Work on struct indices and get DR_REF as
tree.
            (create_data_ref): Adjust.
            (initialize_data_dependence_relation): Split into head
            and tail.  When the base objects fail to match up try
            again with pointer-based analysis of indices.
            * tree-vectorizer.c (vec_info_shared::check_datarefs): Do
            not compare the lazily computed alternate set of indices.

            * gcc.dg/torture/20210916.c: New testcase.
            * gcc.dg/vect/pr65206.c: Likewise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] vectorized version of loop is removed,  dependence analysis fails for *&a[i] vs a[j]
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
                   ` (8 preceding siblings ...)
  2021-09-20  6:51 ` [Bug tree-optimization/65206] vectorized version of loop is removed, dependence analysis fails for *&a[i] vs a[j] cvs-commit at gcc dot gnu.org
@ 2021-09-20  6:51 ` rguenth at gcc dot gnu.org
  2022-03-29 12:57 ` rguenth at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-09-20  6:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to work|                            |12.0
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED
      Known to fail|                            |11.2.1
   Target Milestone|---                         |12.0

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed for GCC 12.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/65206] vectorized version of loop is removed,  dependence analysis fails for *&a[i] vs a[j]
  2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
                   ` (9 preceding siblings ...)
  2021-09-20  6:51 ` rguenth at gcc dot gnu.org
@ 2022-03-29 12:57 ` rguenth at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-03-29 12:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65206

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
*** Bug 69732 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-29 12:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-25 14:50 [Bug tree-optimization/65206] New: Vectorized version of loop is removed ysrumyan at gmail dot com
2015-02-25 14:55 ` [Bug tree-optimization/65206] " ysrumyan at gmail dot com
2015-02-25 15:08 ` rguenth at gcc dot gnu.org
2015-02-25 15:11 ` rguenth at gcc dot gnu.org
2015-02-25 15:30 ` rguenth at gcc dot gnu.org
2015-02-25 15:33 ` rguenth at gcc dot gnu.org
2015-02-26 12:15 ` rguenth at gcc dot gnu.org
2015-02-26 12:38 ` rguenth at gcc dot gnu.org
2021-09-08 11:08 ` rguenth at gcc dot gnu.org
2021-09-20  6:51 ` [Bug tree-optimization/65206] vectorized version of loop is removed, dependence analysis fails for *&a[i] vs a[j] cvs-commit at gcc dot gnu.org
2021-09-20  6:51 ` rguenth at gcc dot gnu.org
2022-03-29 12:57 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).