public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi
@ 2014-02-03 11:19 rguenth at gcc dot gnu.org
  2014-02-03 11:23 ` [Bug tree-optimization/60042] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-02-03 11:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042

            Bug ID: 60042
           Summary: vectorizer still does too many dependence tests for
                    himeno:jacobi
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org

Created attachment 32026
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32026&action=edit
himeno bench

I still get

himenobmtxpa.c:296:9: note: disable versioning for alias - max number of
generated checks exceeded
himenobmtxpa.c:296:9: note: too long list of versioning for alias run-time
tests.

I have a patch to remove some false dependences but still

himenobmtxpa.c:296:9: note: improved number of alias checks from 31 to 21

and 21 is too much - 7 should suffice.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi
  2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org
@ 2014-02-03 11:23 ` rguenth at gcc dot gnu.org
  2014-02-03 13:51 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-02-03 11:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 32027
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32027&action=edit
patch to prune deps to scalar globals

My patch to prune dependences to scalar global vars.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi
  2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org
  2014-02-03 11:23 ` [Bug tree-optimization/60042] " rguenth at gcc dot gnu.org
@ 2014-02-03 13:51 ` rguenth at gcc dot gnu.org
  2014-02-04 13:24 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-02-03 13:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
With some more dumping I seee

himenobmtxpa.c:296:9: note: === vect_prune_runtime_alias_test_list ===
himenobmtxpa.c:296:9: note: merging ranges for *_205, *_324 and *_49, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_205, *_324 and *_192, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_168, *_324 and *_69, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_168, *_324 and *_154, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_265, *_324 and *_296, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_265, *_324 and *_89, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_174, *_324 and *_248, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_174, *_324 and *_161, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_211, *_324 and *_231, *_324
himenobmtxpa.c:296:9: note: merging ranges for *_211, *_324 and *_199, *_324
himenobmtxpa.c:296:9: note: improved number of alias checks from 31 to 21

and

Creating dr for *_205
analyze_innermost: success.
        base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1009
* 4)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1009
* 4)
        Access function 0: {0B, +, 4}_7
Creating dr for *_168
analyze_innermost: success.
        base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1023
* 4)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1023
* 4)
        Access function 0: {0B, +, 4}_7
Creating dr for *_265
analyze_innermost: success.
        base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1034
* 4)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1034
* 4)
        Access function 0: {0B, +, 4}_7
Creating dr for *_174
analyze_innermost: success.
        base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1063
* 4)
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1063
* 4)
        Access function 0: {0B, +, 4}_7
...

so the remaining DDRs against *_324 all look related.

  pretmp_1062 = pretmp_1020 + pretmp_1047;
  pretmp_1063 = _25 * pretmp_1062;

  pretmp_1033 = j_380 + pretmp_1020;
  pretmp_1034 = _25 * pretmp_1033;

  pretmp_1022 = pretmp_1020 + pretmp_1021;
  pretmp_1023 = _25 * pretmp_1022;

but SCEV doesn't expand stmts before the loop and thus doesn't see this.
It's obviously far from trivial to merge segments with symbolic start
addresses ... these are multi-dimensional accesses:

        for(k=1 ; k<kmax ; k++){
          s0= MR(a,0,i,j,k)*MR(p,0,i+1,j,  k)
            + MR(a,1,i,j,k)*MR(p,0,i,  j+1,k)
            + MR(a,2,i,j,k)*MR(p,0,i,  j,  k+1)
            + MR(b,0,i,j,k)
             *( MR(p,0,i+1,j+1,k) - MR(p,0,i+1,j-1,k)
              - MR(p,0,i-1,j+1,k) + MR(p,0,i-1,j-1,k) )
            + MR(b,1,i,j,k)
             *( MR(p,0,i,j+1,k+1) - MR(p,0,i,j-1,k+1)
              - MR(p,0,i,j+1,k-1) + MR(p,0,i,j-1,k-1) )
            + MR(b,2,i,j,k)
             *( MR(p,0,i+1,j,k+1) - MR(p,0,i-1,j,k+1)
              - MR(p,0,i+1,j,k-1) + MR(p,0,i-1,j,k-1) )
            + MR(c,0,i,j,k) * MR(p,0,i-1,j,  k)
            + MR(c,1,i,j,k) * MR(p,0,i,  j-1,k)
            + MR(c,2,i,j,k) * MR(p,0,i,  j,  k-1)
            + MR(wrk1,0,i,j,k);

          ss= (s0*MR(a,3,i,j,k) - MR(p,0,i,j,k))*MR(bnd,0,i,j,k);

          gosa+= ss*ss;
          MR(wrk2,0,i,j,k)= MR(p,0,i,j,k) + omega*ss;
        }

and we manage to merge the fastest varying dimension +-1 ones AFAIK,
but not for example the ones for MR(p,0,i+1,j+1,k) and MR(p,0,i+1,j-1,k).
Ideally we would be able to derive a single check for each array
(which would require analyzing the DRs in the outer loops as well to
gather info about the other dimensions).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi
  2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org
  2014-02-03 11:23 ` [Bug tree-optimization/60042] " rguenth at gcc dot gnu.org
  2014-02-03 13:51 ` rguenth at gcc dot gnu.org
@ 2014-02-04 13:24 ` rguenth at gcc dot gnu.org
  2014-02-04 13:34 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-02-04 13:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Bah, and with -fipa-pta -Ofast -fwhole-program we now _do_ see that there
isn't any aliasing but PRE messes up the loop and creates loop carried
dependencies ... :/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi
  2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2014-02-04 13:24 ` rguenth at gcc dot gnu.org
@ 2014-02-04 13:34 ` rguenth at gcc dot gnu.org
  2014-02-04 14:53 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-02-04 13:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #3)
> Bah, and with -fipa-pta -Ofast -fwhole-program we now _do_ see that there
> isn't any aliasing but PRE messes up the loop and creates loop carried
> dependencies ... :/

Ok, that's because PRE would be the one to make the array loads base
addresses simple induction variables (it performs invariant motion of the
address load from the global matrix struct).  Thus inhibit_phi_insertion
returns false.  Looks like ordering of PRE / LIM isn't too great when
considering such cases.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi
  2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2014-02-04 13:34 ` rguenth at gcc dot gnu.org
@ 2014-02-04 14:53 ` rguenth at gcc dot gnu.org
  2014-02-04 15:09 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-02-04 14:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2014-02-04
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 32038
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32038&action=edit
patch for the PRE issue

The attached patch delays inhibit_phi_insertion to a point where it can give
a more definitive answer (eliminate () time).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi
  2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2014-02-04 14:53 ` rguenth at gcc dot gnu.org
@ 2014-02-04 15:09 ` rguenth at gcc dot gnu.org
  2014-02-04 16:36 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-02-04 15:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #32038|0                           |1
        is obsolete|                            |

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 32039
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32039&action=edit
updated patch


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi
  2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2014-02-04 15:09 ` rguenth at gcc dot gnu.org
@ 2014-02-04 16:36 ` jakub at gcc dot gnu.org
  2014-02-05 14:35 ` rguenth at gcc dot gnu.org
  2014-04-14 13:57 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu.org @ 2014-02-04 16:36 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
As discussed on IRC the #c1 patch would need to be moved after the
if (STMT_VINFO_GATHER_P (stmtinfo_a)...) {}
Getting the version checks below the default limit would be really nice, the
benchmark numbers look much nicer when it is vectorized, even with
--param vect-max-version-for-alias-checks=32.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi
  2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2014-02-04 16:36 ` jakub at gcc dot gnu.org
@ 2014-02-05 14:35 ` rguenth at gcc dot gnu.org
  2014-04-14 13:57 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-02-05 14:35 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
PR23855 fixed we'd get for the analysis of the remaining DRs in the outermost
loop
for example for matrix A:

Creating dr for *_290
analyze_innermost: success.
        base_address: pretmp_1792 + (sizetype) ((long unsigned int) pretmp_1974
* 4)
        offset from base address: 0
        constant offset from base address: 4
        step: 4
        aligned to: 128
        base_object: *pretmp_1792 + (sizetype) ((long unsigned int)
(((pretmp_1831 + 1) * pretmp_1794 + 1) * pretmp_1796) * 4)
        Access function 0: {{{4B, +, (sizetype) ((long unsigned int)
(pretmp_1794 * pretmp_1796) * 4)}_2, +, (sizetype) ((long unsigned int)
pretmp_1796 * 4)}_6, +, 4}_7

Creating dr for *_81
analyze_innermost: success.
        base_address: pretmp_1792 + (sizetype) ((long unsigned int) pretmp_1915
* 4)
        offset from base address: 0
        constant offset from base address: 4
        step: 4
        aligned to: 128
        base_object: *pretmp_1792 + (sizetype) ((long unsigned int)
(((pretmp_1804 + 1) * pretmp_1794 + 1) * pretmp_1796) * 4)
        Access function 0: {{{4B, +, (sizetype) ((long unsigned int)
(pretmp_1794 * pretmp_1796) * 4)}_2, +, (sizetype) ((long unsigned int)
pretmp_1796 * 4)}_6, +, 4}_7

Creating dr for *_60
analyze_innermost: success.
        base_address: pretmp_1792 + (sizetype) ((long unsigned int) pretmp_1902
* 4)
        offset from base address: 0
        constant offset from base address: 4
        step: 4
        aligned to: 128
        base_object: *pretmp_1792 + (sizetype) ((long unsigned int)
(((pretmp_1801 + 1) * pretmp_1794 + 1) * pretmp_1796) * 4)
        Access function 0: {{{4B, +, (sizetype) ((long unsigned int)
(pretmp_1794 * pretmp_1796) * 4)}_2, +, (sizetype) ((long unsigned int)
pretmp_1796 * 4)}_6, +, 4}_7

which still requires hard work to actually combine as the base objects still
differ (but the access function is equal).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi
  2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2014-02-05 14:35 ` rguenth at gcc dot gnu.org
@ 2014-04-14 13:57 ` rguenth at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-04-14 13:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Author: rguenth
Date: Mon Apr 14 13:57:00 2014
New Revision: 209374

URL: http://gcc.gnu.org/viewcvs?rev=209374&root=gcc&view=rev
Log:
2014-04-14  Richard Biener  <rguenther@suse.de>

    PR tree-optimization/60042
    * tree-ssa-pre.c (inhibit_phi_insertion): Remove.
    (insert_into_preds_of_block): Do not prevent PHI insertion
    for REFERENCE exprs here ...
    (eliminate_dom_walker::before_dom_children): ... but prevent
    their use here under similar conditions when applied to the
    IL after PRE optimizations.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/tree-ssa-pre.c


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-04-14 13:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org
2014-02-03 11:23 ` [Bug tree-optimization/60042] " rguenth at gcc dot gnu.org
2014-02-03 13:51 ` rguenth at gcc dot gnu.org
2014-02-04 13:24 ` rguenth at gcc dot gnu.org
2014-02-04 13:34 ` rguenth at gcc dot gnu.org
2014-02-04 14:53 ` rguenth at gcc dot gnu.org
2014-02-04 15:09 ` rguenth at gcc dot gnu.org
2014-02-04 16:36 ` jakub at gcc dot gnu.org
2014-02-05 14:35 ` rguenth at gcc dot gnu.org
2014-04-14 13:57 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).