public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi @ 2014-02-03 11:19 rguenth at gcc dot gnu.org 2014-02-03 11:23 ` [Bug tree-optimization/60042] " rguenth at gcc dot gnu.org ` (8 more replies) 0 siblings, 9 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2014-02-03 11:19 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042 Bug ID: 60042 Summary: vectorizer still does too many dependence tests for himeno:jacobi Product: gcc Version: 4.9.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Created attachment 32026 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32026&action=edit himeno bench I still get himenobmtxpa.c:296:9: note: disable versioning for alias - max number of generated checks exceeded himenobmtxpa.c:296:9: note: too long list of versioning for alias run-time tests. I have a patch to remove some false dependences but still himenobmtxpa.c:296:9: note: improved number of alias checks from 31 to 21 and 21 is too much - 7 should suffice. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi 2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org @ 2014-02-03 11:23 ` rguenth at gcc dot gnu.org 2014-02-03 13:51 ` rguenth at gcc dot gnu.org ` (7 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2014-02-03 11:23 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 32027 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32027&action=edit patch to prune deps to scalar globals My patch to prune dependences to scalar global vars. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi 2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org 2014-02-03 11:23 ` [Bug tree-optimization/60042] " rguenth at gcc dot gnu.org @ 2014-02-03 13:51 ` rguenth at gcc dot gnu.org 2014-02-04 13:24 ` rguenth at gcc dot gnu.org ` (6 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2014-02-03 13:51 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- With some more dumping I seee himenobmtxpa.c:296:9: note: === vect_prune_runtime_alias_test_list === himenobmtxpa.c:296:9: note: merging ranges for *_205, *_324 and *_49, *_324 himenobmtxpa.c:296:9: note: merging ranges for *_205, *_324 and *_192, *_324 himenobmtxpa.c:296:9: note: merging ranges for *_168, *_324 and *_69, *_324 himenobmtxpa.c:296:9: note: merging ranges for *_168, *_324 and *_154, *_324 himenobmtxpa.c:296:9: note: merging ranges for *_265, *_324 and *_296, *_324 himenobmtxpa.c:296:9: note: merging ranges for *_265, *_324 and *_89, *_324 himenobmtxpa.c:296:9: note: merging ranges for *_174, *_324 and *_248, *_324 himenobmtxpa.c:296:9: note: merging ranges for *_174, *_324 and *_161, *_324 himenobmtxpa.c:296:9: note: merging ranges for *_211, *_324 and *_231, *_324 himenobmtxpa.c:296:9: note: merging ranges for *_211, *_324 and *_199, *_324 himenobmtxpa.c:296:9: note: improved number of alias checks from 31 to 21 and Creating dr for *_205 analyze_innermost: success. base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1009 * 4) offset from base address: 0 constant offset from base address: 0 step: 4 aligned to: 128 base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1009 * 4) Access function 0: {0B, +, 4}_7 Creating dr for *_168 analyze_innermost: success. base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1023 * 4) offset from base address: 0 constant offset from base address: 0 step: 4 aligned to: 128 base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1023 * 4) Access function 0: {0B, +, 4}_7 Creating dr for *_265 analyze_innermost: success. base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1034 * 4) offset from base address: 0 constant offset from base address: 0 step: 4 aligned to: 128 base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1034 * 4) Access function 0: {0B, +, 4}_7 Creating dr for *_174 analyze_innermost: success. base_address: pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1063 * 4) offset from base address: 0 constant offset from base address: 0 step: 4 aligned to: 128 base_object: *pretmp_1004 + (sizetype) ((long unsigned int) pretmp_1063 * 4) Access function 0: {0B, +, 4}_7 ... so the remaining DDRs against *_324 all look related. pretmp_1062 = pretmp_1020 + pretmp_1047; pretmp_1063 = _25 * pretmp_1062; pretmp_1033 = j_380 + pretmp_1020; pretmp_1034 = _25 * pretmp_1033; pretmp_1022 = pretmp_1020 + pretmp_1021; pretmp_1023 = _25 * pretmp_1022; but SCEV doesn't expand stmts before the loop and thus doesn't see this. It's obviously far from trivial to merge segments with symbolic start addresses ... these are multi-dimensional accesses: for(k=1 ; k<kmax ; k++){ s0= MR(a,0,i,j,k)*MR(p,0,i+1,j, k) + MR(a,1,i,j,k)*MR(p,0,i, j+1,k) + MR(a,2,i,j,k)*MR(p,0,i, j, k+1) + MR(b,0,i,j,k) *( MR(p,0,i+1,j+1,k) - MR(p,0,i+1,j-1,k) - MR(p,0,i-1,j+1,k) + MR(p,0,i-1,j-1,k) ) + MR(b,1,i,j,k) *( MR(p,0,i,j+1,k+1) - MR(p,0,i,j-1,k+1) - MR(p,0,i,j+1,k-1) + MR(p,0,i,j-1,k-1) ) + MR(b,2,i,j,k) *( MR(p,0,i+1,j,k+1) - MR(p,0,i-1,j,k+1) - MR(p,0,i+1,j,k-1) + MR(p,0,i-1,j,k-1) ) + MR(c,0,i,j,k) * MR(p,0,i-1,j, k) + MR(c,1,i,j,k) * MR(p,0,i, j-1,k) + MR(c,2,i,j,k) * MR(p,0,i, j, k-1) + MR(wrk1,0,i,j,k); ss= (s0*MR(a,3,i,j,k) - MR(p,0,i,j,k))*MR(bnd,0,i,j,k); gosa+= ss*ss; MR(wrk2,0,i,j,k)= MR(p,0,i,j,k) + omega*ss; } and we manage to merge the fastest varying dimension +-1 ones AFAIK, but not for example the ones for MR(p,0,i+1,j+1,k) and MR(p,0,i+1,j-1,k). Ideally we would be able to derive a single check for each array (which would require analyzing the DRs in the outer loops as well to gather info about the other dimensions). ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi 2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org 2014-02-03 11:23 ` [Bug tree-optimization/60042] " rguenth at gcc dot gnu.org 2014-02-03 13:51 ` rguenth at gcc dot gnu.org @ 2014-02-04 13:24 ` rguenth at gcc dot gnu.org 2014-02-04 13:34 ` rguenth at gcc dot gnu.org ` (5 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2014-02-04 13:24 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042 --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- Bah, and with -fipa-pta -Ofast -fwhole-program we now _do_ see that there isn't any aliasing but PRE messes up the loop and creates loop carried dependencies ... :/ ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi 2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org ` (2 preceding siblings ...) 2014-02-04 13:24 ` rguenth at gcc dot gnu.org @ 2014-02-04 13:34 ` rguenth at gcc dot gnu.org 2014-02-04 14:53 ` rguenth at gcc dot gnu.org ` (4 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2014-02-04 13:34 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042 --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Richard Biener from comment #3) > Bah, and with -fipa-pta -Ofast -fwhole-program we now _do_ see that there > isn't any aliasing but PRE messes up the loop and creates loop carried > dependencies ... :/ Ok, that's because PRE would be the one to make the array loads base addresses simple induction variables (it performs invariant motion of the address load from the global matrix struct). Thus inhibit_phi_insertion returns false. Looks like ordering of PRE / LIM isn't too great when considering such cases. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi 2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org ` (3 preceding siblings ...) 2014-02-04 13:34 ` rguenth at gcc dot gnu.org @ 2014-02-04 14:53 ` rguenth at gcc dot gnu.org 2014-02-04 15:09 ` rguenth at gcc dot gnu.org ` (3 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2014-02-04 14:53 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2014-02-04 Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 32038 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32038&action=edit patch for the PRE issue The attached patch delays inhibit_phi_insertion to a point where it can give a more definitive answer (eliminate () time). ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi 2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org ` (4 preceding siblings ...) 2014-02-04 14:53 ` rguenth at gcc dot gnu.org @ 2014-02-04 15:09 ` rguenth at gcc dot gnu.org 2014-02-04 16:36 ` jakub at gcc dot gnu.org ` (2 subsequent siblings) 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2014-02-04 15:09 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #32038|0 |1 is obsolete| | --- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 32039 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32039&action=edit updated patch ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi 2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org ` (5 preceding siblings ...) 2014-02-04 15:09 ` rguenth at gcc dot gnu.org @ 2014-02-04 16:36 ` jakub at gcc dot gnu.org 2014-02-05 14:35 ` rguenth at gcc dot gnu.org 2014-04-14 13:57 ` rguenth at gcc dot gnu.org 8 siblings, 0 replies; 10+ messages in thread From: jakub at gcc dot gnu.org @ 2014-02-04 16:36 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042 Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> --- As discussed on IRC the #c1 patch would need to be moved after the if (STMT_VINFO_GATHER_P (stmtinfo_a)...) {} Getting the version checks below the default limit would be really nice, the benchmark numbers look much nicer when it is vectorized, even with --param vect-max-version-for-alias-checks=32. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi 2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org ` (6 preceding siblings ...) 2014-02-04 16:36 ` jakub at gcc dot gnu.org @ 2014-02-05 14:35 ` rguenth at gcc dot gnu.org 2014-04-14 13:57 ` rguenth at gcc dot gnu.org 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2014-02-05 14:35 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042 --- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> --- PR23855 fixed we'd get for the analysis of the remaining DRs in the outermost loop for example for matrix A: Creating dr for *_290 analyze_innermost: success. base_address: pretmp_1792 + (sizetype) ((long unsigned int) pretmp_1974 * 4) offset from base address: 0 constant offset from base address: 4 step: 4 aligned to: 128 base_object: *pretmp_1792 + (sizetype) ((long unsigned int) (((pretmp_1831 + 1) * pretmp_1794 + 1) * pretmp_1796) * 4) Access function 0: {{{4B, +, (sizetype) ((long unsigned int) (pretmp_1794 * pretmp_1796) * 4)}_2, +, (sizetype) ((long unsigned int) pretmp_1796 * 4)}_6, +, 4}_7 Creating dr for *_81 analyze_innermost: success. base_address: pretmp_1792 + (sizetype) ((long unsigned int) pretmp_1915 * 4) offset from base address: 0 constant offset from base address: 4 step: 4 aligned to: 128 base_object: *pretmp_1792 + (sizetype) ((long unsigned int) (((pretmp_1804 + 1) * pretmp_1794 + 1) * pretmp_1796) * 4) Access function 0: {{{4B, +, (sizetype) ((long unsigned int) (pretmp_1794 * pretmp_1796) * 4)}_2, +, (sizetype) ((long unsigned int) pretmp_1796 * 4)}_6, +, 4}_7 Creating dr for *_60 analyze_innermost: success. base_address: pretmp_1792 + (sizetype) ((long unsigned int) pretmp_1902 * 4) offset from base address: 0 constant offset from base address: 4 step: 4 aligned to: 128 base_object: *pretmp_1792 + (sizetype) ((long unsigned int) (((pretmp_1801 + 1) * pretmp_1794 + 1) * pretmp_1796) * 4) Access function 0: {{{4B, +, (sizetype) ((long unsigned int) (pretmp_1794 * pretmp_1796) * 4)}_2, +, (sizetype) ((long unsigned int) pretmp_1796 * 4)}_6, +, 4}_7 which still requires hard work to actually combine as the base objects still differ (but the access function is equal). ^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/60042] vectorizer still does too many dependence tests for himeno:jacobi 2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org ` (7 preceding siblings ...) 2014-02-05 14:35 ` rguenth at gcc dot gnu.org @ 2014-04-14 13:57 ` rguenth at gcc dot gnu.org 8 siblings, 0 replies; 10+ messages in thread From: rguenth at gcc dot gnu.org @ 2014-04-14 13:57 UTC (permalink / raw) To: gcc-bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60042 --- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- Author: rguenth Date: Mon Apr 14 13:57:00 2014 New Revision: 209374 URL: http://gcc.gnu.org/viewcvs?rev=209374&root=gcc&view=rev Log: 2014-04-14 Richard Biener <rguenther@suse.de> PR tree-optimization/60042 * tree-ssa-pre.c (inhibit_phi_insertion): Remove. (insert_into_preds_of_block): Do not prevent PHI insertion for REFERENCE exprs here ... (eliminate_dom_walker::before_dom_children): ... but prevent their use here under similar conditions when applied to the IL after PRE optimizations. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-ssa-pre.c ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-04-14 13:57 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-02-03 11:19 [Bug tree-optimization/60042] New: vectorizer still does too many dependence tests for himeno:jacobi rguenth at gcc dot gnu.org 2014-02-03 11:23 ` [Bug tree-optimization/60042] " rguenth at gcc dot gnu.org 2014-02-03 13:51 ` rguenth at gcc dot gnu.org 2014-02-04 13:24 ` rguenth at gcc dot gnu.org 2014-02-04 13:34 ` rguenth at gcc dot gnu.org 2014-02-04 14:53 ` rguenth at gcc dot gnu.org 2014-02-04 15:09 ` rguenth at gcc dot gnu.org 2014-02-04 16:36 ` jakub at gcc dot gnu.org 2014-02-05 14:35 ` rguenth at gcc dot gnu.org 2014-04-14 13:57 ` rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).