From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21664 invoked by alias); 19 Oct 2011 07:51:22 -0000 Received: (qmail 21651 invoked by uid 22791); 19 Oct 2011 07:51:21 -0000 X-SWARE-Spam-Status: No, hits=-2.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Wed, 19 Oct 2011 07:51:07 +0000 From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/50789] New: Gather vectorization Date: Wed, 19 Oct 2011 07:51:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-10/txt/msg01906.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789 Bug #: 50789 Summary: Gather vectorization Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned@gcc.gnu.org ReportedBy: jakub@gcc.gnu.org CC: hjl.tools@gmail.com, irar@gcc.gnu.org, kirill.yukhin@intel.com This is to track progress on vectorization using AVX2 v*gather* instructions. The instructions allow plain unconditional gather, e.g.: #define N 1024 float f[N]; int k[N]; float *l[N]; int **m[N]; float f1 (void) { int i; float g = 0.0; for (i = 0; i < N; i++) g += f[k[i]]; return g; } float f2 (float *p) { int i; float g = 0.0; for (i = 0; i < N; i++) g += p[k[i]]; return g; } float f3 (void) { int i; float g = 0.0; for (i = 0; i < N; i++) g += *l[i]; return g; } int f4 (void) { int i; int g = 0; for (i = 0; i < N; i++) g += **m[i]; return g; } should be able to vectorize all 4 loops. In f1/f2 it would use non-zero base (the vector would contain just indexes into some array, which vgather sign extends and adds to base), in f3/f4 it would use zero base - the vectors would be vectors of pointers (resp. uintptr_t). To vectorize the above I'm afraid we'd need to modify tree-data-ref.c as well as tree-vect-data-ref.c, because the memory accesses aren't affine and already dr_analyze_innermost gives up on those, doesn't fill in any of the DR_* stuff. Perhaps with some flag and when the base resp. offset has vdef in the same loop we could mark it somehow and at least fill in the other fields. It would probably make alias decisions (in tree-vect-data-ref.c?) harder. Any ideas? What is additionally possible is to conditionalize loads, either affine or not. So something like: for (i = 0; i < N; i++) { c = 6; if (a[i] > 24) c = b[i]; d[i] = c + e[i]; } for the affine conditional accesses where the vector could be just { 0, 1, 2, 3, ... } but the mask from the comparison.