From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20096 invoked by alias); 20 Sep 2013 10:07:36 -0000 Mailing-List: contact gcc-help-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-help-owner@gcc.gnu.org Received: (qmail 20082 invoked by uid 89); 20 Sep 2013 10:07:35 -0000 Received: from outpost1.zedat.fu-berlin.de (HELO outpost1.zedat.fu-berlin.de) (130.133.4.66) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 20 Sep 2013 10:07:35 +0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: outpost1.zedat.fu-berlin.de Received: from relay1.zedat.fu-berlin.de ([130.133.4.67]) by outpost1.zedat.fu-berlin.de (Exim 4.80.1) with esmtp (envelope-from ) id <1VMxcX-003R10-Rp>; Fri, 20 Sep 2013 12:07:29 +0200 Received: from mx.physik.fu-berlin.de ([160.45.64.218]) by relay1.zedat.fu-berlin.de (Exim 4.80.1) with esmtp (envelope-from ) id <1VMxcX-000WB6-Pf>; Fri, 20 Sep 2013 12:07:29 +0200 Received: from squeeze64.physik.fu-berlin.de ([160.45.66.239] helo=login.physik.fu-berlin.de) by mx.physik.fu-berlin.de with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.80) (envelope-from ) id 1VMxcV-0004np-NN; Fri, 20 Sep 2013 12:07:27 +0200 Received: from tburnus by login.physik.fu-berlin.de with local (Exim 4.72 #1 (Debian)) id 1VMxcV-0007t6-M6; Fri, 20 Sep 2013 12:07:27 +0200 Date: Fri, 20 Sep 2013 10:07:00 -0000 From: Tobias Burnus To: gcc-help@gcc.gnu.org Subject: (gcc/)g++ and __restrict Message-ID: <20130920100723.GA24605@physik.fu-berlin.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-SW-Source: 2013-09/txt/msg00130.txt.bz2 Hi all, I was wondering how to convey to GCC that two pointers do not alias. That works nicely in the argument list, void foo(mytype *__restrict__ arg) However, it does not seem to work for C++'s member variables. Even if one has on the class the declaration with __restrict. Neither does casting to a restrict pointer work. (The issue of example 1 also applies C99.) In example 1, one gets: test.cc:17:3: note: loop versioned for vectorization because of possible aliasing That's unchanged when one uncomments the casts. However, when one #defines RESTRICT as __restrict, the loop versioning disappears. In example 2, have the same problem: I get loop versioning and neither the __restrict__ in the class definition nor the restrict casting has any effect whatsoever. Is this to be expected? And, if so, how can one otherwise convay this information? Tobias PS: For the big code, using a static array instead of a allocatable pointer gives a speed-up of up to 40% - and I am highly suspicious that this is mainly due to alias analysis. PPS: For loops, using #pragma ivdep should help (and does with Intel; for GCC it's still on my to-do list). And also the forced vectorization with #pragma simd (Cilk+/OpenMPv4) should help. Still, __restrict__ should go beyond. (Using Fortran's allocatables is also a solution ;-) Example 1: #define ASSUME_ALIGNED(lvalueptr, align) \ lvalueptr = \ ( __typeof(lvalueptr))( \ __builtin_assume_aligned(lvalueptr, align)) //#define RESTRICT __restrict__ #define RESTRICT typedef double mytype; void test(int size, mytype *RESTRICT a, mytype *RESTRICT b, mytype *RESTRICT c) { ASSUME_ALIGNED(a, 64); ASSUME_ALIGNED(b, 64); ASSUME_ALIGNED(c, 64); // a = (mytype* __restrict__) a; // b = (mytype* __restrict__) b; // c = (mytype* __restrict__) c; for (int i = 0; i < size; ++i) a[i] = b[i] + c[i]; } Example 2: #define ASSUME_ALIGNED(lvalueptr, align) \ lvalueptr = \ ( __typeof(lvalueptr))( \ __builtin_assume_aligned(lvalueptr, align)) //#define RESTRICT __restrict__ #define RESTRICT typedef double mytype; void test(int size, mytype *RESTRICT a, mytype *RESTRICT b, mytype *RESTRICT c) { ASSUME_ALIGNED(a, 64); ASSUME_ALIGNED(b, 64); ASSUME_ALIGNED(c, 64); a = (mytype* __restrict__) a; b = (mytype* __restrict__) b; c = (mytype* __restrict__) c; for (int i = 0; i < size; ++i) a[i] = b[i] + c[i]; }