From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 69495 invoked by alias); 7 May 2015 20:23:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 69485 invoked by uid 89); 7 May 2015 20:23:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pd0-f178.google.com Received: from mail-pd0-f178.google.com (HELO mail-pd0-f178.google.com) (209.85.192.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 07 May 2015 20:23:07 +0000 Received: by pdbqa5 with SMTP id qa5so50687891pdb.1 for ; Thu, 07 May 2015 13:23:05 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.66.250.106 with SMTP id zb10mr700279pac.36.1431030185694; Thu, 07 May 2015 13:23:05 -0700 (PDT) Received: by 10.70.7.226 with HTTP; Thu, 7 May 2015 13:23:05 -0700 (PDT) In-Reply-To: References: Date: Thu, 07 May 2015 20:23:00 -0000 Message-ID: Subject: Re: Fix PR48052: loop not vectorized if index is "unsigned int" From: Abderrazek Zaafrani To: Richard Biener Cc: GCC Patches , Sebastian Pop Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2015-05/txt/msg00590.txt.bz2 Richard, Agree that the code is handling a very special case but this special case is common enough and is limiting the vectorizer in a significant way. The special case is: loops with unsigned index, non-constant start value, and step 1. We have a code for a matrix multiply =E2=80=93 loo= ps blocked by hand - taken from an industry benchmark that is not being vectorized because of the overflow issue. Note that if we relax the step 1 assumption, then we will probably not be able to prove non-overflow and note also that the general case such as constant start value is already working fine. I am not too sure about the incorrectness mentioned below for the cases we are handling. loop->nb_iterations holds a symbolic expression for the number of iterations (our special case falls into the symbolic expression). Based on several loops that I experimented with and that fall under our limited scope, we have either this symbolic expression holding the exact number of iterations for the loop and without overflow or the scev_not_known flag is set to true. May be you can share an example in case you have an example in mind. The suggestion about improving niter analysis and improving iv->no_overflow flag and moving what we are trying to do here into that section with the possibility of using existing information is good and we may look into it. Abderrazek On Wed, May 6, 2015 at 6:02 AM, Richard Biener wrote: > On Mon, May 4, 2015 at 9:47 PM, Abderrazek Zaafrani > wrote: >> This is an old thread and we are still running into similar issues: >> Code is not being vectorized on 64-bit target due to scev not being >> able to optimally analyze overflow condition. >> >> While the original test case shown here seems to work now, it does not >> work if the start value is not a constant and the loop index variable >> is of unsigned type: Ex >> >> void loop2( double const * __restrict__ x_in, double * __restrict__ >> x_out, double const * __restrict__ c, unsigned int N, unsigned int >> start) { >> for(unsigned int i=3Dstart; i!=3DN; ++i) >> x_out[i] =3D c[i]*x_in[i]; >> } >> >> Here is our unit test: >> >> int foo(int* A, int* B, unsigned start, unsigned B) >> { >> int s; >> for (unsigned k =3D start; k > s +=3D A[k] * B[k]; >> return s; >> } >> >> Our unit test case is extracted from a matrix multiply of a >> two-dimensional array and all loops are blocked by hand by a factor of >> B. Even though a bit modified, above loop corresponds to the innermost >> loop of the blocked matrix multiply. >> >> We worked on patch to solve the problem (see attachment.) >> The attached patch passed bootstrap and make check on x86_64-linux. >> Ok for trunk? > > Apart from coding style / API issues the case you handle is very special > (IVs with step 1 only?!) I believe it is also wrong - the assumption that > if there is a symbolic or constant expression for the number of iterations > a BIV will not wrap is not true. niter analysis can very well compute > the number of iterations for a loop with wrapping IVs. For your unit test > this only works because of the special-casing of step 1 IVs. > > Technically it might be more interesting to compute wrapping of IVs > during niter analysis in some more generic way (we have iv->no_overflow > computed by simple_iv, but that is rather not useful here). > > Richard. > >> Thanks, >> Abderrazek Zaafrani