From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-397152-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 69495 invoked by alias); 7 May 2015 20:23:10 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 69485 invoked by uid 89); 7 May 2015 20:23:09 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-pd0-f178.google.com
Received: from mail-pd0-f178.google.com (HELO mail-pd0-f178.google.com) (209.85.192.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 07 May 2015 20:23:07 +0000
Received: by pdbqa5 with SMTP id qa5so50687891pdb.1        for <gcc-patches@gcc.gnu.org>; Thu, 07 May 2015 13:23:05 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.66.250.106 with SMTP id zb10mr700279pac.36.1431030185694; Thu, 07 May 2015 13:23:05 -0700 (PDT)
Received: by 10.70.7.226 with HTTP; Thu, 7 May 2015 13:23:05 -0700 (PDT)
In-Reply-To: <CAFiYyc1Z1Xi4xmEkfKm0bTFP7aq1jFiMgROEUtX28P8YT0+MHQ@mail.gmail.com>
References: <CAGrkkCATyT28OgKzXpSbAY=5=NTZKpp1p60wA8BBdYohgjCY-w@mail.gmail.com>	<CAFiYyc1Z1Xi4xmEkfKm0bTFP7aq1jFiMgROEUtX28P8YT0+MHQ@mail.gmail.com>
Date: Thu, 07 May 2015 20:23:00 -0000
Message-ID: <CAGrkkCAfuG4drNsXCTEwiuztvx7KyrcxQCtPbMVmf3v8FTowCA@mail.gmail.com>
Subject: Re: Fix PR48052: loop not vectorized if index is "unsigned int"
From: Abderrazek Zaafrani <az.zaafrani@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>, Sebastian Pop <sebpop@gmail.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
X-SW-Source: 2015-05/txt/msg00590.txt.bz2

Richard,

Agree that the code is handling a very special case but this special
case is common enough and is limiting the vectorizer in a significant
way. The special case is: loops with unsigned index, non-constant
start value, and step 1. We have a code for a matrix multiply =E2=80=93 loo=
ps
blocked by hand -  taken from an industry benchmark that is not being
vectorized because of the overflow issue. Note that if we relax the
step 1 assumption, then we will probably not be able to prove
non-overflow and note also that the general case such as constant
start value is already working fine.

I am not too sure about the incorrectness mentioned below for the
cases we are handling. loop->nb_iterations holds a symbolic expression
for the number of iterations (our special case falls into the symbolic
expression). Based on several loops that I experimented with and that
fall under our limited scope, we have either this symbolic expression
holding the exact number of iterations for the loop and without
overflow or the scev_not_known flag is set to true. May be you can
share an example in case you have an example in mind.

The suggestion about improving niter analysis and improving
iv->no_overflow flag and moving what we are trying to do here into
that section with the possibility of using existing information is
good and we may look into it.

Abderrazek

On Wed, May 6, 2015 at 6:02 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, May 4, 2015 at 9:47 PM, Abderrazek Zaafrani
> <az.zaafrani@gmail.com> wrote:
>> This is an old thread and we are still running into similar issues:
>> Code is not being vectorized on 64-bit target due to scev not being
>> able to optimally analyze overflow condition.
>>
>> While the original test case shown here seems to work now, it does not
>> work if the start value is not a constant and the loop index variable
>> is of unsigned type: Ex
>>
>> void loop2( double const * __restrict__ x_in, double * __restrict__
>> x_out, double const * __restrict__ c, unsigned int N, unsigned int
>> start) {
>>  for(unsigned int i=3Dstart; i!=3DN; ++i)
>>    x_out[i] =3D c[i]*x_in[i];
>> }
>>
>> Here is our unit test:
>>
>> int foo(int* A, int* B, unsigned start, unsigned B)
>> {
>>   int s;
>>   for (unsigned k =3D start; k <start+B; k++)
>>     s +=3D A[k] * B[k];
>>   return s;
>> }
>>
>> Our unit test case is extracted from a matrix multiply of a
>> two-dimensional array and all loops are blocked by hand by a factor of
>> B. Even though a bit modified, above loop corresponds to the innermost
>> loop of the blocked matrix multiply.
>>
>> We worked on patch to solve the problem (see attachment.)
>> The attached patch passed bootstrap and make check on x86_64-linux.
>> Ok for trunk?
>
> Apart from coding style / API issues the case you handle is very special
> (IVs with step 1 only?!) I believe it is also wrong - the assumption that
> if there is a symbolic or constant expression for the number of iterations
> a BIV will not wrap is not true.  niter analysis can very well compute
> the number of iterations for a loop with wrapping IVs.  For your unit test
> this only works because of the special-casing of step 1 IVs.
>
> Technically it might be more interesting to compute wrapping of IVs
> during niter analysis in some more generic way (we have iv->no_overflow
> computed by simple_iv, but that is rather not useful here).
>
> Richard.
>
>> Thanks,
>> Abderrazek Zaafrani