public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* "unhandled use" in vectorizing a dot product?
@ 2009-11-30 21:25 Benjamin Redelings I
  2009-11-30 22:37 ` Tim Prince
       [not found] ` <28126_1259620656_nAUMbYpv003485_4B144917.6050304@aol.com>
  0 siblings, 2 replies; 10+ messages in thread
From: Benjamin Redelings I @ 2009-11-30 21:25 UTC (permalink / raw)
  To: gcc-help, IRAR

[-- Attachment #1: Type: text/plain, Size: 484 bytes --]

Hi,

     I noticed that (in gcc 4.5 as of Oct-18) the following code is not 
vectorized:

float sum=0;
int i;
for(i=0;i<16;i++)
   sum += f1[i]*f2[i];

The error is "unhandled use in statement"

However, the web page at 
http://gcc.gnu.org/projects/tree-ssa/vectorization.html says:

"Detection and vectorization of special idioms, such as dot-product and 
widening-summation: Incorporated into GCC 4.2."

Can you tell me if I am missing something?  Is the web page correct?

-BenRI

[-- Attachment #2: m3s.c --]
[-- Type: text/x-csrc, Size: 296 bytes --]

#define ALIGNED __attribute__((aligned(16)))

float* ALIGNED s1(int);
int s2(float);


int main(int argc, char* argv[])
{
  float* __restrict f1 ALIGNED = s1(0);
  float* __restrict f2 ALIGNED = s1(0);

  float sum=0;
  int i;
  for(i=0;i<16;i++)
    sum += f1[i]*f2[i];
  s2(sum);
  return 0;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "unhandled use" in vectorizing a dot product?
  2009-11-30 21:25 "unhandled use" in vectorizing a dot product? Benjamin Redelings I
@ 2009-11-30 22:37 ` Tim Prince
  2009-11-30 23:06   ` Brian Budge
       [not found] ` <28126_1259620656_nAUMbYpv003485_4B144917.6050304@aol.com>
  1 sibling, 1 reply; 10+ messages in thread
From: Tim Prince @ 2009-11-30 22:37 UTC (permalink / raw)
  To: Benjamin Redelings I; +Cc: gcc-help, IRAR

Benjamin Redelings I wrote:

>     I noticed that (in gcc 4.5 as of Oct-18) the following code is not 
> vectorized:
> 
> float sum=0;
> int i;
> for(i=0;i<16;i++)
>   sum += f1[i]*f2[i];
> 
> The error is "unhandled use in statement"
> 
> However, the web page at 
> http://gcc.gnu.org/projects/tree-ssa/vectorization.html says:
> 
> "Detection and vectorization of special idioms, such as dot-product and 
> widening-summation: Incorporated into GCC 4.2."
> 
> Can you tell me if I am missing something?  Is the web page correct?

I haven't seen vector sum or dot product reduction except with the use 
of -ffast-math.  At one time, it was said that -fassociative-math also 
should permit it.  It's more effective with -mtune=barcelona 
(particularly for CPUs introduced the last 2 years).
With those options, gcc/g++/gfortran are fairly good at dot product 
vectorization, both traditional code such as you show, and 
dot_product/inner_product.
In my opinion, it's unfortunate not having an option to enable this 
optimization independent of riskier ones.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "unhandled use" in vectorizing a dot product?
  2009-11-30 22:37 ` Tim Prince
@ 2009-11-30 23:06   ` Brian Budge
  0 siblings, 0 replies; 10+ messages in thread
From: Brian Budge @ 2009-11-30 23:06 UTC (permalink / raw)
  To: tprince; +Cc: Benjamin Redelings I, gcc-help, IRAR

I very much agree.  Some programs require correctness with respect to
inf and NaN, but would be quite happy with reordered arithmetic.

  Brian

On Mon, Nov 30, 2009 at 2:37 PM, Tim Prince <n8tm@aol.com> wrote:
> Benjamin Redelings I wrote:
>
>>    I noticed that (in gcc 4.5 as of Oct-18) the following code is not
>> vectorized:
>>
>> float sum=0;
>> int i;
>> for(i=0;i<16;i++)
>>  sum += f1[i]*f2[i];
>>
>> The error is "unhandled use in statement"
>>
>> However, the web page at
>> http://gcc.gnu.org/projects/tree-ssa/vectorization.html says:
>>
>> "Detection and vectorization of special idioms, such as dot-product and
>> widening-summation: Incorporated into GCC 4.2."
>>
>> Can you tell me if I am missing something?  Is the web page correct?
>
> I haven't seen vector sum or dot product reduction except with the use of
> -ffast-math.  At one time, it was said that -fassociative-math also should
> permit it.  It's more effective with -mtune=barcelona (particularly for CPUs
> introduced the last 2 years).
> With those options, gcc/g++/gfortran are fairly good at dot product
> vectorization, both traditional code such as you show, and
> dot_product/inner_product.
> In my opinion, it's unfortunate not having an option to enable this
> optimization independent of riskier ones.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "unhandled use" in vectorizing a dot product?
       [not found] ` <28126_1259620656_nAUMbYpv003485_4B144917.6050304@aol.com>
@ 2009-11-30 23:24   ` Benjamin Redelings
  2009-12-01  7:30     ` Ira Rosen
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Redelings @ 2009-11-30 23:24 UTC (permalink / raw)
  To: tprince; +Cc: Tim Prince, gcc-help, IRAR

On 11/30/2009 05:37 PM, Tim Prince wrote:
> Benjamin Redelings I wrote:
>
>>     I noticed that (in gcc 4.5 as of Oct-18) the following code is 
>> not vectorized:
>>
>> float sum=0;
>> int i;
>> for(i=0;i<16;i++)
>>   sum += f1[i]*f2[i];
>>
>> The error is "unhandled use in statement"
>>
>> However, the web page at 
>> http://gcc.gnu.org/projects/tree-ssa/vectorization.html says:
>>
>> "Detection and vectorization of special idioms, such as dot-product 
>> and widening-summation: Incorporated into GCC 4.2."
>>
>> Can you tell me if I am missing something?  Is the web page correct?
>
> I haven't seen vector sum or dot product reduction except with the use 
> of -ffast-math.  At one time, it was said that -fassociative-math also 
> should permit it.  It's more effective with -mtune=barcelona 
> (particularly for CPUs introduced the last 2 years).
> With those options, gcc/g++/gfortran are fairly good at dot product 
> vectorization, both traditional code such as you show, and 
> dot_product/inner_product.
> In my opinion, it's unfortunate not having an option to enable this 
> optimization independent of riskier ones.
Thank you!  I don't know how long it would have taken me to notice this.

Even better, I see that it can vectorize loops that marginalize 
simultaneously multiple three numbers as well.

for(int i=0;i<16;i++)
   sum += f1[i] * f2[i] * f3[i]?

Hmm... the point about -fassociative-math is a good point, presuming 
that SSE math handles NaNs and Infs.

Also, perhaps the documentation should explicitly say somewhere that 
vectorization can depend on flags like this.  "unhandled use in 
statement" certainly doesn't point the user to an idea of how to fix 
this :-P

-BenRI


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "unhandled use" in vectorizing a dot product?
  2009-11-30 23:24   ` Benjamin Redelings
@ 2009-12-01  7:30     ` Ira Rosen
  2009-12-01 16:07       ` Benjamin Redelings I
  0 siblings, 1 reply; 10+ messages in thread
From: Ira Rosen @ 2009-12-01  7:30 UTC (permalink / raw)
  To: Benjamin Redelings; +Cc: gcc-help, Tim Prince, tprince



Benjamin Redelings <benjamin_redelings@ncsu.edu> wrote on 01/12/2009
01:24:15:

> Also, perhaps the documentation should explicitly say somewhere that
> vectorization can depend on flags like this.  "unhandled use in
> statement" certainly doesn't point the user to an idea of how to fix
> this :-P

The vectorizer prints:

 "unsafe fp math optimization: sum_18 = D.2721_17 + sum_26;"

But you are right, the bottom line printing:

 "not vectorized: unsupported use in stmt."

doesn't help much.

Ira

>
> -BenRI
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "unhandled use" in vectorizing a dot product?
  2009-12-01  7:30     ` Ira Rosen
@ 2009-12-01 16:07       ` Benjamin Redelings I
  2009-12-01 19:24         ` Tim Prince
  2009-12-02 14:35         ` Ira Rosen
  0 siblings, 2 replies; 10+ messages in thread
From: Benjamin Redelings I @ 2009-12-01 16:07 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-help, Tim Prince, tprince

On 12/01/2009 02:28 AM, Ira Rosen wrote:
> Benjamin Redelings<benjamin_redelings@ncsu.edu>  wrote on 01/12/2009
> 01:24:15:
>    
>> Also, perhaps the documentation should explicitly say somewhere that
>> vectorization can depend on flags like this.  "unhandled use in
>> statement" certainly doesn't point the user to an idea of how to fix
>> this :-P
>>      
> The vectorizer prints:
>
>   "unsafe fp math optimization: sum_18 = D.2721_17 + sum_26;"
>
> But you are right, the bottom line printing:
>
>   "not vectorized: unsupported use in stmt."
>
> doesn't help much.
>
> Ira
>    
Hi Ira,

1.  That's actually quite helpful :-)  And thank you for all your work 
on this!  I can't wait to go make sure my actual code is autovectorized.

Anyway, I didn't see this because I didn't use 
-ftree-vectorizer-verbose=9.  Would it be possible to mention this at 
verbosity levels less than 9?  Ideally, level 2, which tells me which 
loops aren't vectorized without mentioning all the cost model parameters.

( BTW the gcc man pages indicate that 7 is the highest value for 
tree-vectorizer-verbose, although it seems that now 9 is the highest value.)

2. Interestingly, the following is recognized WITHOUT -ffast-math:

   for(i=0;i<argc;i++)
     f4[i] += f1[i]*f2[i]*f3[i];

If I change this to the following, then it needs -ffast-math:

   for(i=0;i<argc;i++)
     sum += f1[i]*f2[i]*f3[i];

This is essentially doing the first thing, plus also summing the f4[i].  
I guess that is the problem?

3. Finally, the following loop does not even receive a mention as being 
not vectorized (that I could find!)

  for(i=0;i<argc;i++)
     sum += d1[i]*d2[i]*d3[i]*d4[i];

Here d1, d2, d3, and d4 are double*.  However, the loop is recognized if 
they are float * OR of they are double* but there are only three of 
them.  I presume this is intended... can you explain why?

Thanks for all your help!

-BenRI




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "unhandled use" in vectorizing a dot product?
  2009-12-01 16:07       ` Benjamin Redelings I
@ 2009-12-01 19:24         ` Tim Prince
  2009-12-02 14:35         ` Ira Rosen
  1 sibling, 0 replies; 10+ messages in thread
From: Tim Prince @ 2009-12-01 19:24 UTC (permalink / raw)
  To: Benjamin Redelings I; +Cc: Ira Rosen, gcc-help, tprince

Benjamin Redelings I wrote:


> 
> 2. Interestingly, the following is recognized WITHOUT -ffast-math:
> 
>   for(i=0;i<argc;i++)
>     f4[i] += f1[i]*f2[i]*f3[i];
> 
That's not a reduction; re-association from strict C standard order 
isn't required to vectorize it.
icc makes a similar distinction, this should vectorize when icc 
-fp-model source is set (as well as not set), for example.
> If I change this to the following, then it needs -ffast-math:
> 
>   for(i=0;i<argc;i++)
>     sum += f1[i]*f2[i]*f3[i];
> 
> This is essentially doing the first thing, plus also summing the f4[i].  
> I guess that is the problem?
Yes, vectorization involves at least 4 parallel sums (for float data 
type), adding the partials at the end, with numerically different result 
from the non-vector case (often, but not always, slightly more 
accurate).  Also, possibly varying slightly with alignment, and possibly 
  differing according to whether -msse3 is set.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "unhandled use" in vectorizing a dot product?
  2009-12-01 16:07       ` Benjamin Redelings I
  2009-12-01 19:24         ` Tim Prince
@ 2009-12-02 14:35         ` Ira Rosen
  2009-12-02 16:59           ` Benjamin Redelings I
  1 sibling, 1 reply; 10+ messages in thread
From: Ira Rosen @ 2009-12-02 14:35 UTC (permalink / raw)
  To: Benjamin Redelings I; +Cc: gcc-help, Tim Prince, tprince



Benjamin Redelings I <benjamin_redelings@ncsu.edu> wrote on 01/12/2009
18:06:49:

> Hi Ira,
>
> 1.  That's actually quite helpful :-)  And thank you for all your work
> on this!  I can't wait to go make sure my actual code is autovectorized.
>
> Anyway, I didn't see this because I didn't use
> -ftree-vectorizer-verbose=9.  Would it be possible to mention this at
> verbosity levels less than 9?  Ideally, level 2, which tells me which
> loops aren't vectorized without mentioning all the cost model parameters.

The problem with this is that the vectorizer goes through all the
statements in the loop and checks if they are vectorizable in different
ways. (One of the possibilities, for example, is reduction). If one of the
possibilities fails, it doesn't mean that the statement/loop is not
vectorizable. Only if all of them fail, the vectorization fails. Therefore,
printing an error message for every vectorization possibility is not such a
good idea. But I'll try to see how we can improve the level 2 messages.

>
> ( BTW the gcc man pages indicate that 7 is the highest value for
> tree-vectorizer-verbose, although it seems that now 9 is the highest
value.)

Thanks, I'll fix this.

...
> 3. Finally, the following loop does not even receive a mention as being
> not vectorized (that I could find!)
>
>   for(i=0;i<argc;i++)
>      sum += d1[i]*d2[i]*d3[i]*d4[i];
>
> Here d1, d2, d3, and d4 are double*.  However, the loop is recognized if
> they are float * OR of they are double* but there are only three of
> them.  I presume this is intended... can you explain why?

No :). I tried to compile it and it looks normal. Could you please attach
the whole file and specify the exact command line? (The only reason I can
think of is that, somehow, the loop gets optimized out).

Ira

>
> Thanks for all your help!
>
> -BenRI
>
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "unhandled use" in vectorizing a dot product?
  2009-12-02 14:35         ` Ira Rosen
@ 2009-12-02 16:59           ` Benjamin Redelings I
  2009-12-03 10:32             ` Ira Rosen
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Redelings I @ 2009-12-02 16:59 UTC (permalink / raw)
  To: Ira Rosen; +Cc: gcc-help

[-- Attachment #1: Type: text/plain, Size: 2996 bytes --]

On 12/02/2009 07:41 AM, Ira Rosen wrote:
> Benjamin Redelings I<benjamin_redelings@ncsu.edu>  wrote on 01/12/2009
> 18:06:49:
>    
>> Hi Ira,
>>
>> 1.  That's actually quite helpful :-)  And thank you for all your work
>> on this!  I can't wait to go make sure my actual code is autovectorized.
>>
>> Anyway, I didn't see this because I didn't use
>> -ftree-vectorizer-verbose=9.  Would it be possible to mention this at
>> verbosity levels less than 9?  Ideally, level 2, which tells me which
>> loops aren't vectorized without mentioning all the cost model parameters.
>>      
> The problem with this is that the vectorizer goes through all the
> statements in the loop and checks if they are vectorizable in different
> ways. (One of the possibilities, for example, is reduction). If one of the
> possibilities fails, it doesn't mean that the statement/loop is not
> vectorizable. Only if all of them fail, the vectorization fails. Therefore,
> printing an error message for every vectorization possibility is not such a
> good idea. But I'll try to see how we can improve the level 2 messages.
>    
Great!  Improved level 2 messages will be extremely useful.
>> ( BTW the gcc man pages indicate that 7 is the highest value for
>> tree-vectorizer-verbose, although it seems that now 9 is the highest
>>      
> value.)
>
> Thanks, I'll fix this.
>    
Great!
> ...
>    
>> 3. Finally, the following loop does not even receive a mention as being
>> not vectorized (that I could find!)
>>
>>    for(i=0;i<argc;i++)
>>       sum += d1[i]*d2[i]*d3[i]*d4[i];
>>
>> Here d1, d2, d3, and d4 are double*.  However, the loop is recognized if
>> they are float * OR of they are double* but there are only three of
>> them.  I presume this is intended... can you explain why?
>>      
> No :). I tried to compile it and it looks normal. Could you please attach
> the whole file and specify the exact command line? (The only reason I can
> think of is that, somehow, the loop gets optimized out).
>
> Ira
>    
Umm.... you are right, it was optimized out :-P  Now that I correctly 
added a use on the sum, the loop is reported as vectorized.  I guess 
this raises another question about how to interpret the output of 
tree-vectorizer-verbose.   Is there a good place on the wiki to add the 
following information, to help newcomers get up to speed?

* -ffast-math is required to vectorize dot products and other sums.
* if the loop is optimized out, then there will be no report of its 
vectorizability.
* (?) this is the ONLY reason that there would be no such report.
* if a function is inlined into another function, then its 
vectorizability will be reported again, in that function.
* if all uses of a function are inlined, then there will be no 
"vectorized n loops in function" message.
* functions are (of course) not analyzed by the vectorizer in the order 
in which they are declared.

I guess this doesn't matter now, but here is the corrected file, for 
reference.

Thanks a lot!

-BenRI

[-- Attachment #2: m3s.c --]
[-- Type: text/x-csrc, Size: 1015 bytes --]

#define ALIGNED __attribute__((aligned(16)))

float* ALIGNED s1(int);
double* ALIGNED s3(int);

int s2(float);
int s4(double);

#define RESTRICT __restrict
// #define RESTRICT

int main(int argc, char* argv[])
{
  float* RESTRICT f1 ALIGNED = s1(0);
  float* RESTRICT f2 ALIGNED = s1(1);
  float* RESTRICT f3 ALIGNED = s1(2);
  float* RESTRICT f4 ALIGNED = s1(3);

  double* RESTRICT d1 ALIGNED = s3(0);
  double* RESTRICT d2 ALIGNED = s3(1);
  double* RESTRICT d3 ALIGNED = s3(2);
  double* RESTRICT d4 ALIGNED = s3(3);

  float sum=0;
  int i;
  for(i=0;i<argc;i++)
    sum += f1[i]*f2[i];
  s2(sum);

  sum = 0;
  for(i=0;i<argc;i++)
    f4[i] += f1[i]*f2[i]*f3[i];
  s2(sum);

  sum = 0;
  for(i=0;i<argc;i++)
    sum += f1[i]*f2[i]*f3[i]*f4[i];
  s2(sum);

  double sum2=0;
  for(i=0;i<argc;i++)
    sum2 += d1[i]*d2[i];
  s4(sum2);

  sum2 = 0;
  for(i=0;i<argc;i++)
    d4[i] += d1[i]*d2[i]*d3[i];
  s4(sum2);

  sum2 = 0;
  for(i=0;i<argc;i++)
    sum2 += d1[i]*d2[i]*d3[i]*d4[i];
  s4(sum2);
  return 0;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "unhandled use" in vectorizing a dot product?
  2009-12-02 16:59           ` Benjamin Redelings I
@ 2009-12-03 10:32             ` Ira Rosen
  0 siblings, 0 replies; 10+ messages in thread
From: Ira Rosen @ 2009-12-03 10:32 UTC (permalink / raw)
  To: Benjamin Redelings I; +Cc: gcc-help



Benjamin Redelings I <benjamin_redelings@ncsu.edu> wrote on 02/12/2009
16:35:45:

> Is there a good place on the wiki to add the
> following information, to help newcomers get up to speed?
>
> * -ffast-math is required to vectorize dot products and other sums.

We can add it here:
http://gcc.gnu.org/projects/tree-ssa/vectorization.html.

> * if the loop is optimized out, then there will be no report of its
> vectorizability.
> * (?) this is the ONLY reason that there would be no such report.
> * if a function is inlined into another function, then its
> vectorizability will be reported again, in that function.
> * if all uses of a function are inlined, then there will be no
> "vectorized n loops in function" message.
> * functions are (of course) not analyzed by the vectorizer in the order
> in which they are declared.

These are true for all optimizations.

Ira

>
> I guess this doesn't matter now, but here is the corrected file, for
> reference.
>
> Thanks a lot!
>
> -BenRI
> [attachment "m3s.c" deleted by Ira Rosen/Haifa/IBM]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-12-03  8:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-30 21:25 "unhandled use" in vectorizing a dot product? Benjamin Redelings I
2009-11-30 22:37 ` Tim Prince
2009-11-30 23:06   ` Brian Budge
     [not found] ` <28126_1259620656_nAUMbYpv003485_4B144917.6050304@aol.com>
2009-11-30 23:24   ` Benjamin Redelings
2009-12-01  7:30     ` Ira Rosen
2009-12-01 16:07       ` Benjamin Redelings I
2009-12-01 19:24         ` Tim Prince
2009-12-02 14:35         ` Ira Rosen
2009-12-02 16:59           ` Benjamin Redelings I
2009-12-03 10:32             ` Ira Rosen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).