public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/35117]  New: Vectorization on power PC
@ 2008-02-07  8:17 eyal at geomage dot com
  2008-02-07 10:30 ` [Bug c++/35117] " rguenth at gcc dot gnu dot org
                   ` (33 more replies)
  0 siblings, 34 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-07  8:17 UTC (permalink / raw)
  To: gcc-bugs

Hello,
  I am unable to see the expected performance gain using vectorizatio on
powerPC using Linux Suse.
  I've prepared a simple test and compiled it once with vectorization and once
without the vectorization flags. I'd appriciate if someone could point me as to
what Im doing wrong here.Bellow are the results of the test runs:
   time ./TestNoVec 92200 8 89720 1000
   real    0m23.549s

   time ./TestVec 92200 8 89720 1000
   real    0m22.845s

Here is the code:
#include <iostream>
#include <stdio.h>
#include <stdlib.h>

typedef float ARRTYPE;
int main ( int argc, char *argv[] )
{
        int m_nSamples = atoi( argv[1] );
        int itBegin = atoi( argv[2] );
        int itEnd = atoi( argv[3] );
        int iSizeMain = atoi( argv[ 4 ] );
        ARRTYPE *pSum1 = new ARRTYPE[ 100000 ];
        ARRTYPE *pSum = new ARRTYPE[ 100000 ];
        for ( int it = 0; it < m_nSamples; it++ )
        {
                pSum[ it ] = it / itBegin;
                pSum1[ it ] = itBegin / ( it + 1 );
        }
        ARRTYPE *pVec1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *m_nSamples);
        ARRTYPE *pVec2 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *m_nSamples);
        for ( int i = 0, j = 0; i < m_nSamples - 5; i++ )
        {
            for( int it = itBegin; it < itEnd; it++ )
                pVec1[ it ] += pSum[ it ] + pSum1[ it ];        
        }
        free( pVec1 );
        free( pVec2 );
}

Compilation flag for No vectorization:
gcc  -DTIXML_USE_STL -I /home/build/build -I /home/build/build -I. -I
/usr/local/include -I /usr/include -O3 -fomit-frame-pointer -mtune=powerpc
-falign-functions=16 -fprefetch-loop-arrays -fpeel-loops -funswitch-loops 
-fPIC -mcpu=powerpc  -m64 -fargument-noalias -funroll-loops
-ftree-vectorizer-verbose=7 -fdump-tree-vect-details  -c -o Test.o Test.cpp
gcc -lpthread -lz -lm -lstdc++ -DTIXML_USE_STL -I /home/build/build -I
/home/build/build -I. -I /usr/local/include -I /usr/include -O3
-fomit-frame-pointer -mtune=powerpc -falign-functions=16 -fprefetch-loop-arrays
-fpeel-loops -funswitch-loops  -fPIC -mcpu=powerpc  -m64 -fargument-noalias
-funroll-loops -ftree-vectorizer-verbose=7 -fdump-tree-vect-details
-L/usr/local/lib64 -DTIXML_USE_STL -pthread -L. -L /home/build/build/lib64 -L
/home/build/build/lib64 -L /usr/lib64 -L /lib64 -L /opt/gnome/lib64 -o
TestNoVec Test.o

Compilation of vectorized code:
gcc  -DTIXML_USE_STL -I /home/build/build -I /home/build/build -I. -I
/usr/local/include -I /usr/include -O3 -fomit-frame-pointer -mtune=powerpc
-falign-functions=16 -fprefetch-loop-arrays -fpeel-loops -funswitch-loops
-ftree-vectorize -fPIC -mcpu=powerpc -maltivec -mabi=altivec -m64
-fargument-noalias -funroll-loops -ftree-vectorizer-verbose=7
-fdump-tree-vect-details  -c -o Test.o Test.cpp
gcc -lpthread -lz -lm -lstdc++ -DTIXML_USE_STL -I /home/build/build -I
/home/build/build -I. -I /usr/local/include -I /usr/include -O3
-fomit-frame-pointer -mtune=powerpc -falign-functions=16 -fprefetch-loop-arrays
-fpeel-loops -funswitch-loops -ftree-vectorize -fPIC -mcpu=powerpc -maltivec
-mabi=altivec -m64 -fargument-noalias -funroll-loops
-ftree-vectorizer-verbose=7 -fdump-tree-vect-details -L/usr/local/lib64
-DTIXML_USE_STL -pthread -L. -L /home/build/build/lib64 -L
/home/build/build/lib64 -L /usr/lib64 -L /lib64 -L /opt/gnome/lib64 -o TestVec
Test.o


-- 
           Summary: Vectorization on power PC
           Product: gcc
           Version: 4.3.0
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: c++
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: eyal at geomage dot com
 GCC build triplet: gcc (GCC) 4.3.0 20071124 (experimental)
  GCC host triplet: PowerPC
GCC target triplet: PowerPC


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
@ 2008-02-07 10:30 ` rguenth at gcc dot gnu dot org
  2008-02-07 10:37 ` eyal at geomage dot com
                   ` (32 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2008-02-07 10:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2008-02-07 10:29 -------
The testcase looks completely memory bound.  Does the compiler tell you it
does vectorization at all?  Have you tried without -fprefetch-loop-arrays
(with todays HW prefetchers and the simple access patterns it's probably not
a win here).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
  2008-02-07 10:30 ` [Bug c++/35117] " rguenth at gcc dot gnu dot org
@ 2008-02-07 10:37 ` eyal at geomage dot com
  2008-02-07 10:38 ` pinskia at gcc dot gnu dot org
                   ` (31 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-07 10:37 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from eyal at geomage dot com  2008-02-07 10:36 -------
Yes the loop is vectorized. What do you mean by memory bound? dont you think
that vectorization can help here? I see around 20% performance gain in the real
application.

Bellow is the compiler output:
Eyal.cpp:34: note: dependence distance  = 0.
Eyal.cpp:34: note: accesses have the same alignment.
Eyal.cpp:34: note: dependence distance modulo vf == 0 between *D.22353_81 and
*D.22353_81
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22353_81 and *D.22365_101
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22353_81 and
*D.22365_101
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22355_85 and *D.22353_81
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22355_85 and
*D.22353_81
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22355_85 and *D.22365_101
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22355_85 and
*D.22365_101
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22361_92 and *D.22353_81
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22361_92 and
*D.22353_81
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22361_92 and *D.22365_101
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22361_92 and
*D.22365_101
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22353_81 and *D.22365_101
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22353_81 and
*D.22365_101
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22353_81 and *D.22367_105
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22353_81 and
*D.22367_105
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22353_81 and *D.22371_112
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22353_81 and
*D.22371_112
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22353_81 and *D.22365_101
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22353_81 and
*D.22365_101
Eyal.cpp:34: note: dependence distance  = 0.
Eyal.cpp:34: note: accesses have the same alignment.
Eyal.cpp:34: note: dependence distance modulo vf == 0 between *D.22365_101 and
*D.22365_101
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22367_105 and *D.22365_101
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22367_105 and
*D.22365_101
Eyal.cpp:34: note: versioning for alias required: can't determine dependence
between *D.22371_112 and *D.22365_101
Eyal.cpp:34: note: mark for run-time aliasing test between *D.22371_112 and
*D.22365_101
Eyal.cpp:34: note: found equal ranges *D.22353_81, *D.22365_101 and
*D.22353_81, *D.22365_101
Eyal.cpp:34: note: found equal ranges *D.22353_81, *D.22365_101 and
*D.22353_81, *D.22365_101
Eyal.cpp:34: note: === vect_analyze_slp ===
Eyal.cpp:34: note: === vect_make_slp_decision ===
Eyal.cpp:34: note: === vect_detect_hybrid_slp ===
Eyal.cpp:34: note: Alignment of access forced using versioning.
Eyal.cpp:34: note: Alignment of access forced using versioning.
Eyal.cpp:34: note: Vectorizing an unaligned access.
Eyal.cpp:34: note: Vectorizing an unaligned access.
Eyal.cpp:34: note: Vectorizing an unaligned access.
Eyal.cpp:34: note: Vectorizing an unaligned access.
Eyal.cpp:34: note: Vectorizing an unaligned access.
Eyal.cpp:34: note: Vectorizing an unaligned access.
Eyal.cpp:34: note: === vect_update_slp_costs_according_to_vf
===(analyze_scalar_evolution 
Eyal.cpp:34: note: create runtime check for data references *D.22353_81 and
*D.22365_101
Eyal.cpp:34: note: create runtime check for data references *D.22355_85 and
*D.22353_81
Eyal.cpp:34: note: create runtime check for data references *D.22355_85 and
*D.22365_101
Eyal.cpp:34: note: create runtime check for data references *D.22361_92 and
*D.22353_81
Eyal.cpp:34: note: create runtime check for data references *D.22361_92 and
*D.22365_101
Eyal.cpp:34: note: create runtime check for data references *D.22353_81 and
*D.22367_105
Eyal.cpp:34: note: create runtime check for data references *D.22353_81 and
*D.22371_112
Eyal.cpp:34: note: create runtime check for data references *D.22367_105 and
*D.22365_101
Eyal.cpp:34: note: create runtime check for data references *D.22371_112 and
*D.22365_101
Eyal.cpp:34: note: created 9 versioning for alias checks.
Eyal.cpp:34: note: LOOP VECTORIZED.(get_loop_exit_condition 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
  2008-02-07 10:30 ` [Bug c++/35117] " rguenth at gcc dot gnu dot org
  2008-02-07 10:37 ` eyal at geomage dot com
@ 2008-02-07 10:38 ` pinskia at gcc dot gnu dot org
  2008-02-07 10:41 ` pinskia at gcc dot gnu dot org
                   ` (30 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-02-07 10:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from pinskia at gcc dot gnu dot org  2008-02-07 10:37 -------
I think this is a dup of another bug I filed with respect of the builtin
operator new that getting the malloc attribute.


-- 

pinskia at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|major                       |normal


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (2 preceding siblings ...)
  2008-02-07 10:38 ` pinskia at gcc dot gnu dot org
@ 2008-02-07 10:41 ` pinskia at gcc dot gnu dot org
  2008-02-07 10:44 ` eyal at geomage dot com
                   ` (29 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2008-02-07 10:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from pinskia at gcc dot gnu dot org  2008-02-07 10:40 -------
That is PR 23383.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (3 preceding siblings ...)
  2008-02-07 10:41 ` pinskia at gcc dot gnu dot org
@ 2008-02-07 10:44 ` eyal at geomage dot com
  2008-02-07 10:54 ` irar at il dot ibm dot com
                   ` (28 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-07 10:44 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from eyal at geomage dot com  2008-02-07 10:43 -------
(In reply to comment #3)
> I think this is a dup of another bug I filed with respect of the builtin
> operator new that getting the malloc attribute.

Are you refering to using malloc instead of new? 
using malloc didnt make any difference performance wise.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (4 preceding siblings ...)
  2008-02-07 10:44 ` eyal at geomage dot com
@ 2008-02-07 10:54 ` irar at il dot ibm dot com
  2008-02-07 11:06 ` eyal at geomage dot com
                   ` (27 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: irar at il dot ibm dot com @ 2008-02-07 10:54 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from irar at il dot ibm dot com  2008-02-07 10:53 -------
(In reply to comment #2)
> Yes the loop is vectorized. 
...
> Eyal.cpp:34: note: created 9 versioning for alias checks.
> Eyal.cpp:34: note: LOOP VECTORIZED.(get_loop_exit_condition 

The vectorizer created runtime checks to verify that there is no data
dependence in the loop, i.e., if the data references do alias, the vector
version is skipped and the scalar version of the loop is performed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (5 preceding siblings ...)
  2008-02-07 10:54 ` irar at il dot ibm dot com
@ 2008-02-07 11:06 ` eyal at geomage dot com
  2008-02-07 12:17 ` eyal at geomage dot com
                   ` (26 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-07 11:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #7 from eyal at geomage dot com  2008-02-07 11:06 -------
(In reply to comment #6)
> (In reply to comment #2)
> > Yes the loop is vectorized. 
> ...
> > Eyal.cpp:34: note: created 9 versioning for alias checks.
> > Eyal.cpp:34: note: LOOP VECTORIZED.(get_loop_exit_condition 
> The vectorizer created runtime checks to verify that there is no data
> dependence in the loop, i.e., if the data references do alias, the vector
> version is skipped and the scalar version of the loop is performed.

Hi,
 That is what I suspected. Anyway I can identify from the log what causes
those runtime checks and resolve it in code, so I can be 100% sure that
the code is fully vectorized?

thanks


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (6 preceding siblings ...)
  2008-02-07 11:06 ` eyal at geomage dot com
@ 2008-02-07 12:17 ` eyal at geomage dot com
  2008-02-07 12:55 ` irar at il dot ibm dot com
                   ` (25 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-07 12:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from eyal at geomage dot com  2008-02-07 12:16 -------
Hi Ira,
  Here is the compiler output for the real code.
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references
*D.86651_134 and *D.86666_160
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references
*D.86651_134 and *D.86669_168
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references
*D.86655_139 and *D.86666_160
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references
*D.86655_139 and *D.86669_168
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references
*D.86658_145 and *D.86666_160
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references
*D.86658_145 and *D.86669_168
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references
*D.86661_151 and *D.86666_160
Crs/CEE_CRE_2DSearch.cpp:1285: note: create runtime check for data references
*D.86661_151 and *D.86669_168
Crs/CEE_CRE_2DSearch.cpp:1285: note: created 8 versioning for alias checks.

I looked further in the output log and found the following:
D.86666_160 = pTempSumPhase_Temp_cre_angle_27 + D.86665_159;
D.86669_168 = pTempSum2Phase_Temp_cre_angle_32 + D.86665_159;
D.86651_134 = pSum_78 + D.86650_133;
D.86655_139 = pSum_78 + D.86654_138;
D.86658_145 = pSum_G_106 + D.86650_133;
D.86661_151 = pSum_G_106 + D.86654_138;

D.86650_133 = D.86649_132 * 4
D.86649_132 = (long unsigned int) ittt_855;

D.86654_138 = D.86653_137 * 4;
D.86653_137 = (long unsigned int) ittt1_856;


It seems it complaints about some relationship between
pTempSum2Phase_Temp_cre_angle_32 and pTempSumPhase_Temp_cre_angle_27 and
pSum_78 and pSum_G_106
Those vectors have nothing in common in the code. How do I make the compiler
see there's no relationship? Here's the C++ code:

 void GCEE_CRE_2DSearch::Find( int i_rCee )
{
        float *pTempSumPhase_Temp_cre_angle = (float*) malloc (sizeof(float)
*m_nSamples);
        float *pTempSum2Phase_Temp_cre_angle = (float*) malloc (sizeof(float)
*m_nSamples);

        memset(pTempSumPhase_Temp_cre_angle,0,sizeof(float)* m_nSamples);
        memset(pTempSum2Phase_Temp_cre_angle,0,sizeof(float)* m_nSamples);

        float *  pSum, *pSum_G;
        .....
        .....
        pSum  = m_hiSearchQueue[i_trace];
        pSum_G   = m_hiSearchQueue[i_trace];
        .....
        .....
        for( int it = itBegin, ittt  = itBegin + sample_int, ittt1 = itBegin +
sample_int + 1; it < itEnd; it++, ittt++, ittt1++ )   
        {
                float fSumValue = pSum[ ittt ] * w11;
                fSumValue += pSum[ ittt1 ] * w21;
                fSumValue += pSum_G[ ittt ] * w12;
                fSumValue += pSum_G[ ittt1 ] * w22;
                pTempSumPhase_Temp_cre_angle[ it ] += fSumValue;
                pTempSum2Phase_Temp_cre_angle[ it ] += fSumValue * fSumValue;
        }


Thanks
Eyal         


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (7 preceding siblings ...)
  2008-02-07 12:17 ` eyal at geomage dot com
@ 2008-02-07 12:55 ` irar at il dot ibm dot com
  2008-02-07 12:58 ` eyal at geomage dot com
                   ` (24 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: irar at il dot ibm dot com @ 2008-02-07 12:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from irar at il dot ibm dot com  2008-02-07 12:54 -------
(In reply to comment #8)
> {
>         float *pTempSumPhase_Temp_cre_angle = (float*) malloc (sizeof(float)
> *m_nSamples);
>         float *pTempSum2Phase_Temp_cre_angle = (float*) malloc (sizeof(float)
> *m_nSamples);
> 
>         memset(pTempSumPhase_Temp_cre_angle,0,sizeof(float)* m_nSamples);
>         memset(pTempSum2Phase_Temp_cre_angle,0,sizeof(float)* m_nSamples);

Maybe the problem is that they escape (call to memset)...
The alias analysis fails to distinguish between these two pointers and the
vectorizer has to create runtime checks.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (8 preceding siblings ...)
  2008-02-07 12:55 ` irar at il dot ibm dot com
@ 2008-02-07 12:58 ` eyal at geomage dot com
  2008-02-07 13:05 ` irar at il dot ibm dot com
                   ` (23 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-07 12:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from eyal at geomage dot com  2008-02-07 12:58 -------
(In reply to comment #9)
> (In reply to comment #8)
> > {
> >         float *pTempSumPhase_Temp_cre_angle = (float*) malloc (sizeof(float)
> > *m_nSamples);
> >         float *pTempSum2Phase_Temp_cre_angle = (float*) malloc (sizeof(float)
> > *m_nSamples);
> > 
> >         memset(pTempSumPhase_Temp_cre_angle,0,sizeof(float)* m_nSamples);
> >         memset(pTempSum2Phase_Temp_cre_angle,0,sizeof(float)* m_nSamples);
> Maybe the problem is that they escape (call to memset)...
> The alias analysis fails to distinguish between these two pointers and the
> vectorizer has to create runtime checks.

I've commented the memset operation and still get the 
"created 8 versioning for alias checks." message.

Is there some pragma or a coding convention I can use to make the compiler
understant those pointers have nothing to do with each other?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (9 preceding siblings ...)
  2008-02-07 12:58 ` eyal at geomage dot com
@ 2008-02-07 13:05 ` irar at il dot ibm dot com
  2008-02-07 13:07 ` eyal at geomage dot com
                   ` (22 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: irar at il dot ibm dot com @ 2008-02-07 13:05 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from irar at il dot ibm dot com  2008-02-07 13:04 -------
(In reply to comment #10)
> Is there some pragma or a coding convention I can use to make the compiler
> understant those pointers have nothing to do with each other?

There is __restrict__, but it is useful only for function arguments. 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (10 preceding siblings ...)
  2008-02-07 13:05 ` irar at il dot ibm dot com
@ 2008-02-07 13:07 ` eyal at geomage dot com
  2008-02-07 13:23 ` irar at il dot ibm dot com
                   ` (21 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-07 13:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from eyal at geomage dot com  2008-02-07 13:07 -------
(In reply to comment #11)
> (In reply to comment #10)
> > Is there some pragma or a coding convention I can use to make the compiler
> > understant those pointers have nothing to do with each other?
> There is __restrict__, but it is useful only for function arguments. 

Ira, any suggestions as to how to solve this issue? I'd realy appriciate any
help here as Im lost and we're close to giving up on PPC and vectorization all
together.

thanks
 eyal


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (11 preceding siblings ...)
  2008-02-07 13:07 ` eyal at geomage dot com
@ 2008-02-07 13:23 ` irar at il dot ibm dot com
  2008-02-07 20:45 ` irar at il dot ibm dot com
                   ` (20 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: irar at il dot ibm dot com @ 2008-02-07 13:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from irar at il dot ibm dot com  2008-02-07 13:22 -------
CC'ing Daniel and Diego, maybe they can help with the alias analysis issues.


-- 

irar at il dot ibm dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dnovillo at google dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (12 preceding siblings ...)
  2008-02-07 13:23 ` irar at il dot ibm dot com
@ 2008-02-07 20:45 ` irar at il dot ibm dot com
  2008-02-08  8:50 ` zaks at il dot ibm dot com
                   ` (19 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: irar at il dot ibm dot com @ 2008-02-07 20:45 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #14 from irar at il dot ibm dot com  2008-02-07 20:44 -------
Giving it another thought, this is not necessary an alias analysis issue, even
that it fails to tell that the pointers not alias. Since in this case the
pointers do differ, the runtime test should take the flow to the vectorized
loop. Maybe the test is too strict. I'll look into this on Sunday.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (13 preceding siblings ...)
  2008-02-07 20:45 ` irar at il dot ibm dot com
@ 2008-02-08  8:50 ` zaks at il dot ibm dot com
  2008-02-08  8:55 ` eyal at geomage dot com
                   ` (18 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: zaks at il dot ibm dot com @ 2008-02-08  8:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from zaks at il dot ibm dot com  2008-02-08 08:49 -------
(In reply to comment #5)
> (In reply to comment #3)
> > I think this is a dup of another bug I filed with respect of the builtin
> > operator new that getting the malloc attribute.
> Are you refering to using malloc instead of new? 
> using malloc didnt make any difference performance wise.

Using malloc instead of new does generate better code and improves performance
slightly for me, admittedly not as much as we would like; the kernel becomes:

(using only -O3 -S -m64 -maltivec)

.L29:
        lvx 13,7,9
        lvx 12,3,9
        vperm 1,10,13,7
        vperm 11,9,12,8
        lvx 0,29,9
        vor 10,13,13
        vor 9,12,12
        vaddfp 1,1,11
        vaddfp 0,0,1
        stvx 0,29,9
        addi 9,9,16
        bdnz .L29

which is as good as the vectorizer can get, iinm: peeling the loop to align the
store (and the load from the same address), treating the other two loads as
potentially unaligned.

To further optimize this loop we would probably want to overlap the store with
subsequent loads using -fmodulo-sched; perhaps the new export-ddg can help with
that.


-- 

zaks at il dot ibm dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |zaks at il dot ibm dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (14 preceding siblings ...)
  2008-02-08  8:50 ` zaks at il dot ibm dot com
@ 2008-02-08  8:55 ` eyal at geomage dot com
  2008-02-08  8:58 ` eyal at geomage dot com
                   ` (17 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-08  8:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from eyal at geomage dot com  2008-02-08 08:55 -------
Thanks a lot Ira, I appriciate it.
If you need the full test code with .vect file and makefiles,please let me
know.
thanks,
eyal


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (15 preceding siblings ...)
  2008-02-08  8:55 ` eyal at geomage dot com
@ 2008-02-08  8:58 ` eyal at geomage dot com
  2008-02-10  7:31 ` eres at il dot ibm dot com
                   ` (16 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-08  8:58 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #17 from eyal at geomage dot com  2008-02-08 08:58 -------
> Using malloc instead of new does generate better code and improves performance
> slightly for me, admittedly not as much as we would like; the kernel becomes:
> (using only -O3 -S -m64 -maltivec)
> .L29:
>         lvx 13,7,9
>         lvx 12,3,9
>         vperm 1,10,13,7
>         vperm 11,9,12,8
>         lvx 0,29,9
>         vor 10,13,13
>         vor 9,12,12
>         vaddfp 1,1,11
>         vaddfp 0,0,1
>         stvx 0,29,9
>         addi 9,9,16
>         bdnz .L29
> which is as good as the vectorizer can get, iinm: peeling the loop to align the
> store (and the load from the same address), treating the other two loads as
> potentially unaligned.
> To further optimize this loop we would probably want to overlap the store with
> subsequent loads using -fmodulo-sched; perhaps the new export-ddg can help with
> that.

I was able to get about 20% more in one case with malloc.
I was expecting something like 2-4 times faster when the vectorization is
enabled.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (16 preceding siblings ...)
  2008-02-08  8:58 ` eyal at geomage dot com
@ 2008-02-10  7:31 ` eres at il dot ibm dot com
  2008-02-10  7:42 ` eyal at geomage dot com
                   ` (15 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eres at il dot ibm dot com @ 2008-02-10  7:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from eres at il dot ibm dot com  2008-02-10 07:30 -------
> To further optimize this loop we would probably want to overlap the store with
> subsequent loads using -fmodulo-sched; perhaps the new export-ddg can help with
> that.

I intend to test the impact of -fmodulo-sched with export-ddg patch on this
test.
Eyal - I appreciate it if you could post the full test code so I could test it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (17 preceding siblings ...)
  2008-02-10  7:31 ` eres at il dot ibm dot com
@ 2008-02-10  7:42 ` eyal at geomage dot com
  2008-02-10  7:57 ` eyal at geomage dot com
                   ` (14 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-10  7:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #19 from eyal at geomage dot com  2008-02-10 07:42 -------
Hi,  
  This is the simplest test I have.

#include <iostream>
#include <stdio.h>
#include <stdlib.h>

typedef float ARRTYPE;

int main ( int argc, char *argv[] )
{
        int m_nSamples = atoi( argv[1] );
        int itBegin = atoi( argv[2] );
        int itEnd = atoi( argv[3] );
        int iSizeMain = atoi( argv[ 4 ] );
        ARRTYPE *pSum1 = new ARRTYPE[ 100000 ];
        ARRTYPE *pSum = new ARRTYPE[ 100000 ];
        for ( int it = 0; it < m_nSamples; it++ )
        {
                pSum[ it ] = it / itBegin;
                pSum1[ it ] = itBegin / ( it + 1 );
        }
        ARRTYPE *pVec1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *m_nSamples);
        ARRTYPE *pVec2 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *m_nSamples);
        for ( int i = 0; i < m_nSamples - 5; i++ )
        {
            for( int it = itBegin; it < itEnd; it++ )
                pVec1[ it ] += pSum[ it ] + pSum1[ it ];        
        }
        free( pVec1 );
        free( pVec2 );
}

// Test - Vectorized binary, TestNoVec - Non vectorized binary
time ./Test 90000 1 89900 1
real    0m23.273s

time ./TestNoVec 90000 1 89900 1
real    0m24.344s


This is the compiler output I found relevant, please let me know if you need
more information.

Test.cpp:24: note: dependence distance modulo vf == 0 between *D.22310_50 and
*D.22310_50
Test.cpp:24: note: versioning for alias required: can't determine dependence
between *D.22312_54 and *D.22310_50
Test.cpp:24: note: mark for run-time aliasing test between *D.22312_54 and
*D.22310_50
Test.cpp:24: note: versioning for alias required: can't determine dependence
between *D.22314_58 and *D.22310_50
Test.cpp:24: note: mark for run-time aliasing test between *D.22314_58 and
*D.22310_50
Test.cpp:24: note: create runtime check for data references *D.22312_54 and
*D.22310_50
Test.cpp:24: note: create runtime check for data references *D.22314_58 and
*D.22310_50
Test.cpp:24: note: created 2 versioning for alias checks.
Test.cpp:24: note: LOOP VECTORIZED.(get_loop_exit_condition


D.22310_50 = pVec1_37 + D.22309_49;
D.22312_54 = pSum_20 + D.22309_49;
D.22314_58 = pSum1_18 + D.22309_49;


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (18 preceding siblings ...)
  2008-02-10  7:42 ` eyal at geomage dot com
@ 2008-02-10  7:57 ` eyal at geomage dot com
  2008-02-10 13:49 ` eyal at geomage dot com
                   ` (13 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-10  7:57 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from eyal at geomage dot com  2008-02-10 07:56 -------
Hi,
  I've tried putting the loop to be vectorized in a different method and the
compiler output looks better, but the performance is still the same as the
non-vectorized code.

#include <iostream>
#include <stdio.h>
#include <stdlib.h>

typedef float ARRTYPE;

void Calc( ARRTYPE *pSum, ARRTYPE *pSum1, ARRTYPE *pVec1, ARRTYPE *pVec2, int
m_nSamples, int itBegin, int itEnd );

int main ( int argc, char *argv[] )
{
        int m_nSamples = atoi( argv[1] );
        int itBegin = atoi( argv[2] );
        int itEnd = atoi( argv[3] );
        int iSizeMain = atoi( argv[ 4 ] );
        ARRTYPE *pSum1 = new ARRTYPE[ 100000 ];
        ARRTYPE *pSum = new ARRTYPE[ 100000 ];
        for ( int it = 0; it < m_nSamples; it++ )
        {
                pSum[ it ] = it / itBegin;
                pSum1[ it ] = itBegin / ( it + 1 );
        }
        ARRTYPE *pVec1 = NULL, *pVec2 = NULL;
        Calc( pSum, pSum1, pVec1, pVec2, m_nSamples, itBegin, itEnd );
        std::cout << "pVec1[10]  = " << pVec1[ 10 ] << std::endl;
        std::cout << "pVec1[102]  = " << pVec1[ 102 ] << std::endl;
        free( pVec1 );
        free( pVec2 );
}

void Calc( ARRTYPE *pSum, ARRTYPE *pSum1, ARRTYPE *pVec1, ARRTYPE *pVec2, int
m_nSamples, int itBegin, int itEnd )
{
        pVec1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *m_nSamples);
        pVec2 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *m_nSamples);
        for ( int i = 0; i < m_nSamples - 5; i++ )
        {
            for( int it = itBegin; it < itEnd; it++ )
                pVec1[ it ] += pSum[ it ] + pSum1[ it ];        
        }
}




Eyal.cpp:36: note: dependence distance  = 0.
Eyal.cpp:36: note: accesses have the same alignment.
Eyal.cpp:36: note: dependence distance modulo vf == 0 between *D.22348_22 and
*D.22348_22
Eyal.cpp:36: note: === vect_analyze_slp ===
Eyal.cpp:36: note: === vect_make_slp_decision ===
Eyal.cpp:36: note: === vect_detect_hybrid_slp ===(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = it_60)
(get_scalar_evolution 
  (scalar = it_60)
  (scalar_evolution = {itBegin_14(D), +, 1}_2))
(set_scalar_evolution 
  (scalar = it_60)
  (scalar_evolution = {itBegin_14(D), +, 1}_2))
)
(instantiate_parameters 
  (loop_nb = 2)
  (chrec = {itBegin_14(D), +, 1}_2)
  (res = {itBegin_14(D), +, 1}_2))
(get_loop_exit_condition 
  if (itEnd_16(D) > it_36))

Eyal.cpp:36: note: Alignment of access forced using peeling.
Eyal.cpp:36: note: Vectorizing an unaligned access.
Eyal.cpp:36: note: Vectorizing an unaligned access.
Eyal.cpp:36: note: === vect_update_slp_costs_according_to_vf
===(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = it_60)
(get_scalar_evolution 
  (scalar = it_60)
  (scalar_evolution = {itBegin_14(D), +, 1}_2))
(set_scalar_evolution 
  (scalar = it_60)
  (scalar_evolution = {itBegin_14(D), +, 1}_2))
)
(instantiate_parameters 
  (loop_nb = 2)
  (chrec = {itBegin_14(D), +, 1}_2)
  (res = {itBegin_14(D), +, 1}_2))
(get_loop_exit_condition 
  if (itEnd_16(D) > it_36))
(get_loop_exit_condition 
  if (itEnd_16(D) > it_36))
(get_loop_exit_condition 
  if (itEnd_16(D) > it_84))
(get_loop_exit_condition 
  if (ivtmp.267_92 < prolog_loop_niters.266_70))

loop at Eyal.cpp:37: if (ivtmp.267_92 <
prolog_loop_niters.266_70)(get_loop_exit_condition 
  if (itEnd_16(D) > it_36))
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = it_60)
(get_scalar_evolution 
  (scalar = it_60)
  (scalar_evolution = ))
(analyze_initial_condition 
  (loop_phi_node = 
it_60 = PHI <it_36(4), it_86(21)>)
  (init_cond = it_86))
(analyze_evolution_in_loop 
  (loop_phi_node = it_60 = PHI <it_36(4), it_86(21)>)
(add_to_evolution 
  (loop_nb = 2)
  (chrec_before = it_86)
  (to_add = 1)
  (res = {it_86, +, 1}_2))
  (evolution_function = {it_86, +, 1}_2))
(set_scalar_evolution 
  (scalar = it_60)
  (scalar_evolution = {it_86, +, 1}_2))
)
(get_loop_exit_condition 
  if (itEnd_16(D) > it_36))
(get_loop_exit_condition 
  if (ivtmp.329_211 < bnd.269_99))

loop at Eyal.cpp:37: if (ivtmp.329_211 < bnd.269_99)

Registering new PHI nodes in block #0



Registering new PHI nodes in block #2

Updating SSA information for statement D.22335_6 = malloc (D.22334_5);

Updating SSA information for statement malloc (D.22334_5);



Registering new PHI nodes in block #3



Registering new PHI nodes in block #9



Registering new PHI nodes in block #7



Registering new PHI nodes in block #8



Registering new PHI nodes in block #10



Registering new PHI nodes in block #14



Registering new PHI nodes in block #12

Updating SSA information for statement D.22349_76 = *D.22348_75;

Updating SSA information for statement *D.22348_75 = D.22355_82;



Registering new PHI nodes in block #13



Registering new PHI nodes in block #16



Registering new PHI nodes in block #15



Registering new PHI nodes in block #21



Registering new PHI nodes in block #22



Registering new PHI nodes in block #19

Updating SSA information for statement D.22349_106 = *D.22348_105;

Updating SSA information for statement *D.22348_105 = D.22355_112;



Registering new PHI nodes in block #20



Registering new PHI nodes in block #25



Registering new PHI nodes in block #24



Registering new PHI nodes in block #18



Registering new PHI nodes in block #26

Updating SSA information for statement vect_var_.279_143 = A*vect_p.280_142;

Updating SSA information for statement vect_var_.300_174 = A*vect_p.301_173;



Registering new PHI nodes in block #5

Updating SSA information for statement vect_var_.278_134 = *ivtmp.277_132;

Updating SSA information for statement D.22349_23 = *D.22348_22;

Updating SSA information for statement vect_var_.298_164 = A*ivtmp.297_162;

Updating SSA information for statement vect_var_.319_195 = A*ivtmp.318_193;

Updating SSA information for statement *ivtmp.328_208 = vect_var_.322_198;



Registering new PHI nodes in block #4



Registering new PHI nodes in block #23



Registering new PHI nodes in block #17



Registering new PHI nodes in block #6



Registering new PHI nodes in block #11



Symbols to be put in SSA form

{ HEAP.249 NMT.252 NMT.253 }


Incremental SSA update started at block: 0

Number of blocks in CFG: 27
Number of blocks to update: 26 ( 96%)

Affected blocks: 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
25 26 



Eyal.cpp:36: note: LOOP VECTORIZED.
Eyal.cpp:30: note: vectorized 1 loops in function.
Merging blocks 10 and 14
Merging blocks 15 and 21
Merging blocks 17 and 6
Merging blocks 24 and 18
Created preheader block for loop 3
void Calc(ARRTYPE*, ARRTYPE*, ARRTYPE*, ARRTYPE*, int, int, int) (pSum, pSum1,
pVec1, pVec2, m_nSamples, itBegin, itEnd)
{
  unsigned int ivtmp.329;
  __vector float * ivtmp.328;
  float * D.25089;
  __vector float * vect_p.327;
  unsigned int D.25086;
  long unsigned int base_off.325;
  float * D.25082;
  long unsigned int D.25083;
  long unsigned int D.25084;
  float * batmp.324;
  __vector float * vect_p.323;
  __vector float vect_var_.322;
  __vector float vect_var_.321;
  __vector float vect_var_.320;
  __vector float vect_var_.319;
  __vector float * ivtmp.318;
  float * D.25072;
  __vector float * vect_p.317;
  long unsigned int offset.315;
  unsigned int D.25068;
  long unsigned int base_off.314;
  long unsigned int D.25065;
  long unsigned int D.25066;
  float * batmp.313;
  __vector float * vect_p.312;
  __vector float vect_var_.311;
  __vector signed char vect_var_.310;
  float * D.25060;
  __vector float * vect_p.309;
  unsigned int D.25057;
  long unsigned int base_off.307;
  long unsigned int D.25054;
  long unsigned int D.25055;
  float * batmp.306;
  float * D.25052;
  __vector float * vect_p.305;
  unsigned int D.25049;
  long unsigned int base_off.303;
  long unsigned int D.25046;
  long unsigned int D.25047;
  float * batmp.302;
  __vector float * vect_p.301;
  __vector float vect_var_.300;
  __vector float vect_var_.299;
  __vector float vect_var_.298;
  __vector float * ivtmp.297;
  float * D.25039;
  __vector float * vect_p.296;
  long unsigned int offset.294;
  unsigned int D.25035;
  long unsigned int base_off.293;
  long unsigned int D.25032;
  long unsigned int D.25033;
  float * batmp.292;
  __vector float * vect_p.291;
  __vector float vect_var_.290;
  __vector signed char vect_var_.289;
  float * D.25027;
  __vector float * vect_p.288;
  unsigned int D.25024;
  long unsigned int base_off.286;
  long unsigned int D.25021;
  long unsigned int D.25022;
  float * batmp.285;
  float * D.25019;
  __vector float * vect_p.284;
  unsigned int D.25016;
  long unsigned int base_off.282;
  long unsigned int D.25013;
  long unsigned int D.25014;
  float * batmp.281;
  __vector float * vect_p.280;
  __vector float vect_var_.279;
  __vector float vect_var_.278;
  __vector float * ivtmp.277;
  float * D.25007;
  __vector float * vect_p.276;
  unsigned int D.25004;
  long unsigned int base_off.274;
  float * D.25000;
  long unsigned int D.25001;
  long unsigned int D.25002;
  float * batmp.273;
  __vector float * vect_p.272;
  int D.24997;
  int tmp.271;
  unsigned int ratio_mult_vf.270;
  unsigned int bnd.269;
  int D.24989;
  unsigned int D.24990;
  unsigned int D.24991;
  unsigned int D.24992;
  unsigned int D.24993;
  unsigned int niters.268;
  unsigned int ivtmp.267;
  long unsigned int D.24981;
  long unsigned int D.24982;
  long unsigned int D.24983;
  long unsigned int D.24984;
  unsigned int D.24985;
  unsigned int D.24986;
  unsigned int prolog_loop_niters.266;
  __vector float * vect_p.265;
  float * D.24974;
  long unsigned int D.24975;
  long unsigned int D.24976;
  float * batmp.262;
  int D.24969;
  unsigned int D.24970;
  unsigned int D.24971;
  unsigned int D.24972;
  unsigned int niters.261;
  int it;
  int i;
  float D.22355;
  float D.22354;
  float D.22353;
  ARRTYPE * D.22352;
  float D.22351;
  ARRTYPE * D.22350;
  float D.22349;
  ARRTYPE * D.22348;
  long unsigned int D.22347;
  long unsigned int D.22346;
  int D.22340;
  void * D.22335;
  long unsigned int D.22334;
  long unsigned int D.22333;

<bb 2>:
  D.22333_4 = (long unsigned int) m_nSamples_3(D);
  D.22334_5 = D.22333_4 * 4;
  D.22335_6 = malloc (D.22334_5);
  pVec1_7 = (ARRTYPE *) D.22335_6;
  malloc (D.22334_5);
  D.22340_9 = m_nSamples_3(D) + -5;
  if (D.22340_9 > 0)
    goto <bb 3>;
  else
    goto <bb 22>;

<bb 3>:
  goto <bb 15>;

<bb 4>:

<bb 5>:
  # ivtmp.329_210 = PHI <ivtmp.329_211(4), 0(20)>
  # ivtmp.328_208 = PHI <ivtmp.328_209(4), vect_p.323_207(20)>
  # ivtmp.318_193 = PHI <ivtmp.318_194(4), vect_p.312_192(20)>
  # vect_var_.311_183 = PHI <vect_var_.319_195(4), vect_var_.300_174(20)>
  # ivtmp.297_162 = PHI <ivtmp.297_163(4), vect_p.291_161(20)>
  # vect_var_.290_152 = PHI <vect_var_.298_164(4), vect_var_.279_143(20)>
  # ivtmp.277_132 = PHI <ivtmp.277_133(4), vect_p.272_131(20)>
  # it_60 = PHI <it_36(4), it_86(20)>
  D.22346_20 = (long unsigned int) it_60;
  D.22347_21 = D.22346_20 * 4;
  D.22348_22 = pVec1_7 + D.22347_21;
  vect_var_.278_134 = *ivtmp.277_132;
  D.22349_23 = *D.22348_22;
  D.22350_27 = pSum_26(D) + D.22347_21;
  vect_var_.298_164 = A*ivtmp.297_162;
  vect_var_.299_165 = REALIGN_LOAD <vect_var_.290_152, vect_var_.298_164,
vect_var_.289_151>;
  D.22351_28 = *D.22350_27;
  D.22352_32 = pSum1_31(D) + D.22347_21;
  vect_var_.319_195 = A*ivtmp.318_193;
  vect_var_.320_196 = REALIGN_LOAD <vect_var_.311_183, vect_var_.319_195,
vect_var_.310_182>;
  D.22353_33 = *D.22352_32;
  vect_var_.321_197 = vect_var_.299_165 + vect_var_.278_134;
  D.22354_34 = D.22351_28 + D.22349_23;
  vect_var_.322_198 = vect_var_.321_197 + vect_var_.320_196;
  D.22355_35 = D.22354_34 + D.22353_33;
  *ivtmp.328_208 = vect_var_.322_198;
  it_36 = it_60 + 1;
  ivtmp.277_133 = ivtmp.277_132 + 16;
  ivtmp.297_163 = ivtmp.297_162 + 16;
  ivtmp.318_194 = ivtmp.318_193 + 16;
  ivtmp.328_209 = ivtmp.328_208 + 16;
  ivtmp.329_211 = ivtmp.329_210 + 1;
  if (ivtmp.329_211 < bnd.269_99)
    goto <bb 4>;
  else
    goto <bb 6>;

<bb 6>:
  # it_117 = PHI <it_36(5)>
  D.24997_121 = (int) ratio_mult_vf.270_100;
  tmp.271_122 = it_86 + D.24997_121;
  if (niters.268_98 == ratio_mult_vf.270_100)
    goto <bb 11>;
  else
    goto <bb 7>;

<bb 7>:
  # it_116 = PHI <tmp.271_122(6), it_86(19)>

<bb 8>:
  # it_102 = PHI <it_114(9), it_116(7)>
  D.22346_103 = (long unsigned int) it_102;
  D.22347_104 = D.22346_103 * 4;
  D.22348_105 = pVec1_7 + D.22347_104;
  D.22349_106 = *D.22348_105;
  D.22350_107 = pSum_26(D) + D.22347_104;
  D.22351_108 = *D.22350_107;
  D.22352_109 = pSum1_31(D) + D.22347_104;
  D.22353_110 = *D.22352_109;
  D.22354_111 = D.22351_108 + D.22349_106;
  D.22355_112 = D.22354_111 + D.22353_110;
  *D.22348_105 = D.22355_112;
  it_114 = it_102 + 1;
  if (itEnd_16(D) > it_114)
    goto <bb 9>;
  else
    goto <bb 10>;

<bb 9>:
  goto <bb 8>;

<bb 10>:

<bb 11>:

<bb 12>:

<bb 13>:
  i_37 = i_24 + 1;
  if (D.22340_9 > i_37)
    goto <bb 14>;
  else
    goto <bb 22>;

<bb 14>:

<bb 15>:
  # i_24 = PHI <i_37(14), 0(3)>
  if (itBegin_14(D) < itEnd_16(D))
    goto <bb 16>;
  else
    goto <bb 13>;

<bb 16>:
  D.24969_19 = ~itBegin_14(D);
  D.24970_1 = (unsigned int) D.24969_19;
  D.24971_38 = (unsigned int) itEnd_16(D);
  D.24972_8 = D.24970_1 + D.24971_38;
  niters.261_59 = D.24972_8 + 1;
  D.24974_2 = (float *) D.22335_6;
  D.24975_39 = (long unsigned int) itBegin_14(D);
  D.24976_25 = D.24975_39 * 4;
  batmp.262_30 = D.24974_2 + D.24976_25;
  vect_p.265_63 = (__vector float *) batmp.262_30;
  D.24981_64 = (long unsigned int) vect_p.265_63;
  D.24982_65 = D.24981_64 & 15;
  D.24983_66 = D.24982_65 >> 2;
  D.24984_67 = 4 - D.24983_66;
  D.24985_68 = (unsigned int) D.24984_67;
  D.24986_69 = D.24985_68 & 3;
  prolog_loop_niters.266_70 = MIN_EXPR <D.24986_69, niters.261_59>;
  if (prolog_loop_niters.266_70 == 0)
    goto <bb 19>;
  else
    goto <bb 17>;

<bb 17>:
  # ivtmp.267_89 = PHI <0(16)>
  # it_215 = PHI <itBegin_14(D)(16)>

<bb 23>:
  # ivtmp.267_91 = PHI <ivtmp.267_89(17), ivtmp.267_92(21)>
  # it_72 = PHI <it_215(17), it_84(21)>
  D.22346_73 = (long unsigned int) it_72;
  D.22347_74 = D.22346_73 * 4;
  D.22348_75 = pVec1_7 + D.22347_74;
  D.22349_76 = *D.22348_75;
  D.22350_77 = pSum_26(D) + D.22347_74;
  D.22351_78 = *D.22350_77;
  D.22352_79 = pSum1_31(D) + D.22347_74;
  D.22353_80 = *D.22352_79;
  D.22354_81 = D.22351_78 + D.22349_76;
  D.22355_82 = D.22354_81 + D.22353_80;
  *D.22348_75 = D.22355_82;
  it_84 = it_72 + 1;
  ivtmp.267_92 = ivtmp.267_91 + 1;
  if (ivtmp.267_92 < prolog_loop_niters.266_70)
    goto <bb 21>;
  else
    goto <bb 18>;

<bb 18>:
  # it_87 = PHI <it_84(23)>
  if (niters.261_59 == prolog_loop_niters.266_70)
    goto <bb 12>;
  else
    goto <bb 19>;

<bb 19>:
  # it_86 = PHI <it_87(18), itBegin_14(D)(16)>
  D.24989_93 = ~itBegin_14(D);
  D.24990_94 = (unsigned int) D.24989_93;
  D.24991_95 = (unsigned int) itEnd_16(D);
  D.24992_96 = D.24990_94 + D.24991_95;
  D.24993_97 = D.24992_96 - prolog_loop_niters.266_70;
  niters.268_98 = D.24993_97 + 1;
  bnd.269_99 = niters.268_98 >> 2;
  ratio_mult_vf.270_100 = bnd.269_99 << 2;
  if (ratio_mult_vf.270_100 <= 3)
    goto <bb 7>;
  else
    goto <bb 20>;

<bb 20>:
  D.25000_123 = (float *) D.22335_6;
  D.25001_124 = (long unsigned int) itBegin_14(D);
  D.25002_125 = D.25001_124 * 4;
  batmp.273_126 = D.25000_123 + D.25002_125;
  D.25004_127 = prolog_loop_niters.266_70 * 4;
  base_off.274_128 = (long unsigned int) D.25004_127;
  D.25007_129 = batmp.273_126 + base_off.274_128;
  vect_p.276_130 = (__vector float *) D.25007_129;
  vect_p.272_131 = vect_p.276_130;
  D.25013_135 = (long unsigned int) itBegin_14(D);
  D.25014_136 = D.25013_135 * 4;
  batmp.281_137 = pSum_26(D) + D.25014_136;
  D.25016_138 = prolog_loop_niters.266_70 * 4;
  base_off.282_139 = (long unsigned int) D.25016_138;
  D.25019_140 = batmp.281_137 + base_off.282_139;
  vect_p.284_141 = (__vector float *) D.25019_140;
  vect_p.280_142 = vect_p.284_141;
  vect_var_.279_143 = A*vect_p.280_142;
  D.25021_144 = (long unsigned int) itBegin_14(D);
  D.25022_145 = D.25021_144 * 4;
  batmp.285_146 = pSum_26(D) + D.25022_145;
  D.25024_147 = prolog_loop_niters.266_70 * 4;
  base_off.286_148 = (long unsigned int) D.25024_147;
  D.25027_149 = batmp.285_146 + base_off.286_148;
  vect_p.288_150 = (__vector float *) D.25027_149;
  vect_var_.289_151 = __builtin_altivec_mask_for_load (vect_p.288_150);
  D.25032_153 = (long unsigned int) itBegin_14(D);
  D.25033_154 = D.25032_153 * 4;
  batmp.292_155 = pSum_26(D) + D.25033_154;
  D.25035_156 = prolog_loop_niters.266_70 * 4;
  base_off.293_157 = (long unsigned int) D.25035_156;
  offset.294_158 = base_off.293_157 + 12;
  D.25039_159 = batmp.292_155 + offset.294_158;
  vect_p.296_160 = (__vector float *) D.25039_159;
  vect_p.291_161 = vect_p.296_160;
  D.25046_166 = (long unsigned int) itBegin_14(D);
  D.25047_167 = D.25046_166 * 4;
  batmp.302_168 = pSum1_31(D) + D.25047_167;
  D.25049_169 = prolog_loop_niters.266_70 * 4;
  base_off.303_170 = (long unsigned int) D.25049_169;
  D.25052_171 = batmp.302_168 + base_off.303_170;
  vect_p.305_172 = (__vector float *) D.25052_171;
  vect_p.301_173 = vect_p.305_172;
  vect_var_.300_174 = A*vect_p.301_173;
  D.25054_175 = (long unsigned int) itBegin_14(D);
  D.25055_176 = D.25054_175 * 4;
  batmp.306_177 = pSum1_31(D) + D.25055_176;
  D.25057_178 = prolog_loop_niters.266_70 * 4;
  base_off.307_179 = (long unsigned int) D.25057_178;
  D.25060_180 = batmp.306_177 + base_off.307_179;
  vect_p.309_181 = (__vector float *) D.25060_180;
  vect_var_.310_182 = __builtin_altivec_mask_for_load (vect_p.309_181);
  D.25065_184 = (long unsigned int) itBegin_14(D);
  D.25066_185 = D.25065_184 * 4;
  batmp.313_186 = pSum1_31(D) + D.25066_185;
  D.25068_187 = prolog_loop_niters.266_70 * 4;
  base_off.314_188 = (long unsigned int) D.25068_187;
  offset.315_189 = base_off.314_188 + 12;
  D.25072_190 = batmp.313_186 + offset.315_189;
  vect_p.317_191 = (__vector float *) D.25072_190;
  vect_p.312_192 = vect_p.317_191;
  D.25082_199 = (float *) D.22335_6;
  D.25083_200 = (long unsigned int) itBegin_14(D);
  D.25084_201 = D.25083_200 * 4;
  batmp.324_202 = D.25082_199 + D.25084_201;
  D.25086_203 = prolog_loop_niters.266_70 * 4;
  base_off.325_204 = (long unsigned int) D.25086_203;
  D.25089_205 = batmp.324_202 + base_off.325_204;
  vect_p.327_206 = (__vector float *) D.25089_205;
  vect_p.323_207 = vect_p.327_206;
  goto <bb 5>;

<bb 21>:
  goto <bb 23>;

<bb 22>:
  return;

}



;; Function int main(int, char**) (main)

(get_loop_exit_condition 
  if (ivtmp.553_471 != 0))
(number_of_iterations_in_loop
(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = ivtmp.553_471)
(get_scalar_evolution 
  (scalar = ivtmp.553_471)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = ivtmp.553_1)
(get_scalar_evolution 
  (scalar = ivtmp.553_1)
  (scalar_evolution = ))
(analyze_initial_condition 
  (loop_phi_node = 
ivtmp.553_1 = PHI <256(21), ivtmp.553_471(23)>)
  (init_cond = 256))
(analyze_evolution_in_loop 
  (loop_phi_node = ivtmp.553_1 = PHI <256(21), ivtmp.553_471(23)>)
(add_to_evolution 
  (loop_nb = 3)
  (chrec_before = 256)
  (to_add = 1)
  (res = {256, +, 0x0ffffffffffffffff}_3))
  (evolution_function = {256, +, 0x0ffffffffffffffff}_3))
(set_scalar_evolution 
  (scalar = ivtmp.553_1)
  (scalar_evolution = {256, +, 0x0ffffffffffffffff}_3))
)
(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = 1)
(get_scalar_evolution 
  (scalar = 1)
  (scalar_evolution = 1))
)
(set_scalar_evolution 
  (scalar = ivtmp.553_471)
  (scalar_evolution = {255, +, 0x0ffffffffffffffff}_3))
)
(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = 0)
(get_scalar_evolution 
  (scalar = 0)
  (scalar_evolution = 0))
)
Analyzing # of iterations of loop 3
  exit condition [255, + , 0x0ffffffffffffffff] != 0
  bounds on difference of bases: -255 ... -255
  result:
    # of iterations 255, bounded by 255
  (set_nb_iterations_in_loop = 255))
(get_loop_exit_condition 
  if (ivtmp.553_471 != 0))
Creating dr for __tmp[__i_473]
analyze_innermost: (analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = &__tmp)
(get_scalar_evolution 
  (scalar = &__tmp)
  (scalar_evolution = ))
)
(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = (long unsigned int) __i_473)
(get_scalar_evolution 
  (scalar = (long unsigned int) __i_473)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = __i_473)
(get_scalar_evolution 
  (scalar = __i_473)
  (scalar_evolution = ))
(analyze_initial_condition 
  (loop_phi_node = 
__i_473 = PHI <0(21), __i_139(23)>)
  (init_cond = 0))
(analyze_evolution_in_loop 
  (loop_phi_node = __i_473 = PHI <0(21), __i_139(23)>)
(add_to_evolution 
  (loop_nb = 3)
  (chrec_before = 0)
  (to_add = 1)
  (res = {0, +, 1}_3))
  (evolution_function = {0, +, 1}_3))
(set_scalar_evolution 
  (scalar = __i_473)
  (scalar_evolution = {0, +, 1}_3))
)
)
success.
(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = __i_473)
(get_scalar_evolution 
  (scalar = __i_473)
  (scalar_evolution = {0, +, 1}_3))
(set_scalar_evolution 
  (scalar = __i_473)
  (scalar_evolution = {0, +, 1}_3))
)
        base_address: &__tmp
        offset from base address: 0
        constant offset from base address: 0
        step: 1
        aligned to: 128
        base_object: __tmp[0]
        symbol tag: __tmp
(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = ivtmp.553_1)
(get_scalar_evolution 
  (scalar = ivtmp.553_1)
  (scalar_evolution = {256, +, 0x0ffffffffffffffff}_3))
(set_scalar_evolution 
  (scalar = ivtmp.553_1)
  (scalar_evolution = {256, +, 0x0ffffffffffffffff}_3))
)
(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = __i_473)
(get_scalar_evolution 
  (scalar = __i_473)
  (scalar_evolution = {0, +, 1}_3))
(set_scalar_evolution 
  (scalar = __i_473)
  (scalar_evolution = {0, +, 1}_3))
)

/usr/local/gcc43/lib/gcc/powerpc64-unknown-linux-gnu/4.3.0/../../../../include/c++/4.3.0/bits/locale_facets.h:1168:
note: === vect_analyze_slp ===
/usr/local/gcc43/lib/gcc/powerpc64-unknown-linux-gnu/4.3.0/../../../../include/c++/4.3.0/bits/locale_facets.h:1168:
note: === vect_make_slp_decision ===
/usr/local/gcc43/lib/gcc/powerpc64-unknown-linux-gnu/4.3.0/../../../../include/c++/4.3.0/bits/locale_facets.h:1168:
note: === vect_detect_hybrid_slp ===(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = ivtmp.553_1)
(get_scalar_evolution 
  (scalar = ivtmp.553_1)
  (scalar_evolution = {256, +, 0x0ffffffffffffffff}_3))
(set_scalar_evolution 
  (scalar = ivtmp.553_1)
  (scalar_evolution = {256, +, 0x0ffffffffffffffff}_3))
)
(instantiate_parameters 
  (loop_nb = 3)
  (chrec = {256, +, 0x0ffffffffffffffff}_3)
  (res = {256, +, 0x0ffffffffffffffff}_3))
(analyze_scalar_evolution 
  (loop_nb = 3)
  (scalar = __i_473)
(get_scalar_evolution 
  (scalar = __i_473)
  (scalar_evolution = {0, +, 1}_3))
(set_scalar_evolution 
  (scalar = __i_473)
  (scalar_evolution = {0, +, 1}_3))
)
(instantiate_parameters 
  (loop_nb = 3)
  (chrec = {0, +, 1}_3)
  (res = {0, +, 1}_3))
(get_loop_exit_condition 
  if (ivtmp.553_471 != 0))

/usr/local/gcc43/lib/gcc/powerpc64-unknown-linux-gnu/4.3.0/../../../../include/c++/4.3.0/bits/locale_facets.h:1168:
note: not vectorized: relevant stmt not supported: D.25590_137 = (char)
__i_473(get_loop_exit_condition 
  if (ivtmp.554_138 != 0))
(number_of_iterations_in_loop
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = ivtmp.554_138)
(get_scalar_evolution 
  (scalar = ivtmp.554_138)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = ivtmp.554_469)
(get_scalar_evolution 
  (scalar = ivtmp.554_469)
  (scalar_evolution = ))
(analyze_initial_condition 
  (loop_phi_node = 
ivtmp.554_469 = PHI <256(11), ivtmp.554_138(13)>)
  (init_cond = 256))
(analyze_evolution_in_loop 
  (loop_phi_node = ivtmp.554_469 = PHI <256(11), ivtmp.554_138(13)>)
(add_to_evolution 
  (loop_nb = 2)
  (chrec_before = 256)
  (to_add = 1)
  (res = {256, +, 0x0ffffffffffffffff}_2))
  (evolution_function = {256, +, 0x0ffffffffffffffff}_2))
(set_scalar_evolution 
  (scalar = ivtmp.554_469)
  (scalar_evolution = {256, +, 0x0ffffffffffffffff}_2))
)
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = 1)
(get_scalar_evolution 
  (scalar = 1)
  (scalar_evolution = 1))
)
(set_scalar_evolution 
  (scalar = ivtmp.554_138)
  (scalar_evolution = {255, +, 0x0ffffffffffffffff}_2))
)
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = 0)
(get_scalar_evolution 
  (scalar = 0)
  (scalar_evolution = 0))
)
Analyzing # of iterations of loop 2
  exit condition [255, + , 0x0ffffffffffffffff] != 0
  bounds on difference of bases: -255 ... -255
  result:
    # of iterations 255, bounded by 255
  (set_nb_iterations_in_loop = 255))
(get_loop_exit_condition 
  if (ivtmp.554_138 != 0))
Creating dr for __tmp[__i_477]
analyze_innermost: (analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = &__tmp)
(get_scalar_evolution 
  (scalar = &__tmp)
  (scalar_evolution = ))
)
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = (long unsigned int) __i_477)
(get_scalar_evolution 
  (scalar = (long unsigned int) __i_477)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = __i_477)
(get_scalar_evolution 
  (scalar = __i_477)
  (scalar_evolution = ))
(analyze_initial_condition 
  (loop_phi_node = 
__i_477 = PHI <0(11), __i_96(13)>)
  (init_cond = 0))
(analyze_evolution_in_loop 
  (loop_phi_node = __i_477 = PHI <0(11), __i_96(13)>)
(add_to_evolution 
  (loop_nb = 2)
  (chrec_before = 0)
  (to_add = 1)
  (res = {0, +, 1}_2))
  (evolution_function = {0, +, 1}_2))
(set_scalar_evolution 
  (scalar = __i_477)
  (scalar_evolution = {0, +, 1}_2))
)
)
success.
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = __i_477)
(get_scalar_evolution 
  (scalar = __i_477)
  (scalar_evolution = {0, +, 1}_2))
(set_scalar_evolution 
  (scalar = __i_477)
  (scalar_evolution = {0, +, 1}_2))
)
        base_address: &__tmp
        offset from base address: 0
        constant offset from base address: 0
        step: 1
        aligned to: 128
        base_object: __tmp[0]
        symbol tag: __tmp
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = ivtmp.554_469)
(get_scalar_evolution 
  (scalar = ivtmp.554_469)
  (scalar_evolution = {256, +, 0x0ffffffffffffffff}_2))
(set_scalar_evolution 
  (scalar = ivtmp.554_469)
  (scalar_evolution = {256, +, 0x0ffffffffffffffff}_2))
)
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = __i_477)
(get_scalar_evolution 
  (scalar = __i_477)
  (scalar_evolution = {0, +, 1}_2))
(set_scalar_evolution 
  (scalar = __i_477)
  (scalar_evolution = {0, +, 1}_2))
)

/usr/local/gcc43/lib/gcc/powerpc64-unknown-linux-gnu/4.3.0/../../../../include/c++/4.3.0/bits/locale_facets.h:1168:
note: === vect_analyze_slp ===
/usr/local/gcc43/lib/gcc/powerpc64-unknown-linux-gnu/4.3.0/../../../../include/c++/4.3.0/bits/locale_facets.h:1168:
note: === vect_make_slp_decision ===
/usr/local/gcc43/lib/gcc/powerpc64-unknown-linux-gnu/4.3.0/../../../../include/c++/4.3.0/bits/locale_facets.h:1168:
note: === vect_detect_hybrid_slp ===(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = ivtmp.554_469)
(get_scalar_evolution 
  (scalar = ivtmp.554_469)
  (scalar_evolution = {256, +, 0x0ffffffffffffffff}_2))
(set_scalar_evolution 
  (scalar = ivtmp.554_469)
  (scalar_evolution = {256, +, 0x0ffffffffffffffff}_2))
)
(instantiate_parameters 
  (loop_nb = 2)
  (chrec = {256, +, 0x0ffffffffffffffff}_2)
  (res = {256, +, 0x0ffffffffffffffff}_2))
(analyze_scalar_evolution 
  (loop_nb = 2)
  (scalar = __i_477)
(get_scalar_evolution 
  (scalar = __i_477)
  (scalar_evolution = {0, +, 1}_2))
(set_scalar_evolution 
  (scalar = __i_477)
  (scalar_evolution = {0, +, 1}_2))
)
(instantiate_parameters 
  (loop_nb = 2)
  (chrec = {0, +, 1}_2)
  (res = {0, +, 1}_2))
(get_loop_exit_condition 
  if (ivtmp.554_138 != 0))

/usr/local/gcc43/lib/gcc/powerpc64-unknown-linux-gnu/4.3.0/../../../../include/c++/4.3.0/bits/locale_facets.h:1168:
note: not vectorized: relevant stmt not supported: D.25541_94 = (char)
__i_477(get_loop_exit_condition 
  if (it_28 < m_nSamples_45))
(number_of_iterations_in_loop
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = it_28)
(get_scalar_evolution 
  (scalar = it_28)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = it_476)
(get_scalar_evolution 
  (scalar = it_476)
  (scalar_evolution = ))
(analyze_initial_condition 
  (loop_phi_node = 
it_476 = PHI <it_28(5), 0(3)>)
  (init_cond = 0))
(analyze_evolution_in_loop 
  (loop_phi_node = it_476 = PHI <it_28(5), 0(3)>)
(add_to_evolution 
  (loop_nb = 1)
  (chrec_before = 0)
  (to_add = 1)
  (res = {0, +, 1}_1))
  (evolution_function = {0, +, 1}_1))
(set_scalar_evolution 
  (scalar = it_476)
  (scalar_evolution = {0, +, 1}_1))
)
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = 1)
(get_scalar_evolution 
  (scalar = 1)
  (scalar_evolution = 1))
)
(set_scalar_evolution 
  (scalar = it_28)
  (scalar_evolution = {1, +, 1}_1))
)
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = m_nSamples_45)
(get_scalar_evolution 
  (scalar = m_nSamples_45)
  (scalar_evolution = ))
)
Analyzing # of iterations of loop 1
  exit condition [1, + , 1](no_overflow) < (int) D.24890_44
  bounds on difference of bases: 0 ... 2147483646
  result:
    # of iterations (unsigned int) D.24890_44 + 4294967295, bounded by
2147483646
(instantiate_parameters 
  (loop_nb = 1)
  (chrec = (unsigned int) D.24890_44 + 4294967295)
(analyze_scalar_evolution 
  (loop_nb = 0)
  (scalar = D.24890_44)
(get_scalar_evolution 
  (scalar = D.24890_44)
  (scalar_evolution = ))
(set_scalar_evolution 
  (scalar = D.24890_44)
  (scalar_evolution = D.24890_44))
)
  (res = (unsigned int) D.24890_44 + 4294967295))
  (set_nb_iterations_in_loop = (unsigned int) D.24890_44 + 4294967295))
(get_loop_exit_condition 
  if (it_28 < m_nSamples_45))
Creating dr for *D.22306_22
analyze_innermost: (analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = (float *) D.22306_22)
(get_scalar_evolution 
  (scalar = (float *) D.22306_22)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = D.22306_22)
(get_scalar_evolution 
  (scalar = D.22306_22)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = pSum_18)
(get_scalar_evolution 
  (scalar = pSum_18)
  (scalar_evolution = ))
)
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = D.22305_21)
(get_scalar_evolution 
  (scalar = D.22305_21)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = D.22304_20)
(get_scalar_evolution 
  (scalar = D.22304_20)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = it_476)
(get_scalar_evolution 
  (scalar = it_476)
  (scalar_evolution = {0, +, 1}_1))
(set_scalar_evolution 
  (scalar = it_476)
  (scalar_evolution = {0, +, 1}_1))
)
(set_scalar_evolution 
  (scalar = D.22304_20)
  (scalar_evolution = {0, +, 1}_1))
)
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = 4)
(get_scalar_evolution 
  (scalar = 4)
  (scalar_evolution = 4))
)
(set_scalar_evolution 
  (scalar = D.22305_21)
  (scalar_evolution = {0, +, 4}_1))
)
(set_scalar_evolution 
  (scalar = D.22306_22)
  (scalar_evolution = {pSum_18, +, 4}_1))
)
)
success.
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = D.22306_22)
(get_scalar_evolution 
  (scalar = D.22306_22)
  (scalar_evolution = {pSum_18, +, 4}_1))
(set_scalar_evolution 
  (scalar = D.22306_22)
  (scalar_evolution = {pSum_18, +, 4}_1))
)
        base_address: D.22299_17
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *(ARRTYPE *) D.22299_17
        symbol tag: SMT.506
Creating dr for *D.22309_27
analyze_innermost: (analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = (float *) D.22309_27)
(get_scalar_evolution 
  (scalar = (float *) D.22309_27)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = D.22309_27)
(get_scalar_evolution 
  (scalar = D.22309_27)
  (scalar_evolution = ))
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = pSum1_16)
(get_scalar_evolution 
  (scalar = pSum1_16)
  (scalar_evolution = ))
)
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = D.22305_21)
(get_scalar_evolution 
  (scalar = D.22305_21)
  (scalar_evolution = {0, +, 4}_1))
(set_scalar_evolution 
  (scalar = D.22305_21)
  (scalar_evolution = {0, +, 4}_1))
)
(set_scalar_evolution 
  (scalar = D.22309_27)
  (scalar_evolution = {pSum1_16, +, 4}_1))
)
)
success.
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = D.22309_27)
(get_scalar_evolution 
  (scalar = D.22309_27)
  (scalar_evolution = {pSum1_16, +, 4}_1))
(set_scalar_evolution 
  (scalar = D.22309_27)
  (scalar_evolution = {pSum1_16, +, 4}_1))
)
        base_address: D.22298_15
        offset from base address: 0
        constant offset from base address: 0
        step: 4
        aligned to: 128
        base_object: *(ARRTYPE *) D.22298_15
        symbol tag: SMT.506
(compute_affine_dependence
  (stmt_a = 
*D.22306_22 = D.22308_24)
  (stmt_b = 
*D.22309_27 = D.22312_30)
)
(analyze_scalar_evolution 
  (loop_nb = 1)
  (scalar = it_476)
(get_scalar_evolution 
  (scalar = it_476)
  (scalar_evolution = {0, +, 1}_1))
(set_scalar_evolution 
  (scalar = it_476)
  (scalar_evolution = {0, +, 1}_1))
)

Eyal.cpp:17: note: versioning for alias required: can't determine dependence
between *D.22306_22 and *D.22309_27
Eyal.cpp:17: note: mark for run-time aliasing test between *D.22306_22 and
*D.22309_27
Eyal.cpp:17: note: === vect_analyze_slp ===
Eyal.cpp:17: note: === vect_make_slp_decision ===
Eyal.cpp:17: note: === vect_detect_hybrid_slp ===
Eyal.cpp:17: note: Alignment of access forced using versioning.
Eyal.cpp:17: note: Alignment of access forced using versioning.
Eyal.cpp:17: note: not vectorized: relevant stmt not supported: D.22307_23 =
it_476 / itBegin_47
Eyal.cpp:9: note: vectorized 0 loops in function.
int main(int, char**) (argc, argv)
{
  size_t ivtmp.554;
  size_t ivtmp.553;
  const struct ctype & D.25583;
  const struct ctype & D.25583;
  char D.25590;
  int (*__vtbl_ptr_type) (void) * D.25591;
  int (*__vtbl_ptr_type) (void) * D.25592;
  int (*__vtbl_ptr_type) (void) D.25593;
  char * D.25594;
  int D.25595;
  char __tmp[256];
  size_t __i;
  int (*__vtbl_ptr_type) (void) D.25598;
  int (*__vtbl_ptr_type) (void) * D.25597;
  int (*__vtbl_ptr_type) (void) * D.25596;
  char D.25588;
  char D.25587;
  char D.25587;
  struct basic_ios * __os.63;
  int (*__vtbl_ptr_type) (void) * D.25573;
  int (*__vtbl_ptr_type) (void) * D.25574;
  long int * D.25575;
  long int D.25576;
  long unsigned int D.25577;
  struct basic_ios * D.25578;
  struct basic_ostream & D.25580;
  const struct ctype & D.25534;
  const struct ctype & D.25534;
  char D.25541;
  int (*__vtbl_ptr_type) (void) * D.25542;
  int (*__vtbl_ptr_type) (void) * D.25543;
  int (*__vtbl_ptr_type) (void) D.25544;
  char * D.25545;
  int D.25546;
  char __tmp[256];
  size_t __i;
  int (*__vtbl_ptr_type) (void) D.25549;
  int (*__vtbl_ptr_type) (void) * D.25548;
  int (*__vtbl_ptr_type) (void) * D.25547;
  char D.25539;
  char D.25538;
  char D.25538;
  struct basic_ios * __os.63;
  int (*__vtbl_ptr_type) (void) * D.25524;
  int (*__vtbl_ptr_type) (void) * D.25525;
  long int * D.25526;
  long int D.25527;
  long unsigned int D.25528;
  struct basic_ios * D.25529;
  struct basic_ostream & D.25531;
  double D.24917;
  struct basic_ostream & D.24916;
  struct basic_ostream & D.24916;
  double D.24908;
  struct basic_ostream & D.24907;
  struct basic_ostream & D.24907;
  long int D.24898;
  long int D.24898;
  long int D.24894;
  long int D.24894;
  long int D.24890;
  long int D.24890;
  int it;
  ARRTYPE * pSum;
  ARRTYPE * pSum1;
  int itEnd;
  int itBegin;
  int m_nSamples;
  float D.22319;
  float D.22315;
  float D.22312;
  int D.22311;
  ARRTYPE * D.22309;
  float D.22308;
  int D.22307;
  ARRTYPE * D.22306;
  long unsigned int D.22305;
  long unsigned int D.22304;
  void * D.22299;
  void * D.22298;
  char * D.22297;
  char * * D.22296;
  char * D.22295;
  char * * D.22294;
  char * D.22293;
  char * * D.22292;
  char * D.22291;
  char * * D.22290;

<bb 2>:
  D.22290_3 = argv_2(D) + 8;
  D.22291_4 = *D.22290_3;
  D.24890_44 = __strtol_internal (D.22291_4, 0B, 10, 0);
  m_nSamples_45 = (int) D.24890_44;
  D.22292_6 = argv_2(D) + 16;
  D.22293_7 = *D.22292_6;
  D.24894_46 = __strtol_internal (D.22293_7, 0B, 10, 0);
  itBegin_47 = (int) D.24894_46;
  D.22294_9 = argv_2(D) + 24;
  D.22295_10 = *D.22294_9;
  D.24898_48 = __strtol_internal (D.22295_10, 0B, 10, 0);
  D.22296_12 = argv_2(D) + 32;
  D.22297_13 = *D.22296_12;
  __strtol_internal (D.22297_13, 0B, 10, 0);
  D.22298_15 = operator new [] (400000);
  pSum1_16 = (ARRTYPE *) D.22298_15;
  D.22299_17 = operator new [] (400000);
  pSum_18 = (ARRTYPE *) D.22299_17;
  if (m_nSamples_45 > 0)
    goto <bb 3>;
  else
    goto <bb 7>;

<bb 3>:

<bb 4>:
  # it_476 = PHI <it_28(5), 0(3)>
  D.22304_20 = (long unsigned int) it_476;
  D.22305_21 = D.22304_20 * 4;
  D.22306_22 = pSum_18 + D.22305_21;
  D.22307_23 = it_476 / itBegin_47;
  D.22308_24 = (float) D.22307_23;
  *D.22306_22 = D.22308_24;
  D.22309_27 = pSum1_16 + D.22305_21;
  it_28 = it_476 + 1;
  D.22311_29 = itBegin_47 / it_28;
  D.22312_30 = (float) D.22311_29;
  *D.22309_27 = D.22312_30;
  if (it_28 < m_nSamples_45)
    goto <bb 5>;
  else
    goto <bb 6>;

<bb 5>:
  goto <bb 4>;

<bb 6>:

<bb 7>:
  itEnd_49 = (int) D.24898_48;
  Calc (pSum_18, pSum1_16, 0B, 0B, m_nSamples_45, itBegin_47, itEnd_49);
  __ostream_insert (&cout, &"pVec1[10]  = "[0], 13);
  D.22315_36 ={v} *40B;
  D.24908_52 = (double) D.22315_36;
  D.24907_53 = _M_insert (&cout, D.24908_52);
  __os.63_80 = (struct basic_ios *) D.24907_53;
  D.25524_81 = D.24907_53->_vptr.basic_ostream;
  D.25525_82 = D.25524_81 + -24;
  D.25526_83 = (long int *) D.25525_82;
  D.25527_84 = *D.25526_83;
  D.25528_85 = (long unsigned int) D.25527_84;
  D.25529_86 = __os.63_80 + D.25528_85;
  D.25534_90 = D.25529_86->_M_ctype;
  if (D.25534_90 == 0B)
    goto <bb 8>;
  else
    goto <bb 9>;

<bb 8>:
  __throw_bad_cast ();

<bb 9>:
  D.25539_91 = D.25534_90->_M_widen_ok;
  if (D.25539_91 != 0)
    goto <bb 10>;
  else
    goto <bb 11>;

<bb 10>:
  D.25538_93 = D.25534_90->_M_widen[10];
  goto <bb 17>;

<bb 11>:

<bb 12>:
  # ivtmp.554_469 = PHI <256(11), ivtmp.554_138(13)>
  # __i_477 = PHI <0(11), __i_96(13)>
  D.25541_94 = (char) __i_477;
  __tmp[__i_477] = D.25541_94;
  __i_96 = __i_477 + 1;
  ivtmp.554_138 = ivtmp.554_469 - 1;
  if (ivtmp.554_138 != 0)
    goto <bb 13>;
  else
    goto <bb 14>;

<bb 13>:
  goto <bb 12>;

<bb 14>:
  D.25542_97 = D.25534_90->D.15856._vptr.facet;
  D.25543_98 = D.25542_97 + 56;
  D.25544_99 = *D.25543_98;
  D.25545_100 = &D.25534_90->_M_widen[0];
  OBJ_TYPE_REF(D.25544_99;D.25534_90->7) (D.25534_90, &__tmp[0], &__tmp[256],
D.25545_100);
  D.25534_90->_M_widen_ok = 1;
  D.25546_102 = __builtin_memcmp (&__tmp[0], D.25545_100, 256);
  if (D.25546_102 != 0)
    goto <bb 15>;
  else
    goto <bb 16>;

<bb 15>:
  D.25534_90->_M_widen_ok = 2;

<bb 16>:
  D.25547_103 = D.25534_90->D.15856._vptr.facet;
  D.25548_104 = D.25547_103 + 48;
  D.25549_105 = *D.25548_104;
  D.25538_106 = OBJ_TYPE_REF(D.25549_105;D.25534_90->6) (D.25534_90, 10);

<bb 17>:
  # D.25538_107 = PHI <D.25538_93(10), D.25538_106(16)>
  D.25531_88 = put (D.24907_53, D.25538_107);
  flush (D.25531_88);
  __ostream_insert (&cout, &"pVec1[102]  = "[0], 14);
  D.22319_40 ={v} *408B;
  D.24917_55 = (double) D.22319_40;
  D.24916_56 = _M_insert (&cout, D.24917_55);
  __os.63_123 = (struct basic_ios *) D.24916_56;
  D.25573_124 = D.24916_56->_vptr.basic_ostream;
  D.25574_125 = D.25573_124 + -24;
  D.25575_126 = (long int *) D.25574_125;
  D.25576_127 = *D.25575_126;
  D.25577_128 = (long unsigned int) D.25576_127;
  D.25578_129 = __os.63_123 + D.25577_128;
  D.25583_133 = D.25578_129->_M_ctype;
  if (D.25583_133 == 0B)
    goto <bb 18>;
  else
    goto <bb 19>;

<bb 18>:
  __throw_bad_cast ();

<bb 19>:
  D.25588_134 = D.25583_133->_M_widen_ok;
  if (D.25588_134 != 0)
    goto <bb 20>;
  else
    goto <bb 21>;

<bb 20>:
  D.25587_136 = D.25583_133->_M_widen[10];
  goto <bb 27>;

<bb 21>:

<bb 22>:
  # ivtmp.553_1 = PHI <256(21), ivtmp.553_471(23)>
  # __i_473 = PHI <0(21), __i_139(23)>
  D.25590_137 = (char) __i_473;
  __tmp[__i_473] = D.25590_137;
  __i_139 = __i_473 + 1;
  ivtmp.553_471 = ivtmp.553_1 - 1;
  if (ivtmp.553_471 != 0)
    goto <bb 23>;
  else
    goto <bb 24>;

<bb 23>:
  goto <bb 22>;

<bb 24>:
  D.25591_140 = D.25583_133->D.15856._vptr.facet;
  D.25592_141 = D.25591_140 + 56;
  D.25593_142 = *D.25592_141;
  D.25594_143 = &D.25583_133->_M_widen[0];
  OBJ_TYPE_REF(D.25593_142;D.25583_133->7) (D.25583_133, &__tmp[0],
&__tmp[256], D.25594_143);
  D.25583_133->_M_widen_ok = 1;
  D.25595_145 = __builtin_memcmp (&__tmp[0], D.25594_143, 256);
  if (D.25595_145 != 0)
    goto <bb 25>;
  else
    goto <bb 26>;

<bb 25>:
  D.25583_133->_M_widen_ok = 2;

<bb 26>:
  D.25596_146 = D.25583_133->D.15856._vptr.facet;
  D.25597_147 = D.25596_146 + 48;
  D.25598_148 = *D.25597_147;
  D.25587_149 = OBJ_TYPE_REF(D.25598_148;D.25583_133->6) (D.25583_133, 10);

<bb 27>:
  # D.25587_150 = PHI <D.25587_136(20), D.25587_149(26)>
  D.25580_131 = put (D.24916_56, D.25587_150);
  flush (D.25580_131);
  free (0B);
  free (0B);
  return 0;

}


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (19 preceding siblings ...)
  2008-02-10  7:57 ` eyal at geomage dot com
@ 2008-02-10 13:49 ` eyal at geomage dot com
  2008-02-10 15:07 ` victork at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-10 13:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from eyal at geomage dot com  2008-02-10 13:48 -------
(In reply to comment #14)
> Giving it another thought, this is not necessary an alias analysis issue, even
> that it fails to tell that the pointers not alias. Since in this case the
> pointers do differ, the runtime test should take the flow to the vectorized
> loop. Maybe the test is too strict. I'll look into this on Sunday.

Hi,
 Any update on this matter?

thanks
eyal


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (20 preceding siblings ...)
  2008-02-10 13:49 ` eyal at geomage dot com
@ 2008-02-10 15:07 ` victork at gcc dot gnu dot org
  2008-02-10 15:48 ` eyal at geomage dot com
                   ` (11 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: victork at gcc dot gnu dot org @ 2008-02-10 15:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #22 from victork at gcc dot gnu dot org  2008-02-10 15:06 -------
1. It looks like vectorizer was enabled in both cases, since -O3 enables the
vectorizer by the default. You need to add -fno-tree-vectorize to disable it
explicitly.

2. To get better results from vectorized version I would recommend to allocate
arrays at boundaries aligned to 16 byte and let to the compiler to know this.
You can do it by static allocation of arrays:

  float pSum1[64000] __attribute__ ((__aligned__(16)));
  float pSum[64000] __attribute__ ((__aligned__(16)));
  float pVec1[64000] __attribute__ ((__aligned__(16)));

3. It would be better if "itBegin" will start from 0 and be known at compile
time. This and [2] will allow to vectorizer to save realigning loads.

4. For some strange reason the run time of this test can vary significantly (up
to 50%) from run to run. So be sure to run it several times.

-- Victor.


-- 

victork at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |victork at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (21 preceding siblings ...)
  2008-02-10 15:07 ` victork at gcc dot gnu dot org
@ 2008-02-10 15:48 ` eyal at geomage dot com
  2008-02-11 12:24 ` victork at gcc dot gnu dot org
                   ` (10 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-10 15:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #23 from eyal at geomage dot com  2008-02-10 15:47 -------
(In reply to comment #22)
> 1. It looks like vectorizer was enabled in both cases, since -O3 enables the
> vectorizer by the default. You need to add -fno-tree-vectorize to disable it
> explicitly.
> 2. To get better results from vectorized version I would recommend to allocate
> arrays at boundaries aligned to 16 byte and let to the compiler to know this.
> You can do it by static allocation of arrays:
>   float pSum1[64000] __attribute__ ((__aligned__(16)));
>   float pSum[64000] __attribute__ ((__aligned__(16)));
>   float pVec1[64000] __attribute__ ((__aligned__(16)));
> 3. It would be better if "itBegin" will start from 0 and be known at compile
> time. This and [2] will allow to vectorizer to save realigning loads.
> 4. For some strange reason the run time of this test can vary significantly (up
> to 50%) from run to run. So be sure to run it several times.
> -- Victor.

Hi,
  Item 2 is problematic as the data can vary a lot and I cant use static
arrays.  Im also willing to pay a "reasonable" price for the alignment extra
actions.  
  Item 3: I cant make itBegin start from zero, since this is how the formula
we're using works. Its calculated everytime and can vary in value.
  Item 4: I saw consistent results everytime I ran it.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (22 preceding siblings ...)
  2008-02-10 15:48 ` eyal at geomage dot com
@ 2008-02-11 12:24 ` victork at gcc dot gnu dot org
  2008-02-11 13:36 ` irar at il dot ibm dot com
                   ` (9 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: victork at gcc dot gnu dot org @ 2008-02-11 12:24 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #24 from victork at gcc dot gnu dot org  2008-02-11 12:23 -------
Hi,

Here are some more of my observations.
1. For some unclear reason there is indeed no much difference between
vectorized and non-vectorized versions for long runs like "time ./TestNoVec
92200 8 89720 1000", but the difference is much more apparent for more short
runs:

victork@white:~> time ./mnovec 30000 8 29720 1000
real    0m1.738s
user    0m1.723s
sys     0m0.004s

victork@white:~> time ./mvec 30000 8 29720 1000
real    0m0.781s
user    0m0.778s
sys     0m0.003s

2. If you replace the new() by malloc() it helps to static dependence analysis
to prove independence between pSum, pSum1 and pVec1 at compile time, so the
run-time versioning is not required.

3. If we leave allocation of buffers by new(), then compiler uses "versioning
for alias" and this forces the use of versioning for alignment used to prove
right alignment of store to pVec1. This is less optimal than loop peeling,
since the vectorized version of loop is executed only for values of itBegin
which is multiple of 4. 

Here is the vesion of your program I used to get above results:

#include <iostream>
#include <stdio.h>
#include <stdlib.h>

typedef float ARRTYPE;
int main ( int argc, char *argv[] )
{
        int m_nSamples = atoi( argv[1] );
        int itBegin = atoi( argv[2] );
        int itEnd = atoi( argv[3] );
        int iSizeMain = atoi( argv[ 4 ] );
        ARRTYPE *pSum1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *100000);
        ARRTYPE *pSum = (ARRTYPE*) malloc (sizeof(ARRTYPE) *100000);
        for ( int it = 0; it < m_nSamples; it++ )
        {
                pSum[ it ] = it / itBegin;
                pSum1[ it ] = itBegin / ( it + 1 );
        }
        ARRTYPE *pVec1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *m_nSamples);

        for ( int i = 0, j = 0; i < m_nSamples - 5; i++ )
        {
            for( int it = itBegin; it < itEnd; it++ )
                pVec1[ it ] += pSum[ it ] + pSum1[ it ];
        }
        free( pVec1 );
}

victork@white:~> $g -O3 -fno-tree-vectorize -m64 -o mnovec m.c
victork@white:~> $g -O3 -fdump-tree-vect-details -ftree-vectorize -maltivec
-m64 -o mvec m.c
victork@white:~> time ./mnovec 30000 1 29720 1000

real    0m1.754s
user    0m1.750s
sys     0m0.003s
victork@white:~> time ./mvec 30000 1 29720 1000

real    0m0.781s
user    0m0.778s
sys     0m0.003s


-- Victor


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (23 preceding siblings ...)
  2008-02-11 12:24 ` victork at gcc dot gnu dot org
@ 2008-02-11 13:36 ` irar at il dot ibm dot com
  2008-02-11 13:42 ` victork at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: irar at il dot ibm dot com @ 2008-02-11 13:36 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #25 from irar at il dot ibm dot com  2008-02-11 13:35 -------
(In reply to comment #21)
> (In reply to comment #14)
> > Giving it another thought, this is not necessary an alias analysis issue, even
> > that it fails to tell that the pointers not alias. Since in this case the
> > pointers do differ, the runtime test should take the flow to the vectorized
> > loop. Maybe the test is too strict. I'll look into this on Sunday.
> 
> Hi,
>  Any update on this matter?
> 
> thanks
> eyal
> 

Yes, I asked Victor to look into this, since he implemented versioning for
aliasing in the vectorizer.

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (24 preceding siblings ...)
  2008-02-11 13:36 ` irar at il dot ibm dot com
@ 2008-02-11 13:42 ` victork at gcc dot gnu dot org
  2008-02-11 14:01 ` eyal at geomage dot com
                   ` (7 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: victork at gcc dot gnu dot org @ 2008-02-11 13:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #26 from victork at gcc dot gnu dot org  2008-02-11 13:41 -------
Probably, the small difference between vectorized and non-vectorized versions
can be explained by the fact that big arrays do not fit the memory cache.
Here is the version of the original program which shows that more than twice
difference is remained for long runs as well if arrays are decreased to fit the
cache:

#include <iostream>
#include <stdio.h>
#include <stdlib.h>

typedef float ARRTYPE;
int main ( int argc, char *argv[] )
{
        int m_nSamples = atoi( argv[1] );
        int itBegin = atoi( argv[2] );
        int itEnd = atoi( argv[3] );
        int iSizeMain = atoi( argv[ 4 ] );
        ARRTYPE *pSum1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *100000);
        ARRTYPE *pSum = (ARRTYPE*) malloc (sizeof(ARRTYPE) *100000);
        for ( int it = 0; it < 100000; it++ )
        {
                pSum[ it ] = it / itBegin;
                pSum1[ it ] = itBegin / ( it + 1 );
        }
        ARRTYPE *pVec1 = (ARRTYPE*) malloc (sizeof(ARRTYPE) *100000);

        for ( int i = 0, j = 0; i < m_nSamples - 5; i++ )
        {
            for( int it = itBegin; it < itEnd; it++ )
                pVec1[ it ] += pSum[ it ] + pSum1[ it ];
        }
        free( pVec1 );
}

victork@white:~> $g -O3 -fdump-tree-vect-details -fno-tree-vectorize -m64 -o
mnovec m.c
victork@white:~> $g -O3 -fdump-tree-vect-details -ftree-vectorize -maltivec
-m64 -o mvec m.c

victork@white:~> time ./mnovec 400000 1 29720 1000

real    0m24.493s
user    0m24.483s
sys     0m0.007s
victork@white:~> time ./mvec 400000 1 29720 1000

real    0m10.777s
user    0m10.771s
sys     0m0.005s

-- Victor


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (25 preceding siblings ...)
  2008-02-11 13:42 ` victork at gcc dot gnu dot org
@ 2008-02-11 14:01 ` eyal at geomage dot com
  2008-02-11 14:22 ` victork at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-11 14:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #27 from eyal at geomage dot com  2008-02-11 14:00 -------
Hi,
  I am a bit lost and appriciate your guidelines. Up till now, after all those
emails, I still have no clue as to why such a simple test case doesnt work. As
far as I understood the vectorization should have shown between 2 to 4 times
faster. With all the suggestions here I still didnt get more then 20-30%
performance gain. 
  I would appriciate if someone from the vectorization team could come up with
detailed explaination as to how to make the vectorization do whats promised. 

  As for the last email, Victor:
  1. Using a smaller number of iterations, doesnt help me. This is not what the
real world code runs.
  2. new/malloc almost didnt do anything maybe a gain of 20%
  3. The difference between 1.738sec and 0.781sec can either be a 2 times
performance gain or simply a 1 second gain that would remain 1 second for more
intensive calculations. Therefore I cant use/rely on the test you did.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (26 preceding siblings ...)
  2008-02-11 14:01 ` eyal at geomage dot com
@ 2008-02-11 14:22 ` victork at gcc dot gnu dot org
  2008-02-11 16:30 ` dberlin at gcc dot gnu dot org
                   ` (5 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: victork at gcc dot gnu dot org @ 2008-02-11 14:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #28 from victork at gcc dot gnu dot org  2008-02-11 14:21 -------
>   As for the last email, Victor:
>   1. Using a smaller number of iterations, doesnt help me. This is not what the
> real world code runs.

Looks like in your example the memory subsystem is a performance bottleneck.
Vectorization alone does not help. Probably you need to think how to partition
your arrays to fit the data cache.

>   2. new/malloc almost didnt do anything maybe a gain of 20%

With data allocated my malloc compiler is able to prove independence
statically. So, it would be better to alocate memory by malloc.

>   3. The difference between 1.738sec and 0.781sec can either be a 2 times
> performance gain or simply a 1 second gain that would remain 1 second for more
> intensive calculations. Therefore I cant use/rely on the test you did.

See an example in my previous comment. It is about 2.4 times performance gain.
-- Victor


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (27 preceding siblings ...)
  2008-02-11 14:22 ` victork at gcc dot gnu dot org
@ 2008-02-11 16:30 ` dberlin at gcc dot gnu dot org
  2008-02-12  8:43 ` eyal at geomage dot com
                   ` (4 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: dberlin at gcc dot gnu dot org @ 2008-02-11 16:30 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #29 from dberlin at gcc dot gnu dot org  2008-02-11 16:29 -------
Vectorization is not magic.
I'm also not sure where you got the idea that vectorization = magic speedup
There is no real "expected performance gain" on memory bound applications
because the processor spends all of it's time waiting on the memory subsystem,
not calculating things.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (28 preceding siblings ...)
  2008-02-11 16:30 ` dberlin at gcc dot gnu dot org
@ 2008-02-12  8:43 ` eyal at geomage dot com
  2008-02-12 10:52 ` victork at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-12  8:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #30 from eyal at geomage dot com  2008-02-12 08:43 -------
Hi,
  Thanks a lot for the input about a potential memory bottle-neck. I indeed was
under the impression that once I got the loop vectorized, I'd immidiatly see a
performance boost.
  I would appriciate, however, a further explaination about this issue.
  After all, this is a very simple test case. I still dont understand why the
hugh diffference when I run:
  time ./TestNoVec 92200 8 89720 1000
   real    0m23.549s

   time ./TestVec 92200 8 89720 1000
   real    0m22.845s

and when I run:
victork@white:~> time ./mnovec 400000 1 29720 1000

real    0m24.493s
user    0m24.483s
sys     0m0.007s
victork@white:~> time ./mvec 400000 1 29720 1000

real    0m10.777s
user    0m10.771s
sys     0m0.005s


I cant see from the code how those parameter diff effect the performance so
much. I'd appriciate your assistance again.

thanks
eyal


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (29 preceding siblings ...)
  2008-02-12  8:43 ` eyal at geomage dot com
@ 2008-02-12 10:52 ` victork at gcc dot gnu dot org
  2008-02-12 11:29 ` eyal at geomage dot com
                   ` (2 subsequent siblings)
  33 siblings, 0 replies; 35+ messages in thread
From: victork at gcc dot gnu dot org @ 2008-02-12 10:52 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #31 from victork at gcc dot gnu dot org  2008-02-12 10:51 -------
> I would appriciate, however, a further explaination about this issue.

The explanation has to deal with CPU architecture and is not related to
compilers.  In case of cache miss the memory load and store take tens of cpu
cycles instead of few cycles in case of cache hit.
When we run:
time ./mvec 400000 1 29720 1000
The program perform 400000 iterations of outer loop and 29720 iterations in
internal loop. The internal loop performs 3 load accesses and one store access
per iteration. Starting from second iteration of outer loop, all  29720
elements of arrays pSum, pSum1 and pVec1 will be placed into cache and from
this point all accesses will be cache hits. (I assume that data cache is big
enough to contain all 29720*3 elements).

Lets look at the slow run:
% time ./TestVec 92200 8 89720 1000
Here the program perform (89720-8) iterations in internal loop, so in order to
have cache hits most of the time we need the cache to be at least 89712*3 in
size.  Lets consider what will happen if cache size is only half of required
amount.  After completion of first iteration of the outer loop, the cache will
be filled with second half of data from arrays.  At start of second iteration
of outer loop, all elements from first half will be evicted from the cache as
most caches use LRU policy to choose evicted elements.  Considering that PPC970
is out-of-order, multiple-issue architecture we can guess why CPU have enough
time to perform arithmetic operations even in scalar manner without adding any
overhead relatively to vectorized version of internal loop.



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (30 preceding siblings ...)
  2008-02-12 10:52 ` victork at gcc dot gnu dot org
@ 2008-02-12 11:29 ` eyal at geomage dot com
  2008-02-13 16:07 ` eyal at geomage dot com
  2008-02-14 13:42 ` victork at gcc dot gnu dot org
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-12 11:29 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #32 from eyal at geomage dot com  2008-02-12 11:28 -------
(In reply to comment #31)
> > I would appriciate, however, a further explaination about this issue.
> The explanation has to deal with CPU architecture and is not related to
> compilers.  In case of cache miss the memory load and store take tens of cpu
> cycles instead of few cycles in case of cache hit.
> When we run:
> time ./mvec 400000 1 29720 1000
> The program perform 400000 iterations of outer loop and 29720 iterations in
> internal loop. The internal loop performs 3 load accesses and one store access
> per iteration. Starting from second iteration of outer loop, all  29720
> elements of arrays pSum, pSum1 and pVec1 will be placed into cache and from
> this point all accesses will be cache hits. (I assume that data cache is big
> enough to contain all 29720*3 elements).
> Lets look at the slow run:
> % time ./TestVec 92200 8 89720 1000
> Here the program perform (89720-8) iterations in internal loop, so in order to
> have cache hits most of the time we need the cache to be at least 89712*3 in
> size.  Lets consider what will happen if cache size is only half of required
> amount.  After completion of first iteration of the outer loop, the cache will
> be filled with second half of data from arrays.  At start of second iteration
> of outer loop, all elements from first half will be evicted from the cache as
> most caches use LRU policy to choose evicted elements.  Considering that PPC970
> is out-of-order, multiple-issue architecture we can guess why CPU have enough
> time to perform arithmetic operations even in scalar manner without adding any
> overhead relatively to vectorized version of internal loop.


Thanks a lot for the detailed explaination Victor. I'll try to see if I can
break the real code to be more memory friendly.
Again thanks a lot guys.

eyal


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (31 preceding siblings ...)
  2008-02-12 11:29 ` eyal at geomage dot com
@ 2008-02-13 16:07 ` eyal at geomage dot com
  2008-02-14 13:42 ` victork at gcc dot gnu dot org
  33 siblings, 0 replies; 35+ messages in thread
From: eyal at geomage dot com @ 2008-02-13 16:07 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #33 from eyal at geomage dot com  2008-02-13 16:06 -------
Hi All,
  I've done some changes that hopefully prevent the memory from being a
performance bottleneck. I see a perf gain of ~10%. However the compiler still
gives me the warnings in comment #19 - 
Test.cpp:24: note: versioning for alias required: can't determine dependence
between *D.22312_54 and *D.22310_50
Test.cpp:24: note: mark for run-time aliasing test between *D.22312_54 and
*D.22310_50
Test.cpp:24: note: versioning for alias required: can't determine dependence
between *D.22314_58 and *D.22310_50
Test.cpp:24: note: mark for run-time aliasing test between *D.22314_58 and
*D.22310_50
Test.cpp:24: note: create runtime check for data references *D.22312_54 and
*D.22310_50
Test.cpp:24: note: create runtime check for data references *D.22314_58 and
*D.22310_50
Test.cpp:24: note: created 2 versioning for alias checks.
Test.cpp:24: note: LOOP VECTORIZED.(get_loop_exit_condition


How do I resolve those issues? which might prevent from the vectorized code to
run and therefore I dont see a bigger performance improvement?
I'd appriciate any assistance...

thanks
eyal


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Bug c++/35117] Vectorization on power PC
  2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
                   ` (32 preceding siblings ...)
  2008-02-13 16:07 ` eyal at geomage dot com
@ 2008-02-14 13:42 ` victork at gcc dot gnu dot org
  33 siblings, 0 replies; 35+ messages in thread
From: victork at gcc dot gnu dot org @ 2008-02-14 13:42 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #34 from victork at gcc dot gnu dot org  2008-02-14 13:41 -------
> How do I resolve those issues? which might prevent from the vectorized code to
> run and therefore I dont see a bigger performance improvement?
> I'd appriciate any assistance...

This note is just information and not warning. It means that compiler cannot
prove independence of data arrays which are candidates for vectorization. It's
hard to say how to help to compiler to prove dependence statically without
seeing your source code. You can try "restrict" attribute. Still sometimes
dependence cannot be proven at compile time, and run-time check is necessary.
This run-time check adds some overhead, but for loops with large enough number
of iterations the overhead is diminishing.


-- 

victork at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2008-02-14 13:42 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-07  8:17 [Bug c++/35117] New: Vectorization on power PC eyal at geomage dot com
2008-02-07 10:30 ` [Bug c++/35117] " rguenth at gcc dot gnu dot org
2008-02-07 10:37 ` eyal at geomage dot com
2008-02-07 10:38 ` pinskia at gcc dot gnu dot org
2008-02-07 10:41 ` pinskia at gcc dot gnu dot org
2008-02-07 10:44 ` eyal at geomage dot com
2008-02-07 10:54 ` irar at il dot ibm dot com
2008-02-07 11:06 ` eyal at geomage dot com
2008-02-07 12:17 ` eyal at geomage dot com
2008-02-07 12:55 ` irar at il dot ibm dot com
2008-02-07 12:58 ` eyal at geomage dot com
2008-02-07 13:05 ` irar at il dot ibm dot com
2008-02-07 13:07 ` eyal at geomage dot com
2008-02-07 13:23 ` irar at il dot ibm dot com
2008-02-07 20:45 ` irar at il dot ibm dot com
2008-02-08  8:50 ` zaks at il dot ibm dot com
2008-02-08  8:55 ` eyal at geomage dot com
2008-02-08  8:58 ` eyal at geomage dot com
2008-02-10  7:31 ` eres at il dot ibm dot com
2008-02-10  7:42 ` eyal at geomage dot com
2008-02-10  7:57 ` eyal at geomage dot com
2008-02-10 13:49 ` eyal at geomage dot com
2008-02-10 15:07 ` victork at gcc dot gnu dot org
2008-02-10 15:48 ` eyal at geomage dot com
2008-02-11 12:24 ` victork at gcc dot gnu dot org
2008-02-11 13:36 ` irar at il dot ibm dot com
2008-02-11 13:42 ` victork at gcc dot gnu dot org
2008-02-11 14:01 ` eyal at geomage dot com
2008-02-11 14:22 ` victork at gcc dot gnu dot org
2008-02-11 16:30 ` dberlin at gcc dot gnu dot org
2008-02-12  8:43 ` eyal at geomage dot com
2008-02-12 10:52 ` victork at gcc dot gnu dot org
2008-02-12 11:29 ` eyal at geomage dot com
2008-02-13 16:07 ` eyal at geomage dot com
2008-02-14 13:42 ` victork at gcc dot gnu dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).