[Bug tree-optimization/50789] New: Gather vectorization

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/50789] New: Gather vectorization
@ 2011-10-19  7:51 jakub at gcc dot gnu.org
  2011-10-19  8:10 ` [Bug tree-optimization/50789] " rguenth at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-10-19  7:51 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

             Bug #: 50789
           Summary: Gather vectorization
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: jakub@gcc.gnu.org
                CC: hjl.tools@gmail.com, irar@gcc.gnu.org,
                    kirill.yukhin@intel.com

This is to track progress on vectorization using AVX2 v*gather* instructions.

The instructions allow plain unconditional gather, e.g.:
#define N 1024
float f[N];
int k[N];
float *l[N];
int **m[N];

float
f1 (void)
{
  int i;
  float g = 0.0;
  for (i = 0; i < N; i++)
    g += f[k[i]];
  return g;
}

float
f2 (float *p)
{
  int i;
  float g = 0.0;
  for (i = 0; i < N; i++)
    g += p[k[i]];
  return g;
}

float
f3 (void)
{
  int i;
  float g = 0.0;
  for (i = 0; i < N; i++)
    g += *l[i];
  return g;
}

int
f4 (void)
{
  int i;
  int g = 0;
  for (i = 0; i < N; i++)
    g += **m[i];
  return g;
}

should be able to vectorize all 4 loops.  In f1/f2 it would use non-zero base
(the vector would contain just indexes into some array, which vgather sign
extends and adds to base), in f3/f4 it would use zero base - the vectors would
be vectors of pointers (resp. uintptr_t).

To vectorize the above I'm afraid we'd need to modify tree-data-ref.c as well
as tree-vect-data-ref.c, because the memory accesses aren't affine and already
dr_analyze_innermost gives up on those, doesn't fill in any of the DR_* stuff.
Perhaps with some flag and when the base resp. offset has vdef in the same loop
we could mark it somehow and at least fill in the other fields.  It would
probably make alias decisions (in tree-vect-data-ref.c?) harder.  Any ideas?

What is additionally possible is to conditionalize loads, either affine or not.
So something like:
for (i = 0; i < N; i++)
  {
    c = 6;
    if (a[i] > 24)
      c = b[i];
    d[i] = c + e[i];
  }
for the affine conditional accesses where the vector could be just { 0, 1, 2,
3, ... } but the mask from the comparison.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
@ 2011-10-19  8:10 ` rguenth at gcc dot gnu.org
  2011-10-19  8:49 ` irar at il dot ibm.com
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-10-19  8:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2011-10-19
                 CC|                            |rguenth at gcc dot gnu.org
     Ever Confirmed|0                           |1
           Severity|normal                      |enhancement

--- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-10-19 08:09:48 UTC ---
Yeah, it would be a data-reference that doesn't quite fit into existing sets.
The aliasing could be dealt with (just don't assume anything about the index).

This sort of pattern happens in SPEC FP 2006 I think in a Fortran benchmark.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
  2011-10-19  8:10 ` [Bug tree-optimization/50789] " rguenth at gcc dot gnu.org
@ 2011-10-19  8:49 ` irar at il dot ibm.com
  2011-10-19  9:03 ` jakub at gcc dot gnu.org
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: irar at il dot ibm.com @ 2011-10-19  8:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

--- Comment #2 from Ira Rosen <irar at il dot ibm.com> 2011-10-19 08:47:03 UTC ---
(In reply to comment #0)

> To vectorize the above I'm afraid we'd need to modify tree-data-ref.c as well
> as tree-vect-data-ref.c, because the memory accesses aren't affine and already
> dr_analyze_innermost gives up on those, doesn't fill in any of the DR_* stuff.
> Perhaps with some flag and when the base resp. offset has vdef in the same loop
> we could mark it somehow and at least fill in the other fields.  It would
> probably make alias decisions (in tree-vect-data-ref.c?) harder.  Any ideas?

We have something similar for SLP: if an access is not affine we just fill in
what we can. But I don't really understand what can be filled in for f3/f4.

I don't think any data dependence decision is possible for f3 and f4, since we
can't prove anything. But in all the examples there are no stores, and we don't
care about read-read.

In f1 and f2 we know the base so, assuming no overflow, we can handle stores to
a different array.

> 
> What is additionally possible is to conditionalize loads, either affine or not.

Can't we treat it as an unconditional load for the dr analysis purposes?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
  2011-10-19  8:10 ` [Bug tree-optimization/50789] " rguenth at gcc dot gnu.org
  2011-10-19  8:49 ` irar at il dot ibm.com
@ 2011-10-19  9:03 ` jakub at gcc dot gnu.org
  2011-10-19  9:40 ` irar at il dot ibm.com
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-10-19  9:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-10-19 09:02:26 UTC ---
(In reply to comment #2)
> We have something similar for SLP: if an access is not affine we just fill in
> what we can. But I don't really understand what can be filled in for f3/f4.

Well, we should be able to at least use TBAA in that case:
void
f5 (void)
{
  int i;
  for (i = 0; i < N; i++)
    k[i] += *l[i];
}
should be vectorizable too, as l[i] can't overlap k[i] (normal data dep) and
*l[i], being float read, can't alias with k[i] either (int).
Similarly perhaps:
void
f6 (float *__restrict f, float *__restrict *__restrict l)
{
  int i;
  for (i = 0; i < N; i++)
    f[i] += *l[i];
}
(though that one unlikely very soon).
With these gather accesses we can't do any runtime alias checking before the
loop, either we can prove there is no alias, or we can't vectorize.

> Can't we treat it as an unconditional load for the dr analysis purposes?

For that surely, but the conditional loads have other problems for the
vectorizer, currently that means control flow within the loop.  We'd need to
transform it (perhaps temporarily or using pattern recognizer) to something
without control flow that would still be clear on that the load is only
conditional.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2011-10-19  9:03 ` jakub at gcc dot gnu.org
@ 2011-10-19  9:40 ` irar at il dot ibm.com
  2011-10-24  8:40 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: irar at il dot ibm.com @ 2011-10-19  9:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

Ira Rosen <irar at il dot ibm.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |irar at il dot ibm.com

--- Comment #4 from Ira Rosen <irar at il dot ibm.com> 2011-10-19 09:38:06 UTC ---
(In reply to comment #3)
> (In reply to comment #2)
> > We have something similar for SLP: if an access is not affine we just fill in
> > what we can. But I don't really understand what can be filled in for f3/f4.
> 
> Well, we should be able to at least use TBAA in that case:
> void
> f5 (void)
> {
>   int i;
>   for (i = 0; i < N; i++)
>     k[i] += *l[i];
> }
> should be vectorizable too, as l[i] can't overlap k[i] (normal data dep) and
> *l[i], being float read, can't alias with k[i] either (int).

I meant that we can't decompose *l[i] any further.

> With these gather accesses we can't do any runtime alias checking before the
> loop, either we can prove there is no alias, or we can't vectorize.

Agreed.

> 
> > Can't we treat it as an unconditional load for the dr analysis purposes?
> 
> For that surely, but the conditional loads have other problems for the
> vectorizer, currently that means control flow within the loop.  We'd need to
> transform it (perhaps temporarily or using pattern recognizer) to something
> without control flow that would still be clear on that the load is only
> conditional.

Something like this
http://gcc.gnu.org/ml/gcc-patches/2010-11/msg00304.html ?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2011-10-19  9:40 ` irar at il dot ibm.com
@ 2011-10-24  8:40 ` jakub at gcc dot gnu.org
  2011-10-25 21:17 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-10-24  8:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-10-24 08:39:16 UTC ---
Not exactly, -fif-convert-loop-stores is apparently a language changing option,
only a subset of valid C/C++ programs is valid with it.  With V*GATHER* insns,
and, as I found out during the weekend, with VMASKMOVP[SD] and VPMASKMOV[DQ]
instructions too we can handle both conditional loads and conditional stores.
So testcases like:

float a[N], b[N], c[N], d[N], e[N], g[N];

void
f6 (void)
{
  int i;
  for (i = 0; i < N; i++)
    e[i] = a[i] < b[i] ? c[i] : d[i];
}

void
f7 (float *p, float *q)
{
  int i;
  for (i = 0; i < N; i++)
    e[i] = a[i] < b[i] ? p[i] : q[i];
}

void
f8 (void)
{
  int i;
  for (i = 0; i < N; i++)
    {
      float f = c[i] + d[i];
      if (a[i] < b[i])
        e[i] = f;
    }
}

void
f9 (void)
{
  int i;
  for (i = 0; i < N; i++)
    {
      float f = c[i] * d[i];
      if (a[i] < b[i])
        e[i] = f;
      else
        g[i] = f;
    }
}

should be vectorizable (and even with -mavx).  Haven't checked if any other CPU
(PPC, ARM, ...) doesn't have anything similar.
In fact, f6 ought to be vectorizable always, we could easily find out that for
any i that can appear in the loop (0 through 999) c[i] (nor d[i]) will not trap
or fault.  The question is if the same is true for
extern float c[N];
instead (e.g. if the actual definition would be then float c[N / 2];, I'd hope
that it is invalid C though), but for f7 you already can't know if p resp. q
are valid pointers at all, are correctly aligned, and whether e.g. p[i] or q[i]
don't point beyond end of an mmapped region.  So f7/f8/f9 are only vectorizable
using these v*maskmov* instructions (or f7 using v*gather*, but that would be
unnecessary additional overhead).  I've verified that SNB CPUs don't require
any alignment and don't fault on completely invalid addresses with zero mask.

The question is how to represent this in the IL, and IMHO it should be
something that is either present solely during the vectorization (i.e. pattern
recognizer like thing), or that we convert the IL into right before the
vectorizer (e.g. during ifcvt), but convert it back to the original multiple
BBs IL either at the end of the vectorizer or in a pass right after the
vectorizer.  For the conditional loads we could perhaps represent them by
COND_EXPRs with some flag on the gimple which would allow memory instead of
SSA_NAMEs in one or both of the then/else operands or a new tree code, for
conditional stores we'd need a new tree code.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2011-10-24  8:40 ` jakub at gcc dot gnu.org
@ 2011-10-25 21:17 ` jakub at gcc dot gnu.org
  2011-10-26 16:57 ` jakub at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-10-25 21:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
         AssignedTo|unassigned at gcc dot       |jakub at gcc dot gnu.org
                   |gnu.org                     |

--- Comment #6 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-10-25 21:16:21 UTC ---
Created attachment 25614
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25614
gcc47-gather.patch

My current WIP patch, vectorizes at least:
#define N 1024
float f[N], g[N];
int k[N];

void
f20 (void)
{
  int i;
  for (i = 0; i < N; i++)
    g[i] = f[k[i]];
}

There is still work on the patch.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2011-10-25 21:17 ` jakub at gcc dot gnu.org
@ 2011-10-26 16:57 ` jakub at gcc dot gnu.org
  2011-11-07 16:02 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-10-26 16:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #25614|0                           |1
        is obsolete|                            |

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-10-26 16:56:36 UTC ---
Created attachment 25619
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25619
gcc47-pr50789.patch

Updated patch that I'm bootstrapping/regtesting right now.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2011-10-26 16:57 ` jakub at gcc dot gnu.org
@ 2011-11-07 16:02 ` jakub at gcc dot gnu.org
  2011-11-08 13:26 ` jakub at gcc dot gnu.org
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-11-07 16:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-11-07 15:59:10 UTC ---
Author: jakub
Date: Mon Nov  7 15:59:07 2011
New Revision: 181089

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=181089
Log:
    PR tree-optimization/50789
    * tree-vect-stmts.c (process_use): Add force argument, avoid
    exist_non_indexing_operands_for_use_p check if true.
    (vect_mark_stmts_to_be_vectorized): Adjust callers.  Handle
    STMT_VINFO_GATHER_P.
    (gen_perm_mask): New function.
    (perm_mask_for_reverse): Use it.
    (reverse_vec_element): Rename to...
    (permute_vec_elements): ... this.  Add Y and MASK_VEC arguments,
    generalize for any permutations.
    (vectorizable_load): Adjust caller.  Handle STMT_VINFO_GATHER_P.
    * target.def (TARGET_VECTORIZE_BUILTIN_GATHER): New hook.
    * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_GATHER): Document it.
    * doc/tm.texi: Regenerate.
    * tree-data-ref.c (initialize_data_dependence_relation,
    compute_self_dependence): No longer static.
    * tree-data-ref.h (initialize_data_dependence_relation,
    compute_self_dependence): New prototypes.
    * tree-vect-data-refs.c (vect_check_gather): New function.
    (vect_analyze_data_refs): Detect possible gather load data
    refs.
    * tree-vectorizer.h (struct _stmt_vec_info): Add gather_p field.
    (STMT_VINFO_GATHER_P): Define.
    (vect_check_gather): New prototype.
    * config/i386/i386-builtin-types.def: Add types for alternate
    gather builtins.
    * config/i386/sse.md (AVXMODE48P_DI): Remove.
    (VEC_GATHER_MODE): Rename mode_attr to...
    (VEC_GATHER_IDXSI): ... this.
    (VEC_GATHER_IDXDI, VEC_GATHER_SRCDI): New mode_attrs.
    (avx2_gathersi<mode>, *avx2_gathersi<mode>): Use <VEC_GATHER_IDXSI>
    instead of <VEC_GATHER_MODE>.
    (avx2_gatherdi<mode>): Use <VEC_GATHER_IDXDI> instead of
    <<AVXMODE48P_DI> and <VEC_GATHER_SRCDI> instead of VEC_GATHER_MODE
    on src and mask operands.
    (*avx2_gatherdi<mode>): Likewise.  Use VEC_GATHER_MODE iterator
    instead of AVXMODE48P_DI.
    (avx2_gatherdi<mode>256, *avx2_gatherdi<mode>256): Removed.
    * config/i386/i386.c (enum ix86_builtins): Add
    IX86_BUILTIN_GATHERALTSIV4DF, IX86_BUILTIN_GATHERALTDIV8SF,
    IX86_BUILTIN_GATHERALTSIV4DI and IX86_BUILTIN_GATHERALTDIV8SI.
    (ix86_init_mmx_sse_builtins): Create those builtins.
    (ix86_expand_builtin): Handle those builtins and adjust expansions
    of other gather builtins.
    (ix86_vectorize_builtin_gather): New function.
    (TARGET_VECTORIZE_BUILTIN_GATHER): Define.

    * gcc.target/i386/avx2-gather-1.c: New test.
    * gcc.target/i386/avx2-gather-2.c: New test.
    * gcc.target/i386/avx2-gather-3.c: New test.
    * gcc.target/i386/avx2-gather-4.c: New test.

Added:
    trunk/gcc/testsuite/gcc.target/i386/avx2-gather-1.c
    trunk/gcc/testsuite/gcc.target/i386/avx2-gather-2.c
    trunk/gcc/testsuite/gcc.target/i386/avx2-gather-3.c
    trunk/gcc/testsuite/gcc.target/i386/avx2-gather-4.c
Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/i386/i386-builtin-types.def
    trunk/gcc/config/i386/i386.c
    trunk/gcc/config/i386/sse.md
    trunk/gcc/doc/tm.texi
    trunk/gcc/doc/tm.texi.in
    trunk/gcc/target.def
    trunk/gcc/testsuite/ChangeLog
    trunk/gcc/tree-data-ref.c
    trunk/gcc/tree-data-ref.h
    trunk/gcc/tree-vect-data-refs.c
    trunk/gcc/tree-vect-stmts.c
    trunk/gcc/tree-vectorizer.h


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2011-11-07 16:02 ` jakub at gcc dot gnu.org
@ 2011-11-08 13:26 ` jakub at gcc dot gnu.org
  2013-04-02 16:50 ` vincenzo.innocente at cern dot ch
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-11-08 13:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

--- Comment #9 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-11-08 13:24:03 UTC ---
Unconditional gather is now vectorized, conditional load/store including gather
has to wait for GCC 4.8.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2011-11-08 13:26 ` jakub at gcc dot gnu.org
@ 2013-04-02 16:50 ` vincenzo.innocente at cern dot ch
  2013-04-17  8:31 ` andrey.turetskiy at gmail dot com
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-04-02 16:50 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

vincenzo Innocente <vincenzo.innocente at cern dot ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vincenzo.innocente at cern
                   |                            |dot ch

--- Comment #10 from vincenzo Innocente <vincenzo.innocente at cern dot ch> 2013-04-02 16:49:53 UTC ---
I was trying to see how gcc behaves w.r.t. this example
http://software.intel.com/en-us/articles/bkm-coaxing-the-compiler-to-vectorize-structured-data-via-gathers

So I started from the example in comment 6 and "evolved" as follows
f21() and f22() are equivalent to my eyes
f21 vectorize, f22 not
also the variant f21b does not vectorize…

c++ -v
Using built-in specs.
COLLECT_GCC=c++
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/x86_64-apple-darwin12.2.0/4.8.0/lto-wrapper
Target: x86_64-apple-darwin12.2.0
Configured with: ./configure --enable-languages=c,c++,fortran
--disable-multilib --disable-bootstrap --enable-lto -disable-libitm :
(reconfigured) ./configure --enable-languages=c,c++,fortran --disable-multilib
--disable-bootstrap --enable-lto -disable-libitm : (reconfigured) ./configure
--enable-languages=c,c++,fortran --disable-multilib --disable-bootstrap
--enable-lto -disable-libitm
Thread model: posix
gcc version 4.8.0 20130313 (experimental) [trunk revision 196633] (GCC) 

c++ -std=c++11 -Ofast -mavx2 -S gather.cc -ftree-vectorizer-verbose=2  

struct float3 {
  float x;
  float y;
  float z;
};

#define N 1024
float fx[N], g[N];
float fy[N];
float fz[N]; 
int k[N];

float ff[3*N];
float3 f3[N];
void
f20 (void)
{
  int i;
  for (i = 0; i < N; i++)
    g[i] = fx[k[i]]+fy[k[i]]+fz[k[i]];
}

void
f21 (void)
{
  int i;
  for (i = 0; i < N; i++)
    g[i] = ff[3*k[i]]+ff[3*k[i]+1]+ff[3*k[i]+2];
}
void
f22 (void)
{
  int i;
  for (i = 0; i < N; i++)
    g[i] = f3[k[i]].x+f3[k[i]].y+f3[k[i]].z;
}


void
f21b (void)
{
  int i;
  for (i = 0; i < N; i++) {
    auto j = ff+3*k[i];
    g[i] = j[0]+j[1]+j[2];
  }
}
>From gcc-bugs-return-418991-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org Tue Apr 02 16:57:45 2013
Return-Path: <gcc-bugs-return-418991-listarch-gcc-bugs=gcc.gnu.org@gcc.gnu.org>
Delivered-To: listarch-gcc-bugs@gcc.gnu.org
Received: (qmail 19238 invoked by alias); 2 Apr 2013 16:57:45 -0000
Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-bugs.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-help@gcc.gnu.org>
Sender: gcc-bugs-owner@gcc.gnu.org
Delivered-To: mailing list gcc-bugs@gcc.gnu.org
Received: (qmail 18834 invoked by uid 48); 2 Apr 2013 16:57:39 -0000
From: "d.v.a at ngs dot ru" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c++/56815] New: void pointer arithmetic
Date: Tue, 02 Apr 2013 16:57:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: c++
X-Bugzilla-Keywords:
X-Bugzilla-Severity: normal
X-Bugzilla-Who: d.v.a at ngs dot ru
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Changed-Fields:
Message-ID: <bug-56815-4@http.gcc.gnu.org/bugzilla/>
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
Content-Type: text/plain; charset="UTF-8"
MIME-Version: 1.0
X-SW-Source: 2013-04/txt/msg00136.txt.bz2
Content-length: 707


http://gcc.gnu.org/bugzilla/show_bug.cgi?idV815

             Bug #: 56815
           Summary: void pointer arithmetic
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: d.v.a@ngs.ru


int main()
{
    void *p = 0;
    p += 1;
}

$ gcc -std=c++98 source.cpp

source.cpp: In function 'int main()':
source.cpp:4:7: warning: pointer of type 'void *' used in arithmetic
[-Wpedantic]
     p += 1;
       ^

Why only warning? It must be error. Other compilers reject this code.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2013-04-02 16:50 ` vincenzo.innocente at cern dot ch
@ 2013-04-17  8:31 ` andrey.turetskiy at gmail dot com
  2013-04-17  8:53 ` rguenther at suse dot de
  2013-07-03  9:22 ` vincenzo.innocente at cern dot ch
  12 siblings, 0 replies; 14+ messages in thread
From: andrey.turetskiy at gmail dot com @ 2013-04-17  8:31 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

Andrey Turetskiy <andrey.turetskiy at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrey.turetskiy at gmail
                   |                            |dot com

--- Comment #11 from Andrey Turetskiy <andrey.turetskiy at gmail dot com> 2013-04-17 08:31:29 UTC ---
It looks like gathers can be used for vectorization in cases like:

#define N 1024

float x[4*N], y[N];

void foo ()
{
  int i;
  for (i = 0; i < N; i++)
    y[i] = x[179 + 3*i];
}

Now this code isn't vectorized.
In addition there are a lot of such exampes in SPECS 2006. Vectorization with
gathers can give noticeable gain.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2013-04-17  8:31 ` andrey.turetskiy at gmail dot com
@ 2013-04-17  8:53 ` rguenther at suse dot de
  2013-07-03  9:22 ` vincenzo.innocente at cern dot ch
  12 siblings, 0 replies; 14+ messages in thread
From: rguenther at suse dot de @ 2013-04-17  8:53 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

--- Comment #12 from rguenther at suse dot de <rguenther at suse dot de> 2013-04-17 08:53:21 UTC ---
On Wed, 17 Apr 2013, andrey.turetskiy at gmail dot com wrote:

> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789
> 
> Andrey Turetskiy <andrey.turetskiy at gmail dot com> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |andrey.turetskiy at gmail
>                    |                            |dot com
> 
> --- Comment #11 from Andrey Turetskiy <andrey.turetskiy at gmail dot com> 2013-04-17 08:31:29 UTC ---
> It looks like gathers can be used for vectorization in cases like:
> 
> #define N 1024
> 
> float x[4*N], y[N];
> 
> void foo ()
> {
>   int i;
>   for (i = 0; i < N; i++)
>     y[i] = x[179 + 3*i];
> }
> 
> Now this code isn't vectorized.
> In addition there are a lot of such exampes in SPECS 2006. Vectorization with
> gathers can give noticeable gain.

The above can be vectorized with the strided-load vectorization support
(just it doesn't trigger here).  And strided-load vectorization
code-generation can be imrpoved by using gather vectorization by
first building a vector of addresses / indices and then performing
a gather load.  If building a vector of addresses / indices is
cheaper than performing scalar loads and building a vector from
the results, that is.

So the above is more related to strided load support (and the
not yet implemented strided store support as well, if there
are also gather stores ...)

Richard.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug tree-optimization/50789] Gather vectorization
  2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2013-04-17  8:53 ` rguenther at suse dot de
@ 2013-07-03  9:22 ` vincenzo.innocente at cern dot ch
  12 siblings, 0 replies; 14+ messages in thread
From: vincenzo.innocente at cern dot ch @ 2013-07-03  9:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50789

--- Comment #13 from vincenzo Innocente <vincenzo.innocente at cern dot ch> ---
I just submitted a specific bug-report as PR57796


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-07-03  9:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-19  7:51 [Bug tree-optimization/50789] New: Gather vectorization jakub at gcc dot gnu.org
2011-10-19  8:10 ` [Bug tree-optimization/50789] " rguenth at gcc dot gnu.org
2011-10-19  8:49 ` irar at il dot ibm.com
2011-10-19  9:03 ` jakub at gcc dot gnu.org
2011-10-19  9:40 ` irar at il dot ibm.com
2011-10-24  8:40 ` jakub at gcc dot gnu.org
2011-10-25 21:17 ` jakub at gcc dot gnu.org
2011-10-26 16:57 ` jakub at gcc dot gnu.org
2011-11-07 16:02 ` jakub at gcc dot gnu.org
2011-11-08 13:26 ` jakub at gcc dot gnu.org
2013-04-02 16:50 ` vincenzo.innocente at cern dot ch
2013-04-17  8:31 ` andrey.turetskiy at gmail dot com
2013-04-17  8:53 ` rguenther at suse dot de
2013-07-03  9:22 ` vincenzo.innocente at cern dot ch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).