public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* GSoC 2010 Project Idea
       [not found] <979659581003280414o8a88c0fy29016c15674834df@mail.gmail.com>
@ 2010-03-28 22:47 ` Артем Шинкаров
  2010-03-30  8:33   ` Andi Kleen
                     ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Артем Шинкаров @ 2010-03-28 22:47 UTC (permalink / raw)
  To: gcc

Hi,

I have a project in mind which I'm going to propose to the GCC in terms of
Google Summer of Code. My project is not on the list of project ideas
(http://gcc.gnu.org/wiki/SummerOfCode) that is why it would be very interesting
for me to hear any opinions and maybe even to find a mentor.


1. Project idea

A brief project idea is to create an abstract layer for vectorized
computations. This would allow to write a portable vectorized code.


2. State of the art

Nowadays most of processors have a support for SIMD computations. However, the
problem is that each hardware has a different set of SIMD instructions: Intel
MMX+SSE+AVX, PowerPC Altivec, ARM iWMMXt, and so on. GCC supports most of
architecture-specific instructions providing built-in functions. It is
considerably convenient to use these functions when you want to optimize some
piece of code. The problem starts when you want to make this code portable.
It is not a very common task, and of course GCC has a vectorizer.
Unfortunately, there are many examples which show that it is relatively simple
for a human to find a right  place in the code and vectorize it, but it is
extremely hard for the compiler to do the same. As a result we end up with the
code which is not using the capabilities of the architecture.
It would be much easier for the programmer to use an abstract layer to
implement a vectorized code. A compiler should deal with the portability issues
dispatching the code from the abstract layer to the particular architecture. My
experience shows that there are no such a library for C/C++ that could solve
the problem. There are some attempts like: http://libsimd.sourceforge.net/ but
it is only a small part of the idea, and unfortunately the development is
suspended. Or maybe I am wrong and everything is already written?


3. Implementation

First we need to introduce the SIMD abstract model functionality which can be
mapped  to the set of architectures we want to support. The difficulty is that
SIMD instruction sets from different architectures are not fully compatible.
Then we want to write a set of "fake-SIMD" functions to be sure that our code
will be usable within the architecture without SIMD support.
After that there is a question how to dispatch functions from the abstract
layer to the architecture layer. The trivial thing to do is just to map the
abstract layer functions to the built-in functions. Obviously it would not give
the best performance. For example, loading the data from the unaligned memory
into the SIMD register is much slower than loading the data from the aligned
memory. Altivec has an instruction vec_madd(a,b,c) which can be represented by
two instructions in SSE case: _mm_add_ps( _mm_mul_ps(a,b), c). It means that
some code optimizations are required.


4. Time constraints

The GSoC gives 4 month to finish the project. It means that the
timeline could be the following:
2 weeks -- discussions and design
1 week  -- fake SIMD
3 weeks -- implementation of the main dispatcher
2 weeks -- benchmarks and testing
* the first submission
1.5 month -- architecture specific dispatcher optimizations
0.5 month  -- testing
* the second submission

This project can be continued in various ways:
1) Cost model for the dispatcher
2) Auto vectorizer + dispatcher
3) Integration with other languages
And so on


5. Questions

Should it be the library or the part of the language? What about the extensions
of this abstract layer with a respect to the Larrabee (or similar) which
provides 512-bit register for vectorized operations? And so on.
These questions should be discussed considering the project time constraints
and the interest of the GCC. If anybody is interested in mentoring such a
project please let me know and I would be happy to discuss all the issues. If
anybody thinks that the project is hopeless, please let me know as well.

--
Best regards,
Artem Shinkarov
Compiler Technology and Computer Architecture Group
University of Hertfordshire

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GSoC 2010 Project Idea
  2010-03-28 22:47 ` GSoC 2010 Project Idea Артем Шинкаров
@ 2010-03-30  8:33   ` Andi Kleen
  2010-03-30 13:16   ` Joseph S. Myers
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2010-03-30  8:33 UTC (permalink / raw)
  To: °àâÕÜ
	ÈØÝÚÐàÞÒ
  Cc: gcc

°àâÕÜ ÈØÝÚÐàÞÒ <artyom.shinkaroff@gmail.com> writes:

> Hi,
>
> I have a project in mind which I'm going to propose to the GCC in terms of
> Google Summer of Code. My project is not on the list of project ideas
> (http://gcc.gnu.org/wiki/SummerOfCode) that is why it would be very interesting
> for me to hear any opinions and maybe even to find a mentor.

My guess is that the project is a bit too ambitious for a single 
summer. Perhaps try to scale it down to make it more manageable?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GSoC 2010 Project Idea
  2010-03-28 22:47 ` GSoC 2010 Project Idea Артем Шинкаров
  2010-03-30  8:33   ` Andi Kleen
@ 2010-03-30 13:16   ` Joseph S. Myers
  2010-03-30 17:03     ` Артем Шинкаров
  2010-03-30 18:13   ` Michael Meissner
  2010-04-11 20:46   ` Dorit Nuzman
  3 siblings, 1 reply; 6+ messages in thread
From: Joseph S. Myers @ 2010-03-30 13:16 UTC (permalink / raw)
  To: Артем
	Шинкаров
  Cc: gcc

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2061 bytes --]

On Sun, 28 Mar 2010, áÒÔÅÍ ûÉÎËÁÒÏ× wrote:

> It would be much easier for the programmer to use an abstract layer to
> implement a vectorized code. A compiler should deal with the portability issues
> dispatching the code from the abstract layer to the particular architecture. My

The generic vector types (used with the vector_size attribute) could be 
seen as the beginnings of such an abstract layer.

So you might look at what isn't supported readily with such types at 
present.  Note that patches have been submitted but not committed for some 
features (e.g. subscripting vectors like arrays) that would need pushing 
through into GCC.  Then you could look at related extensions in other 
implementations or related languages (e.g. OpenCL) that might be useful 
for GNU C and C++ generic vectors.

What vector instructions are there that cannot be effectively represented 
using generic vectors (supposing that you add the ability to subscript 
them, which helps describe a lot of operations using compound literals)?  
They might need further extensions.  Are there instructions that can be 
effectively represented but where the generic representation in C source 
does not end up generating the right instructions?  If so, you could 
improve the compiler so that given generic C+vectors source, and generic 
RTL patterns for the instruction, it ends up generating that instruction.

One subset of the problem you could look at would be saturating operations 
- both for scalars and for vectors.  For some such operations, built-in 
functions might help.  Some can be written in generic C, but the compiler 
won't detect them and generate saturating instructions.  See a paper of 
Bik, Girkar, Grey and Tian 
<http://saluc.engr.uconn.edu/refs/compiler/bik02idioms.pdf> regarding how 
to detect saturating operations.  There are generic RTL codes for some 
saturating operations - but they will only ever be generated from 
target-specific built-in functions at present.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GSoC 2010 Project Idea
  2010-03-30 13:16   ` Joseph S. Myers
@ 2010-03-30 17:03     ` Артем Шинкаров
  0 siblings, 0 replies; 6+ messages in thread
From: Артем Шинкаров @ 2010-03-30 17:03 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: gcc

> The generic vector types (used with the vector_size attribute) could be
> seen as the beginnings of such an abstract layer.

Yes, this is very likely is going to be a starting point. I'm sorry
that I have not mentioned this in my first email. Maybe there could be
some alternative ideas how it should look like. But this is the most
obvious one.

Basically, general vectors support only a restricted set of
operations: +, -, *, /, &, |, ^, ~

The indexing, as you already said, is not supported. I know about this
patch, but the question is what would be the most efficient way  to
implement it. Do we always want to return a value or a memory address
of the particular vector element, or may be we can optimize the set of
operations using vector-shifting and vector-masking to keep an element
just inside the vector. Sometimes it could be faster.

You cannot compare two vectors, although you have built-in
instructions for that.

You cannot do shifts within a vector and it could be very useful.
Sometimes general vector extension just fails, producing a code that
causes Segmentation fault. For example:

#include <stdio.h>
#define N 1024

typedef short __attribute__((vector_size(16))) v8hi;

short a[N];
v8hi *pa = (v8hi *)a, *pvt;
v8hi va;

int main(int argc, char *argv[]) {
    FILE *f;
    int i, var;

    f = fopen(argv[1], "r");
    for (i = 0; i < N; i++) {
        fscanf(f, "%i", &var);
        a[i] = (short) var;
    }

    printf("Before the assignment\n");

    va  = *((v8hi *)&(a[0]));
    pvt =  ((v8hi *)&(a[3]));
    *pvt = va;

    printf("After the assignment\n");

    for (i = 0; i < 20; i++) {
        printf("%i ", a[i]);
    }
    printf("\n");

    fclose (f);

}

all the vector assignments are converted in case of intel architecture
into instruction "movdqa" which works only if memory is aligned, which
is not the case in this example. Compiler can't figure it out and
produces a code which causes segmentation fault. Although if you would
compile the same code on an architecture without SIMD support then it
works fine.

It is surely not a very serious bug but it makes hard to use generic
vector support.

Reduction of the operation is not supported, you can't sum over the
vector of elements. Some architectures have a support for this feature
as well.

Permutation of elements within a vector.

Saturated arithmetic. But I'm not an expert in that field. I mean I
don't know what kind o instructions each architecture provides for
saturated arithmetic. But I think it would not be hard to find it out.

And some more.

The question is what should be done at the first stage. It is surely a
very big project, not for one summer. Depending on the taste of the
mentor, different things could be done as at the beginning. Are you
interested in mentoring this project?



--
Artem Shinkarov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GSoC 2010 Project Idea
  2010-03-28 22:47 ` GSoC 2010 Project Idea Артем Шинкаров
  2010-03-30  8:33   ` Andi Kleen
  2010-03-30 13:16   ` Joseph S. Myers
@ 2010-03-30 18:13   ` Michael Meissner
  2010-04-11 20:46   ` Dorit Nuzman
  3 siblings, 0 replies; 6+ messages in thread
From: Michael Meissner @ 2010-03-30 18:13 UTC (permalink / raw)
  To: Артем
	Шинкаров
  Cc: gcc

On Sun, Mar 28, 2010 at 10:37:07PM +0100, Артем Шинкаров wrote:
> Hi,
> 
> I have a project in mind which I'm going to propose to the GCC in terms of
> Google Summer of Code. My project is not on the list of project ideas
> (http://gcc.gnu.org/wiki/SummerOfCode) that is why it would be very interesting
> for me to hear any opinions and maybe even to find a mentor.
> 
> 
> 1. Project idea
> 
> A brief project idea is to create an abstract layer for vectorized
> computations. This would allow to write a portable vectorized code.
> 
> 
> 2. State of the art
> 
> Nowadays most of processors have a support for SIMD computations. However, the
> problem is that each hardware has a different set of SIMD instructions: Intel
> MMX+SSE+AVX, PowerPC Altivec, ARM iWMMXt, and so on. GCC supports most of
> architecture-specific instructions providing built-in functions. It is
> considerably convenient to use these functions when you want to optimize some
> piece of code. The problem starts when you want to make this code portable.
> It is not a very common task, and of course GCC has a vectorizer.
> Unfortunately, there are many examples which show that it is relatively simple
> for a human to find a right  place in the code and vectorize it, but it is
> extremely hard for the compiler to do the same. As a result we end up with the
> code which is not using the capabilities of the architecture.
> It would be much easier for the programmer to use an abstract layer to
> implement a vectorized code. A compiler should deal with the portability issues
> dispatching the code from the abstract layer to the particular architecture. My
> experience shows that there are no such a library for C/C++ that could solve
> the problem. There are some attempts like: http://libsimd.sourceforge.net/ but
> it is only a small part of the idea, and unfortunately the development is
> suspended. Or maybe I am wrong and everything is already written?

Note, the powerpc and cell compilers have the notion of a vector keyword that
is followed by a type (powerpc needs -maltivec and/or -mvsx to enable it).  So
you can write:

    vector float sum (vector float a, vector float b) { return a+b; }

Now, ideally, it would be useful to have sytax so you could change the vector
size, and the compiler would do the conversion to/from hw types to abstract
types.

-- 
Michael Meissner, IBM
4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA
meissner@linux.vnet.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: GSoC 2010 Project Idea
  2010-03-28 22:47 ` GSoC 2010 Project Idea Артем Шинкаров
                     ` (2 preceding siblings ...)
  2010-03-30 18:13   ` Michael Meissner
@ 2010-04-11 20:46   ` Dorit Nuzman
  3 siblings, 0 replies; 6+ messages in thread
From: Dorit Nuzman @ 2010-04-11 20:46 UTC (permalink / raw)
  To: Артем
	Шинкаров
  Cc: gcc

> Hi,
>
> I have a project in mind which I'm going to propose to the GCC in terms
of
> Google Summer of Code. My project is not on the list of project ideas
> (http://gcc.gnu.org/wiki/SummerOfCode) that is why it would be very
> interesting
> for me to hear any opinions and maybe even to find a mentor.
>
>
> 1. Project idea
>
> A brief project idea is to create an abstract layer for vectorized
> computations. This would allow to write a portable vectorized code.
>
>
> 2. State of the art
>
> Nowadays most of processors have a support for SIMD computations.
However, the
> problem is that each hardware has a different set of SIMD instructions:
Intel
> MMX+SSE+AVX, PowerPC Altivec, ARM iWMMXt, and so on. GCC supports most of
> architecture-specific instructions providing built-in functions. It is
> considerably convenient to use these functions when you want to optimize
some
> piece of code. The problem starts when you want to make this code
portable.
> It is not a very common task, and of course GCC has a vectorizer.
> Unfortunately, there are many examples which show that it is relatively
simple
> for a human to find a right  place in the code and vectorize it, but it
is
> extremely hard for the compiler to do the same. As a result we end up
with the
> code which is not using the capabilities of the architecture.
> It would be much easier for the programmer to use an abstract layer to
> implement a vectorized code. A compiler should deal with the
> portability issues
> dispatching the code from the abstract layer to the particular
> architecture. My
> experience shows that there are no such a library for C/C++ that could
solve
> the problem. There are some attempts like:
http://libsimd.sourceforge.net/but
> it is only a small part of the idea, and unfortunately the development is
> suspended. Or maybe I am wrong and everything is already written?
>

Just some relevant/related prior art you may be interested in: one is the
LLVA virtual vector IR:
http://www.cs.rice.edu/~taha/teaching/04H/RAP/cache/adve-LowLevelVirtual.pdf
and there's also an ongoing work on generic vector support in cli on top of
the cli-branch of GCC - a preliminary report on early stages of that work
was presented at GROW'10 (http://ctuning.org/dissemination/grow10-04.pdf),
with hopefully some follow-ups later this year...

good luck with whatever GSoC project you ended up proposing!

dorit

>
> 3. Implementation
>
> First we need to introduce the SIMD abstract model functionality which
can be
> mapped  to the set of architectures we want to support. The difficulty is
that
> SIMD instruction sets from different architectures are not fully
compatible.
> Then we want to write a set of "fake-SIMD" functions to be sure that our
code
> will be usable within the architecture without SIMD support.
> After that there is a question how to dispatch functions from the
abstract
> layer to the architecture layer. The trivial thing to do is just to map
the
> abstract layer functions to the built-in functions. Obviously it
> would not give
> the best performance. For example, loading the data from the unaligned
memory
> into the SIMD register is much slower than loading the data from the
aligned
> memory. Altivec has an instruction vec_madd(a,b,c) which can be
represented by
> two instructions in SSE case: _mm_add_ps( _mm_mul_ps(a,b), c). It means
that
> some code optimizations are required.
>
>
> 4. Time constraints
>
> The GSoC gives 4 month to finish the project. It means that the
> timeline could be the following:
> 2 weeks -- discussions and design
> 1 week  -- fake SIMD
> 3 weeks -- implementation of the main dispatcher
> 2 weeks -- benchmarks and testing
> * the first submission
> 1.5 month -- architecture specific dispatcher optimizations
> 0.5 month  -- testing
> * the second submission
>
> This project can be continued in various ways:
> 1) Cost model for the dispatcher
> 2) Auto vectorizer + dispatcher
> 3) Integration with other languages
> And so on
>
>
> 5. Questions
>
> Should it be the library or the part of the language? What about
theextensions
> of this abstract layer with a respect to the Larrabee (or similar) which
> provides 512-bit register for vectorized operations? And so on.
> These questions should be discussed considering the project time
constraints
> and the interest of the GCC. If anybody is interested in mentoring such a
> project please let me know and I would be happy to discuss all the
issues. If
> anybody thinks that the project is hopeless, please let me know as well.
>
> --
> Best regards,
> Artem Shinkarov
> Compiler Technology and Computer Architecture Group
> University of Hertfordshire

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-04-11 20:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <979659581003280414o8a88c0fy29016c15674834df@mail.gmail.com>
2010-03-28 22:47 ` GSoC 2010 Project Idea Артем Шинкаров
2010-03-30  8:33   ` Andi Kleen
2010-03-30 13:16   ` Joseph S. Myers
2010-03-30 17:03     ` Артем Шинкаров
2010-03-30 18:13   ` Michael Meissner
2010-04-11 20:46   ` Dorit Nuzman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).