From mboxrd@z Thu Jan 1 00:00:00 1970
Return-Path: User Hints
On the way I made the labeling of examples quite more consistent.
Applied.
Gerald
Index: projects/tree-ssa/vectorization.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/tree-ssa/vectorization.html,v
retrieving revision 1.36
diff -u -r1.36 vectorization.html
--- projects/tree-ssa/vectorization.html 29 Jul 2018 20:43:43 -0000 1.36
+++ projects/tree-ssa/vectorization.html 26 Aug 2018 10:45:12 -0000
@@ -159,12 +159,13 @@
as loop vectorization. Basic block SLP is enabled by default at -O3
and when -ftree-vectorize
is enabled.
"feature" indicates the vectorization capabilities demonstrated by the - example.
example1: + example. + +Example 1:int a[256], b[256], c[256]; @@ -175,7 +176,9 @@ a[i] = b[i] + c[i]; } } -example2: + + +Example 2:
int a[256], b[256], c[256]; @@ -194,7 +197,9 @@ a[i] = b[i]&c[i]; i++; } } -example3: + + +Example 3:
typedef int aint __attribute__ ((__aligned__(16))); @@ -205,7 +210,9 @@ *p++ = *q++; } } -example4: + + +Example 4:
typedef int aint __attribute__ ((__aligned__(16))); @@ -230,7 +237,9 @@ b[i] = (j > MAX ? MAX : 0); } } -example5: + + +Example 5:
struct a { @@ -241,8 +250,9 @@ /* feature: support for alignable struct access */ s.ca[i] = 5; } -example6 -(gfortran): + + +Example 6: gfortran:
DIMENSION A(1000000), B(1000000), C(1000000) @@ -250,7 +260,9 @@ A = LOG(X); B = LOG(Y); C = A + B PRINT*, C(500000) END -example7: + + +Example 7:
int a[256], b[256]; @@ -262,7 +274,9 @@ a[i] = b[i+x]; } } -example8: + + +Example 8:
int a[M][N]; @@ -276,7 +290,9 @@ } } } -example9: + + +Example 9:
unsigned int ub[N], uc[N]; @@ -289,7 +305,9 @@ for (i = 0; i < N; i++) { udiff += (ub[i] - uc[i]); } -example10: + + +Example 10:
/* feature: support data-types of different sizes. @@ -311,7 +329,8 @@ ia[i] = (int) sb[i]; }-example11: + +Example 11:
/* feature: support strided accesses - the data elements @@ -324,7 +343,7 @@ }-example12: induction: +Example 12: Induction:
for (i = 0; i < N; i++) { @@ -332,7 +351,7 @@ }-example13: outer-loop: +Example 13: Outer-loop:
for (i = 0; i < M; i++) { @@ -345,7 +364,8 @@ }-example14: double reduction: +Example 14: Double reduction: +
for (k = 0; k < K; k++) { sum = 0; @@ -357,7 +377,8 @@ }-example15: condition in nested loop: +Example 15: Condition in nested loop: +
for (j = 0; j < M; j++) { @@ -374,7 +395,8 @@ }-example16: load permutation in loop-aware SLP: +Example 16: Load permutation in loop-aware SLP: +
for (i = 0; i < N; i++) { @@ -388,7 +410,8 @@ }-example17: basic block SLP: +Example 17: Basic block SLP: +
void foo () { @@ -402,7 +425,8 @@ }-example18: Simple reduction in SLP: +Example 18: Simple reduction in SLP: +
int sum1; int sum2; @@ -419,7 +443,8 @@ }-example19: Reduction chain in SLP: +Example 19: Reduction chain in SLP: +
int sum; int a[128]; @@ -435,9 +460,10 @@ }-example20: Basic block SLP with +Example 20: Basic block SLP with multiple types, loads with different offsets, misaligned load, -and not-affine accesses: +and not-affine accesses: +
void foo (int * __restrict__ dst, short * __restrict__ src, int h, int stride, short A, short B) @@ -459,7 +485,8 @@ }-example21: Backward access: +Example 21: Backward access: +
int foo (int *b, int n) { @@ -472,7 +499,8 @@ }-example22: Alignment hints: +Example 22: Alignment hints: +
void foo (int *out1, int *in1, int *in2, int n) { @@ -487,7 +515,8 @@ }-example23: Widening shift: +Example 23: Widening shift: +
void foo (unsigned short *src, unsigned int *dst) { @@ -498,7 +527,8 @@ }-example24: Condition with mixed types: +Example 24: Condition with mixed types: +
#define N 1024 float a[N], b[N]; @@ -512,7 +542,8 @@ }-example25: Loop with bool: +Example 25: Loop with bool: +
#define N 1024 float a[N], b[N], c[N], d[N]; @@ -531,11 +562,12 @@ }-
Examples of loops that currently cannot be - vectorized:
example1: uncountable loop: + vectorized: + +Example 1: Uncountable loop:while (*p != NULL) { @@ -1564,8 +1596,7 @@ PLDI 2000. -High-Level Plan of - Implementation (2003-2005)
+High-Level Plan of Implementation (2003-2005)
The table below outlines the high level vectorization scheme along with a proposal for an implementation scheme, as @@ -1926,9 +1957,7 @@
- - - -
Loop detection and loop CFG analysis
+>Loop detection and loop CFG analysis
Detect loops, and record some basic control flow information about them (contained basic blocks, loop @@ -1940,9 +1969,7 @@
- - - -
Modeling the target machine vector capabilities to +
Modeling the target machine vector capabilities to the
tree
-level.Expose the required target specific information to @@ -1998,9 +2025,7 @@
- - - -
Enhance the Builtins Support
+Enhance the Builtins Support
Currently the tree optimizers do not know the semantics of target specific builtin functions, so they @@ -2016,9 +2041,7 @@
- - - -
Cost Model
+Cost Model
There is an overhead associated with vectorization -- moving data in to/out of vector registers @@ -2037,9 +2060,7 @@
- - - -
Induction Variable Analysis
+Induction Variable Analysis
Used by the vectorizer to detect loop bound, analyze access patterns and analyze data dependencies between @@ -2066,9 +2087,7 @@
- - - -
Dependence Testing
+Dependence Testing
Following the classic dependence-based approach for vectorization as described in
- - - -
Access Pattern Analysis
+Access Pattern Analysis
The memory architecture usually allows only restricted accesses to data in memory; one of the @@ -2151,9 +2168,7 @@
- - - -
Extend the range of supportable operations
+Extend the range of supportable operations
At first, the only computations that will be vectorized are those for which the vectorization @@ -2185,9 +2200,7 @@
- - - -
Alignment
+Alignment
The memory architecture usually allows only restricted accesses to data in memory. One of the @@ -2237,9 +2250,7 @@
- - - -
Idiom Recognition
+Idiom Recognition
It is often the case that complicated computations can be reduced into a simpler, straight-line sequence @@ -2301,9 +2312,7 @@
- - - -
Conditional Execution
+Conditional Execution
The general principle we are trying to follow is to keep the actual code transformation part of the @@ -2333,9 +2342,7 @@
- - - -
Handle Advanced Loop Forms
+Handle Advanced Loop Forms
- Support general loop bound (unknown, or doesn't @@ -2355,9 +2362,7 @@
- - - -
Handle Pointer Aliasing
+Handle Pointer Aliasing
- Improve aliasing analysis. [various gcc projects @@ -2406,9 +2411,7 @@
- - - -
Loop versioning
+Loop versioning
Provide utilities that allow performing the following transformation: Given a condition and a loop, @@ -2424,9 +2427,7 @@
- - - -
Loop Transformations to Increase Vectorizability of +
Loop Transformations to Increase Vectorizability of Loops
These include:
@@ -2448,9 +2449,7 @@- - - -
Other Optimizations
+Other Optimizations