* [Bug tree-optimization/18437] vectorizer failed for matrix multiplication
[not found] <bug-18437-4@http.gcc.gnu.org/bugzilla/>
@ 2011-05-22 16:02 ` steven at gcc dot gnu.org
2011-07-27 12:39 ` rguenth at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu.org @ 2011-05-22 16:02 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437
Steven Bosscher <steven at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed|2005-12-21 03:40:32 |2011-05-22 17:36:32
--- Comment #4 from Steven Bosscher <steven at gcc dot gnu.org> 2011-05-22 15:36:52 UTC ---
Test case of comment #0 is not vectorized in recent GCC:
1 #define align(x) __attribute__((align(x)))
2 typedef float align(16) MATRIX[3][3];
3
4 void RotateMatrix(MATRIX ret, MATRIX a, MATRIX b)
5 {
6 int i, j;
7
8 for (j = 0; j < 3; j++)
9 for (i = 0; i < 3; i++)
10 ret[j][i] = a[j][0] * b[0][i]
11 + a[j][1] * b[1][i]
12 + a[j][2] * b[2][i];
13 }
t.c:8: note: not vectorized: loop contains function calls or data references
that cannot be analyzed
t.c:8: note: bad data references.
t.c:4: note: vectorized 0 loops in function.
"GCC: (GNU) 4.6.0 20110312 (experimental) [trunk revision 170907]"
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/18437] vectorizer failed for matrix multiplication
[not found] <bug-18437-4@http.gcc.gnu.org/bugzilla/>
2011-05-22 16:02 ` [Bug tree-optimization/18437] vectorizer failed for matrix multiplication steven at gcc dot gnu.org
@ 2011-07-27 12:39 ` rguenth at gcc dot gnu.org
2012-04-17 13:55 ` matz at gcc dot gnu.org
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-07-27 12:39 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437
--- Comment #5 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-07-27 12:38:20 UTC ---
The initial testcase is probably a bad example (3x3 matrix). The following
testcase is borrowed from Polyhedron rnflow and is vectorized by ICC but
not by GCC (the ICC variant is 15% faster):
function trs2a2 (j, k, u, d, m)
real, dimension (1:m,1:m) :: trs2a2
real, dimension (1:m,1:m) :: u, d
integer, intent (in) :: j, k, m
real (kind = selected_real_kind (10,50)) :: dtmp
trs2a2 = 0.0
do iclw1 = j, k - 1
do iclw2 = j, k - 1
dtmp = 0.0d0
do iclww = j, k - 1
dtmp = dtmp + u (iclw1, iclww) * d (iclww, iclw2)
enddo
trs2a2 (iclw1, iclw2) = dtmp
enddo
enddo
return
end function trs2a2
the reason why GCC cannot vectorize this is that the load from U has
a non-constant stride, so vectorization would need to load two scalars
and build up a vector (ICC does that). If the stride were constant
but not power-of-two GCC would reject that as well, probably to not
confuse the interleaving code. Data dependence analysis also rejects
non-constant strides.
Further complication (for the cost model) is the accumulator of
type double compared to the data types of float. ICC uses only
half of the float vectors here to handle mixed float/double type
loops (but it still unrolls the loop).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/18437] vectorizer failed for matrix multiplication
[not found] <bug-18437-4@http.gcc.gnu.org/bugzilla/>
2011-05-22 16:02 ` [Bug tree-optimization/18437] vectorizer failed for matrix multiplication steven at gcc dot gnu.org
2011-07-27 12:39 ` rguenth at gcc dot gnu.org
@ 2012-04-17 13:55 ` matz at gcc dot gnu.org
2012-05-09 13:07 ` rguenth at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: matz at gcc dot gnu.org @ 2012-04-17 13:55 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437
--- Comment #6 from Michael Matz <matz at gcc dot gnu.org> 2012-04-17 13:54:36 UTC ---
Author: matz
Date: Tue Apr 17 13:54:26 2012
New Revision: 186530
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=186530
Log:
PR tree-optimization/18437
* tree-vectorizer.h (_stmt_vec_info.stride_load_p): New member.
(STMT_VINFO_STRIDE_LOAD_P): New accessor.
(vect_check_strided_load): Declare.
* tree-vect-data-refs.c (vect_check_strided_load): New function.
(vect_analyze_data_refs): Use it to accept strided loads.
* tree-vect-stmts.c (vectorizable_load): Ditto and handle them.
testsuite/
* gfortran.dg/vect/rnflow-trs2a2.f90: New test.
Added:
trunk/gcc/testsuite/gfortran.dg/vect/rnflow-trs2a2.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-data-refs.c
trunk/gcc/tree-vect-stmts.c
trunk/gcc/tree-vectorizer.h
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/18437] vectorizer failed for matrix multiplication
[not found] <bug-18437-4@http.gcc.gnu.org/bugzilla/>
` (2 preceding siblings ...)
2012-04-17 13:55 ` matz at gcc dot gnu.org
@ 2012-05-09 13:07 ` rguenth at gcc dot gnu.org
2012-07-13 8:50 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-05-09 13:07 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437
--- Comment #7 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-05-09 12:59:49 UTC ---
Author: rguenth
Date: Wed May 9 12:59:46 2012
New Revision: 187330
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=187330
Log:
2012-05-09 Richard Guenther <rguenther@suse.de>
PR tree-optimization/18437
* gfortran.dg/vect/rnflow-trs2a2.f90: Move ...
* gfortran.dg/vect/fast-math-rnflow-trs2a2.f90: ... here.
Added:
trunk/gcc/testsuite/gfortran.dg/vect/fast-math-rnflow-trs2a2.f90
- copied unchanged from r187329,
trunk/gcc/testsuite/gfortran.dg/vect/rnflow-trs2a2.f90
Removed:
trunk/gcc/testsuite/gfortran.dg/vect/rnflow-trs2a2.f90
Modified:
trunk/gcc/testsuite/ChangeLog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/18437] vectorizer failed for matrix multiplication
[not found] <bug-18437-4@http.gcc.gnu.org/bugzilla/>
` (3 preceding siblings ...)
2012-05-09 13:07 ` rguenth at gcc dot gnu.org
@ 2012-07-13 8:50 ` rguenth at gcc dot gnu.org
2023-08-04 20:19 ` pinskia at gcc dot gnu.org
2023-08-04 20:20 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-13 8:50 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437
Richard Guenther <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |53947
--- Comment #8 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-13 08:49:47 UTC ---
Link to vectorizer missed-optimization meta-bug.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/18437] vectorizer failed for matrix multiplication
[not found] <bug-18437-4@http.gcc.gnu.org/bugzilla/>
` (4 preceding siblings ...)
2012-07-13 8:50 ` rguenth at gcc dot gnu.org
@ 2023-08-04 20:19 ` pinskia at gcc dot gnu.org
2023-08-04 20:20 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-04 20:19 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437
--- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
For the original testcase in comment #0, with `-O3 -fno-vect-cost-model` GCC
can vectorize it on aarch64 but not on x86_64.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug tree-optimization/18437] vectorizer failed for matrix multiplication
[not found] <bug-18437-4@http.gcc.gnu.org/bugzilla/>
` (5 preceding siblings ...)
2023-08-04 20:19 ` pinskia at gcc dot gnu.org
@ 2023-08-04 20:20 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-08-04 20:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=18437
--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Andrew Pinski from comment #9)
> For the original testcase in comment #0, with `-O3 -fno-vect-cost-model` GCC
> can vectorize it on aarch64 but not on x86_64.
I should say starting in GCC 6 .
^ permalink raw reply [flat|nested] 10+ messages in thread