* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
@ 2015-05-26 7:59 ` vries at gcc dot gnu.org
2015-05-26 8:11 ` vries at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 7:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
vries at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
--- Comment #1 from vries at gcc dot gnu.org ---
FWIW, this patch puts pass_parallelize_loops before pass_vectorize:
...
diff --git a/gcc/passes.def b/gcc/passes.def
index 4690e23..f0629ff 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -243,14 +243,14 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_dce);
POP_INSERT_PASSES ()
NEXT_PASS (pass_iv_canon);
- NEXT_PASS (pass_parallelize_loops);
- PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
- NEXT_PASS (pass_expand_omp_ssa);
- POP_INSERT_PASSES ()
NEXT_PASS (pass_if_conversion);
/* pass_vectorize must immediately follow pass_if_conversion.
Please do not add any other passes in between. */
NEXT_PASS (pass_vectorize);
+ NEXT_PASS (pass_parallelize_loops);
+ PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
+ NEXT_PASS (pass_expand_omp_ssa);
+ POP_INSERT_PASSES ()
PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
NEXT_PASS (pass_dce);
POP_INSERT_PASSES ()
...
And that makes the problem go away (btw, dump file names need adapting in
investigate.sh):
...
$ ./investigate.sh
parloops_factor: 0, index_type: int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
vectorized: 1, parallelized: 0
parloops_factor: 2, index_type: int:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned int:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: long:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
vectorized: 1, parallelized: 1
...
Of course, the patch means we're no longer vectorizing parallelized loops, but
parallelizing vectorized loops.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
2015-05-26 7:59 ` [Bug tree-optimization/66285] " vries at gcc dot gnu.org
@ 2015-05-26 8:11 ` vries at gcc dot gnu.org
2015-05-26 8:12 ` vries at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 8:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #2 from vries at gcc dot gnu.org ---
Created attachment 35623
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35623&action=edit
par-2.c.129t.parloops
For -DINDEX_TYPE=int, par-2.c.129t.parloops
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
2015-05-26 7:59 ` [Bug tree-optimization/66285] " vries at gcc dot gnu.org
2015-05-26 8:11 ` vries at gcc dot gnu.org
@ 2015-05-26 8:12 ` vries at gcc dot gnu.org
2015-05-26 8:13 ` vries at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 8:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #3 from vries at gcc dot gnu.org ---
Created attachment 35624
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35624&action=edit
par-2.c.130t.ompexpssa
par-2.c.130t.ompexpssa
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
` (2 preceding siblings ...)
2015-05-26 8:12 ` vries at gcc dot gnu.org
@ 2015-05-26 8:13 ` vries at gcc dot gnu.org
2015-05-26 8:13 ` vries at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 8:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #4 from vries at gcc dot gnu.org ---
Created attachment 35625
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35625&action=edit
par-2.c.131t.ifcvt
par-2.c.131t.ifcvt
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
` (3 preceding siblings ...)
2015-05-26 8:13 ` vries at gcc dot gnu.org
@ 2015-05-26 8:13 ` vries at gcc dot gnu.org
2015-05-26 10:58 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 8:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #5 from vries at gcc dot gnu.org ---
Created attachment 35626
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35626&action=edit
par-2.c.132t.vect
par-2.c.132t.vect
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
` (4 preceding siblings ...)
2015-05-26 8:13 ` vries at gcc dot gnu.org
@ 2015-05-26 10:58 ` rguenth at gcc dot gnu.org
2015-05-26 12:54 ` vries at gcc dot gnu.org
2015-05-26 13:24 ` vries at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-05-26 10:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
I thought that parallelizing vectorized loops is harder (you eventually get
extra prologue and epliogue loops, etc).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
` (5 preceding siblings ...)
2015-05-26 10:58 ` rguenth at gcc dot gnu.org
@ 2015-05-26 12:54 ` vries at gcc dot gnu.org
2015-05-26 13:24 ` vries at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 12:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #7 from vries at gcc dot gnu.org ---
(In reply to Richard Biener from comment #6)
> I thought that parallelizing vectorized loops is harder (you eventually get
> extra prologue and epliogue loops, etc).
Another example, par-4.c:
...
int __attribute__((noinline,noclone))
f (int argc, double *__restrict results, double *__restrict data, INDEX_TYPE n)
{
double coeff = 12.2;
for (INDEX_TYPE idx = 0; idx < n; idx++)
results[idx] = coeff * data[idx];
return !(results[argc] == 0.0);
}
#define nEvents 1000
#if defined (MAIN)
int
main (int argc)
{
double results[nEvents] = {0};
double data[nEvents] = {0};
return f (argc, results, data, nEvents);
}
#endif
...
When not parallelizing, we vectorize without problems:
...
parloops_factor: 0, index_type: int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
vectorized: 1, parallelized: 0
...
When parallelizing, we generate both a low iteration count loop, and a
split-off parallelized loop. The vectorizer vectorizes both loops (each of
which contains an epilogue):
...
parloops_factor: 2, index_type: int:
vectorized: 2, parallelized: 1
parloops_factor: 2, index_type: long:
vectorized: 2, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
vectorized: 2, parallelized: 1
...
Except in the case of unsigned int, in which case it only vectorizes the low
iteration count loop:
...
parloops_factor: 2, index_type: unsigned int:
vectorized: 1, parallelized: 1
...
The other loop fails to vectorize in a fashion similar as decribed for par-2.c
with INDEX_TYPE (unsigned) int.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
` (6 preceding siblings ...)
2015-05-26 12:54 ` vries at gcc dot gnu.org
@ 2015-05-26 13:24 ` vries at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 13:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #8 from vries at gcc dot gnu.org ---
For example par-4.c, if we use the same patch to interchange the passes, we
get:
When not parallelizing, all loops get vectorized:
...
parloops_factor: 0, index_type: int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
vectorized: 1, parallelized: 0
...
When parallelizing, we parallelize one loop.
...
parloops_factor: 2, index_type: int:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned int:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: long:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
vectorized: 1, parallelized: 1
...
The loop that is parallelized is the vectorized loop, not the epilogue.
So AFAIU:
- with this patch the epilogue is only performed by the main thread, after all
the threads are done. Each thread handles one slice of the vectorized loop.
- without the patch, the epilogue is potentially executed by each thread.
^ permalink raw reply [flat|nested] 9+ messages in thread