public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/66285] New: failure to vectorize parallelized loop
@ 2015-05-26 7:55 vries at gcc dot gnu.org
2015-05-26 7:59 ` [Bug tree-optimization/66285] " vries at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 7:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
Bug ID: 66285
Summary: failure to vectorize parallelized loop
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
Target Milestone: ---
Another pr46032-inspired example.
Consider par-2.c:
...
#define nEvents 1000
int __attribute__((noinline,noclone))
f (int argc, double *__restrict results, double *__restrict data)
{
double coeff = 12.2;
for (INDEX_TYPE idx = 0; idx < nEvents; idx++)
results[idx] = coeff * data[idx];
return !(results[argc] == 0.0);
}
#if defined (MAIN)
int
main (int argc)
{
double results[nEvents] = {0};
double data[nEvents] = {0};
return f (argc, results, data);
}
#endif
...
And investigate.sh:
...
#!/bin/bash
src=par-2.c
for parloops_factor in 0 2; do
for index_type in "int" "unsigned int" "long" "unsigned long"; do
rm -f *.c.*;
./lean-c/install/bin/gcc -O2 $src -S \
-ftree-parallelize-loops=$parloops_factor \
-ftree-vectorize \
-fdump-tree-all-all \
"-DINDEX_TYPE=$index_type"
vectdump=$src.132t.vect
pardump=$src.129t.parloops
vectorized=$(grep -c "LOOP VECTORIZED" $vectdump)
if [ ! -f $pardump ]; then
parallelized=0
else
parallelized=$(grep -c "parallelizing inner loop" $pardump)
fi
echo "parloops_factor: $parloops_factor, index_type: $index_type:"
echo " vectorized: $vectorized, parallelized: $parallelized"
done
done
...
If we're not parallelizing, vectorization succeeds:
...
parloops_factor: 0, index_type: int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
vectorized: 1, parallelized: 0
...
If we're parallelizing, vectorization succeeds for (unsigned) long:
...
parloops_factor: 2, index_type: long:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
vectorized: 1, parallelized: 1
...
but not for (unsigned) int:
...
parloops_factor: 2, index_type: int:
vectorized: 0, parallelized: 1
parloops_factor: 2, index_type: unsigned int:
vectorized: 0, parallelized: 1
...
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
@ 2015-05-26 7:59 ` vries at gcc dot gnu.org
2015-05-26 8:11 ` vries at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 7:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
vries at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
--- Comment #1 from vries at gcc dot gnu.org ---
FWIW, this patch puts pass_parallelize_loops before pass_vectorize:
...
diff --git a/gcc/passes.def b/gcc/passes.def
index 4690e23..f0629ff 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -243,14 +243,14 @@ along with GCC; see the file COPYING3. If not see
NEXT_PASS (pass_dce);
POP_INSERT_PASSES ()
NEXT_PASS (pass_iv_canon);
- NEXT_PASS (pass_parallelize_loops);
- PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
- NEXT_PASS (pass_expand_omp_ssa);
- POP_INSERT_PASSES ()
NEXT_PASS (pass_if_conversion);
/* pass_vectorize must immediately follow pass_if_conversion.
Please do not add any other passes in between. */
NEXT_PASS (pass_vectorize);
+ NEXT_PASS (pass_parallelize_loops);
+ PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
+ NEXT_PASS (pass_expand_omp_ssa);
+ POP_INSERT_PASSES ()
PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
NEXT_PASS (pass_dce);
POP_INSERT_PASSES ()
...
And that makes the problem go away (btw, dump file names need adapting in
investigate.sh):
...
$ ./investigate.sh
parloops_factor: 0, index_type: int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
vectorized: 1, parallelized: 0
parloops_factor: 2, index_type: int:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned int:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: long:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
vectorized: 1, parallelized: 1
...
Of course, the patch means we're no longer vectorizing parallelized loops, but
parallelizing vectorized loops.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
2015-05-26 7:59 ` [Bug tree-optimization/66285] " vries at gcc dot gnu.org
@ 2015-05-26 8:11 ` vries at gcc dot gnu.org
2015-05-26 8:12 ` vries at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 8:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #2 from vries at gcc dot gnu.org ---
Created attachment 35623
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35623&action=edit
par-2.c.129t.parloops
For -DINDEX_TYPE=int, par-2.c.129t.parloops
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
2015-05-26 7:59 ` [Bug tree-optimization/66285] " vries at gcc dot gnu.org
2015-05-26 8:11 ` vries at gcc dot gnu.org
@ 2015-05-26 8:12 ` vries at gcc dot gnu.org
2015-05-26 8:13 ` vries at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 8:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #3 from vries at gcc dot gnu.org ---
Created attachment 35624
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35624&action=edit
par-2.c.130t.ompexpssa
par-2.c.130t.ompexpssa
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
` (2 preceding siblings ...)
2015-05-26 8:12 ` vries at gcc dot gnu.org
@ 2015-05-26 8:13 ` vries at gcc dot gnu.org
2015-05-26 8:13 ` vries at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 8:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #5 from vries at gcc dot gnu.org ---
Created attachment 35626
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35626&action=edit
par-2.c.132t.vect
par-2.c.132t.vect
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
` (3 preceding siblings ...)
2015-05-26 8:13 ` vries at gcc dot gnu.org
@ 2015-05-26 8:13 ` vries at gcc dot gnu.org
2015-05-26 10:58 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 8:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #4 from vries at gcc dot gnu.org ---
Created attachment 35625
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35625&action=edit
par-2.c.131t.ifcvt
par-2.c.131t.ifcvt
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
` (4 preceding siblings ...)
2015-05-26 8:13 ` vries at gcc dot gnu.org
@ 2015-05-26 10:58 ` rguenth at gcc dot gnu.org
2015-05-26 12:54 ` vries at gcc dot gnu.org
2015-05-26 13:24 ` vries at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-05-26 10:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
I thought that parallelizing vectorized loops is harder (you eventually get
extra prologue and epliogue loops, etc).
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
` (5 preceding siblings ...)
2015-05-26 10:58 ` rguenth at gcc dot gnu.org
@ 2015-05-26 12:54 ` vries at gcc dot gnu.org
2015-05-26 13:24 ` vries at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 12:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #7 from vries at gcc dot gnu.org ---
(In reply to Richard Biener from comment #6)
> I thought that parallelizing vectorized loops is harder (you eventually get
> extra prologue and epliogue loops, etc).
Another example, par-4.c:
...
int __attribute__((noinline,noclone))
f (int argc, double *__restrict results, double *__restrict data, INDEX_TYPE n)
{
double coeff = 12.2;
for (INDEX_TYPE idx = 0; idx < n; idx++)
results[idx] = coeff * data[idx];
return !(results[argc] == 0.0);
}
#define nEvents 1000
#if defined (MAIN)
int
main (int argc)
{
double results[nEvents] = {0};
double data[nEvents] = {0};
return f (argc, results, data, nEvents);
}
#endif
...
When not parallelizing, we vectorize without problems:
...
parloops_factor: 0, index_type: int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
vectorized: 1, parallelized: 0
...
When parallelizing, we generate both a low iteration count loop, and a
split-off parallelized loop. The vectorizer vectorizes both loops (each of
which contains an epilogue):
...
parloops_factor: 2, index_type: int:
vectorized: 2, parallelized: 1
parloops_factor: 2, index_type: long:
vectorized: 2, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
vectorized: 2, parallelized: 1
...
Except in the case of unsigned int, in which case it only vectorizes the low
iteration count loop:
...
parloops_factor: 2, index_type: unsigned int:
vectorized: 1, parallelized: 1
...
The other loop fails to vectorize in a fashion similar as decribed for par-2.c
with INDEX_TYPE (unsigned) int.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug tree-optimization/66285] failure to vectorize parallelized loop
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
` (6 preceding siblings ...)
2015-05-26 12:54 ` vries at gcc dot gnu.org
@ 2015-05-26 13:24 ` vries at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 13:24 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285
--- Comment #8 from vries at gcc dot gnu.org ---
For example par-4.c, if we use the same patch to interchange the passes, we
get:
When not parallelizing, all loops get vectorized:
...
parloops_factor: 0, index_type: int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
vectorized: 1, parallelized: 0
...
When parallelizing, we parallelize one loop.
...
parloops_factor: 2, index_type: int:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned int:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: long:
vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
vectorized: 1, parallelized: 1
...
The loop that is parallelized is the vectorized loop, not the epilogue.
So AFAIU:
- with this patch the epilogue is only performed by the main thread, after all
the threads are done. Each thread handles one slice of the vectorized loop.
- without the patch, the epilogue is potentially executed by each thread.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-05-26 13:24 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-26 7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
2015-05-26 7:59 ` [Bug tree-optimization/66285] " vries at gcc dot gnu.org
2015-05-26 8:11 ` vries at gcc dot gnu.org
2015-05-26 8:12 ` vries at gcc dot gnu.org
2015-05-26 8:13 ` vries at gcc dot gnu.org
2015-05-26 8:13 ` vries at gcc dot gnu.org
2015-05-26 10:58 ` rguenth at gcc dot gnu.org
2015-05-26 12:54 ` vries at gcc dot gnu.org
2015-05-26 13:24 ` vries at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).