public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/66285] New: failure to vectorize parallelized loop
@ 2015-05-26  7:55 vries at gcc dot gnu.org
  2015-05-26  7:59 ` [Bug tree-optimization/66285] " vries at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26  7:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

            Bug ID: 66285
           Summary: failure to vectorize parallelized loop
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Another pr46032-inspired example.

Consider par-2.c:
...
#define nEvents 1000

int __attribute__((noinline,noclone))
f (int argc, double *__restrict results, double *__restrict data)
{
  double coeff = 12.2;

  for (INDEX_TYPE idx = 0; idx < nEvents; idx++)
    results[idx] = coeff * data[idx];

  return !(results[argc] == 0.0);
}

#if defined (MAIN)
int
main (int argc)
{
  double results[nEvents] = {0};
  double data[nEvents] = {0};

  return f (argc, results, data);
}
#endif
...

And investigate.sh:
...
#!/bin/bash

src=par-2.c

for parloops_factor in 0 2; do
    for index_type in "int" "unsigned int" "long" "unsigned long"; do
        rm -f *.c.*;

        ./lean-c/install/bin/gcc -O2 $src -S \
            -ftree-parallelize-loops=$parloops_factor \
            -ftree-vectorize \
            -fdump-tree-all-all \
            "-DINDEX_TYPE=$index_type"

        vectdump=$src.132t.vect
        pardump=$src.129t.parloops

        vectorized=$(grep -c "LOOP VECTORIZED" $vectdump)

        if [ ! -f $pardump ]; then 
            parallelized=0
        else
            parallelized=$(grep -c "parallelizing inner loop" $pardump)
        fi

        echo "parloops_factor: $parloops_factor, index_type: $index_type:"
        echo "  vectorized: $vectorized, parallelized: $parallelized"
    done
done
...

If we're not parallelizing, vectorization succeeds:
...
parloops_factor: 0, index_type: int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
  vectorized: 1, parallelized: 0
...

If we're parallelizing, vectorization succeeds for (unsigned) long:
...
parloops_factor: 2, index_type: long:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
  vectorized: 1, parallelized: 1
...

but not for (unsigned) int:
...
parloops_factor: 2, index_type: int:
  vectorized: 0, parallelized: 1
parloops_factor: 2, index_type: unsigned int:
  vectorized: 0, parallelized: 1
...


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/66285] failure to vectorize parallelized loop
  2015-05-26  7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
@ 2015-05-26  7:59 ` vries at gcc dot gnu.org
  2015-05-26  8:11 ` vries at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26  7:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

vries at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #1 from vries at gcc dot gnu.org ---
FWIW, this patch puts pass_parallelize_loops before pass_vectorize: 
...
diff --git a/gcc/passes.def b/gcc/passes.def
index 4690e23..f0629ff 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -243,14 +243,14 @@ along with GCC; see the file COPYING3.  If not see
              NEXT_PASS (pass_dce);
          POP_INSERT_PASSES ()
          NEXT_PASS (pass_iv_canon);
-         NEXT_PASS (pass_parallelize_loops);
-         PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
-             NEXT_PASS (pass_expand_omp_ssa);
-         POP_INSERT_PASSES ()
          NEXT_PASS (pass_if_conversion);
          /* pass_vectorize must immediately follow pass_if_conversion.
             Please do not add any other passes in between.  */
          NEXT_PASS (pass_vectorize);
+         NEXT_PASS (pass_parallelize_loops);
+         PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
+             NEXT_PASS (pass_expand_omp_ssa);
+         POP_INSERT_PASSES ()
           PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
              NEXT_PASS (pass_dce);
           POP_INSERT_PASSES ()
...

And that makes the problem go away (btw, dump file names need adapting in
investigate.sh):
...
$ ./investigate.sh 
parloops_factor: 0, index_type: int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
  vectorized: 1, parallelized: 0
parloops_factor: 2, index_type: int:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned int:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: long:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
  vectorized: 1, parallelized: 1
...

Of course, the patch means we're no longer vectorizing parallelized loops, but
parallelizing vectorized loops.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/66285] failure to vectorize parallelized loop
  2015-05-26  7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
  2015-05-26  7:59 ` [Bug tree-optimization/66285] " vries at gcc dot gnu.org
@ 2015-05-26  8:11 ` vries at gcc dot gnu.org
  2015-05-26  8:12 ` vries at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26  8:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

--- Comment #2 from vries at gcc dot gnu.org ---
Created attachment 35623
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35623&action=edit
par-2.c.129t.parloops

For -DINDEX_TYPE=int, par-2.c.129t.parloops


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/66285] failure to vectorize parallelized loop
  2015-05-26  7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
  2015-05-26  7:59 ` [Bug tree-optimization/66285] " vries at gcc dot gnu.org
  2015-05-26  8:11 ` vries at gcc dot gnu.org
@ 2015-05-26  8:12 ` vries at gcc dot gnu.org
  2015-05-26  8:13 ` vries at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26  8:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

--- Comment #3 from vries at gcc dot gnu.org ---
Created attachment 35624
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35624&action=edit
par-2.c.130t.ompexpssa

par-2.c.130t.ompexpssa


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/66285] failure to vectorize parallelized loop
  2015-05-26  7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2015-05-26  8:13 ` vries at gcc dot gnu.org
@ 2015-05-26  8:13 ` vries at gcc dot gnu.org
  2015-05-26 10:58 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

--- Comment #5 from vries at gcc dot gnu.org ---
Created attachment 35626
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35626&action=edit
par-2.c.132t.vect

par-2.c.132t.vect


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/66285] failure to vectorize parallelized loop
  2015-05-26  7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2015-05-26  8:12 ` vries at gcc dot gnu.org
@ 2015-05-26  8:13 ` vries at gcc dot gnu.org
  2015-05-26  8:13 ` vries at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

--- Comment #4 from vries at gcc dot gnu.org ---
Created attachment 35625
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35625&action=edit
par-2.c.131t.ifcvt

par-2.c.131t.ifcvt


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/66285] failure to vectorize parallelized loop
  2015-05-26  7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2015-05-26  8:13 ` vries at gcc dot gnu.org
@ 2015-05-26 10:58 ` rguenth at gcc dot gnu.org
  2015-05-26 12:54 ` vries at gcc dot gnu.org
  2015-05-26 13:24 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2015-05-26 10:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
I thought that parallelizing vectorized loops is harder (you eventually get
extra prologue and epliogue loops, etc).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/66285] failure to vectorize parallelized loop
  2015-05-26  7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2015-05-26 10:58 ` rguenth at gcc dot gnu.org
@ 2015-05-26 12:54 ` vries at gcc dot gnu.org
  2015-05-26 13:24 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 12:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

--- Comment #7 from vries at gcc dot gnu.org ---
(In reply to Richard Biener from comment #6)
> I thought that parallelizing vectorized loops is harder (you eventually get
> extra prologue and epliogue loops, etc).

Another example, par-4.c:
...
int __attribute__((noinline,noclone))
f (int argc, double *__restrict results, double *__restrict data, INDEX_TYPE n)
{
  double coeff = 12.2;

  for (INDEX_TYPE idx = 0; idx < n; idx++)
    results[idx] = coeff * data[idx];

  return !(results[argc] == 0.0);
}

#define nEvents 1000

#if defined (MAIN)
int
main (int argc)
{
  double results[nEvents] = {0};
  double data[nEvents] = {0};

  return f (argc, results, data, nEvents);
}
#endif
...

When not parallelizing, we vectorize without problems:
...
parloops_factor: 0, index_type: int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
  vectorized: 1, parallelized: 0
...


When parallelizing, we generate both a low iteration count loop, and a
split-off parallelized loop. The vectorizer vectorizes both loops (each of
which contains an epilogue):
...
parloops_factor: 2, index_type: int:
  vectorized: 2, parallelized: 1
parloops_factor: 2, index_type: long:
  vectorized: 2, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
  vectorized: 2, parallelized: 1
...

Except in the case of unsigned int, in which case it only vectorizes the low
iteration count loop:
...
parloops_factor: 2, index_type: unsigned int:
  vectorized: 1, parallelized: 1
...
The other loop fails to vectorize in a fashion similar as decribed for par-2.c
with INDEX_TYPE (unsigned) int.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/66285] failure to vectorize parallelized loop
  2015-05-26  7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2015-05-26 12:54 ` vries at gcc dot gnu.org
@ 2015-05-26 13:24 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2015-05-26 13:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66285

--- Comment #8 from vries at gcc dot gnu.org ---
For example par-4.c, if we use the same patch to interchange the passes, we
get:

When not parallelizing, all loops get vectorized:
...
parloops_factor: 0, index_type: int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned int:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: long:
  vectorized: 1, parallelized: 0
parloops_factor: 0, index_type: unsigned long:
  vectorized: 1, parallelized: 0
...

When parallelizing, we parallelize one loop.
...
parloops_factor: 2, index_type: int:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned int:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: long:
  vectorized: 1, parallelized: 1
parloops_factor: 2, index_type: unsigned long:
  vectorized: 1, parallelized: 1
...
The loop that is parallelized is the vectorized loop, not the epilogue.


So AFAIU:
- with this patch the epilogue is only performed by the main thread, after all
  the threads are done. Each thread handles one slice of the vectorized loop.
- without the patch, the epilogue is potentially executed by each thread.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-05-26 13:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-26  7:55 [Bug tree-optimization/66285] New: failure to vectorize parallelized loop vries at gcc dot gnu.org
2015-05-26  7:59 ` [Bug tree-optimization/66285] " vries at gcc dot gnu.org
2015-05-26  8:11 ` vries at gcc dot gnu.org
2015-05-26  8:12 ` vries at gcc dot gnu.org
2015-05-26  8:13 ` vries at gcc dot gnu.org
2015-05-26  8:13 ` vries at gcc dot gnu.org
2015-05-26 10:58 ` rguenth at gcc dot gnu.org
2015-05-26 12:54 ` vries at gcc dot gnu.org
2015-05-26 13:24 ` vries at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).