public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar
@ 2015-07-14 22:39 vries at gcc dot gnu.org
  2015-07-14 22:53 ` [Bug tree-optimization/66873] " vries at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-14 22:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

            Bug ID: 66873
           Summary: fortran variant of outer-1.c not parallelized by
                    autopar
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vries at gcc dot gnu.org
  Target Milestone: ---

Consider this example, a fortran version of autopar/outer-1.c:
...
program main
  implicit none
  integer, parameter         :: n = 500
  integer, dimension (0:n-1, 0:n-1) :: x
  integer                    :: i, j, ii, jj

  do ii = 0, n - 1
     do jj = 0, n - 1
        x(ii, jj) = ii + jj + 3
     end do
  end do

  do i = 0, n - 1
     do j = 0, n - 1
        if (x(i, j) .ne. i + j + 3) call abort
     end do
  end do

end program main
...

When trying to parallelize this using -O2 -ftree-parallelize-loops=2, it fails
on the dependencies:
...
(Data Dep:
#(Data Ref:
#  bb: 4
#  stmt: x[_12] = _14;
#  ref: x[_12];
#  base_object: x;
#  Access function 0: {{0, +, 1}_3, +, 500}_4
#)
#(Data Ref:
#  bb: 4
#  stmt: x[_12] = _14;
#  ref: x[_12];
#  base_object: x;
#  Access function 0: {{0, +, 1}_3, +, 500}_4
#)
  access_fn_A: {{0, +, 1}_3, +, 500}_4
  access_fn_B: {{0, +, 1}_3, +, 500}_4

 (subscript
  iterations_that_access_an_element_twice_in_A: [0]
  last_conflict: scev_not_known
  iterations_that_access_an_element_twice_in_B: [0]
  last_conflict: scev_not_known
  (Subscript distance: 0 ))
  inner loop index: 0
  loop nest: (3 4 )
  distance_vector:   0   0
  distance_vector: 500  -1
  direction_vector:     =    =
  direction_vector:     +    -
)
  FAILED: data dependencies exist across iterations
...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
@ 2015-07-14 22:53 ` vries at gcc dot gnu.org
  2015-07-14 23:02 ` vries at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-14 22:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

vries at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #1 from vries at gcc dot gnu.org ---
Once noticeable difference with outer-1.c, is that pass_iv_canon make the inner
and outer loop ivs run downwards (from 500 to 0).

Removing pass_iv_canon from the pass list fixes that, but doesn't change
anything about the dependency analysis in parloops:
...
(Data Dep:
#(Data Ref:
#  bb: 4
#  stmt: x[_12] = _14;
#  ref: x[_12];
#  base_object: x;
#  Access function 0: {{0, +, 1}_3, +, 500}_4
#)
#(Data Ref:
#  bb: 4
#  stmt: x[_12] = _14;
#  ref: x[_12];
#  base_object: x;
#  Access function 0: {{0, +, 1}_3, +, 500}_4
#)
  access_fn_A: {{0, +, 1}_3, +, 500}_4
  access_fn_B: {{0, +, 1}_3, +, 500}_4

 (subscript
  iterations_that_access_an_element_twice_in_A: [0]
  last_conflict: scev_not_known
  iterations_that_access_an_element_twice_in_B: [0]
  last_conflict: scev_not_known
  (Subscript distance: 0 ))
  inner loop index: 0
  loop nest: (3 4 )
  distance_vector:   0   0
  distance_vector: 500  -1
  direction_vector:     =    =
  direction_vector:     +    -
)
  FAILED: data dependencies exist across iterations
...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
  2015-07-14 22:53 ` [Bug tree-optimization/66873] " vries at gcc dot gnu.org
@ 2015-07-14 23:02 ` vries at gcc dot gnu.org
  2015-07-14 23:18 ` vries at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-14 23:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

--- Comment #2 from vries at gcc dot gnu.org ---
Another obvious difference is that the fortran 2-dimensional array access is
rewritten into a single dimension array access:
...
  <bb 3>:
  # ii_7 = PHI <0(2), ii_16(7)>
  pretmp_52 = (integer(kind=8)) ii_7;
  pretmp_53 = pretmp_52 * 500;

  <bb 4>:
  # jj_10 = PHI <0(3), jj_15(5)>
  _11 = (integer(kind=8)) jj_10;
  _12 = _11 + pretmp_53;
  _13 = ii_7 + jj_10;
  _14 = _13 + 3;
  x[_12] = _14;
  jj_15 = jj_10 + 1;
  if (jj_10 == 499)
    goto <bb 6>;
  else
    goto <bb 5>;
...

While the outer-1.c 2-dimensional array access is still 2-dimensional:
...
  <bb 9>:
  # i_34 = PHI <0(3), i_15(8)>
  goto <bb 5>;

  <bb 5>:
  # j_36 = PHI <0(9), j_14(4)>
  _11 = i_34 + j_36;
  _12 = _11 + 3;
  x[i_34][j_36] = _12;
  j_14 = j_36 + 1;
  if (N_9(D) > j_14)
    goto <bb 4>;
  else
    goto <bb 6>;
...

Which results in different access functions, and the dependence analysis
succeeds:
...
(Data Dep:
#(Data Ref:
#  bb: 5
#  stmt: x[i_34][j_36] = _12;
#  ref: x[i_34][j_36];
#  base_object: x;
#  Access function 0: {0, +, 1}_4
#  Access function 1: {0, +, 1}_1
#)
#(Data Ref:
#  bb: 5
#  stmt: x[i_34][j_36] = _12;
#  ref: x[i_34][j_36];
#  base_object: x;
#  Access function 0: {0, +, 1}_4
#  Access function 1: {0, +, 1}_1
#)
  access_fn_A: {0, +, 1}_4
  access_fn_B: {0, +, 1}_4

 (subscript
  iterations_that_access_an_element_twice_in_A: [0]
  last_conflict: scev_not_known
  iterations_that_access_an_element_twice_in_B: [0]
  last_conflict: scev_not_known
  (Subscript distance: 0 ))
  access_fn_A: {0, +, 1}_1
  access_fn_B: {0, +, 1}_1

 (subscript
  iterations_that_access_an_element_twice_in_A: [0]
  last_conflict: scev_not_known
  iterations_that_access_an_element_twice_in_B: [0]
  last_conflict: scev_not_known
  (Subscript distance: 0 ))
  inner loop index: 0
  loop nest: (1 4 )
  distance_vector:   0   0
  direction_vector:     =    =
)
  SUCCESS: may be parallelized
...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
  2015-07-14 22:53 ` [Bug tree-optimization/66873] " vries at gcc dot gnu.org
  2015-07-14 23:02 ` vries at gcc dot gnu.org
@ 2015-07-14 23:18 ` vries at gcc dot gnu.org
  2015-07-14 23:23 ` vries at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-14 23:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

--- Comment #3 from vries at gcc dot gnu.org ---
The fortran example succeeds when floop-parallelize-all is used.

Even though the access function in graphite seems the same:
...
        Access function 0: {{0, +, 1}_3, +, 500}_4
...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2015-07-14 23:18 ` vries at gcc dot gnu.org
@ 2015-07-14 23:23 ` vries at gcc dot gnu.org
  2015-07-14 23:39 ` kargl at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-14 23:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

--- Comment #4 from vries at gcc dot gnu.org ---
Created attachment 35983
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35983&action=edit
Tentative patch

Using this tentative patch, we use graphite analysis (if available) by default
for parloops. That way, we manage to parallelize the fortran example using just
-ftree-parallelize-loops=2.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2015-07-14 23:23 ` vries at gcc dot gnu.org
@ 2015-07-14 23:39 ` kargl at gcc dot gnu.org
  2015-07-15  6:12 ` vries at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: kargl at gcc dot gnu.org @ 2015-07-14 23:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

kargl at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |kargl at gcc dot gnu.org

--- Comment #5 from kargl at gcc dot gnu.org ---
(In reply to vries from comment #0)
> Consider this example, a fortran version of autopar/outer-1.c:
> ...
> program main
>   implicit none
>   integer, parameter         :: n = 500
>   integer, dimension (0:n-1, 0:n-1) :: x
>   integer                    :: i, j, ii, jj
> 
>   do ii = 0, n - 1
>      do jj = 0, n - 1
>         x(ii, jj) = ii + jj + 3
>      end do
>   end do
> 
>   do i = 0, n - 1
>      do j = 0, n - 1
>         if (x(i, j) .ne. i + j + 3) call abort
>      end do
>   end do
> 
> end program main
> ...
> 
> When trying to parallelize this using -O2 -ftree-parallelize-loops=2, it
> fails on the dependencies:

Does the loop ordering matter?  Fortran is a column major language,
so your nested loops are backwards.  One would normally write.

  do jj = 0, n - 1
      do ii = 0, n - 1
         x(ii, jj) = ii + jj + 3
      end do
   end do

where the first loop index varies most rapidly.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2015-07-14 23:39 ` kargl at gcc dot gnu.org
@ 2015-07-15  6:12 ` vries at gcc dot gnu.org
  2015-07-15 10:01 ` vries at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-15  6:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

--- Comment #6 from vries at gcc dot gnu.org ---
(In reply to kargl from comment #5)
> Does the loop ordering matter?  Fortran is a column major language,
> so your nested loops are backwards.  One would normally write.
> 
>   do jj = 0, n - 1
>       do ii = 0, n - 1
>          x(ii, jj) = ii + jj + 3
>       end do
>    end do
> 
> where the first loop index varies most rapidly.

Thanks for letting me know. I'm obviously not fluent in Fortran.

Interchanging ii and jj in the array access of the example, and again disabling
pass_iv_canon, gives:
...
(Data Dep:
#(Data Ref:
#  bb: 4
#  stmt: x[_12] = _14;
#  ref: x[_12];
#  base_object: x;
#  Access function 0: {{0, +, 500}_3, +, 1}_4
#)
#(Data Ref:
#  bb: 4
#  stmt: x[_12] = _14;
#  ref: x[_12];
#  base_object: x;
#  Access function 0: {{0, +, 500}_3, +, 1}_4
#)
  access_fn_A: {{0, +, 500}_3, +, 1}_4
  access_fn_B: {{0, +, 500}_3, +, 1}_4

 (subscript
  iterations_that_access_an_element_twice_in_A: [0]
  last_conflict: scev_not_known
  iterations_that_access_an_element_twice_in_B: [0]
  last_conflict: scev_not_known
  (Subscript distance: 0 ))
  inner loop index: 0
  loop nest: (3 4 )
  distance_vector:   0   0
  distance_vector:   1 -500
  direction_vector:     =    =
  direction_vector:     +    -
)
  FAILED: data dependencies exist across iterations
...

Again, using -floops-parallelize-all allows the outer loop to be paralelized.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2015-07-15  6:12 ` vries at gcc dot gnu.org
@ 2015-07-15 10:01 ` vries at gcc dot gnu.org
  2015-07-15 20:29 ` vries at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-15 10:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

vries at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #35983|0                           |1
        is obsolete|                            |

--- Comment #7 from vries at gcc dot gnu.org ---
Created attachment 35986
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35986&action=edit
Updated tentative patch

I found that always doing graphite before parloops resulted in failures to
parallelize reduction testcases.

I've split things up now:
- first we do parloopsred, a parloops variant in which we only handle
reductions
- then we do graphite
- then we do the normal parloops

This seems to combine the best of graphite and parloops.

The only gotcha is that I had to disable pass_iv_canon when
tree_parallelize_loops > 1. It seems to interfere with graphite. I did not
observe any failures to parallelize due to not running pass_iv_canon.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2015-07-15 10:01 ` vries at gcc dot gnu.org
@ 2015-07-15 20:29 ` vries at gcc dot gnu.org
  2015-07-24 16:56 ` vries at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-15 20:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

vries at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |patch

--- Comment #8 from vries at gcc dot gnu.org ---
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01332.html


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2015-07-15 20:29 ` vries at gcc dot gnu.org
@ 2015-07-24 16:56 ` vries at gcc dot gnu.org
  2015-07-26 17:12 ` vries at gcc dot gnu.org
  2015-07-26 21:39 ` vries at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-24 16:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

vries at gcc dot gnu.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|patch                       |

--- Comment #10 from vries at gcc dot gnu.org ---
retracting patch. it doesn't make sense to use graphite by default before
addressing the compile-time-hog problem.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2015-07-24 16:56 ` vries at gcc dot gnu.org
@ 2015-07-26 17:12 ` vries at gcc dot gnu.org
  2015-07-26 21:39 ` vries at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-26 17:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

--- Comment #11 from vries at gcc dot gnu.org ---
Created attachment 36057
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36057&action=edit
Updated tentative patch


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug tree-optimization/66873] fortran variant of outer-1.c not parallelized by autopar
  2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2015-07-26 17:12 ` vries at gcc dot gnu.org
@ 2015-07-26 21:39 ` vries at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: vries at gcc dot gnu.org @ 2015-07-26 21:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66873

--- Comment #12 from vries at gcc dot gnu.org ---
Created attachment 36063
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36063&action=edit
autopar/outer-7.c

C example to reproduce the same problem


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-07-26 21:39 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-14 22:39 [Bug tree-optimization/66873] New: fortran variant of outer-1.c not parallelized by autopar vries at gcc dot gnu.org
2015-07-14 22:53 ` [Bug tree-optimization/66873] " vries at gcc dot gnu.org
2015-07-14 23:02 ` vries at gcc dot gnu.org
2015-07-14 23:18 ` vries at gcc dot gnu.org
2015-07-14 23:23 ` vries at gcc dot gnu.org
2015-07-14 23:39 ` kargl at gcc dot gnu.org
2015-07-15  6:12 ` vries at gcc dot gnu.org
2015-07-15 10:01 ` vries at gcc dot gnu.org
2015-07-15 20:29 ` vries at gcc dot gnu.org
2015-07-24 16:56 ` vries at gcc dot gnu.org
2015-07-26 17:12 ` vries at gcc dot gnu.org
2015-07-26 21:39 ` vries at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).