public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/36842] Fortran: Minimize heap allocation of temporary arrays.
       [not found] <bug-36842-4@http.gcc.gnu.org/bugzilla/>
@ 2011-09-07  9:31 ` burnus at gcc dot gnu.org
  2011-09-07 13:57 ` [Bug fortran/36842] Fortran: Minimize heap allocation of temporary arrays by loop versioning in the frontend rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 4+ messages in thread
From: burnus at gcc dot gnu.org @ 2011-09-07  9:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36842

Tobias Burnus <burnus at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |burnus at gcc dot gnu.org

--- Comment #3 from Tobias Burnus <burnus at gcc dot gnu.org> 2011-09-07 09:13:07 UTC ---
At least I fail how the compiler can know whether the arguments alias or not.
For an illustration, use:
   real(kind=kind(1.0d0)), target :: data(5,5)
   data = Reshape([((i*10+j, i = 1, 5), j=1,5)], [5,5])
   Rx => data(1,:)
   Ry => data
for the actual argument. You will get different results with the current code,
   Ry(:,n) = Ry(:,n) * Rx(:)
and with a loop which simply assigns it without a temporary,
   do m = 1, size(Rx)
     Ry(m,n) = Ry(m,n) * Rx(m)
   end do

Thus, I see no other possibility than having a temporary which stores first the
evaluated right-hand side before it assigns it to the left-hand side.

As noted by Mikael, if the compiler knows that the two variables cannot alias,
no temporary is generated. That's the case if one of the variables is neither a
target nor a pointer - or if both variables are nonpointers (also with target)
but not dummy arguments - or both are nonpointers (also with target and dummy
argument) but allocatable. (Cf. gcc/fortran/dependency.c's
gfc_check_dependency).

 * * *

It would help a lot if the temporary generation (malloc/free) could be moved
out of the loop; however, that's nontrivial to do for both the front end and
for the middle end. Cf. PR 19831 and PR 21046. 

 * * *

Regarding the heap usage: Since GCC 4.7 the flag -fstack-arrays exists (implied
by -Ofast), which uses the stack instead of the heap for the temporary array.


Thus, unless someone has a better idea or comes up with an optimizable test
case, I vote for closing this PR.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug fortran/36842] Fortran: Minimize heap allocation of temporary arrays by loop versioning in the frontend
       [not found] <bug-36842-4@http.gcc.gnu.org/bugzilla/>
  2011-09-07  9:31 ` [Bug fortran/36842] Fortran: Minimize heap allocation of temporary arrays burnus at gcc dot gnu.org
@ 2011-09-07 13:57 ` rguenth at gcc dot gnu.org
  2011-09-07 14:32 ` burnus at gcc dot gnu.org
  2011-09-07 14:34 ` rguenther at suse dot de
  3 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2011-09-07 13:57 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36842

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Fortran: Minimize heap      |Fortran: Minimize heap
                   |allocation of temporary     |allocation of temporary
                   |arrays.                     |arrays by loop versioning
                   |                            |in the frontend

--- Comment #4 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-09-07 13:54:47 UTC ---
I think the report asks for performing loop versioning in the frontend, thus
emit

  if (Rx and Ry may overlap)
    {
      ... current code ...
    }
  else
    {
      ... code without temporary
    }

where of course the tricky part is creating the appropriate condition
and deciding if it is worthwhile doing the versioning.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug fortran/36842] Fortran: Minimize heap allocation of temporary arrays by loop versioning in the frontend
       [not found] <bug-36842-4@http.gcc.gnu.org/bugzilla/>
  2011-09-07  9:31 ` [Bug fortran/36842] Fortran: Minimize heap allocation of temporary arrays burnus at gcc dot gnu.org
  2011-09-07 13:57 ` [Bug fortran/36842] Fortran: Minimize heap allocation of temporary arrays by loop versioning in the frontend rguenth at gcc dot gnu.org
@ 2011-09-07 14:32 ` burnus at gcc dot gnu.org
  2011-09-07 14:34 ` rguenther at suse dot de
  3 siblings, 0 replies; 4+ messages in thread
From: burnus at gcc dot gnu.org @ 2011-09-07 14:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36842

--- Comment #5 from Tobias Burnus <burnus at gcc dot gnu.org> 2011-09-07 14:30:11 UTC ---
(In reply to comment #4)
> I think the report asks for performing loop versioning in the frontend [...]
> where of course the tricky part is creating the appropriate condition
> and deciding if it is worthwhile doing the versioning.


(In reply to comment #4)
>   if (Rx and Ry may overlap)

That would be something like:

  Rx_min = Rx
  Rx_max = &Rx[((rx.dim[0].stride >= 0 && rx.dim[0].ubound >= rx.dim[0].lbound
|| rx.dim[0].stride < 0) || rx.dim[0].stride < 0 && rx.dim[0].lbound == 1 ?
(integer(kind=8)) (integer(kind=4)) rx.dim[0].ubound : 0) * stride.1 +
offset.2]

Something similar for Ry - but a bit more complicated as it has two dimensions
- and then:
  if (Rx_min <= Ry_max && Rx_max >= Ry_min)
    /* They are overlapping.  */

For the LHS one already needs to calculate the extend for the memory allocation
- thus, that extend one gets for free. The effort for the RHS depends on the
number of variables which possibly alias with the LHS. The costs would be the
additional extend calculations, the additional code and in particular the
branching.

In terms of the code, it would mean a larger restructuring as currently
gfc_check_dependency only checks for the dependency; for loop versioning one
needs a function which extracts all the possibly aliasing bounds of the RHS
expression.

The more difficult question is: When should it be applied? Not for -Os, but
otherwise?

(Work around: Writing the code such that the compiler actually knows more by
avoiding POINTER, TARGET and by using CONTIGUOUS - which is difficult for
legacy code [including benchmarks].)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug fortran/36842] Fortran: Minimize heap allocation of temporary arrays by loop versioning in the frontend
       [not found] <bug-36842-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2011-09-07 14:32 ` burnus at gcc dot gnu.org
@ 2011-09-07 14:34 ` rguenther at suse dot de
  3 siblings, 0 replies; 4+ messages in thread
From: rguenther at suse dot de @ 2011-09-07 14:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36842

--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> 2011-09-07 14:32:55 UTC ---
On Wed, 7 Sep 2011, burnus at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36842
> 
> --- Comment #5 from Tobias Burnus <burnus at gcc dot gnu.org> 2011-09-07 14:30:11 UTC ---
> (In reply to comment #4)
> > I think the report asks for performing loop versioning in the frontend [...]
> > where of course the tricky part is creating the appropriate condition
> > and deciding if it is worthwhile doing the versioning.
> 
> 
> (In reply to comment #4)
> >   if (Rx and Ry may overlap)
> 
> That would be something like:
> 
>   Rx_min = Rx
>   Rx_max = &Rx[((rx.dim[0].stride >= 0 && rx.dim[0].ubound >= rx.dim[0].lbound
> || rx.dim[0].stride < 0) || rx.dim[0].stride < 0 && rx.dim[0].lbound == 1 ?
> (integer(kind=8)) (integer(kind=4)) rx.dim[0].ubound : 0) * stride.1 +
> offset.2]
> 
> Something similar for Ry - but a bit more complicated as it has two dimensions
> - and then:
>   if (Rx_min <= Ry_max && Rx_max >= Ry_min)
>     /* They are overlapping.  */
> 
> For the LHS one already needs to calculate the extend for the memory allocation
> - thus, that extend one gets for free. The effort for the RHS depends on the
> number of variables which possibly alias with the LHS. The costs would be the
> additional extend calculations, the additional code and in particular the
> branching.
> 
> In terms of the code, it would mean a larger restructuring as currently
> gfc_check_dependency only checks for the dependency; for loop versioning one
> needs a function which extracts all the possibly aliasing bounds of the RHS
> expression.
> 
> The more difficult question is: When should it be applied? Not for -Os, but
> otherwise?

Conditional on a new frontend flag I suppose, eventually enabled at
-O3 by default.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-09-07 14:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-36842-4@http.gcc.gnu.org/bugzilla/>
2011-09-07  9:31 ` [Bug fortran/36842] Fortran: Minimize heap allocation of temporary arrays burnus at gcc dot gnu.org
2011-09-07 13:57 ` [Bug fortran/36842] Fortran: Minimize heap allocation of temporary arrays by loop versioning in the frontend rguenth at gcc dot gnu.org
2011-09-07 14:32 ` burnus at gcc dot gnu.org
2011-09-07 14:34 ` rguenther at suse dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).