public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* Fortran array slices and -frepack-arrays
@ 2018-04-16 11:10 Wilco Dijkstra
  2018-04-16 17:15 ` Steve Kargl
  0 siblings, 1 reply; 5+ messages in thread
From: Wilco Dijkstra @ 2018-04-16 11:10 UTC (permalink / raw)
  To: fortran; +Cc: nd

Hi,

I posted this on gcc-patches (https://gcc.gnu.org/ml/gcc/2018-04/msg00090.html) not realizing
there is a dedicated Fortran list:

I looked at a few performance anomalies between gfortran and Flang - it appears array slices
are treated differently. Using -frepack-arrays fixed a performance issue in gfortran and didn't
cause any regressions. Making input array slices contiguous helps both locality and enables
more vectorization.

So I wonder whether it should be made the default (-O3 or just -Ofast)? Alternatively would
it be feasible in Fortran to version functions or loops if all arguments are contiguous slices?

Wilco

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fortran array slices and -frepack-arrays
  2018-04-16 11:10 Fortran array slices and -frepack-arrays Wilco Dijkstra
@ 2018-04-16 17:15 ` Steve Kargl
  2018-04-18 12:30   ` Wilco Dijkstra
  0 siblings, 1 reply; 5+ messages in thread
From: Steve Kargl @ 2018-04-16 17:15 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: fortran, nd

On Mon, Apr 16, 2018 at 11:10:30AM +0000, Wilco Dijkstra wrote:
> Hi,
> 
> I posted this on gcc-patches (https://gcc.gnu.org/ml/gcc/2018-04/msg00090.html) not realizing
> there is a dedicated Fortran list:
> 
> I looked at a few performance anomalies between gfortran and Flang
> - it appears array slices are treated differently. Using -frepack-arrays
> fixed a performance issue in gfortran and didn't cause any regressions.
> Making input array slices contiguous helps both locality and enables
> more vectorization.
> 
> So I wonder whether it should be made the default (-O3 or just -Ofast)?
> Alternatively would it be feasible in Fortran to version functions or
> loops if all arguments are contiguous slices?

The description of -frepack-arrays suggests that a temporary
array is created.  What impact does this have on stack usage?
What impact does it have on code size?  Other than some unnamed
code that you have, have you tried -frepack-arrays on say the
Polyhedron Benchmarks to investigate trade-offs with this
option.

-- 
Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fortran array slices and -frepack-arrays
  2018-04-16 17:15 ` Steve Kargl
@ 2018-04-18 12:30   ` Wilco Dijkstra
  2018-04-18 13:16     ` Richard Biener
  2018-04-18 20:28     ` Thomas Koenig
  0 siblings, 2 replies; 5+ messages in thread
From: Wilco Dijkstra @ 2018-04-18 12:30 UTC (permalink / raw)
  To: sgk; +Cc: fortran, nd

Steve Kargl wrote:

> The description of -frepack-arrays suggests that a temporary
> array is created.  What impact does this have on stack usage?
> What impact does it have on code size?  Other than some unnamed
> code that you have, have you tried -frepack-arrays on say the
> Polyhedron Benchmarks to investigate trade-offs with this
> option.

The step is checked at function entry, and the array is only copied to
a malloc'd temporary if it is not 1. Generally Fortran uses malloc even for
small temporaries - with -fstack-arrays it uses alloca even for huge
temporaries. This is far from ideal, it should choose between malloc and
alloca depending on a configurable size.

The vast majority of steps in array slices are 1, so it mostly adds extra
code. The codesize cost on SPEC2017 is 2% on exchange2_r, 6.6% on
roms_r (large number of array slices, so if function versioning is feasible
then that is preferable) - the rest is either identical or ~0.2% larger. 

I'll see whether I can run some more benchmarks.

Wilco

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fortran array slices and -frepack-arrays
  2018-04-18 12:30   ` Wilco Dijkstra
@ 2018-04-18 13:16     ` Richard Biener
  2018-04-18 20:28     ` Thomas Koenig
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Biener @ 2018-04-18 13:16 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: sgk, fortran, nd

On Wed, Apr 18, 2018 at 2:29 PM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Steve Kargl wrote:
>
>> The description of -frepack-arrays suggests that a temporary
>> array is created.  What impact does this have on stack usage?
>> What impact does it have on code size?  Other than some unnamed
>> code that you have, have you tried -frepack-arrays on say the
>> Polyhedron Benchmarks to investigate trade-offs with this
>> option.
>
> The step is checked at function entry, and the array is only copied to
> a malloc'd temporary if it is not 1. Generally Fortran uses malloc even for
> small temporaries - with -fstack-arrays it uses alloca even for huge
> temporaries. This is far from ideal, it should choose between malloc and
> alloca depending on a configurable size.
>
> The vast majority of steps in array slices are 1, so it mostly adds extra
> code. The codesize cost on SPEC2017 is 2% on exchange2_r, 6.6% on
> roms_r (large number of array slices, so if function versioning is feasible
> then that is preferable) - the rest is either identical or ~0.2% larger.
>
> I'll see whether I can run some more benchmarks.

Well, an immediate benefit would be to improve vectorization which
no longer needs to handle arbitrary strided loads/stores but can rely
on stride 1 if that is properly communicated by the frontend with
-frepack-arrays.

Of course here inputs are not everything but outputs have to be
considered as well...

Richard.

> Wilco

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Fortran array slices and -frepack-arrays
  2018-04-18 12:30   ` Wilco Dijkstra
  2018-04-18 13:16     ` Richard Biener
@ 2018-04-18 20:28     ` Thomas Koenig
  1 sibling, 0 replies; 5+ messages in thread
From: Thomas Koenig @ 2018-04-18 20:28 UTC (permalink / raw)
  To: Wilco Dijkstra, sgk; +Cc: fortran, nd

Hi Wilco,

> The vast majority of steps in array slices are 1, so it mostly adds extra
> code. The codesize cost on SPEC2017 is 2% on exchange2_r, 6.6% on
> roms_r (large number of array slices, so if function versioning is feasible
> then that is preferable) - the rest is either identical or ~0.2% larger.
> 
> I'll see whether I can run some more benchmarks.

There is a standard way to specify what -frepack-arrays does: Specify
the CONTIGUOUS attribute to the dummy argument.

You could also try using -flto (which should be able to deduce that
strides are equal to one), but LTO has some known issues with Fortran.

Regards

	Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-18 20:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-16 11:10 Fortran array slices and -frepack-arrays Wilco Dijkstra
2018-04-16 17:15 ` Steve Kargl
2018-04-18 12:30   ` Wilco Dijkstra
2018-04-18 13:16     ` Richard Biener
2018-04-18 20:28     ` Thomas Koenig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).