public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* Optimization of spread
@ 2022-11-03 10:48 Théo Cavignac
  2022-11-03 21:54 ` Mikael Morin
  0 siblings, 1 reply; 4+ messages in thread
From: Théo Cavignac @ 2022-11-03 10:48 UTC (permalink / raw)
  To: fortran

Hello,
I am currently writing some numerical code in Fortran 2003 and I want
to use the spread intrinsic because having used NumPy heavily for the
past few years, it feels natural to use such an array primitive.
I naturally wondered what would be the effect on performance and found
this on Stack Overflow: https://stackoverflow.com/a/55732905/6324751

TLDR: spread is as fast, if not faster than a do loop, when using
ifort. However, it is significantly slower (up to 100% in my
microbenchmarks) with gfortran 12.2.0.

Investigating the matter a bit more, I noticed that ifort recognize
the pattern and essentially produce the same code for both the do loop
and the spread call, while gfortran “naively” call spread, even with
-O3.

Here is a demonstration on godbolt.org: https://godbolt.org/z/dcYEPj8bP

So, my question is: is this something that could be better optimized?
I wonder if simply allowing the compiler to inline spread wouldn't
already enable further optimizations that would lead to the same kind
of performance as found in ifort.
I also think other array intrinsic may benefit from this effort if
similar strategies can be applied.
While I have never been contributing to GCC, but I would be willing to
do this implementation if it is in the reach of my C++ skills, and if
someone can point me in the right direction.

Regards,
Théo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Optimization of spread
  2022-11-03 10:48 Optimization of spread Théo Cavignac
@ 2022-11-03 21:54 ` Mikael Morin
  2022-11-03 22:04   ` Thomas Koenig
  0 siblings, 1 reply; 4+ messages in thread
From: Mikael Morin @ 2022-11-03 21:54 UTC (permalink / raw)
  To: Théo Cavignac; +Cc: gfortran, Thomas Koenig

Hello,

welcome, and thanks for your interest.

Le 03/11/2022 à 11:48, Théo Cavignac via Fortran a écrit :
> Hello,
> I am currently writing some numerical code in Fortran 2003 and I want
> to use the spread intrinsic because having used NumPy heavily for the
> past few years, it feels natural to use such an array primitive.
> I naturally wondered what would be the effect on performance and found
> this on Stack Overflow: https://stackoverflow.com/a/55732905/6324751
> 
> TLDR: spread is as fast, if not faster than a do loop, when using
> ifort. However, it is significantly slower (up to 100% in my
> microbenchmarks) with gfortran 12.2.0.
> 
> Investigating the matter a bit more, I noticed that ifort recognize
> the pattern and essentially produce the same code for both the do loop
> and the spread call, while gfortran “naively” call spread, even with
> -O3.
> 
> Here is a demonstration on godbolt.org: https://godbolt.org/z/dcYEPj8bP
> 
> So, my question is: is this something that could be better optimized?
> I wonder if simply allowing the compiler to inline spread wouldn't
> already enable further optimizations that would lead to the same kind
> of performance as found in ifort.
Well, obviously you can get the same performance gfortran gets with do 
loops if you make gfortran generate do loops in place for spread.

> I also think other array intrinsic may benefit from this effort if
> similar strategies can be applied.
> While I have never been contributing to GCC, but I would be willing to
> do this implementation if it is in the reach of my C++ skills, and if
> someone can point me in the right direction.
> 
The first step to do is get a work environment and build the latest gcc 
git master from source.
The source is actually more C than C++ (the fortran front-end at least). 
  It requires little C++ skills, but time and willingness to decipher 
its complexity.

There are two places where inlining can be done:
  * In front-end passes where the parsed fortran code is rewritten 
before generating the intermediary code for the optimizers.  Thomas 
König can help you there.
  * Directly in the code generation for the optimizers.  It is (much) 
more complex but can avoid the need for temporaries.  I can help you there.

Some links about our development process and conventions:
https://gcc.gnu.org/contribute.html
https://gcc.gnu.org/git.html

How to build GCC:
https://gcc.gnu.org/wiki/InstallingGCC


Mikael



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Optimization of spread
  2022-11-03 21:54 ` Mikael Morin
@ 2022-11-03 22:04   ` Thomas Koenig
  2022-11-16 10:12     ` Théo Cavignac
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Koenig @ 2022-11-03 22:04 UTC (permalink / raw)
  To: Mikael Morin, Théo Cavignac; +Cc: gfortran

Hi,

Mikael beat me to a mail saying essentially the same things by
a few minutes, so I'm just adding a few details.

> There are two places where inlining can be done:
>   * In front-end passes where the parsed fortran code is rewritten 
> before generating the intermediary code for the optimizers.  Thomas 
> König can help you there.

I most certainly can.  frontend-passes.cc contains, among other
functionality, a function to inline MATMUL for small sizes, so
much of the infrastructure is already there.

>   * Directly in the code generation for the optimizers.  It is (much) 
> more complex but can avoid the need for temporaries.  I can help you there.
> 
> Some links about our development process and conventions:
> https://gcc.gnu.org/contribute.html
> https://gcc.gnu.org/git.html

And, if you're into hacking gfortran, some starting pointers are at
https://gcc.gnu.org/wiki/GFortranHacking . But always free feel to ask!

Best regards

	Thomas

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Optimization of spread
  2022-11-03 22:04   ` Thomas Koenig
@ 2022-11-16 10:12     ` Théo Cavignac
  0 siblings, 0 replies; 4+ messages in thread
From: Théo Cavignac @ 2022-11-16 10:12 UTC (permalink / raw)
  To: gfortran

Mikael, Thomas,
Thank you very much for being so welcoming.

> The source is actually more C than C++ (the fortran front-end at least).
That's good to know, I am much more comfortable with C.

> It requires little C++ skills, but time and willingness to decipher its complexity.
Yes, I don't expect that to be easy.

> There are two places where inlining can be done:
>  * In front-end passes where the parsed fortran code is rewritten
> before generating the intermediary code for the optimizers.  Thomas
> König can help you there.
>  * Directly in the code generation for the optimizers.  It is (much)
> more complex but can avoid the need for temporaries.  I can help you there.

My understanding of the compiler inner working being what it is, I
will try to have a look at the higher level side first.

> I most certainly can.  frontend-passes.cc contains, among other
> functionality, a function to inline MATMUL for small sizes, so
> much of the infrastructure is already there.
I will start my investigation there.

> > Some links about our development process and conventions:
> > https://gcc.gnu.org/contribute.html
> > https://gcc.gnu.org/git.html
>
> And, if you're into hacking gfortran, some starting pointers are at
> https://gcc.gnu.org/wiki/GFortranHacking . But always free feel to ask!
I am familiar with git, but I'll have to read the two other documents soon.

Thanks again, hopefully you'll ear about me a little later.

Best regards,
Théo

On Thu, Nov 3, 2022 at 11:04 PM Thomas Koenig <tkoenig@netcologne.de> wrote:
>
> Hi,
>
> Mikael beat me to a mail saying essentially the same things by
> a few minutes, so I'm just adding a few details.
>
> > There are two places where inlining can be done:
> >   * In front-end passes where the parsed fortran code is rewritten
> > before generating the intermediary code for the optimizers.  Thomas
> > König can help you there.
>
> I most certainly can.  frontend-passes.cc contains, among other
> functionality, a function to inline MATMUL for small sizes, so
> much of the infrastructure is already there.
>
> >   * Directly in the code generation for the optimizers.  It is (much)
> > more complex but can avoid the need for temporaries.  I can help you there.
> >
> > Some links about our development process and conventions:
> > https://gcc.gnu.org/contribute.html
> > https://gcc.gnu.org/git.html
>
> And, if you're into hacking gfortran, some starting pointers are at
> https://gcc.gnu.org/wiki/GFortranHacking . But always free feel to ask!
>
> Best regards
>
>         Thomas

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-11-16 10:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-03 10:48 Optimization of spread Théo Cavignac
2022-11-03 21:54 ` Mikael Morin
2022-11-03 22:04   ` Thomas Koenig
2022-11-16 10:12     ` Théo Cavignac

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).