From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.smtpout.orange.fr (smtp-22.smtpout.orange.fr [80.12.242.22]) by sourceware.org (Postfix) with ESMTPS id F02003858C36 for ; Thu, 3 Nov 2022 21:54:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F02003858C36 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=orange.fr Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=orange.fr Received: from [192.168.1.17] ([83.197.245.49]) by smtp.orange.fr with ESMTPA id qiAhoR7TrTyouqiAooc4xS; Thu, 03 Nov 2022 22:54:54 +0100 X-ME-Helo: [192.168.1.17] X-ME-Auth: bW9yaW4tbWlrYWVsQG9yYW5nZS5mcg== X-ME-Date: Thu, 03 Nov 2022 22:54:54 +0100 X-ME-IP: 83.197.245.49 Message-ID: <8f4bdd1c-44f1-fb2f-fdc2-f7fcbef17c43@orange.fr> Date: Thu, 3 Nov 2022 22:54:47 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 Subject: Re: Optimization of spread To: =?UTF-8?Q?Th=c3=a9o_Cavignac?= References: Content-Language: en-US Cc: gfortran , Thomas Koenig From: Mikael Morin In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,FREEMAIL_FROM,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, welcome, and thanks for your interest. Le 03/11/2022 à 11:48, Théo Cavignac via Fortran a écrit : > Hello, > I am currently writing some numerical code in Fortran 2003 and I want > to use the spread intrinsic because having used NumPy heavily for the > past few years, it feels natural to use such an array primitive. > I naturally wondered what would be the effect on performance and found > this on Stack Overflow: https://stackoverflow.com/a/55732905/6324751 > > TLDR: spread is as fast, if not faster than a do loop, when using > ifort. However, it is significantly slower (up to 100% in my > microbenchmarks) with gfortran 12.2.0. > > Investigating the matter a bit more, I noticed that ifort recognize > the pattern and essentially produce the same code for both the do loop > and the spread call, while gfortran “naively” call spread, even with > -O3. > > Here is a demonstration on godbolt.org: https://godbolt.org/z/dcYEPj8bP > > So, my question is: is this something that could be better optimized? > I wonder if simply allowing the compiler to inline spread wouldn't > already enable further optimizations that would lead to the same kind > of performance as found in ifort. Well, obviously you can get the same performance gfortran gets with do loops if you make gfortran generate do loops in place for spread. > I also think other array intrinsic may benefit from this effort if > similar strategies can be applied. > While I have never been contributing to GCC, but I would be willing to > do this implementation if it is in the reach of my C++ skills, and if > someone can point me in the right direction. > The first step to do is get a work environment and build the latest gcc git master from source. The source is actually more C than C++ (the fortran front-end at least). It requires little C++ skills, but time and willingness to decipher its complexity. There are two places where inlining can be done: * In front-end passes where the parsed fortran code is rewritten before generating the intermediary code for the optimizers. Thomas König can help you there. * Directly in the code generation for the optimizers. It is (much) more complex but can avoid the need for temporaries. I can help you there. Some links about our development process and conventions: https://gcc.gnu.org/contribute.html https://gcc.gnu.org/git.html How to build GCC: https://gcc.gnu.org/wiki/InstallingGCC Mikael