From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vs1-xe33.google.com (mail-vs1-xe33.google.com [IPv6:2607:f8b0:4864:20::e33]) by sourceware.org (Postfix) with ESMTPS id 8BE9D3858D37 for ; Thu, 3 Nov 2022 10:47:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8BE9D3858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-vs1-xe33.google.com with SMTP id 128so1448823vsz.12 for ; Thu, 03 Nov 2022 03:47:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:to:subject:message-id:date:from :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4KFK2yHi8SiQ+/m9xaqTtoNeoKSGiGQg8eInvu97b9Y=; b=Hqx1cHptMKp3dffZdWN5I/cz9p9ml5p+k7h7NpkG3060aFsn6dNZ6zC5O2saKvA+Tr vyxnZtqWJXI89HMxkOwvXeUajWgIY1T0U3VtuEQ3pmyFEq13ZEdLEGuYZyk1P6hejZrI tBebVB0QarVvc/87PQKqTSoj9ns07mD4D4oSEzm4lxs2LZ1qnvO2ykVhijo/1qUGP3sR YNBbkZbW+vHl7Xragy+xfmLgWF/i0ph14ghFRzTISUgtIEcLlAFxkDiX6eKVM+GNBhiZ yRdAYcHq2MNOklXLeKffzsFbLbEU0PFieP66HTw1d8jVOFRR7gdpSX2u7R67uPMCRaMa 8ayQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:to:subject:message-id:date:from :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4KFK2yHi8SiQ+/m9xaqTtoNeoKSGiGQg8eInvu97b9Y=; b=j3WCrwvnC266iQeL1WNa/CRLGGSAT1+h9+UY1m505jhXhyI/QKz54QOJt/NSt5ZSJr 6H0mF87JFZiTPE8iTBCK+iONyQIMRHR6nKqLjvUFdstidDYhPK1XjMGu10D9H8I3eYSE Tb6eEpUj9NoNzd0+qQZ1SMtplmBtwSSzFWc9UO5Lhxn/2f++fg13Iezh6U+w1wsSoHx/ JOj1f3msfnuGwMIe6EcJkisR4fWr5UPJEpg9iYc5MuofKT6oHx1wj9vsCPLCdATBHYTH knQ7IxpB/Xw+3l3vINF83zcy17WETCTJd5Atn2VP71rdIoMofRbXnPDijdObdmO5kJQ2 wd5Q== X-Gm-Message-State: ACrzQf09u8QYDym+/BRqOyyekvDaQXX9gQfyqVU8svRbfuJV7pFaPtqG Ca0VhH+Z2Kvcf5O2tVrsvkkY3CD1kXLJSVMZajBXtxehAjo= X-Google-Smtp-Source: AMsMyM5QROtfqiOh6bZYpt+2CIcg93xJfESp03pGqqUoJx7WBb7xyrUc5XLRAsY0osO0ESzFMbXY6NfZsa12NLAxN5w= X-Received: by 2002:a67:ab46:0:b0:3ac:e26d:dba4 with SMTP id k6-20020a67ab46000000b003ace26ddba4mr12525115vsh.62.1667472478843; Thu, 03 Nov 2022 03:47:58 -0700 (PDT) MIME-Version: 1.0 From: =?UTF-8?Q?Th=C3=A9o_Cavignac?= Date: Thu, 3 Nov 2022 11:48:07 +0100 Message-ID: Subject: Optimization of spread To: fortran@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, I am currently writing some numerical code in Fortran 2003 and I want to use the spread intrinsic because having used NumPy heavily for the past few years, it feels natural to use such an array primitive. I naturally wondered what would be the effect on performance and found this on Stack Overflow: https://stackoverflow.com/a/55732905/6324751 TLDR: spread is as fast, if not faster than a do loop, when using ifort. However, it is significantly slower (up to 100% in my microbenchmarks) with gfortran 12.2.0. Investigating the matter a bit more, I noticed that ifort recognize the pattern and essentially produce the same code for both the do loop and the spread call, while gfortran =E2=80=9Cnaively=E2=80=9D call spread, = even with -O3. Here is a demonstration on godbolt.org: https://godbolt.org/z/dcYEPj8bP So, my question is: is this something that could be better optimized? I wonder if simply allowing the compiler to inline spread wouldn't already enable further optimizations that would lead to the same kind of performance as found in ifort. I also think other array intrinsic may benefit from this effort if similar strategies can be applied. While I have never been contributing to GCC, but I would be willing to do this implementation if it is in the reach of my C++ skills, and if someone can point me in the right direction. Regards, Th=C3=A9o