public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/114324] New: AVX2 vectorisation performance regression with gfortran 13/14
@ 2024-03-13 12:52 mjr19 at cam dot ac.uk
  2024-03-13 19:45 ` [Bug tree-optimization/114324] [13/14 Regression] " pinskia at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: mjr19 at cam dot ac.uk @ 2024-03-13 12:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114324

            Bug ID: 114324
           Summary: AVX2 vectorisation performance regression with
                    gfortran 13/14
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mjr19 at cam dot ac.uk
  Target Milestone: ---

Created attachment 57685
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57685&action=edit
Test case of loop showing performance regression

The attached loop, when compiled with "-Ofast -mavx2" runs over 20% slower on
gfortran 13 or (pre-release) 14 than it does on 12.x. Precise versions tested
12.3.0, 13.1.0 and GCC 14 downloaded on 11th March.

Precise slowdown depends on CPU. Tested on Haswell and Kaby Lake desktops.

Adding "-fopenmp" changes the code produced, but 12.3 still beats later
compilers. The analysis below is without -fopenmp.

It appears (to me) that 12.x is using the full width of the ymm registers, and
has a loop of 17 vector instructions, and some scalar loop control, which
performs two iterations of the original Fortran loop.

13.x manages more aggressive unrolling, performing four iterations per pass,
but uses about 54 vector instructions, rather than the 34 one might naively
expect. More instructions does not necessarily mean slower, but here it does.

I attach the test case to which I refer. I would be happy to add the trivial
timing program to show how I have been timing it. The full code is an FFT, but
the test case has been reduced to functional nonsense.

(I note that in other areas there are pleasing performance gains in gfortran
13.x. It is a pity that this partially cancels them.)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-21  9:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-13 12:52 [Bug fortran/114324] New: AVX2 vectorisation performance regression with gfortran 13/14 mjr19 at cam dot ac.uk
2024-03-13 19:45 ` [Bug tree-optimization/114324] [13/14 Regression] " pinskia at gcc dot gnu.org
2024-03-14  8:32 ` rguenth at gcc dot gnu.org
2024-03-14  8:32 ` rguenth at gcc dot gnu.org
2024-03-14 10:37 ` rguenth at gcc dot gnu.org
2024-03-15 20:06 ` mjr19 at cam dot ac.uk
2024-05-01 13:21 ` [Bug tree-optimization/114324] [13/14/15 " mjr19 at cam dot ac.uk
2024-05-21  9:19 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).