From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id A8B543858C3A; Wed, 13 Mar 2024 12:52:52 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A8B543858C3A
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1710334372;
	bh=RZMczQUdLRq9Gl47Jvjfu4MVdEKAk2rZeahmWBFhp2g=;
	h=From:To:Subject:Date:From;
	b=S470jyD43ZHj6FtXGqxQ8Vw1Q6J6UhT18tOzrMGMV3ObCn0HciQCrl0fziElvW3FS
	 KCZhVJzFVwaxg9A8x1gQPcOxrk69tjVv9hcQoCZEpZq1OaVdj7JeE4B9eLhurQuFjb
	 PjZYrKQc3nIl0UvSmB8GvgcnpXre/JmmUx68mYiY=
From: "mjr19 at cam dot ac.uk" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug fortran/114324] New: AVX2 vectorisation performance regression
 with gfortran 13/14
Date: Wed, 13 Mar 2024 12:52:51 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: fortran
X-Bugzilla-Version: 13.1.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: mjr19 at cam dot ac.uk
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
 attachments.created
Message-ID: <bug-114324-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114324

            Bug ID: 114324
           Summary: AVX2 vectorisation performance regression with
                    gfortran 13/14
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mjr19 at cam dot ac.uk
  Target Milestone: ---

Created attachment 57685
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D57685&action=3Dedit
Test case of loop showing performance regression

The attached loop, when compiled with "-Ofast -mavx2" runs over 20% slower =
on
gfortran 13 or (pre-release) 14 than it does on 12.x. Precise versions test=
ed
12.3.0, 13.1.0 and GCC 14 downloaded on 11th March.

Precise slowdown depends on CPU. Tested on Haswell and Kaby Lake desktops.

Adding "-fopenmp" changes the code produced, but 12.3 still beats later
compilers. The analysis below is without -fopenmp.

It appears (to me) that 12.x is using the full width of the ymm registers, =
and
has a loop of 17 vector instructions, and some scalar loop control, which
performs two iterations of the original Fortran loop.

13.x manages more aggressive unrolling, performing four iterations per pass,
but uses about 54 vector instructions, rather than the 34 one might naively
expect. More instructions does not necessarily mean slower, but here it doe=
s.

I attach the test case to which I refer. I would be happy to add the trivial
timing program to show how I have been timing it. The full code is an FFT, =
but
the test case has been reduced to functional nonsense.

(I note that in other areas there are pleasing performance gains in gfortran
13.x. It is a pity that this partially cancels them.)=