From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id C4DE4385DDD5; Tue, 25 Jun 2024 13:47:18 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C4DE4385DDD5
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1719323238;
	bh=/k1tLlC0e1V240zFaAJgoUfGpg1I21cfmYfdRbhfayI=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=Jhj5WFWyEyvbcgK1CigU0o/U9iB9EvFzCf11Q7TrLFIfXKdNF0+Hyo1C8WAnGEWt3
	 JihpRD4/FrXGyr4eBzBA2yEhSOMAJVHqJc4jytDMliYZK6ZlMZKCduzIpsiz/UL28F
	 Ivk7bQdzvwAsjKsNDzJruQjiaj9argZbLDbStuSo=
From: "mjr19 at cam dot ac.uk" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/114324] [13/14/15 Regression] AVX2
 vectorisation performance regression with gfortran 13/14
Date: Tue, 25 Jun 2024 13:47:16 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.1.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: mjr19 at cam dot ac.uk
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 13.4
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-114324-4-vnI0YpE4E8@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114324-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114324-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114324

--- Comment #7 from mjr19 at cam dot ac.uk ---
The patch to GCC 15 in commit
r15-1508-g59221dc587f369695d9b0c2f73aedf8458931f0f  from pr 68855 has made a
significant improvement to the optimisation of these examples at -O3, causi=
ng
the -Ofast version now to be slower than the -O3 version for both of the
attachments. For the two examples given, rough timings in ns/iteration on a
3GHz Kaby Lake are

m3spf

gf-12  -Ofast   26.5
gf-15  -O3      27.6
gf-14  -Ofast   34.8
gf-15  -Ofast   35.1
gf-14  -O3      43.8
gf-12  -O3      44.8

m4spf

gf-12  -Ofast   23.3
gf-15  -O3      23.8
gf-14  -Ofast   29.6
gf-15  -Ofast   29.7
gf-14  -O3      37.3
gf-12  -O3      37.6

All with the flag -mavx2, and in both cases the fastest time is very simila=
r to
ifort -O3. gf-15 is gfortran 15.0-20240623

(I believe there is interest in the optimisation of these expressions. I am=
 an
electronic structure physicist, and the major simulation codes in my area,
Abinit, CASTEP, QE, Siesta, VASP, are all written in Fortran, all use the
complex datatype, are likely to make use of conjugation and also multiplica=
tion
by +/-i, and use large amounts of time on academic supercomputers. The abil=
ity
to alternate neg and nop efficiently along a vector would be very useful if=
 it
dealt with conjg and *(+/-i), and the obvious xor seems quite safe.)=