From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C4DE4385DDD5; Tue, 25 Jun 2024 13:47:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C4DE4385DDD5 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1719323238; bh=/k1tLlC0e1V240zFaAJgoUfGpg1I21cfmYfdRbhfayI=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Jhj5WFWyEyvbcgK1CigU0o/U9iB9EvFzCf11Q7TrLFIfXKdNF0+Hyo1C8WAnGEWt3 JihpRD4/FrXGyr4eBzBA2yEhSOMAJVHqJc4jytDMliYZK6ZlMZKCduzIpsiz/UL28F Ivk7bQdzvwAsjKsNDzJruQjiaj9argZbLDbStuSo= From: "mjr19 at cam dot ac.uk" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114324] [13/14/15 Regression] AVX2 vectorisation performance regression with gfortran 13/14 Date: Tue, 25 Jun 2024 13:47:16 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.1.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: mjr19 at cam dot ac.uk X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.4 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114324 --- Comment #7 from mjr19 at cam dot ac.uk --- The patch to GCC 15 in commit r15-1508-g59221dc587f369695d9b0c2f73aedf8458931f0f from pr 68855 has made a significant improvement to the optimisation of these examples at -O3, causi= ng the -Ofast version now to be slower than the -O3 version for both of the attachments. For the two examples given, rough timings in ns/iteration on a 3GHz Kaby Lake are m3spf gf-12 -Ofast 26.5 gf-15 -O3 27.6 gf-14 -Ofast 34.8 gf-15 -Ofast 35.1 gf-14 -O3 43.8 gf-12 -O3 44.8 m4spf gf-12 -Ofast 23.3 gf-15 -O3 23.8 gf-14 -Ofast 29.6 gf-15 -Ofast 29.7 gf-14 -O3 37.3 gf-12 -O3 37.6 All with the flag -mavx2, and in both cases the fastest time is very simila= r to ifort -O3. gf-15 is gfortran 15.0-20240623 (I believe there is interest in the optimisation of these expressions. I am= an electronic structure physicist, and the major simulation codes in my area, Abinit, CASTEP, QE, Siesta, VASP, are all written in Fortran, all use the complex datatype, are likely to make use of conjugation and also multiplica= tion by +/-i, and use large amounts of time on academic supercomputers. The abil= ity to alternate neg and nop efficiently along a vector would be very useful if= it dealt with conjg and *(+/-i), and the obvious xor seems quite safe.)=