From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 77FDD3858294; Wed, 1 May 2024 13:21:41 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 77FDD3858294 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1714569701; bh=+RascdX5HnCx/VQ4Pw/+tIfgGD7SWikjwroohXhG/fs=; h=From:To:Subject:Date:In-Reply-To:References:From; b=CZcosFWRd0Oh2k1AdZtXVWdHv9Gtc8TxE4RJzTmvlbdruKuZfX7mWwxdmB5z8zUhQ i5wDlYfW5tfYwrB3uxwn0gA2VMmiJ7Vl8w/P+BYkjcRdBkPB/wTAzHHAGM/34G0357 mp328XjI95Ur6r/AeI/Xq3YDHMQVpLLkR4XwIEe4= From: "mjr19 at cam dot ac.uk" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114324] [13/14/15 Regression] AVX2 vectorisation performance regression with gfortran 13/14 Date: Wed, 01 May 2024 13:21:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.1.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: mjr19 at cam dot ac.uk X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.3 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114324 --- Comment #5 from mjr19 at cam dot ac.uk --- Note that bug 114767 also turns out to be a case in which the inability to alternate neg and nop along a vector leads to poor performance with some operations on the complex type. That optimisation improvement request also discusses that the ability to alternate add and nop could be beneficial. Ifort can alternate neg and nop, at least in the simple case of complex(kind(1d0)) :: c(*) do i=3D1,n c(i)=3Dconjg(c(i)) enddo Helped by aggressive default unrolling, it ends up being almost four times faster than gfortran-14 on the machine I tested it on. On asking gfortran-1= 4 to unroll, the difference is reduced to about a factor of two.=