From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id CF1193858415; Thu, 18 Apr 2024 17:58:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CF1193858415 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1713463118; bh=+e3STIlB0RLgiuQsj3PEqemTHmEfd4w/YklHTWVVJzU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=wP2aYcpCFl3suMCMk4Brs/t1el17l0aoQlQSBPkOcaa4NndopdMTjexMUd7R251Yp /ZD+M6AI4f7mcd02rTHsoStKGExFeW3Xa3Cc8QB57CoW1/TqIYcfmkrsI461/qhVbS INzj0uv+Z2dFfDvvg3xzGLq3EaGkVaFzSo/xHiEw= From: "mjr19 at cam dot ac.uk" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114767] gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal Date: Thu, 18 Apr 2024 17:58:38 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: mjr19 at cam dot ac.uk X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114767 --- Comment #4 from mjr19 at cam dot ac.uk --- An issue which I suspect is related is shown by subroutine zradd(c,n) integer :: i,n complex(kind(1d0)) :: c(*) do i=3D1,n c(i)=3Dc(i)+1d0 enddo end subroutine If compiled with gfortran-14 and -O3 -mavx2 it all looks very sensible. If one adds -ffast-math, it looks a lot less sensible, and takes over 70% longer to run. I think it has changed from promoting 1d0 to (1d0,0d0) and t= hen adding that (which one might argue that a strict interpretation of the Fort= ran standard requires, but I am not certain that it does), to collecting all the real parts in a vector, adding 1d0 to them, and avoiding adding 0d0 to the imaginary parts. Unsurprisingly the gain in halving the number of additions= is more than offset by the required vperms and vshufs. Ideally -ffast-math would have noticed that adding 0d0 to the imaginary par= t is not necessary, but then concluded that doing so was faster than any alterna= tive method, and so done so anyway.=