From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id CF1193858415; Thu, 18 Apr 2024 17:58:38 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CF1193858415
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1713463118;
	bh=+e3STIlB0RLgiuQsj3PEqemTHmEfd4w/YklHTWVVJzU=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=wP2aYcpCFl3suMCMk4Brs/t1el17l0aoQlQSBPkOcaa4NndopdMTjexMUd7R251Yp
	 /ZD+M6AI4f7mcd02rTHsoStKGExFeW3Xa3Cc8QB57CoW1/TqIYcfmkrsI461/qhVbS
	 INzj0uv+Z2dFfDvvg3xzGLq3EaGkVaFzSo/xHiEw=
From: "mjr19 at cam dot ac.uk" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/114767] gfortran AVX2 complex multiplication
 by (0d0,1d0) suboptimal
Date: Thu, 18 Apr 2024 17:58:38 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: mjr19 at cam dot ac.uk
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-114767-4-04SNwtE9dk@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-114767-4@http.gcc.gnu.org/bugzilla/>
References: <bug-114767-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114767

--- Comment #4 from mjr19 at cam dot ac.uk ---
An issue which I suspect is related is shown by

subroutine zradd(c,n)
  integer :: i,n
  complex(kind(1d0)) :: c(*)

  do i=3D1,n
     c(i)=3Dc(i)+1d0
  enddo
end subroutine

If compiled with gfortran-14 and -O3 -mavx2 it all looks very sensible.

If one adds -ffast-math, it looks a lot less sensible, and takes over 70%
longer to run. I think it has changed from promoting 1d0 to (1d0,0d0) and t=
hen
adding that (which one might argue that a strict interpretation of the Fort=
ran
standard requires, but I am not certain that it does), to collecting all the
real parts in a vector, adding 1d0 to them, and avoiding adding 0d0 to the
imaginary parts. Unsurprisingly the gain in halving the number of additions=
 is
more than offset by the required vperms and vshufs.

Ideally -ffast-math would have noticed that adding 0d0 to the imaginary par=
t is
not necessary, but then concluded that doing so was faster than any alterna=
tive
method, and so done so anyway.=