public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/114767] New: gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal
@ 2024-04-18 12:13 mjr19 at cam dot ac.uk
  2024-04-18 13:05 ` [Bug tree-optimization/114767] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: mjr19 at cam dot ac.uk @ 2024-04-18 12:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114767

            Bug ID: 114767
           Summary: gfortran AVX2 complex multiplication by (0d0,1d0)
                    suboptimal
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mjr19 at cam dot ac.uk
  Target Milestone: ---

Gfortran 14 shows considerable improvement over 13.1 on x86_64 AVX2 on the test
case

subroutine scale_i(c,n)
  integer :: i,n
  complex(kind(1d0)) :: c(*)

  do i=1,n
     c(i)=c(i)*(0d0,1d0)
  enddo
end subroutine scale_i

Both vectorise well, and use an xor for the multiplication by -1 -- good.

But both progress by forming one vector containing the real parts of four
consecutive complex elements, and one of the imaginary parts. The imaginary
parts are then all xor'ed to swap their signs, and further permuting and
shuffling occurs to reassemble things into the correct interleaved order.
Gfortran-14 has reduced the amount of permuting and shuffling to achieve the
same result.

I think that it should be possible to do this with the vector registers holding
the complex data in their natural order. A single xor could switch the signs of
alternate elements, leaving the real parts untouched, and a single vpermilpd
(or the more general vpermpd) could then swap pairs of elements. This should
not only be faster, but use fewer registers too.

(This could be generalised to the case where the constant is a multiple of
(0d0,1d0), say (0d0,q), either by a final multiplication with a vector
containing q repeated, or by replacing the xor by a multiplication by a vector
containing q repeated but with alternating signs.)

Compilation tests with -mavx2 -O3 -ffast-math.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-14 15:30 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-18 12:13 [Bug fortran/114767] New: gfortran AVX2 complex multiplication by (0d0,1d0) suboptimal mjr19 at cam dot ac.uk
2024-04-18 13:05 ` [Bug tree-optimization/114767] " rguenth at gcc dot gnu.org
2024-04-18 13:58 ` mjr19 at cam dot ac.uk
2024-04-18 17:35 ` roger at nextmovesoftware dot com
2024-04-18 17:58 ` mjr19 at cam dot ac.uk
2024-04-18 18:01 ` roger at nextmovesoftware dot com
2024-04-19 13:44 ` mjr19 at cam dot ac.uk
2024-05-14 15:30 ` mjr19 at cam dot ac.uk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).