Hi all,

I've been looking at implementing the complex multiply patterns for the 
amdgcn port, but I'm not getting the code I was hoping for. When I try 
to use the patterns on x86_64 or AArch64 they don't seem to work there 
either, so is there something wrong with the middle-end? I've tried both 
current HEAD and GCC 11.

The example shown in the internals manual is a simple loop multiplying 
two arrays of complex numbers, and writing the results to a third. I had 
expected that it would use the largest vectorization factor available, 
with the real/imaginary numbers in even/odd lanes as described, but the 
vectorization factor is only 2 (so, a single complex number), and I have 
to set -fvect-cost-model=unlimited to get even that.

I tried another example with SLP and that too uses the cmul patterns 
only for a single real/imaginary pair.

Did proper vectorization of cmul ever really work? There is a case in 
the testsuite for the pattern match, but it isn't in a loop.

Thanks

Andrew

P.S. I attached my testcase, in case I'm doing something stupid.

P.P.S. The manual says the pattern is "cmulm4", etc., but it's actually 
"cmulm3" in the implementation.