Hi All,

here is a respin with the requested changes.

I just realized I haven't updated the documentation yet but will do
so soon since I'm sure there will be feedback :)

Thanks,
Tamar

gcc/ChangeLog:

	* doc/md.texi: Document optabs.
	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
	* optabs.def (cadd90_optab, cadd270_optab): New.
	* tree-vect-slp-patterns.c (linear_loads_p, vect_slp_make_linear,
	class complex_add_pattern,complex_add_pattern::matches): New.
	(complex_operations_pattern::matches): Add complex_add_pattern.

> -----Original Message-----
> From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Tuesday, September 29, 2020 11:44 AM
> To: Richard Sandiford <Richard.Sandiford@arm.com>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org;
> nd <nd@arm.com>; ook@ucw.cz
> Subject: Re: [PATCH v2 6/16]middle-end Add Complex Addition with rotation
> detection
> 
> On Tue, 29 Sep 2020, Richard Sandiford wrote:
> 
> > Tamar Christina <tamar.christina@arm.com> writes:
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a
> 4e
> > > bbed73a70f29be 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6132,6 +6132,17 @@ floating-point mode.
> > >
> > >  This pattern is not allowed to @code{FAIL}.
> > >
> > > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern @item
> > > +@samp{cadd@var{m}@var{n}3} Perform a vector addition of complex
> > > +numbers in operand 1 with operand 2 rotated by @var{m} degrees
> > > +around the argand plane and storing the result in operand 0.  The
> > > +instruction must perform the operation on data loaded contiguously
> > > +into the vectors.
> >
> > Nitpicking, sorry, but I think it would be better to describe the
> > layout directly rather than in terms of loads, since the preceding
> > operation might not be a load.
> 
> So if we're at that and since GCC vectors do not have complex components
> can we formulate this in terms avoiding 'complex'?
> Isn't this an add of one vector to a vector with adjacant lanes swapped and
> possibly negated?  Mentioning that this would match a complex add in case
> lanes happen to match up with complex real/imag parts is OK but the pattern
> should work equally well if there's no complex numbers involved?
> 
> > I guess the main question is: what representation do we expect for
> > big-endian?  A normal Advanced SIMD LDR would give this (for floats):
> >
> >              MEMORY
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   array numbering
> >    +-----+-----+-----+-----+
> >       V     V     V     V      Advanced SIMD LDR
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   GCC lane numbering
> >    +-----+-----+-----+-----+
> >    |  3  |  2  |  1  |  0  |   Arm lane numbering
> >    +-----+-----+-----+-----+
> >   MSB       REGISTER      LSB
> >
> > but the FC* instructions put the imaginary parts in the more
> > significant lane, so the pairs of elements above would need to be
> > reversed:
> >
> >              MEMORY
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   array numbering
> >    +-----+-----+-----+-----+
> >        \   /       \   /
> >         \ /         \ /
> >          X           X         Load and permute
> >         / \         / \
> >        /   \       /   \
> >    +-----+-----+-----+-----+
> >    | i0  | r0  | i1  | r1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   GCC lane numbering
> >    +-----+-----+-----+-----+
> >    |  3  |  2  |  1  |  0  |   Arm lane numbering
> >    +-----+-----+-----+-----+
> >   MSB       REGISTER      LSB
> >
> > (Or the whole vector could be reversed.)
> >
> > We might decide that it just isn't worth doing this for Advanced SIMD.
> > But should the semantics of the optab be that:
> >
> > (1) GCC lane number 0 holds a real part, or
> > (2) the least significant lane holds a real part?
> >
> > With (1), it would be up to the target to hide the permute above.
> > With (2), the vectoriser would need to introduce the permute itself.
> >
> > I'm not sure there's a perfect answer even for Arm targets.  (2)
> > matches the Advanced SIMD semantics.  But for SVE, the register layout
> > follows
> > LD1 rather than LDR, and the GCC and architectural lane numbering match
> up.
> > (1) would therefore be better than (2) for SVE (and so no permute
> > would be needed for either endianness on SVE).
> >
> > > +The operation is only supported for vector modes @var{n} and with
> > > +rotations @var{m} of 90 or 270.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > >  @cindex @code{ffs@var{m}2} instruction pattern  @item
> > > @samp{ffs@var{m}2}  Store into operand 0 one plus the index of the
> > > least significant 1-bit diff --git a/gcc/internal-fn.def
> > > b/gcc/internal-fn.def index
> > >
> 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3f
> > > b005b5af78ef31 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb,
> > > binary)  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin,
> binary)
> > > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> > > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST,
> cadd90,
> > > +binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270,
> ECF_CONST,
> > > +cadd270, binary)
> > >
> > >  /* FP scales.  */
> > >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git
> > > a/gcc/optabs.def b/gcc/optabs.def index
> > >
> 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f
> > > 6848e80edf936d 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")  OPTAB_D
> > > (atanh_optab, "atanh$a2")  OPTAB_D (copysign_optab, "copysign$F$a3")
> > > OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > > +OPTAB_D (cadd90_optab, "cadd90$a3") OPTAB_D (cadd270_optab,
> > > +"cadd270$a3")
> > >  OPTAB_D (cos_optab, "cos$a2")
> > >  OPTAB_D (cosh_optab, "cosh$a2")
> > >  OPTAB_D (exp10_optab, "exp10$a2")
> > > diff --git a/gcc/tree-vect-slp-patterns.c
> > > b/gcc/tree-vect-slp-patterns.c index
> > >
> 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2
> ba
> > > c736dd970be735 100644
> > > --- a/gcc/tree-vect-slp-patterns.c
> > > +++ b/gcc/tree-vect-slp-patterns.c
> > > @@ -663,12 +663,94 @@ graceful_exit:
> > >      }
> > >  };
> > >
> > > +class ComplexAddPattern : public ComplexPattern
> >
> > Another nitpick, sorry, but type names should be lower case rather
> > than CamelCase.
> >
> > Thanks,
> > Richard
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imend