[PATCH v2 6/16]middle-end Add Complex Addition with rotation detection

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection
@ 2020-09-25 14:28 Tamar Christina
  2020-09-29 10:02 ` Richard Sandiford
  0 siblings, 1 reply; 4+ messages in thread
From: Tamar Christina @ 2020-09-25 14:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, ook

[-- Attachment #1: Type: text/plain, Size: 639 bytes --]

Hi All,

This patch adds pattern detections for the following operation:

  Addition with rotation of the second argument around the Argand plane.
    Supported rotations are 90 and 180.

    c = a + (b * I) and c = a + (b * I * I)

  where a, b and c are complex numbers.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	* doc/md.texi: Document optabs.
	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
	* optabs.def (cadd90_optab, cadd270_optab): New.
	* tree-vect-slp-patterns.c (class ComplexAddPattern): New.
	(slp_patterns): Add ComplexAddPattern.

-- 

[-- Attachment #2: rb13510.patch --]
[-- Type: text/x-diff, Size: 5039 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6132,6 +6132,17 @@ floating-point mode.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cadd@var{m}@var{n}3} instruction pattern
+@item @samp{cadd@var{m}@var{n}3}
+Perform a vector addition of complex numbers in operand 1 with operand 2
+rotated by @var{m} degrees around the argand plane and storing the result in
+operand 0.  The instruction must perform the operation on data loaded
+contiguously into the vectors.
+The operation is only supported for vector modes @var{n} and with
+rotations @var{m} of 90 or 270.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (atanh_optab, "atanh$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (cadd90_optab, "cadd90$a3")
+OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -663,12 +663,94 @@ graceful_exit:
     }
 };
 
+class ComplexAddPattern : public ComplexPattern
+{
+  protected:
+    ComplexAddPattern (slp_tree node, vec_info *vinfo)
+      : ComplexPattern (node, vinfo)
+    {
+      this->m_arity = 2;
+      this->m_num_args = 2;
+      this->m_vects.create (0);
+      this->m_defs.create (0);
+    }
+
+  public:
+    ~ComplexAddPattern ()
+    {
+      this->m_vects.release ();
+      this->m_defs.release ();
+    }
+
+    static VectPattern* create (slp_tree node, vec_info *vinfo)
+    {
+       return new ComplexAddPattern (node, vinfo);
+    }
+
+    const char* get_name ()
+    {
+      return "Complex Addition";
+    }
+
+    /* Pattern matcher for trying to match complex addition pattern in SLP tree
+       using the N statements statements found in node starting at position IDX.
+       If the operation matches then IFN is set to the operation it matched and
+       the arguments to the two replacement statements are put in VECTS.
+
+       If no match is found then IFN is set to IFN_LAST.
+
+       This function matches the patterns shaped as:
+
+         c[i] = a[i] - b[i+1];
+         c[i+1] = a[i+1] + b[i];
+
+       If a match occurred then TRUE is returned, else FALSE.  */
+
+    bool matches (stmt_vec_info *stmts, int idx)
+    {
+      this->m_last_ifn = IFN_LAST;
+      int base = idx - (this->m_arity - 1);
+      this->m_last_idx = idx;
+      this->m_stmt_info = stmts[0];
+
+      complex_operation_t op
+	= vect_detect_pair_op (base, this->m_node, &this->m_vects);
+
+      /* Find the two components.  Rotation in the complex plane will modify
+	 the operations:
+
+	 * Rotation  0: + +
+	 * Rotation 90: - +
+	 * Rotation 180: - -
+	 * Rotation 270: + -
+
+	Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
+	to care about them here.  */
+      if (op == MINUS_PLUS)
+	this->m_last_ifn = IFN_COMPLEX_ADD_ROT90;
+      else if (op == PLUS_MINUS)
+	this->m_last_ifn = IFN_COMPLEX_ADD_ROT270;
+
+      if (this->m_last_ifn == IFN_LAST)
+	return false;
+
+      /* Correct the arguments after matching.  */
+      std::swap (this->m_vects[1], this->m_vects[3]);
+
+      /* If the two operands are the same, we don't have a permute. In such a case
+	 there is no advantage in doing the replacement.  */
+      return store_results ();
+    }
+};
+
 #define SLP_PATTERN(x) &x::create
 VectPatternDecl slp_patterns[]
 {
   /* For least amount of back-tracking and more efficient matching
      order patterns from the largest to the smallest.  Especially if they
      overlap in what they can detect.  */
+
+  SLP_PATTERN (ComplexAddPattern),
 };
 #undef SLP_PATTERN
 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection
  2020-09-25 14:28 [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection Tamar Christina
@ 2020-09-29 10:02 ` Richard Sandiford
  2020-09-29 10:44   ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Sandiford @ 2020-09-29 10:02 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, rguenther, ook

Tamar Christina <tamar.christina@arm.com> writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6132,6 +6132,17 @@ floating-point mode.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> +@item @samp{cadd@var{m}@var{n}3}
> +Perform a vector addition of complex numbers in operand 1 with operand 2
> +rotated by @var{m} degrees around the argand plane and storing the result in
> +operand 0.  The instruction must perform the operation on data loaded
> +contiguously into the vectors.

Nitpicking, sorry, but I think it would be better to describe the
layout directly rather than in terms of loads, since the preceding
operation might not be a load.

I guess the main question is: what representation do we expect for
big-endian?  A normal Advanced SIMD LDR would give this (for floats):

             MEMORY
   +-----+-----+-----+-----+
   | r0  | i0  | r1  | i1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   array numbering
   +-----+-----+-----+-----+
      V     V     V     V      Advanced SIMD LDR
   +-----+-----+-----+-----+
   | r0  | i0  | r1  | i1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   GCC lane numbering
   +-----+-----+-----+-----+
   |  3  |  2  |  1  |  0  |   Arm lane numbering
   +-----+-----+-----+-----+
  MSB       REGISTER      LSB

but the FC* instructions put the imaginary parts in the more
significant lane, so the pairs of elements above would need
to be reversed:

             MEMORY
   +-----+-----+-----+-----+
   | r0  | i0  | r1  | i1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   array numbering
   +-----+-----+-----+-----+
       \   /       \   /
        \ /         \ /
         X           X         Load and permute
        / \         / \
       /   \       /   \
   +-----+-----+-----+-----+
   | i0  | r0  | i1  | r1  |
   +-----+-----+-----+-----+
   |  0  |  1  |  2  |  3  |   GCC lane numbering
   +-----+-----+-----+-----+
   |  3  |  2  |  1  |  0  |   Arm lane numbering
   +-----+-----+-----+-----+
  MSB       REGISTER      LSB

(Or the whole vector could be reversed.)

We might decide that it just isn't worth doing this for Advanced SIMD.
But should the semantics of the optab be that:

(1) GCC lane number 0 holds a real part, or
(2) the least significant lane holds a real part?

With (1), it would be up to the target to hide the permute above.
With (2), the vectoriser would need to introduce the permute itself.

I'm not sure there's a perfect answer even for Arm targets.  (2) matches
the Advanced SIMD semantics.  But for SVE, the register layout follows
LD1 rather than LDR, and the GCC and architectural lane numbering match up.
(1) would therefore be better than (2) for SVE (and so no permute would be
needed for either endianness on SVE).

> +The operation is only supported for vector modes @var{n} and with
> +rotations @var{m} of 90 or 270.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{ffs@var{m}2} instruction pattern
>  @item @samp{ffs@var{m}2}
>  Store into operand 0 one plus the index of the least significant 1-bit
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31 100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
>  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
>  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
>  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
> +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
>  
>  /* FP scales.  */
>  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
>  OPTAB_D (atanh_optab, "atanh$a2")
>  OPTAB_D (copysign_optab, "copysign$F$a3")
>  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> +OPTAB_D (cadd90_optab, "cadd90$a3")
> +OPTAB_D (cadd270_optab, "cadd270$a3")
>  OPTAB_D (cos_optab, "cos$a2")
>  OPTAB_D (cosh_optab, "cosh$a2")
>  OPTAB_D (exp10_optab, "exp10$a2")
> diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> index 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735 100644
> --- a/gcc/tree-vect-slp-patterns.c
> +++ b/gcc/tree-vect-slp-patterns.c
> @@ -663,12 +663,94 @@ graceful_exit:
>      }
>  };
>  
> +class ComplexAddPattern : public ComplexPattern

Another nitpick, sorry, but type names should be lower case rather than
CamelCase.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection
  2020-09-29 10:02 ` Richard Sandiford
@ 2020-09-29 10:44   ` Richard Biener
  2020-11-03 15:06     ` Tamar Christina
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Biener @ 2020-09-29 10:44 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Tamar Christina, gcc-patches, nd, ook

On Tue, 29 Sep 2020, Richard Sandiford wrote:

> Tamar Christina <tamar.christina@arm.com> writes:
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -6132,6 +6132,17 @@ floating-point mode.
> >  
> >  This pattern is not allowed to @code{FAIL}.
> >  
> > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern
> > +@item @samp{cadd@var{m}@var{n}3}
> > +Perform a vector addition of complex numbers in operand 1 with operand 2
> > +rotated by @var{m} degrees around the argand plane and storing the result in
> > +operand 0.  The instruction must perform the operation on data loaded
> > +contiguously into the vectors.
> 
> Nitpicking, sorry, but I think it would be better to describe the
> layout directly rather than in terms of loads, since the preceding
> operation might not be a load.

So if we're at that and since GCC vectors do not have complex
components can we formulate this in terms avoiding 'complex'?
Isn't this an add of one vector to a vector with adjacant
lanes swapped and possibly negated?  Mentioning that this would
match a complex add in case lanes happen to match up with
complex real/imag parts is OK but the pattern should work
equally well if there's no complex numbers involved?

> I guess the main question is: what representation do we expect for
> big-endian?  A normal Advanced SIMD LDR would give this (for floats):
> 
>              MEMORY
>    +-----+-----+-----+-----+
>    | r0  | i0  | r1  | i1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   array numbering
>    +-----+-----+-----+-----+
>       V     V     V     V      Advanced SIMD LDR
>    +-----+-----+-----+-----+
>    | r0  | i0  | r1  | i1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   GCC lane numbering
>    +-----+-----+-----+-----+
>    |  3  |  2  |  1  |  0  |   Arm lane numbering
>    +-----+-----+-----+-----+
>   MSB       REGISTER      LSB
> 
> but the FC* instructions put the imaginary parts in the more
> significant lane, so the pairs of elements above would need
> to be reversed:
> 
>              MEMORY
>    +-----+-----+-----+-----+
>    | r0  | i0  | r1  | i1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   array numbering
>    +-----+-----+-----+-----+
>        \   /       \   /
>         \ /         \ /
>          X           X         Load and permute
>         / \         / \
>        /   \       /   \
>    +-----+-----+-----+-----+
>    | i0  | r0  | i1  | r1  |
>    +-----+-----+-----+-----+
>    |  0  |  1  |  2  |  3  |   GCC lane numbering
>    +-----+-----+-----+-----+
>    |  3  |  2  |  1  |  0  |   Arm lane numbering
>    +-----+-----+-----+-----+
>   MSB       REGISTER      LSB
> 
> (Or the whole vector could be reversed.)
> 
> We might decide that it just isn't worth doing this for Advanced SIMD.
> But should the semantics of the optab be that:
> 
> (1) GCC lane number 0 holds a real part, or
> (2) the least significant lane holds a real part?
> 
> With (1), it would be up to the target to hide the permute above.
> With (2), the vectoriser would need to introduce the permute itself.
> 
> I'm not sure there's a perfect answer even for Arm targets.  (2) matches
> the Advanced SIMD semantics.  But for SVE, the register layout follows
> LD1 rather than LDR, and the GCC and architectural lane numbering match up.
> (1) would therefore be better than (2) for SVE (and so no permute would be
> needed for either endianness on SVE).
> 
> > +The operation is only supported for vector modes @var{n} and with
> > +rotations @var{m} of 90 or 270.
> > +
> > +This pattern is not allowed to @code{FAIL}.
> > +
> >  @cindex @code{ffs@var{m}2} instruction pattern
> >  @item @samp{ffs@var{m}2}
> >  Store into operand 0 one plus the index of the least significant 1-bit
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3fb005b5af78ef31 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
> >  DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> >  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
> > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
> >  
> >  /* FP scales.  */
> >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
> > diff --git a/gcc/optabs.def b/gcc/optabs.def
> > index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
> > --- a/gcc/optabs.def
> > +++ b/gcc/optabs.def
> > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
> >  OPTAB_D (atanh_optab, "atanh$a2")
> >  OPTAB_D (copysign_optab, "copysign$F$a3")
> >  OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > +OPTAB_D (cadd90_optab, "cadd90$a3")
> > +OPTAB_D (cadd270_optab, "cadd270$a3")
> >  OPTAB_D (cos_optab, "cos$a2")
> >  OPTAB_D (cosh_optab, "cosh$a2")
> >  OPTAB_D (exp10_optab, "exp10$a2")
> > diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
> > index 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2bac736dd970be735 100644
> > --- a/gcc/tree-vect-slp-patterns.c
> > +++ b/gcc/tree-vect-slp-patterns.c
> > @@ -663,12 +663,94 @@ graceful_exit:
> >      }
> >  };
> >  
> > +class ComplexAddPattern : public ComplexPattern
> 
> Another nitpick, sorry, but type names should be lower case rather than
> CamelCase.
> 
> Thanks,
> Richard
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imend

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection
  2020-09-29 10:44   ` Richard Biener
@ 2020-11-03 15:06     ` Tamar Christina
  0 siblings, 0 replies; 4+ messages in thread
From: Tamar Christina @ 2020-11-03 15:06 UTC (permalink / raw)
  To: Richard Biener, Richard Sandiford; +Cc: gcc-patches, nd, ook

[-- Attachment #1: Type: text/plain, Size: 7347 bytes --]

Hi All,

here is a respin with the requested changes.

I just realized I haven't updated the documentation yet but will do
so soon since I'm sure there will be feedback :)

Thanks,
Tamar

gcc/ChangeLog:

	* doc/md.texi: Document optabs.
	* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
	* optabs.def (cadd90_optab, cadd270_optab): New.
	* tree-vect-slp-patterns.c (linear_loads_p, vect_slp_make_linear,
	class complex_add_pattern,complex_add_pattern::matches): New.
	(complex_operations_pattern::matches): Add complex_add_pattern.

> -----Original Message-----
> From: rguenther@c653.arch.suse.de <rguenther@c653.arch.suse.de> On
> Behalf Of Richard Biener
> Sent: Tuesday, September 29, 2020 11:44 AM
> To: Richard Sandiford <Richard.Sandiford@arm.com>
> Cc: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org;
> nd <nd@arm.com>; ook@ucw.cz
> Subject: Re: [PATCH v2 6/16]middle-end Add Complex Addition with rotation
> detection
> 
> On Tue, 29 Sep 2020, Richard Sandiford wrote:
> 
> > Tamar Christina <tamar.christina@arm.com> writes:
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a
> 4e
> > > bbed73a70f29be 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6132,6 +6132,17 @@ floating-point mode.
> > >
> > >  This pattern is not allowed to @code{FAIL}.
> > >
> > > +@cindex @code{cadd@var{m}@var{n}3} instruction pattern @item
> > > +@samp{cadd@var{m}@var{n}3} Perform a vector addition of complex
> > > +numbers in operand 1 with operand 2 rotated by @var{m} degrees
> > > +around the argand plane and storing the result in operand 0.  The
> > > +instruction must perform the operation on data loaded contiguously
> > > +into the vectors.
> >
> > Nitpicking, sorry, but I think it would be better to describe the
> > layout directly rather than in terms of loads, since the preceding
> > operation might not be a load.
> 
> So if we're at that and since GCC vectors do not have complex components
> can we formulate this in terms avoiding 'complex'?
> Isn't this an add of one vector to a vector with adjacant lanes swapped and
> possibly negated?  Mentioning that this would match a complex add in case
> lanes happen to match up with complex real/imag parts is OK but the pattern
> should work equally well if there's no complex numbers involved?
> 
> > I guess the main question is: what representation do we expect for
> > big-endian?  A normal Advanced SIMD LDR would give this (for floats):
> >
> >              MEMORY
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   array numbering
> >    +-----+-----+-----+-----+
> >       V     V     V     V      Advanced SIMD LDR
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   GCC lane numbering
> >    +-----+-----+-----+-----+
> >    |  3  |  2  |  1  |  0  |   Arm lane numbering
> >    +-----+-----+-----+-----+
> >   MSB       REGISTER      LSB
> >
> > but the FC* instructions put the imaginary parts in the more
> > significant lane, so the pairs of elements above would need to be
> > reversed:
> >
> >              MEMORY
> >    +-----+-----+-----+-----+
> >    | r0  | i0  | r1  | i1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   array numbering
> >    +-----+-----+-----+-----+
> >        \   /       \   /
> >         \ /         \ /
> >          X           X         Load and permute
> >         / \         / \
> >        /   \       /   \
> >    +-----+-----+-----+-----+
> >    | i0  | r0  | i1  | r1  |
> >    +-----+-----+-----+-----+
> >    |  0  |  1  |  2  |  3  |   GCC lane numbering
> >    +-----+-----+-----+-----+
> >    |  3  |  2  |  1  |  0  |   Arm lane numbering
> >    +-----+-----+-----+-----+
> >   MSB       REGISTER      LSB
> >
> > (Or the whole vector could be reversed.)
> >
> > We might decide that it just isn't worth doing this for Advanced SIMD.
> > But should the semantics of the optab be that:
> >
> > (1) GCC lane number 0 holds a real part, or
> > (2) the least significant lane holds a real part?
> >
> > With (1), it would be up to the target to hide the permute above.
> > With (2), the vectoriser would need to introduce the permute itself.
> >
> > I'm not sure there's a perfect answer even for Arm targets.  (2)
> > matches the Advanced SIMD semantics.  But for SVE, the register layout
> > follows
> > LD1 rather than LDR, and the GCC and architectural lane numbering match
> up.
> > (1) would therefore be better than (2) for SVE (and so no permute
> > would be needed for either endianness on SVE).
> >
> > > +The operation is only supported for vector modes @var{n} and with
> > > +rotations @var{m} of 90 or 270.
> > > +
> > > +This pattern is not allowed to @code{FAIL}.
> > > +
> > >  @cindex @code{ffs@var{m}2} instruction pattern  @item
> > > @samp{ffs@var{m}2}  Store into operand 0 one plus the index of the
> > > least significant 1-bit diff --git a/gcc/internal-fn.def
> > > b/gcc/internal-fn.def index
> > >
> 13e60828fcf5db6c5f15aae2bacd4cf04029e430..956a65a338c157b51de7e78a3f
> > > b005b5af78ef31 100644
> > > --- a/gcc/internal-fn.def
> > > +++ b/gcc/internal-fn.def
> > > @@ -275,6 +275,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb,
> > > binary)  DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin,
> binary)
> > > DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
> > > DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
> > > +DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST,
> cadd90,
> > > +binary) DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270,
> ECF_CONST,
> > > +cadd270, binary)
> > >
> > >  /* FP scales.  */
> > >  DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) diff --git
> > > a/gcc/optabs.def b/gcc/optabs.def index
> > >
> 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f
> > > 6848e80edf936d 100644
> > > --- a/gcc/optabs.def
> > > +++ b/gcc/optabs.def
> > > @@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")  OPTAB_D
> > > (atanh_optab, "atanh$a2")  OPTAB_D (copysign_optab, "copysign$F$a3")
> > > OPTAB_D (xorsign_optab, "xorsign$F$a3")
> > > +OPTAB_D (cadd90_optab, "cadd90$a3") OPTAB_D (cadd270_optab,
> > > +"cadd270$a3")
> > >  OPTAB_D (cos_optab, "cos$a2")
> > >  OPTAB_D (cosh_optab, "cosh$a2")
> > >  OPTAB_D (exp10_optab, "exp10$a2")
> > > diff --git a/gcc/tree-vect-slp-patterns.c
> > > b/gcc/tree-vect-slp-patterns.c index
> > >
> 6453a5b1b6464dba833adc2c2a194db5e712bb79..b2b0ac62e9a69145470f41d2
> ba
> > > c736dd970be735 100644
> > > --- a/gcc/tree-vect-slp-patterns.c
> > > +++ b/gcc/tree-vect-slp-patterns.c
> > > @@ -663,12 +663,94 @@ graceful_exit:
> > >      }
> > >  };
> > >
> > > +class ComplexAddPattern : public ComplexPattern
> >
> > Another nitpick, sorry, but type names should be lower case rather
> > than CamelCase.
> >
> > Thanks,
> > Richard
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> Nuernberg, Germany; GF: Felix Imend

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr13510.patch --]
[-- Type: text/x-diff; name="pr13510.patch", Size: 9225 bytes --]

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b46286943778e16d95b15def4299bcbf8db7eb8..71e226505b2619d10982b59a4ebbed73a70f29be 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -6132,6 +6132,17 @@ floating-point mode.
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{cadd@var{m}@var{n}3} instruction pattern
+@item @samp{cadd@var{m}@var{n}3}
+Perform a vector addition of complex numbers in operand 1 with operand 2
+rotated by @var{m} degrees around the argand plane and storing the result in
+operand 0.  The instruction must perform the operation on data loaded
+contiguously into the vectors.
+The operation is only supported for vector modes @var{n} and with
+rotations @var{m} of 90 or 270.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{ffs@var{m}2} instruction pattern
 @item @samp{ffs@var{m}2}
 Store into operand 0 one plus the index of the least significant 1-bit
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 310d37aa53819791b5df1683afca831f08e5892a..33c54be1e158ddea25c4cd6b1148df8cf4a509b5 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -277,6 +277,9 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
+
 
 /* FP scales.  */
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 78409aa14537d259bf90277751aac00d452a0d3f..2bb0bf857977035bf562a77f5f6848e80edf936d 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -290,6 +290,8 @@ OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (atanh_optab, "atanh$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (cadd90_optab, "cadd90$a3")
+OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-vect-slp-patterns.c b/gcc/tree-vect-slp-patterns.c
index 65044a77d55e24cde6c663e81c11b66ad9650056..0732cf0a6d93be8590b84c39dff82940b280e46b 100644
--- a/gcc/tree-vect-slp-patterns.c
+++ b/gcc/tree-vect-slp-patterns.c
@@ -157,6 +157,114 @@ typedef enum _complex_operation : unsigned {
 
 typedef struct map_ { int a; int b; } map_t;
 
+/*******************************************************************************
+ * General helper functions
+ ******************************************************************************/
+
+/* Check to see if all loads rooted in ROOT are linear.  Linearity is
+   defined as having no gaps between values loaded.  */
+
+static load_permutation_t
+linear_loads_p (slp_tree root, bool *linear)
+{
+  *linear = false;
+  if (!root)
+    return vNULL;
+
+  unsigned i;
+  load_permutation_t loads = vNULL;
+
+  if (SLP_TREE_LOAD_PERMUTATION (root).exists ())
+    {
+      loads = SLP_TREE_LOAD_PERMUTATION (root);
+      unsigned leader = loads[0];
+      unsigned load;
+      FOR_EACH_VEC_ELT_FROM (loads, i, load, 1)
+	if (load != ++leader)
+      return loads;
+    }
+
+  slp_tree child;
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (root), i, child)
+    {
+      loads = linear_loads_p (child, linear);
+      if (!*linear)
+	return loads;
+    }
+
+  *linear = true;
+  return loads;
+}
+
+/* This function attempts to make a node rooted in NODE linear.  If the node
+   if already linear than the node itself is returned in RESULT.
+
+   If the node is not linear then a new VEC_PERM_EXPR node is created with a
+   lane permute that when applied will make the node linear.   If such a
+   permute cannot be created then FALSE is returned from the function.
+
+   Here linearity is defined as having a sequential, monotically increasing
+   load position inside the load permute generated by the loads reachable from
+   NODE.  */
+
+static bool
+vect_slp_make_linear (slp_tree parent, slp_tree node, slp_tree *result)
+{
+  bool is_linear = false;
+  unsigned x, val;
+  load_permutation_t load_perm = linear_loads_p (node, &is_linear);
+  if (is_linear)
+    {
+      *result = node;
+      return true;
+    }
+
+  /* Attempt to linearise the permute.  */
+  vec<std::pair<unsigned, unsigned> > zipped;
+  zipped.create (load_perm.length ());
+  FOR_EACH_VEC_ELT (load_perm, x, val)
+    zipped.quick_push (std::make_pair (val, x));
+  typedef const std::pair<unsigned, unsigned>* cmp_t;
+  zipped.qsort ([](const void *a, const void *b) -> int
+    { return (int)((cmp_t)a)->first - (int)((cmp_t)b)->first; });
+
+  /* Verify if we have a linear permute sequence.  */
+  if (zipped.length () > 0)
+    {
+      unsigned leader = zipped[0].first;
+      for (x = 1; x < zipped.length (); x++)
+	if(!(is_linear = (zipped[x].first == ++leader)))
+	  break;
+    }
+  else
+    is_linear = true;
+
+  if (!is_linear)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			"Loads could not be made linear %p\n",
+			node);
+      zipped.release ();
+      return false;
+  }
+
+  for (x = 0; x < zipped.length (); x++)
+    zipped[x].first = 0;
+
+  /* Create the new permute node and store it instead.  */
+  slp_tree vnode = vect_create_new_slp_node (vNULL, 1);
+  SLP_TREE_CODE (vnode) = VEC_PERM_EXPR;
+  SLP_TREE_LANE_PERMUTATION (vnode) = zipped;
+  SLP_TREE_VECTYPE (vnode) = SLP_TREE_VECTYPE (parent);
+  SLP_TREE_CHILDREN (vnode).quick_push (node);
+  SLP_TREE_REF_COUNT (vnode) = 1;
+  SLP_TREE_LANES (vnode) = SLP_TREE_LANES (node);
+  SLP_TREE_REPRESENTATIVE (vnode) = SLP_TREE_REPRESENTATIVE (parent);
+  *result = vnode;
+  return is_linear;
+}
+
 /*******************************************************************************
  * Simple vector pattern matcher
  ******************************************************************************/
@@ -532,6 +640,93 @@ complex_pattern::validate_p ()
   return true;
 }
 
+
+/*******************************************************************************
+ * complex_add_pattern class
+ ******************************************************************************/
+
+class complex_add_pattern : public complex_pattern
+{
+  protected:
+    complex_add_pattern (slp_tree *node, vec_info *vinfo)
+      : complex_pattern (node, vinfo)
+    {
+      this->m_arity = 2;
+      this->m_num_args = 2;
+    }
+
+  public:
+    static vect_pattern* create (slp_tree *node, vec_info *vinfo)
+    {
+       return new complex_add_pattern (node, vinfo);
+    }
+
+    const char* get_name ()
+    {
+      return "Complex Addition";
+    }
+
+    bool matches ();
+    bool matches (complex_operation_t op, vec<slp_tree> ops);
+};
+
+/* Pattern matcher for trying to match complex addition pattern in SLP tree.
+
+   If no match is found then IFN is set to IFN_LAST.
+   This function matches the patterns shaped as:
+
+   c[i] = a[i] - b[i+1];
+   c[i+1] = a[i+1] + b[i];
+
+   If a match occurred then TRUE is returned, else FALSE.  */
+
+bool
+complex_add_pattern::matches (complex_operation_t op, vec<slp_tree> ops)
+{
+  this->m_ifn = IFN_LAST;
+  this->m_ops.safe_splice (ops);
+
+  /* Find the two components.  Rotation in the complex plane will modify
+     the operations:
+
+      * Rotation  0: + +
+      * Rotation 90: - +
+      * Rotation 180: - -
+      * Rotation 270: + -
+
+      Rotation 0 and 180 can be handled by normal SIMD code, so we don't need
+      to care about them here.  */
+  if (op == MINUS_PLUS)
+    this->m_ifn = IFN_COMPLEX_ADD_ROT90;
+  else if (op == PLUS_MINUS)
+    this->m_ifn = IFN_COMPLEX_ADD_ROT270;
+
+  /* verify that there is a permute, otherwise this isn't a pattern we
+     we support.  */
+  bool is_linear = false;
+  bool all_linear = true;
+  unsigned x;
+  for (x = 0; x < this->m_ops.length () && all_linear; x++)
+    {
+      linear_loads_p (this->m_ops[x], &is_linear);
+      all_linear = all_linear && is_linear;
+    }
+
+  if (this->m_ifn == IFN_LAST || all_linear)
+    return false;
+
+  save_match ();
+  return true;
+}
+
+bool
+complex_add_pattern::matches ()
+{
+  complex_operation_t op
+    = vect_detect_pair_op (*this->m_node, true, &this->m_ops);
+  return matches (op, this->m_ops);
+}
+
 /*******************************************************************************
  * complex_operations_pattern class
  ******************************************************************************/
@@ -580,6 +775,13 @@ complex_operations_pattern::matches ()
   if (op == CMPLX_NONE)
     return false;
 
+  /* Check which pattern this may be.  Match longest pattern first.  */
+  this->m_patt = complex_add_pattern::create (this->m_node, this->m_vinfo);
+  if (this->m_patt->matches (op, this->m_ops))
+    return true;
+
+  delete this->m_patt;
+
   this->m_patt = NULL;
   return false;
 }


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-03 15:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 14:28 [PATCH v2 6/16]middle-end Add Complex Addition with rotation detection Tamar Christina
2020-09-29 10:02 ` Richard Sandiford
2020-09-29 10:44   ` Richard Biener
2020-11-03 15:06     ` Tamar Christina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).