public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Native complex operations
@ 2023-07-17  9:02 Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 1/9] Native complex operations: Conditional lowering Sylvain Noiry
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Hi,

I have recently started a discussion about exposing complex operations directly
to the backends, to better exploit ISA with complex instructions. The title of 
the original message is "[RFC] Exposing complex numbers to target backends" [1].

This message starts a serie of 9 patches of the implementation that I've done. 
8 patches are about generic code, split by features. The last one is an 
experimental update of the x86 backend which exploits the newly exposed 
complex operations.

My original work was on the KVX backend from Kalray, where the ISA has complex
instructions. So I have obtained huge performance gains, on par with a code 
which uses builtins. On x86 there are gains without -ffast-math because less
calls to helpers are performed, but gains are marginal with -ffast-math due to
the lack of complex instructions.

[1] https://gcc.gnu.org/pipermail/gcc/2023-July/241998.html

Summary of the 9 patches:
  1/9: Conditional lowering of complex operations using the backend + update 
       on the TREE complex constants
  2/9: Move of read_complex_part and write_complex_part to target hooks to let
       the backend decide
  3/9: Add a gen_rtx_complex target hook to let the backend use its preferred 
       representation of complex in rtl
  4/9: Support and optimize the use of classical registers to represent complex
  5/9: Expose the conjugate operation down to the backend
  6/9: Expose and optimize complex rotations using internals functions and 
       conditional lowering
  7/9: Allow the vectorizer to work on complex type like it does on scalars
  8/9: Add explicit vectors of complex. This remains optional
  9/9: Experimental update on the x86 backend to exploit some of the previous
       features

The following sections explains the features added by each patch and 
illustrates them with examples on KVX, because of the backend support all the
new features. All examples are compiled with -O2 -ffast-math.

Patches 1 to 4 are required to have the minimal set of features which allows a
backend to exploit native complex operations.

PATCH 1/9: 
  - Change the TREE complex constants by adding a new field called "both" in 
    the tree_complex struct, which holds a vector of the real and imaginary 
    parts. This make the handling of constants during the cplxlower and expand
    passes easier. Any change to the one part will also affect the vector, 
    so very few changes are needed elsewhere.
  - Check in the optab for a complex pattern for almost all operations in the 
    cplxlower pass. The lowering is done only if an optab code is found. Some 
    conditions on presence on constants in the operands were also added, which 
    can be a subject of discussions.
  - Add a complex component for both parts in the cplxlower pass. When an 
    operation is lowered, the both part is recreated using a COMPLEX_EXPR. 
    When an operation is kept non-lowerd, real and imaginary parts are 
    extracted using REALPART_EXPR and IMAGPART_EXPR.

PATCH 2/9:
  - Move the inner implementation of read_complex_part and write_complex_part 
    to target hooks. This allows each backend to have its own implementation, 
    while the default ones are almost the same as before. Going back to 
    standard functions may be a point to discuss if no incompatible change are 
    done by the target to the default implementation.
  - Change the signature of read_complex_part and write_complex_part, to allow 
    both parts as a part. This affects all the calls to these functions.

PATCH 3/9:
  - Add a new target hook to replace gen_rtx_CONCAT when a complex element 
    needs to be created. The default implementation uses gen_rtx_CONCAT, but 
    the KVX implementation simply created a register with a complex type. 
    A previous attempt was to deal with generating_concat_p in gen_rtx_reg, 
    but no good solutions was found.

PATCH 4/9:
  - Adapt and optimize for the use of native complex operation in rtl, 
    aswell as register of complex types. After this patch, it's now possible 
    to re-implement the three new hooks and write some complex pattern. 
  
  Considering the following example:
  
    _Complex float mul(_Complex float a, _Complex float b)
    { 
      return a * b;
    }

  Previously, the generated code was:
    mul:
        copyw $r3 = $r0
        extfz $r5 = $r0, 32+32-1, 32 ; extract imag part
        ;;      # (end cycle 0)
        fmulw $r4 = $r3, $r1         ; float mul
        ;;      # (end cycle 1)
        fmulw $r2 = $r5, $r1         ; float mul
        extfz $r1 = $r1, 32+32-1, 32 ; extract real part
        ;;      # (end cycle 2)      
        ffmsw $r4 = $r5, $r1         ; float FMS
        ;;      # (end cycle 5)
        ffmaw $r2 = $r3, $r1	     : float FMA
        ;;      # (end cycle 6)
        insf $r0 = $r4, 32+0-1, 0    ; insert real part
        ;;      # (end cycle 9)
        insf $r0 = $r2, 32+32-1, 32  ; insert imag part
        ret
        ;;      # (end cycle 10)

  The KVX has a complex float multiplication instruction, so now the result is:
    mul:
	fmulwc $r0 = $r0, $r1        ; float complex mul
	ret
	;;	# (end cycle 0)

  Similar results are obtains for additions, subtractions, and negations.

  Moreover, if complex FMA and FMS patterns are defined, these operations will 
  be caught exactly like scalars. Indeed most passes of GCC will work out of 
  the box for complex types.


PATCH 5/9:
  - Expose the complex conjugate operation to backend. This is done through a 
    new conj optab and a new conj rtl operation. CONJ_EXPR can now be kept and 
    expanded if a conj pattern exists.

  considering the following example:
    
    _Complex float conjugate(_Complex float a)
      { 
	return ~a;
      }

  Previously, the generated code was:
    conjugate:
        extfz $r1 = $r0, 32+32-1, 32 ; extract imag part
        copyw $r2 = $r0
        ;;      # (end cycle 0)
        fnegw $r1 = $r1		     ; float negate
        insf $r0 = $r2, 32+0-1, 0    ; insert real part
        ;;      # (end cycle 1)
        insf $r0 = $r1, 32+32-1, 32  ; insert imag part
        ret
        ;;      # (end cycle 2)

  Now, considering that the imag part is in the upper part of a 64-bit register, 
  the generated code is:
    conjugate:
	fnegd $r0 = $r0              ; double negate
	ret
	;;	# (end cycle 0)

  The KVX can also conjugate one operand before doing an operation. The backend 
  programmer only has to write the pattern and the combiner pass is now able to
  catch it. For the following example:
 
     _Complex float mulc(_Complex float a, _Complex float b)
      { 
        return a * ~b;
      }

  The generated code is:
    mulc:
	fmulwc.c $r0 = $r1, $r0 ; complex mul with conj
	ret
	;;	# (end cycle 0)

PATCH 6/9:
  - Catch complex rotation by 90° and 270° in fold-const.cc like before, but 
    now convert them into the new COMPLEX_ROT90 and COMPLEX_ROT270 internal 
    functions
  - Add crot90 and crot270 optabs to expose these operation the backends.
  - Conditionnaly lower COMPLEX_ROT90/COMPLEX_ROT270 by checking if 
    crot90/crot270 are in the optab
  - convert a + crot90/270(b) into cadd90/270(a, b) in a similar way than FMAs.
    This approach is different than the one implement a few years ago by 
    Tamar Christina. The main advantage is that it does not try to recognize 
    a rotation among the lowered operations, so there are less misses.

  Considering the following example:
    _Complex float crot90(_Complex float a)
      { 
	return a * I;
      }

  Previously the generated code was:
    rot90:
        extfz $r1 = $r0, 32+32-1, 32 ; extract imag part
        copyw $r2 = $r0	
        ;;      # (end cycle 0)
        fnegw $r1 = $r1		     ; float negate
        ;;      # (end cycle 1)
        insf $r0 = $r1, 32+0-1, 0    ; insert real part
        ;;      # (end cycle 5)
        insf $r0 = $r2, 32+32-1, 32  ; insert imag part
        ret
        ;;      # (end cycle 6)

  Even if the KVX does not have a unique instruction for the complex rotation,
  a define_expand pattern can already bring gains:
    crot90:
	fnegd $r0 = $r0		     	    ; double negate
	;;	# (end cycle 0)
	sbmm8 $r0 = $r0, 0x0804020180402010 ; swap real imag
	ret
	;;	# (end cycle 1)


PATCH 7/9:
  - Add vector of complex inner types
  - Adapt or duplicate several functions and target hooks to deal with vectors 
    of complex and not just scalars.
  
  After these changes the vectorize works out of box for native complex 
  operations. Considering the following example:
    void
    fmaconjcplx (float complex a[restrict N], float complex b[restrict N],
	         float complex c[restrict N], float complex d[restrict N])
    {
      for (int i = 0; i < N; i++)
	d[i] = c[i] + a[i] * b[i];
    }

  The vectorizer has done its job and the KVX has some SIMD complex 
  instructions, so the result is:
    fmacplx:
        make $r4 = 0
        make $r5 = 32
        ;;      # (end cycle 0)
        loopdo $r5, .L56               ; begin hardware loop
        ;;      # (end cycle 1)
    .L53:
        lq.xs $r10r11 = $r4[$r1]       ; load 128-bit
        ;;      # (end cycle 0)
        lq.xs $r8r9 = $r4[$r0]	       ; load 128-bit
        ;;      # (end cycle 1)
        lq.xs $r6r7 = $r4[$r2]	       ; load 128-bit
        ;;      # (end cycle 2)
        ffmawcp $r6r7 = $r10r11, $r8r9 ; float complex FMA two lanes
        ;;      # (end cycle 5)
        sq.xs $r4[$r3] = $r6r7	       ; store 128-bit
        addd $r4 = $r4, 1
        ;;      # (end cycle 8)
        # loopdo end
    .L56:
        ret

PATCH 8/9:
  - allow the creation of explicit complex vectors in C using 
    __attribute__ ((vector_size ()))

  This patch adds a feature in C, so it's optional and has more implications
  than just GCC.

PATCH 9/9:
  - Experimental support of complex operation in the x86 backend
  - Only in SCmode for now
  - Add (scalar) addition, subtraction, negation, conjugate, move, and mult

  Small examples can show interesting gains even if there is no complex 
  instruction on x86. However the patterns can be implemented using multiple
  instructions in a define_expand. Considering the same complex multiplication
  as the first example compiled with -mavx, the generated code can is:
    mul:
	vshufps	$20, %xmm1, %xmm1, %xmm1
	vshufps	$68, %xmm0, %xmm0, %xmm0 
	vmulps	%xmm1, %xmm0, %xmm0
	vshufps	$13, %xmm0, %xmm0, %xmm1
	vshufps	$8, %xmm0, %xmm0, %xmm0
	vaddsubps %xmm1, %xmm0, %xmm0
	ret

  Howether complete tests like an FFTs only show marginal gains. Huge gains can
  be although be obtained without -ffast-math. But it is still experimental and
  I am not an expert of the x86 backend, so improvement are certainly possible.

Thanks,

Sylvain
  
  

































^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/9] Native complex operations: Conditional lowering
  2023-07-17  9:02 [PATCH 0/9] Native complex operations Sylvain Noiry
@ 2023-07-17  9:02 ` Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 2/9] Native complex operations: Move functions to hooks Sylvain Noiry
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Allow the cplxlower pass to identify if an operation does not need
to be lowered through optabs. In this case, lowering is not performed.
The cplxlower pass now has to handle a mix of lowered and non-lowered
operations. A quick access to both parts of a complex constant is
also implemented.

gcc/lto/ChangeLog:

	* lto-common.cc (compare_tree_sccs_1): Handle both parts of a
	  complex constant

gcc/ChangeLog:

	* coretypes.h: Add enum for complex parts
	* gensupport.cc (match_pattern): Add complex types
	* lto-streamer-out.cc (DFS::DFS_write_tree_body):
	(hash_tree): Handle both parts of a complex constant
	* tree-complex.cc (get_component_var): Support handling of
	both parts of a complex
	(get_component_ssa_name): Likewise
	(set_component_ssa_name): Likewise
	(extract_component): Likewise
	(update_complex_components): Likewise
	(update_complex_components_on_edge): Likewise
	(update_complex_assignment): Likewise
	(update_phi_components): Likewise
	(expand_complex_move): Likewise
	(expand_complex_asm): Update with complex_part_t
	(complex_component_cst_p): New: check if a complex
	component is a constant
	(target_native_complex_operation): New: Check if complex
	operation is supported natively by the backend, through
	the optab
	(expand_complex_operations_1): Condionally lowered ops
	(tree_lower_complex): Support handling of both parts of
	 a complex
	* tree-core.h (struct GTY): Add field for both parts of
	the tree_complex struct
	* tree-streamer-in.cc (lto_input_ts_complex_tree_pointers):
	Handle both parts of a complex constant
	* tree-streamer-out.cc (write_ts_complex_tree_pointers):
	Likewise
	* tree.cc (build_complex): likewise
	* tree.h (class auto_suppress_location_wrappers):
	(type_has_mode_precision_p): Add special case for complex
---
 gcc/coretypes.h          |   9 +
 gcc/gensupport.cc        |   2 +
 gcc/lto-streamer-out.cc  |   2 +
 gcc/lto/lto-common.cc    |   2 +
 gcc/tree-complex.cc      | 434 +++++++++++++++++++++++++++++----------
 gcc/tree-core.h          |   1 +
 gcc/tree-streamer-in.cc  |   1 +
 gcc/tree-streamer-out.cc |   1 +
 gcc/tree.cc              |   8 +
 gcc/tree.h               |  15 +-
 10 files changed, 363 insertions(+), 112 deletions(-)

diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index ca8837cef67..a000c104b53 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -443,6 +443,15 @@ enum optimize_size_level
   OPTIMIZE_SIZE_MAX
 };
 
+/* part of a complex */
+
+typedef enum
+{
+  REAL_P = 0,
+  IMAG_P = 1,
+  BOTH_P = 2
+} complex_part_t;
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second either NULL if the pointer to
    pointer points into a GC object or the actual pointer address if
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 959d1d9c83c..9aa2ba69fcd 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -3746,9 +3746,11 @@ match_pattern (optab_pattern *p, const char *name, const char *pat)
 		    break;
 		if (*p == 0
 		    && (! force_int || mode_class[i] == MODE_INT
+			|| mode_class[i] == MODE_COMPLEX_INT
 			|| mode_class[i] == MODE_VECTOR_INT)
 		    && (! force_partial_int
 			|| mode_class[i] == MODE_INT
+			|| mode_class[i] == MODE_COMPLEX_INT
 			|| mode_class[i] == MODE_PARTIAL_INT
 			|| mode_class[i] == MODE_VECTOR_INT)
 		    && (! force_float
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 5ffa8954022..38c48e44867 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -985,6 +985,7 @@ DFS::DFS_write_tree_body (struct output_block *ob,
     {
       DFS_follow_tree_edge (TREE_REALPART (expr));
       DFS_follow_tree_edge (TREE_IMAGPART (expr));
+      DFS_follow_tree_edge (TREE_COMPLEX_BOTH_PARTS (expr));
     }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
@@ -1417,6 +1418,7 @@ hash_tree (struct streamer_tree_cache_d *cache, hash_map<tree, hashval_t> *map,
     {
       visit (TREE_REALPART (t));
       visit (TREE_IMAGPART (t));
+      visit (TREE_COMPLEX_BOTH_PARTS (t));
     }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 703e665b698..f647ee62f9e 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -1408,6 +1408,8 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
     {
       compare_tree_edges (TREE_REALPART (t1), TREE_REALPART (t2));
       compare_tree_edges (TREE_IMAGPART (t1), TREE_IMAGPART (t2));
+      compare_tree_edges (TREE_COMPLEX_BOTH_PARTS (t1),
+			  TREE_COMPLEX_BOTH_PARTS (t2));
     }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
index 688fe13989c..63753e4acf4 100644
--- a/gcc/tree-complex.cc
+++ b/gcc/tree-complex.cc
@@ -42,6 +42,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfganal.h"
 #include "gimple-fold.h"
 #include "diagnostic-core.h"
+#include "target.h"
+#include "memmodel.h"
+#include "optabs-tree.h"
+#include "internal-fn.h"
 
 
 /* For each complex ssa name, a lattice value.  We're interested in finding
@@ -74,7 +78,7 @@ static vec<complex_lattice_t> complex_lattice_values;
    the hashtable.  */
 static int_tree_htab_type *complex_variable_components;
 
-/* For each complex SSA_NAME, a pair of ssa names for the components.  */
+/* For each complex SSA_NAME, three ssa names for the components.  */
 static vec<tree> complex_ssa_name_components;
 
 /* Vector of PHI triplets (original complex PHI and corresponding real and
@@ -476,17 +480,27 @@ create_one_component_var (tree type, tree orig, const char *prefix,
 /* Retrieve a value for a complex component of VAR.  */
 
 static tree
-get_component_var (tree var, bool imag_p)
+get_component_var (tree var, complex_part_t part)
 {
-  size_t decl_index = DECL_UID (var) * 2 + imag_p;
+  size_t decl_index = DECL_UID (var) * 3 + part;
   tree ret = cvc_lookup (decl_index);
 
   if (ret == NULL)
     {
-      ret = create_one_component_var (TREE_TYPE (TREE_TYPE (var)), var,
-				      imag_p ? "CI" : "CR",
-				      imag_p ? "$imag" : "$real",
-				      imag_p ? IMAGPART_EXPR : REALPART_EXPR);
+      switch (part)
+	{
+	case REAL_P:
+	  ret = create_one_component_var (TREE_TYPE (TREE_TYPE (var)), var,
+					  "CR", "$real", REALPART_EXPR);
+	  break;
+	case IMAG_P:
+	  ret = create_one_component_var (TREE_TYPE (TREE_TYPE (var)), var,
+					  "CI", "$imag", IMAGPART_EXPR);
+	  break;
+	case BOTH_P:
+	  ret = var;
+	  break;
+	}
       cvc_insert (decl_index, ret);
     }
 
@@ -496,13 +510,15 @@ get_component_var (tree var, bool imag_p)
 /* Retrieve a value for a complex component of SSA_NAME.  */
 
 static tree
-get_component_ssa_name (tree ssa_name, bool imag_p)
+get_component_ssa_name (tree ssa_name, complex_part_t part)
 {
   complex_lattice_t lattice = find_lattice_value (ssa_name);
   size_t ssa_name_index;
   tree ret;
 
-  if (lattice == (imag_p ? ONLY_REAL : ONLY_IMAG))
+  if (((lattice == ONLY_IMAG) && (part == REAL_P))
+      || ((lattice == ONLY_REAL) && (part == IMAG_P)))
+
     {
       tree inner_type = TREE_TYPE (TREE_TYPE (ssa_name));
       if (SCALAR_FLOAT_TYPE_P (inner_type))
@@ -511,14 +527,33 @@ get_component_ssa_name (tree ssa_name, bool imag_p)
 	return build_int_cst (inner_type, 0);
     }
 
-  ssa_name_index = SSA_NAME_VERSION (ssa_name) * 2 + imag_p;
+  if (part == BOTH_P)
+    return ssa_name;
+
+  ssa_name_index = SSA_NAME_VERSION (ssa_name) * 3 + part;
+
+  /* increase size of dynamic array if needed */
+  if (ssa_name_index >= complex_ssa_name_components.length ())
+    {
+      complex_ssa_name_components.safe_grow_cleared
+	(2 * complex_ssa_name_components.length (), true);
+      complex_lattice_values.safe_grow_cleared
+	(2 * complex_lattice_values.length (), true);
+    }
+
   ret = complex_ssa_name_components[ssa_name_index];
   if (ret == NULL)
     {
       if (SSA_NAME_VAR (ssa_name))
-	ret = get_component_var (SSA_NAME_VAR (ssa_name), imag_p);
+	ret = get_component_var (SSA_NAME_VAR (ssa_name), part);
       else
-	ret = TREE_TYPE (TREE_TYPE (ssa_name));
+	{
+	  if (part == BOTH_P)
+	    ret = TREE_TYPE (ssa_name);
+	  else
+	    ret = TREE_TYPE (TREE_TYPE (ssa_name));
+	}
+
       ret = make_ssa_name (ret);
 
       /* Copy some properties from the original.  In particular, whether it
@@ -542,7 +577,7 @@ get_component_ssa_name (tree ssa_name, bool imag_p)
    gimple_seq of stuff that needs doing.  */
 
 static gimple_seq
-set_component_ssa_name (tree ssa_name, bool imag_p, tree value)
+set_component_ssa_name (tree ssa_name, complex_part_t part, tree value)
 {
   complex_lattice_t lattice = find_lattice_value (ssa_name);
   size_t ssa_name_index;
@@ -553,14 +588,24 @@ set_component_ssa_name (tree ssa_name, bool imag_p, tree value)
   /* We know the value must be zero, else there's a bug in our lattice
      analysis.  But the value may well be a variable known to contain
      zero.  We should be safe ignoring it.  */
-  if (lattice == (imag_p ? ONLY_REAL : ONLY_IMAG))
+  if (((lattice == ONLY_IMAG) && (part == REAL_P))
+      || ((lattice == ONLY_REAL) && (part == IMAG_P)))
     return NULL;
 
   /* If we've already assigned an SSA_NAME to this component, then this
      means that our walk of the basic blocks found a use before the set.
      This is fine.  Now we should create an initialization for the value
      we created earlier.  */
-  ssa_name_index = SSA_NAME_VERSION (ssa_name) * 2 + imag_p;
+  ssa_name_index = SSA_NAME_VERSION (ssa_name) * 3 + part;
+
+  /* increase size of dynamic array if needed */
+  if (ssa_name_index >= complex_ssa_name_components.length ())
+    {
+      size_t old_size = complex_ssa_name_components.length ();
+      complex_ssa_name_components.safe_grow (2 * old_size, true);
+      complex_lattice_values.safe_grow (2 * old_size, true);
+    }
+
   comp = complex_ssa_name_components[ssa_name_index];
   if (comp)
     ;
@@ -584,7 +629,7 @@ set_component_ssa_name (tree ssa_name, bool imag_p, tree value)
 	  && (!SSA_NAME_VAR (value) || DECL_IGNORED_P (SSA_NAME_VAR (value)))
 	  && !DECL_IGNORED_P (SSA_NAME_VAR (ssa_name)))
 	{
-	  comp = get_component_var (SSA_NAME_VAR (ssa_name), imag_p);
+	  comp = get_component_var (SSA_NAME_VAR (ssa_name), part);
 	  replace_ssa_name_symbol (value, comp);
 	}
 
@@ -595,7 +640,7 @@ set_component_ssa_name (tree ssa_name, bool imag_p, tree value)
   /* Finally, we need to stabilize the result by installing the value into
      a new ssa name.  */
   else
-    comp = get_component_ssa_name (ssa_name, imag_p);
+    comp = get_component_ssa_name (ssa_name, part);
 
   /* Do all the work to assign VALUE to COMP.  */
   list = NULL;
@@ -612,13 +657,14 @@ set_component_ssa_name (tree ssa_name, bool imag_p, tree value)
    Emit any new code before gsi.  */
 
 static tree
-extract_component (gimple_stmt_iterator *gsi, tree t, bool imagpart_p,
+extract_component (gimple_stmt_iterator * gsi, tree t, complex_part_t part,
 		   bool gimple_p, bool phiarg_p = false)
 {
   switch (TREE_CODE (t))
     {
     case COMPLEX_CST:
-      return imagpart_p ? TREE_IMAGPART (t) : TREE_REALPART (t);
+      return (part == BOTH_P) ? t : (part == IMAG_P) ?
+	TREE_IMAGPART (t) : TREE_REALPART (t);
 
     case COMPLEX_EXPR:
       gcc_unreachable ();
@@ -629,7 +675,7 @@ extract_component (gimple_stmt_iterator *gsi, tree t, bool imagpart_p,
 	t = unshare_expr (t);
 	TREE_TYPE (t) = inner_type;
 	TREE_OPERAND (t, 1) = TYPE_SIZE (inner_type);
-	if (imagpart_p)
+	if (part == IMAG_P)
 	  TREE_OPERAND (t, 2) = size_binop (PLUS_EXPR, TREE_OPERAND (t, 2),
 					    TYPE_SIZE (inner_type));
 	if (gimple_p)
@@ -646,10 +692,11 @@ extract_component (gimple_stmt_iterator *gsi, tree t, bool imagpart_p,
     case VIEW_CONVERT_EXPR:
     case MEM_REF:
       {
-	tree inner_type = TREE_TYPE (TREE_TYPE (t));
-
-	t = build1 ((imagpart_p ? IMAGPART_EXPR : REALPART_EXPR),
-		    inner_type, unshare_expr (t));
+	if (part == BOTH_P)
+	  t = unshare_expr (t);
+	else
+	  t = build1 (((part == IMAG_P) ? IMAGPART_EXPR : REALPART_EXPR),
+		      (TREE_TYPE (TREE_TYPE (t))), unshare_expr (t));
 
 	if (gimple_p)
 	  t = force_gimple_operand_gsi (gsi, t, true, NULL, true,
@@ -659,10 +706,12 @@ extract_component (gimple_stmt_iterator *gsi, tree t, bool imagpart_p,
       }
 
     case SSA_NAME:
-      t = get_component_ssa_name (t, imagpart_p);
-      if (TREE_CODE (t) == SSA_NAME && SSA_NAME_DEF_STMT (t) == NULL)
-	gcc_assert (phiarg_p);
-      return t;
+      {
+	t = get_component_ssa_name (t, part);
+	if (TREE_CODE (t) == SSA_NAME && SSA_NAME_DEF_STMT (t) == NULL)
+	  gcc_assert (phiarg_p);
+	return t;
+      }
 
     default:
       gcc_unreachable ();
@@ -673,18 +722,29 @@ extract_component (gimple_stmt_iterator *gsi, tree t, bool imagpart_p,
 
 static void
 update_complex_components (gimple_stmt_iterator *gsi, gimple *stmt, tree r,
-			   tree i)
+			   tree i, tree b = NULL)
 {
   tree lhs;
   gimple_seq list;
 
+  gcc_assert (b || (r && i));
   lhs = gimple_get_lhs (stmt);
+  if (!b)
+    b = lhs;
+  if (!r)
+    r = build1 (REALPART_EXPR, TREE_TYPE (TREE_TYPE (b)), unshare_expr (b));
+  if (!i)
+    i = build1 (IMAGPART_EXPR, TREE_TYPE (TREE_TYPE (b)), unshare_expr (b));
+
+  list = set_component_ssa_name (lhs, REAL_P, r);
+  if (list)
+    gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
 
-  list = set_component_ssa_name (lhs, false, r);
+  list = set_component_ssa_name (lhs, IMAG_P, i);
   if (list)
     gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
 
-  list = set_component_ssa_name (lhs, true, i);
+  list = set_component_ssa_name (lhs, BOTH_P, b);
   if (list)
     gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
 }
@@ -694,11 +754,11 @@ update_complex_components_on_edge (edge e, tree lhs, tree r, tree i)
 {
   gimple_seq list;
 
-  list = set_component_ssa_name (lhs, false, r);
+  list = set_component_ssa_name (lhs, REAL_P, r);
   if (list)
     gsi_insert_seq_on_edge (e, list);
 
-  list = set_component_ssa_name (lhs, true, i);
+  list = set_component_ssa_name (lhs, IMAG_P, i);
   if (list)
     gsi_insert_seq_on_edge (e, list);
 }
@@ -707,19 +767,24 @@ update_complex_components_on_edge (edge e, tree lhs, tree r, tree i)
 /* Update an assignment to a complex variable in place.  */
 
 static void
-update_complex_assignment (gimple_stmt_iterator *gsi, tree r, tree i)
+update_complex_assignment (gimple_stmt_iterator * gsi, tree r, tree i,
+			   tree b = NULL)
 {
   gimple *old_stmt = gsi_stmt (*gsi);
-  gimple_assign_set_rhs_with_ops (gsi, COMPLEX_EXPR, r, i);
+  if (b == NULL)
+    gimple_assign_set_rhs_with_ops (gsi, COMPLEX_EXPR, r, i);
+  else
+    /* dummy assignment, but pr45569.C fails if removed */
+    gimple_assign_set_rhs_from_tree (gsi, b);
+
   gimple *stmt = gsi_stmt (*gsi);
   update_stmt (stmt);
   if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
     bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
 
-  update_complex_components (gsi, gsi_stmt (*gsi), r, i);
+  update_complex_components (gsi, gsi_stmt (*gsi), r, i, b);
 }
 
-
 /* Generate code at the entry point of the function to initialize the
    component variables for a complex parameter.  */
 
@@ -768,7 +833,8 @@ update_phi_components (basic_block bb)
 
 	  for (j = 0; j < 2; j++)
 	    {
-	      tree l = get_component_ssa_name (gimple_phi_result (phi), j > 0);
+	      tree l = get_component_ssa_name (gimple_phi_result (phi),
+					       (complex_part_t) j);
 	      if (TREE_CODE (l) == SSA_NAME)
 		p[j] = create_phi_node (l, bb);
 	    }
@@ -779,7 +845,9 @@ update_phi_components (basic_block bb)
 	      for (j = 0; j < 2; j++)
 		if (p[j])
 		  {
-		    comp = extract_component (NULL, arg, j > 0, false, true);
+		    comp =
+		      extract_component (NULL, arg, (complex_part_t) j, false,
+					 true);
 		    if (TREE_CODE (comp) == SSA_NAME
 			&& SSA_NAME_DEF_STMT (comp) == NULL)
 		      {
@@ -809,13 +877,14 @@ update_phi_components (basic_block bb)
     }
 }
 
+
 /* Expand a complex move to scalars.  */
 
 static void
 expand_complex_move (gimple_stmt_iterator *gsi, tree type)
 {
   tree inner_type = TREE_TYPE (type);
-  tree r, i, lhs, rhs;
+  tree r, i, b, lhs, rhs;
   gimple *stmt = gsi_stmt (*gsi);
 
   if (is_gimple_assign (stmt))
@@ -862,16 +931,13 @@ expand_complex_move (gimple_stmt_iterator *gsi, tree type)
       else
 	{
 	  if (gimple_assign_rhs_code (stmt) != COMPLEX_EXPR)
-	    {
-	      r = extract_component (gsi, rhs, 0, true);
-	      i = extract_component (gsi, rhs, 1, true);
-	    }
+	    update_complex_assignment (gsi, NULL, NULL,
+				       extract_component (gsi, rhs,
+							  BOTH_P, true));
 	  else
-	    {
-	      r = gimple_assign_rhs1 (stmt);
-	      i = gimple_assign_rhs2 (stmt);
-	    }
-	  update_complex_assignment (gsi, r, i);
+	    update_complex_assignment (gsi,
+				       gimple_assign_rhs1 (stmt),
+				       gimple_assign_rhs2 (stmt), NULL);
 	}
     }
   else if (rhs
@@ -883,24 +949,18 @@ expand_complex_move (gimple_stmt_iterator *gsi, tree type)
       location_t loc;
 
       loc = gimple_location (stmt);
-      r = extract_component (gsi, rhs, 0, false);
-      i = extract_component (gsi, rhs, 1, false);
-
-      x = build1 (REALPART_EXPR, inner_type, unshare_expr (lhs));
-      t = gimple_build_assign (x, r);
-      gimple_set_location (t, loc);
-      gsi_insert_before (gsi, t, GSI_SAME_STMT);
+      b = extract_component (gsi, rhs, BOTH_P, false);
 
       if (stmt == gsi_stmt (*gsi))
 	{
-	  x = build1 (IMAGPART_EXPR, inner_type, unshare_expr (lhs));
+	  x = unshare_expr (lhs);
 	  gimple_assign_set_lhs (stmt, x);
-	  gimple_assign_set_rhs1 (stmt, i);
+	  gimple_assign_set_rhs1 (stmt, b);
 	}
       else
 	{
-	  x = build1 (IMAGPART_EXPR, inner_type, unshare_expr (lhs));
-	  t = gimple_build_assign (x, i);
+	  x = unshare_expr (lhs);
+	  t = gimple_build_assign (x, b);
 	  gimple_set_location (t, loc);
 	  gsi_insert_before (gsi, t, GSI_SAME_STMT);
 
@@ -1641,26 +1701,88 @@ expand_complex_asm (gimple_stmt_iterator *gsi)
 		}
 	      /* Make sure to not ICE later, see PR105165.  */
 	      tree zero = build_zero_cst (TREE_TYPE (TREE_TYPE (op)));
-	      set_component_ssa_name (op, false, zero);
-	      set_component_ssa_name (op, true, zero);
+	      set_component_ssa_name (op, REAL_P, zero);
+	      set_component_ssa_name (op, IMAG_P, zero);
+	      set_component_ssa_name (op, BOTH_P, zero);
 	      continue;
 	    }
 	  tree type = TREE_TYPE (op);
 	  tree inner_type = TREE_TYPE (type);
 	  tree r = build1 (REALPART_EXPR, inner_type, op);
 	  tree i = build1 (IMAGPART_EXPR, inner_type, op);
-	  gimple_seq list = set_component_ssa_name (op, false, r);
+	  tree b = op;
+	  gimple_seq list = set_component_ssa_name (op, REAL_P, r);
 
 	  if (list)
 	    gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
 
-	  list = set_component_ssa_name (op, true, i);
+	  list = set_component_ssa_name (op, IMAG_P, i);
+	  if (list)
+	    gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
+
+	  list = set_component_ssa_name (op, BOTH_P, b);
 	  if (list)
 	    gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
 	}
     }
 }
 
+/* Returns true if a complex component is a constant */
+
+static bool
+complex_component_cst_p (tree cplx, complex_part_t part)
+{
+  switch (TREE_CODE (cplx))
+    {
+    case COMPLEX_CST:
+      return true;
+
+    case SSA_NAME:
+      {
+	size_t ssa_name_index = SSA_NAME_VERSION (cplx) * 3 + part;
+	tree val = complex_ssa_name_components[ssa_name_index];
+	return (val) ? CONSTANT_CLASS_P (val) : false;
+      }
+
+    default:
+      return false;
+    }
+}
+
+/* Returns true if the target support a particular complex operation natively */
+
+static bool
+target_native_complex_operation (enum tree_code code, tree type,
+				 tree inner_type, tree ac, tree bc,
+				 complex_lattice_t al, complex_lattice_t bl)
+{
+  /* Native complex instructions are currently only used when both operands are varying,
+     but a finer grain approach may be interesting */
+  if ((al != VARYING) || ((bl != VARYING) && (bl != UNINITIALIZED)))
+    return false;
+
+  /* do not use native operations when a part of the result is constant */
+  if ((bl == UNINITIALIZED)
+      && (complex_component_cst_p (ac, REAL_P)
+	  || complex_component_cst_p (ac, IMAG_P)))
+    return false;
+  else if ((bl != UNINITIALIZED)
+	   &&
+	   ((complex_component_cst_p (ac, REAL_P)
+	     && complex_component_cst_p (bc, REAL_P))
+	    || (complex_component_cst_p (ac, IMAG_P)
+		&& complex_component_cst_p (bc, IMAG_P))))
+    return false;
+
+  optab op = optab_for_tree_code (code, inner_type, optab_default);
+
+  /* no need to search if operation is not in the optab */
+  if (op == unknown_optab)
+    return false;
+
+  return optab_handler (op, TYPE_MODE (type)) != CODE_FOR_nothing;
+}
+
 /* Process one statement.  If we identify a complex operation, expand it.  */
 
 static void
@@ -1729,14 +1851,17 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
 		 && TREE_CODE (lhs) == SSA_NAME)
 	  {
 	    rhs = gimple_assign_rhs1 (stmt);
+	    enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 	    rhs = extract_component (gsi, TREE_OPERAND (rhs, 0),
-		                     gimple_assign_rhs_code (stmt)
-				       == IMAGPART_EXPR,
-				     false);
+				     (rhs_code == IMAGPART_EXPR) ? IMAG_P
+				     : (rhs_code == REALPART_EXPR) ? REAL_P
+				     : BOTH_P, false);
 	    gimple_assign_set_rhs_from_tree (gsi, rhs);
 	    stmt = gsi_stmt (*gsi);
 	    update_stmt (stmt);
 	  }
+	else if (is_gimple_call (stmt))
+	  return;
       }
       return;
     }
@@ -1755,19 +1880,6 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
       bc = gimple_cond_rhs (stmt);
     }
 
-  ar = extract_component (gsi, ac, false, true);
-  ai = extract_component (gsi, ac, true, true);
-
-  if (ac == bc)
-    br = ar, bi = ai;
-  else if (bc)
-    {
-      br = extract_component (gsi, bc, 0, true);
-      bi = extract_component (gsi, bc, 1, true);
-    }
-  else
-    br = bi = NULL_TREE;
-
   al = find_lattice_value (ac);
   if (al == UNINITIALIZED)
     al = VARYING;
@@ -1783,44 +1895,142 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
 	bl = VARYING;
     }
 
-  switch (code)
+  if (target_native_complex_operation
+      (code, type, inner_type, ac, bc, al, bl))
     {
-    case PLUS_EXPR:
-    case MINUS_EXPR:
-      expand_complex_addition (gsi, inner_type, ar, ai, br, bi, code, al, bl);
-      break;
+      tree ab, bb, rb;
+      gimple_seq stmts = NULL;
+      location_t loc = gimple_location (gsi_stmt (*gsi));
+
+      ab = extract_component (gsi, ac, BOTH_P, true);
+      if (ac == bc)
+	bb = ab;
+      else if (bc)
+	{
+	  bb = extract_component (gsi, bc, BOTH_P, true);
+	}
+      else
+	bb = NULL_TREE;
 
-    case MULT_EXPR:
-      expand_complex_multiplication (gsi, type, ar, ai, br, bi, al, bl);
-      break;
+      switch (code)
+	{
+	case PLUS_EXPR:
+	case MINUS_EXPR:
+	case MULT_EXPR:
+	  rb = gimple_build (&stmts, loc, code, type, ab, bb);
+	  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	  update_complex_assignment (gsi, NULL, NULL, rb);
+	  break;
 
-    case TRUNC_DIV_EXPR:
-    case CEIL_DIV_EXPR:
-    case FLOOR_DIV_EXPR:
-    case ROUND_DIV_EXPR:
-    case RDIV_EXPR:
-      expand_complex_division (gsi, type, ar, ai, br, bi, code, al, bl);
-      break;
+	case NEGATE_EXPR:
+	case CONJ_EXPR:
+	  rb = gimple_build (&stmts, loc, code, type, ab);
+	  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	  update_complex_assignment (gsi, NULL, NULL, rb);
+	  break;
 
-    case NEGATE_EXPR:
-      expand_complex_negation (gsi, inner_type, ar, ai);
-      break;
+	case EQ_EXPR:
+	case NE_EXPR:
+	  /* FIXME */
+	  {
+	    gimple *stmt = gsi_stmt (*gsi);
+	    rb = gimple_build (&stmts, loc, code, type, ab, bb);
+	    switch (gimple_code (stmt))
+	      {
+	      case GIMPLE_RETURN:
+		{
+		  greturn *return_stmt = as_a < greturn * >(stmt);
+		  gimple_return_set_retval (return_stmt,
+					    fold_convert (type, rb));
+		}
+		break;
 
-    case CONJ_EXPR:
-      expand_complex_conjugate (gsi, inner_type, ar, ai);
-      break;
+	      case GIMPLE_ASSIGN:
+		update_complex_assignment (gsi, NULL, NULL, rb);
+		gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+		break;
 
-    case EQ_EXPR:
-    case NE_EXPR:
-      expand_complex_comparison (gsi, ar, ai, br, bi, code);
-      break;
+	      case GIMPLE_COND:
+		{
+		  gcond *cond_stmt = as_a < gcond * >(stmt);
+		  gimple_cond_set_code (cond_stmt, EQ_EXPR);
+		  gimple_cond_set_lhs (cond_stmt, rb);
+		  gimple_cond_set_rhs (cond_stmt, boolean_true_node);
+		}
+		break;
 
-    default:
-      gcc_unreachable ();
+	      default:
+		break;
+	      }
+	    break;
+	  }
+
+
+	  /* not supported yet */
+	case TRUNC_DIV_EXPR:
+	case CEIL_DIV_EXPR:
+	case FLOOR_DIV_EXPR:
+	case ROUND_DIV_EXPR:
+	case RDIV_EXPR:
+
+	default:
+	  gcc_unreachable ();
+	}
+      return;
     }
-}
 
+    ar = extract_component (gsi, ac, REAL_P, true);
+    ai = extract_component (gsi, ac, IMAG_P, true);
+
+    if (ac == bc)
+      br = ar, bi = ai;
+    else if (bc)
+      {
+	br = extract_component (gsi, bc, REAL_P, true);
+	bi = extract_component (gsi, bc, IMAG_P, true);
+      }
+    else
+	br = bi = NULL_TREE;
+
+    switch (code)
+      {
+      case PLUS_EXPR:
+      case MINUS_EXPR:
+	expand_complex_addition (gsi, inner_type, ar, ai, br, bi, code, al,
+				 bl);
+	break;
+
+      case MULT_EXPR:
+	expand_complex_multiplication (gsi, type, ar, ai, br, bi, al, bl);
+	break;
+
+      case TRUNC_DIV_EXPR:
+      case CEIL_DIV_EXPR:
+      case FLOOR_DIV_EXPR:
+      case ROUND_DIV_EXPR:
+      case RDIV_EXPR:
+	expand_complex_division (gsi, type, ar, ai, br, bi, code, al, bl);
+	break;
+
+      case NEGATE_EXPR:
+	expand_complex_negation (gsi, inner_type, ar, ai);
+	break;
+
+      case CONJ_EXPR:
+	expand_complex_conjugate (gsi, inner_type, ar, ai);
+	break;
+
+      case EQ_EXPR:
+      case NE_EXPR:
+	expand_complex_comparison (gsi, ar, ai, br, bi, code);
+	break;
+
+      default:
+	gcc_unreachable ();
+      }
+}
 \f
+
 /* Entry point for complex operation lowering during optimization.  */
 
 static unsigned int
@@ -1845,8 +2055,8 @@ tree_lower_complex (void)
 
   complex_variable_components = new int_tree_htab_type (10);
 
-  complex_ssa_name_components.create (2 * num_ssa_names);
-  complex_ssa_name_components.safe_grow_cleared (2 * num_ssa_names, true);
+  complex_ssa_name_components.create (3 * num_ssa_names);
+  complex_ssa_name_components.safe_grow_cleared (3 * num_ssa_names, true);
 
   update_parameter_components ();
 
@@ -1879,7 +2089,9 @@ tree_lower_complex (void)
 		      || is_gimple_min_invariant (op))
 		    continue;
 		  tree arg = gimple_phi_arg_def (phis_to_revisit[j], l);
-		  op = extract_component (NULL, arg, k > 0, false, false);
+		  op =
+		    extract_component (NULL, arg, (complex_part_t) k, false,
+				       false);
 		  SET_PHI_ARG_DEF (phi, l, op);
 		}
 	    }
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 668808a29d0..da6daf99fc1 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1486,6 +1486,7 @@ struct GTY(()) tree_complex {
   struct tree_typed typed;
   tree real;
   tree imag;
+  tree both;
 };
 
 struct GTY(()) tree_vector {
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index 5bead0c3c6a..a1fa2cb9eea 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -695,6 +695,7 @@ lto_input_ts_complex_tree_pointers (class lto_input_block *ib,
 {
   TREE_REALPART (expr) = stream_read_tree_ref (ib, data_in);
   TREE_IMAGPART (expr) = stream_read_tree_ref (ib, data_in);
+  TREE_COMPLEX_BOTH_PARTS (expr) = stream_read_tree_ref (ib, data_in);
 }
 
 
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index ff9694e17dd..be7314ef748 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -592,6 +592,7 @@ write_ts_complex_tree_pointers (struct output_block *ob, tree expr)
 {
   stream_write_tree_ref (ob, TREE_REALPART (expr));
   stream_write_tree_ref (ob, TREE_IMAGPART (expr));
+  stream_write_tree_ref (ob, TREE_COMPLEX_BOTH_PARTS (expr));
 }
 
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 420857b110c..2bc1b0d1e3f 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -2497,6 +2497,14 @@ build_complex (tree type, tree real, tree imag)
 
   tree t = make_node (COMPLEX_CST);
 
+  /* represent both parts as a constant vector */
+  tree vector_type = build_vector_type (TREE_TYPE (real), 2);
+  tree_vector_builder v (vector_type, 1, 2);
+  v.quick_push (real);
+  v.quick_push (imag);
+  tree both = v.build ();
+
+  TREE_COMPLEX_BOTH_PARTS (t) = both;
   TREE_REALPART (t) = real;
   TREE_IMAGPART (t) = imag;
   TREE_TYPE (t) = type ? type : build_complex_type (TREE_TYPE (real));
diff --git a/gcc/tree.h b/gcc/tree.h
index fa02e2907a1..28716b53120 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -634,6 +634,12 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 
 /* Nonzero if TYPE represents a complex floating-point type.  */
 
+#define COMPLEX_INTEGER_TYPE_P(TYPE)	\
+  (TREE_CODE (TYPE) == COMPLEX_TYPE	\
+   && TREE_CODE (TREE_TYPE (TYPE)) == INTEGER_TYPE)
+
+/* Nonzero if TYPE represents a complex floating-point type.  */
+
 #define COMPLEX_FLOAT_TYPE_P(TYPE)	\
   (TREE_CODE (TYPE) == COMPLEX_TYPE	\
    && TREE_CODE (TREE_TYPE (TYPE)) == REAL_TYPE)
@@ -1155,6 +1161,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 /* In a COMPLEX_CST node.  */
 #define TREE_REALPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.real)
 #define TREE_IMAGPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.imag)
+#define TREE_COMPLEX_BOTH_PARTS(NODE) (COMPLEX_CST_CHECK (NODE)->complex.both)
 
 /* In a VECTOR_CST node.  See generic.texi for details.  */
 #define VECTOR_CST_NELTS(NODE) (TYPE_VECTOR_SUBPARTS (TREE_TYPE (NODE)))
@@ -2214,6 +2221,8 @@ class auto_suppress_location_wrappers
   (as_a <scalar_int_mode> (TYPE_CHECK (NODE)->type_common.mode))
 #define SCALAR_FLOAT_TYPE_MODE(NODE) \
   (as_a <scalar_float_mode> (TYPE_CHECK (NODE)->type_common.mode))
+#define COMPLEX_TYPE_MODE(NODE) \
+  (as_a <complex_mode> (TYPE_CHECK (NODE)->type_common.mode))
 #define SET_TYPE_MODE(NODE, MODE) \
   (TYPE_CHECK (NODE)->type_common.mode = (MODE))
 
@@ -6646,7 +6655,11 @@ extern const builtin_structptr_type builtin_structptr_types[6];
 inline bool
 type_has_mode_precision_p (const_tree t)
 {
-  return known_eq (TYPE_PRECISION (t), GET_MODE_PRECISION (TYPE_MODE (t)));
+  if (TREE_CODE (t) == COMPLEX_TYPE)
+    return known_eq (2*TYPE_PRECISION (TREE_TYPE(t)),
+		     GET_MODE_PRECISION (TYPE_MODE (t)));
+  else
+    return known_eq (TYPE_PRECISION (t), GET_MODE_PRECISION (TYPE_MODE (t)));
 }
 
 /* Helper functions for fndecl_built_in_p.  */
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/9] Native complex operations: Move functions to hooks
  2023-07-17  9:02 [PATCH 0/9] Native complex operations Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 1/9] Native complex operations: Conditional lowering Sylvain Noiry
@ 2023-07-17  9:02 ` Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 3/9] Native complex operations: Add gen_rtx_complex hook Sylvain Noiry
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Move read_complex_part and write_complex_part to target hooks. Their
signature also change because of the type of argument part is now
complex_part_t. Calls to theses functions are updated accordingly.

gcc/ChangeLog:

	* target.def: Define hooks for read_complex_part and
	write_complex_part
	* targhooks.cc (default_read_complex_part): New: default
	implementation of read_complex_part
	(default_write_complex_part): New: default implementation
	if write_complex_part
	* targhooks.h: Add default_read_complex_part and
	default_write_complex_part
	* doc/tm.texi: Document the new TARGET_READ_COMPLEX_PART
	and TARGET_WRITE_COMPLEX_PART hooks
	* doc/tm.texi.in: Add TARGET_READ_COMPLEX_PART and
	TARGET_WRITE_COMPLEX_PART
	* expr.cc
	(write_complex_part): Call TARGET_READ_COMPLEX_PART hook
	(read_complex_part): Call TARGET_WRITE_COMPLEX_PART hook
	* expr.h: Update function signatures of read_complex_part
	and write_complex_part
	* builtins.cc (expand_ifn_atomic_compare_exchange_into_call):
	Update calls to read_complex_part and write_complex_part
	(expand_ifn_atomic_compare_exchange): Likewise
	* expmed.cc (flip_storage_order): Likewise
	(clear_storage_hints): Likewise
	and write_complex_part
	(emit_move_complex_push): Likewise
	(emit_move_complex_parts): Likewise
	(expand_assignment): Likewise
	(expand_expr_real_2): Likewise
	(expand_expr_real_1): Likewise
	(const_vector_from_tree): Likewise
	* internal-fn.cc (expand_arith_set_overflow): Likewise
	(expand_arith_overflow_result_store): Likewise
	(expand_addsub_overflow): Likewise
	(expand_neg_overflow): Likewise
	(expand_mul_overflow): Likewise
	(expand_arith_overflow): Likewise
	(expand_UADDC): Likewise
---
 gcc/builtins.cc    |   8 +--
 gcc/doc/tm.texi    |  10 +++
 gcc/doc/tm.texi.in |   4 ++
 gcc/expmed.cc      |   4 +-
 gcc/expr.cc        | 164 +++++++++------------------------------------
 gcc/expr.h         |   5 +-
 gcc/internal-fn.cc |  20 +++---
 gcc/target.def     |  18 +++++
 gcc/targhooks.cc   | 139 ++++++++++++++++++++++++++++++++++++++
 gcc/targhooks.h    |   5 ++
 10 files changed, 224 insertions(+), 153 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 6dff5214ff8..37da6bcae6f 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6347,8 +6347,8 @@ expand_ifn_atomic_compare_exchange_into_call (gcall *call, machine_mode mode)
       if (GET_MODE (boolret) != mode)
 	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
       x = force_reg (mode, x);
-      write_complex_part (target, boolret, true, true);
-      write_complex_part (target, x, false, false);
+      write_complex_part (target, boolret, IMAG_P, true);
+      write_complex_part (target, x, REAL_P, false);
     }
 }
 
@@ -6403,8 +6403,8 @@ expand_ifn_atomic_compare_exchange (gcall *call)
       rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (GET_MODE (boolret) != mode)
 	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
-      write_complex_part (target, boolret, true, true);
-      write_complex_part (target, oldval, false, false);
+      write_complex_part (target, boolret, IMAG_P, true);
+      write_complex_part (target, oldval, REAL_P, false);
     }
 }
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 95ba56e05ae..87997b76338 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4605,6 +4605,16 @@ to return a nonzero value when it is required, the compiler will run out
 of spill registers and print a fatal error message.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_READ_COMPLEX_PART (rtx @var{cplx}, complex_part_t @var{part})
+This hook should return the rtx representing the specified @var{part} of the complex given by @var{cplx}.
+  @var{part} can be the real part, the imaginary part, or both of them.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_WRITE_COMPLEX_PART (rtx @var{cplx}, rtx @var{val}, complex_part_t @var{part}, bool @var{undefined_p})
+This hook should move the rtx value given by @var{val} to the specified @var{var} of the complex given by @var{cplx}.
+  @var{var} can be the real part, the imaginary part, or both of them.
+@end deftypefn
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d..efbf972e6a7 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3390,6 +3390,10 @@ stack.
 
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 
+@hook TARGET_READ_COMPLEX_PART
+
+@hook TARGET_WRITE_COMPLEX_PART
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index fbd4ce2d42f..2f787cc28f9 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -394,8 +394,8 @@ flip_storage_order (machine_mode mode, rtx x)
 
   if (COMPLEX_MODE_P (mode))
     {
-      rtx real = read_complex_part (x, false);
-      rtx imag = read_complex_part (x, true);
+      rtx real = read_complex_part (x, REAL_P);
+      rtx imag = read_complex_part (x, IMAG_P);
 
       real = flip_storage_order (GET_MODE_INNER (mode), real);
       imag = flip_storage_order (GET_MODE_INNER (mode), imag);
diff --git a/gcc/expr.cc b/gcc/expr.cc
index fff09dc9951..e1a0892b4d9 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3480,8 +3480,8 @@ clear_storage_hints (rtx object, rtx size, enum block_op_methods method,
 	  zero = CONST0_RTX (GET_MODE_INNER (mode));
 	  if (zero != NULL)
 	    {
-	      write_complex_part (object, zero, 0, true);
-	      write_complex_part (object, zero, 1, false);
+	      write_complex_part (object, zero, REAL_P, true);
+	      write_complex_part (object, zero, IMAG_P, false);
 	      return NULL;
 	    }
 	}
@@ -3646,126 +3646,18 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, unsigned int align,
    If UNDEFINED_P then the value in CPLX is currently undefined.  */
 
 void
-write_complex_part (rtx cplx, rtx val, bool imag_p, bool undefined_p)
+write_complex_part (rtx cplx, rtx val, complex_part_t part, bool undefined_p)
 {
-  machine_mode cmode;
-  scalar_mode imode;
-  unsigned ibitsize;
-
-  if (GET_CODE (cplx) == CONCAT)
-    {
-      emit_move_insn (XEXP (cplx, imag_p), val);
-      return;
-    }
-
-  cmode = GET_MODE (cplx);
-  imode = GET_MODE_INNER (cmode);
-  ibitsize = GET_MODE_BITSIZE (imode);
-
-  /* For MEMs simplify_gen_subreg may generate an invalid new address
-     because, e.g., the original address is considered mode-dependent
-     by the target, which restricts simplify_subreg from invoking
-     adjust_address_nv.  Instead of preparing fallback support for an
-     invalid address, we call adjust_address_nv directly.  */
-  if (MEM_P (cplx))
-    {
-      emit_move_insn (adjust_address_nv (cplx, imode,
-					 imag_p ? GET_MODE_SIZE (imode) : 0),
-		      val);
-      return;
-    }
-
-  /* If the sub-object is at least word sized, then we know that subregging
-     will work.  This special case is important, since store_bit_field
-     wants to operate on integer modes, and there's rarely an OImode to
-     correspond to TCmode.  */
-  if (ibitsize >= BITS_PER_WORD
-      /* For hard regs we have exact predicates.  Assume we can split
-	 the original object if it spans an even number of hard regs.
-	 This special case is important for SCmode on 64-bit platforms
-	 where the natural size of floating-point regs is 32-bit.  */
-      || (REG_P (cplx)
-	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
-	  && REG_NREGS (cplx) % 2 == 0))
-    {
-      rtx part = simplify_gen_subreg (imode, cplx, cmode,
-				      imag_p ? GET_MODE_SIZE (imode) : 0);
-      if (part)
-        {
-	  emit_move_insn (part, val);
-	  return;
-	}
-      else
-	/* simplify_gen_subreg may fail for sub-word MEMs.  */
-	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
-    }
-
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val,
-		   false, undefined_p);
+  targetm.write_complex_part (cplx, val, part, undefined_p);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
    real part if IMAG_P is false, and the imaginary part if it's true.  */
 
 rtx
-read_complex_part (rtx cplx, bool imag_p)
-{
-  machine_mode cmode;
-  scalar_mode imode;
-  unsigned ibitsize;
-
-  if (GET_CODE (cplx) == CONCAT)
-    return XEXP (cplx, imag_p);
-
-  cmode = GET_MODE (cplx);
-  imode = GET_MODE_INNER (cmode);
-  ibitsize = GET_MODE_BITSIZE (imode);
-
-  /* Special case reads from complex constants that got spilled to memory.  */
-  if (MEM_P (cplx) && GET_CODE (XEXP (cplx, 0)) == SYMBOL_REF)
-    {
-      tree decl = SYMBOL_REF_DECL (XEXP (cplx, 0));
-      if (decl && TREE_CODE (decl) == COMPLEX_CST)
-	{
-	  tree part = imag_p ? TREE_IMAGPART (decl) : TREE_REALPART (decl);
-	  if (CONSTANT_CLASS_P (part))
-	    return expand_expr (part, NULL_RTX, imode, EXPAND_NORMAL);
-	}
-    }
-
-  /* For MEMs simplify_gen_subreg may generate an invalid new address
-     because, e.g., the original address is considered mode-dependent
-     by the target, which restricts simplify_subreg from invoking
-     adjust_address_nv.  Instead of preparing fallback support for an
-     invalid address, we call adjust_address_nv directly.  */
-  if (MEM_P (cplx))
-    return adjust_address_nv (cplx, imode,
-			      imag_p ? GET_MODE_SIZE (imode) : 0);
-
-  /* If the sub-object is at least word sized, then we know that subregging
-     will work.  This special case is important, since extract_bit_field
-     wants to operate on integer modes, and there's rarely an OImode to
-     correspond to TCmode.  */
-  if (ibitsize >= BITS_PER_WORD
-      /* For hard regs we have exact predicates.  Assume we can split
-	 the original object if it spans an even number of hard regs.
-	 This special case is important for SCmode on 64-bit platforms
-	 where the natural size of floating-point regs is 32-bit.  */
-      || (REG_P (cplx)
-	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
-	  && REG_NREGS (cplx) % 2 == 0))
-    {
-      rtx ret = simplify_gen_subreg (imode, cplx, cmode,
-				     imag_p ? GET_MODE_SIZE (imode) : 0);
-      if (ret)
-        return ret;
-      else
-	/* simplify_gen_subreg may fail for sub-word MEMs.  */
-	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
-    }
-
-  return extract_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0,
-			    true, NULL_RTX, imode, imode, false, NULL);
+read_complex_part (rtx cplx, complex_part_t part)
+{
+  return targetm.read_complex_part (cplx, part);
 }
 \f
 /* A subroutine of emit_move_insn_1.  Yet another lowpart generator.
@@ -3936,9 +3828,10 @@ emit_move_complex_push (machine_mode mode, rtx x, rtx y)
     }
 
   emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)),
-		  read_complex_part (y, imag_first));
+		  read_complex_part (y, (imag_first) ? IMAG_P : REAL_P));
   return emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)),
-			 read_complex_part (y, !imag_first));
+			 read_complex_part (y,
+					    (imag_first) ? REAL_P : IMAG_P));
 }
 
 /* A subroutine of emit_move_complex.  Perform the move from Y to X
@@ -3954,8 +3847,8 @@ emit_move_complex_parts (rtx x, rtx y)
       && REG_P (x) && !reg_overlap_mentioned_p (x, y))
     emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, false), false, true);
-  write_complex_part (x, read_complex_part (y, true), true, false);
+  write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true);
+  write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false);
 
   return get_last_insn ();
 }
@@ -5812,9 +5705,9 @@ expand_assignment (tree to, tree from, bool nontemporal)
 		  if (from_rtx)
 		    {
 		      emit_move_insn (XEXP (to_rtx, 0),
-				      read_complex_part (from_rtx, false));
+				      read_complex_part (from_rtx, REAL_P));
 		      emit_move_insn (XEXP (to_rtx, 1),
-				      read_complex_part (from_rtx, true));
+				      read_complex_part (from_rtx, IMAG_P));
 		    }
 		  else
 		    {
@@ -5836,14 +5729,16 @@ expand_assignment (tree to, tree from, bool nontemporal)
 	    concat_store_slow:;
 	      rtx temp = assign_stack_temp (GET_MODE (to_rtx),
 					    GET_MODE_SIZE (GET_MODE (to_rtx)));
-	      write_complex_part (temp, XEXP (to_rtx, 0), false, true);
-	      write_complex_part (temp, XEXP (to_rtx, 1), true, false);
+	      write_complex_part (temp, XEXP (to_rtx, 0), REAL_P, true);
+	      write_complex_part (temp, XEXP (to_rtx, 1), IMAG_P, false);
 	      result = store_field (temp, bitsize, bitpos,
 				    bitregion_start, bitregion_end,
 				    mode1, from, get_alias_set (to),
 				    nontemporal, reversep);
-	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
-	      emit_move_insn (XEXP (to_rtx, 1), read_complex_part (temp, true));
+	      emit_move_insn (XEXP (to_rtx, 0),
+			      read_complex_part (temp, REAL_P));
+	      emit_move_insn (XEXP (to_rtx, 1),
+			      read_complex_part (temp, IMAG_P));
 	    }
 	}
       /* For calls to functions returning variable length structures, if TO_RTX
@@ -10322,8 +10217,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	      complex_expr_swap_order:
 		/* Move the imaginary (op1) and real (op0) parts to their
 		   location.  */
-		write_complex_part (target, op1, true, true);
-		write_complex_part (target, op0, false, false);
+		write_complex_part (target, op1, IMAG_P, true);
+		write_complex_part (target, op0, REAL_P, false);
 
 		return target;
 	      }
@@ -10352,8 +10247,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	  }
 
       /* Move the real (op0) and imaginary (op1) parts to their location.  */
-      write_complex_part (target, op0, false, true);
-      write_complex_part (target, op1, true, false);
+      write_complex_part (target, op0, REAL_P, true);
+      write_complex_part (target, op1, IMAG_P, false);
 
       return target;
 
@@ -11508,7 +11403,8 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 		    rtx parts[2];
 		    for (int i = 0; i < 2; i++)
 		      {
-			rtx op = read_complex_part (op0, i != 0);
+			rtx op =
+			  read_complex_part (op0, (i != 0) ? IMAG_P : REAL_P);
 			if (GET_CODE (op) == SUBREG)
 			  op = force_reg (GET_MODE (op), op);
 			temp = gen_lowpart_common (GET_MODE_INNER (mode1), op);
@@ -12106,11 +12002,11 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 
     case REALPART_EXPR:
       op0 = expand_normal (treeop0);
-      return read_complex_part (op0, false);
+      return read_complex_part (op0, REAL_P);
 
     case IMAGPART_EXPR:
       op0 = expand_normal (treeop0);
-      return read_complex_part (op0, true);
+      return read_complex_part (op0, IMAG_P);
 
     case RETURN_EXPR:
     case LABEL_EXPR:
@@ -13449,8 +13345,8 @@ const_vector_from_tree (tree exp)
 	builder.quick_push (const_double_from_real_value (TREE_REAL_CST (elt),
 							  inner));
       else if (TREE_CODE (elt) == FIXED_CST)
-	builder.quick_push (CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt),
-							  inner));
+	builder.quick_push (CONST_FIXED_FROM_FIXED_VALUE
+			    (TREE_FIXED_CST (elt), inner));
       else
 	builder.quick_push (immed_wide_int_const (wi::to_poly_wide (elt),
 						  inner));
diff --git a/gcc/expr.h b/gcc/expr.h
index 11bff531862..833ff16bd0d 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -261,9 +261,8 @@ extern rtx_insn *emit_move_insn_1 (rtx, rtx);
 
 extern rtx_insn *emit_move_complex_push (machine_mode, rtx, rtx);
 extern rtx_insn *emit_move_complex_parts (rtx, rtx);
-extern rtx read_complex_part (rtx, bool);
-extern void write_complex_part (rtx, rtx, bool, bool);
-extern rtx read_complex_part (rtx, bool);
+extern rtx read_complex_part (rtx, complex_part_t);
+extern void write_complex_part (rtx, rtx, complex_part_t, bool);
 extern rtx emit_move_resolve_push (machine_mode, rtx);
 
 /* Push a block of length SIZE (perhaps variable)
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index f9aaf66cf2a..8d3d4599256 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -917,9 +917,9 @@ expand_arith_set_overflow (tree lhs, rtx target)
 {
   if (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (lhs))) == 1
       && !TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (lhs))))
-    write_complex_part (target, constm1_rtx, true, false);
+    write_complex_part (target, constm1_rtx, IMAG_P, false);
   else
-    write_complex_part (target, const1_rtx, true, false);
+    write_complex_part (target, const1_rtx, IMAG_P, false);
 }
 
 /* Helper for expand_*_overflow.  Store RES into the __real__ part
@@ -974,7 +974,7 @@ expand_arith_overflow_result_store (tree lhs, rtx target,
       expand_arith_set_overflow (lhs, target);
       emit_label (done_label);
     }
-  write_complex_part (target, lres, false, false);
+  write_complex_part (target, lres, REAL_P, false);
 }
 
 /* Helper for expand_*_overflow.  Store RES into TARGET.  */
@@ -1019,7 +1019,7 @@ expand_addsub_overflow (location_t loc, tree_code code, tree lhs,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true, false);
+	write_complex_part (target, const0_rtx, IMAG_P, false);
     }
 
   /* We assume both operands and result have the same precision
@@ -1464,7 +1464,7 @@ expand_neg_overflow (location_t loc, tree lhs, tree arg1, bool is_ubsan,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true, false);
+	write_complex_part (target, const0_rtx, IMAG_P, false);
     }
 
   enum insn_code icode = optab_handler (negv3_optab, mode);
@@ -1589,7 +1589,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true, false);
+	write_complex_part (target, const0_rtx, IMAG_P, false);
     }
 
   if (is_ubsan)
@@ -2406,7 +2406,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
       do_compare_rtx_and_jump (op1, res, NE, true, mode, NULL_RTX, NULL,
 			       all_done_label, profile_probability::very_unlikely ());
       emit_label (set_noovf);
-      write_complex_part (target, const0_rtx, true, false);
+      write_complex_part (target, const0_rtx, IMAG_P, false);
       emit_label (all_done_label);
     }
 
@@ -2675,7 +2675,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
 	{
 	  /* The infinity precision result will always fit into result.  */
 	  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
-	  write_complex_part (target, const0_rtx, true, false);
+	  write_complex_part (target, const0_rtx, IMAG_P, false);
 	  scalar_int_mode mode = SCALAR_INT_TYPE_MODE (type);
 	  struct separate_ops ops;
 	  ops.code = code;
@@ -2840,8 +2840,8 @@ expand_UADDC (internal_fn ifn, gcall *stmt)
   create_input_operand (&ops[3], op2, mode);
   create_input_operand (&ops[4], op3, mode);
   expand_insn (icode, 5, ops);
-  write_complex_part (target, re, false, false);
-  write_complex_part (target, im, true, false);
+  write_complex_part (target, re, REAL_P, false);
+  write_complex_part (target, im, IMAG_P, false);
 }
 
 /* Expand USUBC STMT.  */
diff --git a/gcc/target.def b/gcc/target.def
index 7d684296c17..9798c0f58e4 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3306,6 +3306,24 @@ a pointer to int.",
  bool, (ao_ref *ref),
  default_ref_may_alias_errno)
 
+/* Returns the value corresponding to the specified part of a complex.  */
+DEFHOOK
+(read_complex_part,
+ "This hook should return the rtx representing the specified @var{part} of the complex given by @var{cplx}.\n\
+  @var{part} can be the real part, the imaginary part, or both of them.",
+ rtx,
+ (rtx cplx, complex_part_t part),
+ default_read_complex_part)
+
+/* Moves a value to the specified part of a complex  */
+DEFHOOK
+(write_complex_part,
+ "This hook should move the rtx value given by @var{val} to the specified @var{var} of the complex given by @var{cplx}.\n\
+  @var{var} can be the real part, the imaginary part, or both of them.",
+ void,
+ (rtx cplx, rtx val, complex_part_t part, bool undefined_p),
+ default_write_complex_part)
+
 /* Support for named address spaces.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_ADDR_SPACE_"
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index e190369f87a..d33fcbd9a13 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1532,6 +1532,145 @@ default_preferred_simd_mode (scalar_mode)
   return word_mode;
 }
 
+/* By default, extract one of the components of the complex value CPLX.  Extract the
+   real part if part is REAL_P, and the imaginary part if it is IMAG_P. If part is
+   BOTH_P, return cplx directly*/
+
+rtx
+default_read_complex_part (rtx cplx, complex_part_t part)
+{
+  machine_mode cmode;
+  scalar_mode imode;
+  unsigned ibitsize;
+
+  if (part == BOTH_P)
+    return cplx;
+
+  if (GET_CODE (cplx) == CONCAT)
+    return XEXP (cplx, part);
+
+  cmode = GET_MODE (cplx);
+  imode = GET_MODE_INNER (cmode);
+  ibitsize = GET_MODE_BITSIZE (imode);
+
+  /* Special case reads from complex constants that got spilled to memory.  */
+  if (MEM_P (cplx) && GET_CODE (XEXP (cplx, 0)) == SYMBOL_REF)
+    {
+      tree decl = SYMBOL_REF_DECL (XEXP (cplx, 0));
+      if (decl && TREE_CODE (decl) == COMPLEX_CST)
+	{
+	  tree cplx_part =
+	    (part == IMAG_P) ? TREE_IMAGPART (decl) : TREE_REALPART (decl);
+	  if (CONSTANT_CLASS_P (cplx_part))
+	    return expand_expr (cplx_part, NULL_RTX, imode, EXPAND_NORMAL);
+	}
+    }
+
+  /* For MEMs simplify_gen_subreg may generate an invalid new address
+     because, e.g., the original address is considered mode-dependent
+     by the target, which restricts simplify_subreg from invoking
+     adjust_address_nv.  Instead of preparing fallback support for an
+     invalid address, we call adjust_address_nv directly.  */
+  if (MEM_P (cplx))
+    return adjust_address_nv (cplx, imode, (part == IMAG_P)
+			      ? GET_MODE_SIZE (imode) : 0);
+
+  /* If the sub-object is at least word sized, then we know that subregging
+     will work.  This special case is important, since extract_bit_field
+     wants to operate on integer modes, and there's rarely an OImode to
+     correspond to TCmode.  */
+  if (ibitsize >= BITS_PER_WORD
+      /* For hard regs we have exact predicates.  Assume we can split
+	 the original object if it spans an even number of hard regs.
+	 This special case is important for SCmode on 64-bit platforms
+	 where the natural size of floating-point regs is 32-bit.  */
+      || (REG_P (cplx)
+	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
+	  && REG_NREGS (cplx) % 2 == 0))
+    {
+      rtx ret = simplify_gen_subreg (imode, cplx, cmode, (part == IMAG_P)
+				     ? GET_MODE_SIZE (imode) : 0);
+      if (ret)
+	return ret;
+      else
+	/* simplify_gen_subreg may fail for sub-word MEMs.  */
+	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
+    }
+
+  return extract_bit_field (cplx, ibitsize, (part == IMAG_P) ? ibitsize : 0,
+			    true, NULL_RTX, imode, imode, false, NULL);
+}
+
+/* By default, Write to one of the components of the complex value CPLX.  Write VAL to
+   the real part if part is REAL_P, and the imaginary part if it is IMAG_P. If part is
+   BOTH_P, call recursively with REAL_P and IMAG_P  */
+
+void
+default_write_complex_part (rtx cplx, rtx val, complex_part_t part, bool undefined_p)
+{
+  machine_mode cmode;
+  scalar_mode imode;
+  unsigned ibitsize;
+
+  if (part == BOTH_P)
+    {
+      write_complex_part (cplx, read_complex_part (val, REAL_P), REAL_P, false);
+      write_complex_part (cplx, read_complex_part (val, IMAG_P), IMAG_P, false);
+      return;
+    }
+
+  if (GET_CODE (cplx) == CONCAT)
+    {
+      emit_move_insn (XEXP (cplx, part == IMAG_P), val);
+      return;
+    }
+
+  cmode = GET_MODE (cplx);
+  imode = GET_MODE_INNER (cmode);
+  ibitsize = GET_MODE_BITSIZE (imode);
+
+  /* For MEMs simplify_gen_subreg may generate an invalid new address
+     because, e.g., the original address is considered mode-dependent
+     by the target, which restricts simplify_subreg from invoking
+     adjust_address_nv.  Instead of preparing fallback support for an
+     invalid address, we call adjust_address_nv directly.  */
+  if (MEM_P (cplx))
+    {
+      emit_move_insn (adjust_address_nv (cplx, imode, (part == IMAG_P)
+					 ? GET_MODE_SIZE (imode) : 0), val);
+      return;
+    }
+
+  /* If the sub-object is at least word sized, then we know that subregging
+     will work.  This special case is important, since store_bit_field
+     wants to operate on integer modes, and there's rarely an OImode to
+     correspond to TCmode.  */
+  if (ibitsize >= BITS_PER_WORD
+      /* For hard regs we have exact predicates.  Assume we can split
+	 the original object if it spans an even number of hard regs.
+	 This special case is important for SCmode on 64-bit platforms
+	 where the natural size of floating-point regs is 32-bit.  */
+      || (REG_P (cplx)
+	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
+	  && REG_NREGS (cplx) % 2 == 0))
+    {
+      rtx cplx_part = simplify_gen_subreg (imode, cplx, cmode,
+					   (part == IMAG_P) ?
+					   GET_MODE_SIZE (imode) : 0);
+      if (cplx_part)
+	{
+	  emit_move_insn (cplx_part, val);
+	  return;
+	}
+      else
+	/* simplify_gen_subreg may fail for sub-word MEMs.  */
+	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
+    }
+
+  store_bit_field (cplx, ibitsize, (part == IMAG_P) ? ibitsize : 0, 0, 0,
+		   imode, val, false, undefined_p);
+}
+
 /* By default do not split reductions further.  */
 
 machine_mode
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 1a0db8dddd5..805abd96938 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -124,6 +124,11 @@ extern opt_machine_mode default_get_mask_mode (machine_mode);
 extern bool default_empty_mask_is_expensive (unsigned);
 extern vector_costs *default_vectorize_create_costs (vec_info *, bool);
 
+extern rtx default_read_complex_part (rtx cplx, complex_part_t part);
+extern void default_write_complex_part (rtx cplx, rtx val,
+					complex_part_t part,
+					bool undefined_p);
+
 /* OpenACC hooks.  */
 extern bool default_goacc_validate_dims (tree, int [], int, unsigned);
 extern int default_goacc_dim_limit (int);
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/9] Native complex operations: Add gen_rtx_complex hook
  2023-07-17  9:02 [PATCH 0/9] Native complex operations Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 1/9] Native complex operations: Conditional lowering Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 2/9] Native complex operations: Move functions to hooks Sylvain Noiry
@ 2023-07-17  9:02 ` Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 4/9] Native complex operations: Allow native complex regs and ops in rtl Sylvain Noiry
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Add a new target hook for complex element creation during
the expand pass, called gen_rtx_complex. The default implementation
calls gen_rtx_CONCAT like before. Then calls to gen_rtx_CONCAT for
complex handling are replaced by calls to targetm.gen_rtx_complex.

gcc/ChangeLog:

	* target.def: Add gen_rtx_complex target hook
	* targhooks.cc (default_gen_rtx_complex): New: Default
	implementation for gen_rtx_complex
	* targhooks.h: Add default_gen_rtx_complex
	* doc/tm.texi: Document TARGET_GEN_RTX_COMPLEX
	* doc/tm.texi.in: Add TARGET_GEN_RTX_COMPLEX
	* emit-rtl.cc (gen_reg_rtx): Replace call to
	gen_rtx_CONCAT by call to gen_rtx_complex
	(init_emit_once): Likewise
	* expmed.cc (flip_storage_order): Likewise
	* optabs.cc (expand_doubleword_mod): Likewise
---
 gcc/doc/tm.texi    |  6 ++++++
 gcc/doc/tm.texi.in |  2 ++
 gcc/emit-rtl.cc    | 26 +++++++++-----------------
 gcc/expmed.cc      |  2 +-
 gcc/optabs.cc      | 12 +++++++-----
 gcc/target.def     | 10 ++++++++++
 gcc/targhooks.cc   | 27 +++++++++++++++++++++++++++
 gcc/targhooks.h    |  2 ++
 8 files changed, 64 insertions(+), 23 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 87997b76338..b73147aea9f 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4605,6 +4605,12 @@ to return a nonzero value when it is required, the compiler will run out
 of spill registers and print a fatal error message.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GEN_RTX_COMPLEX (machine_mode @var{mode}, rtx @var{real_part}, rtx @var{imag_part})
+This hook should return an rtx representing a complex of mode @var{machine_mode} built from @var{real_part} and @var{imag_part}.
+  If both arguments are @code{NULL}, create them as registers.
+ The default is @code{gen_rtx_CONCAT}.
+@end deftypefn
+
 @deftypefn {Target Hook} rtx TARGET_READ_COMPLEX_PART (rtx @var{cplx}, complex_part_t @var{part})
 This hook should return the rtx representing the specified @var{part} of the complex given by @var{cplx}.
   @var{part} can be the real part, the imaginary part, or both of them.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index efbf972e6a7..dd39e450903 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3390,6 +3390,8 @@ stack.
 
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 
+@hook TARGET_GEN_RTX_COMPLEX
+
 @hook TARGET_READ_COMPLEX_PART
 
 @hook TARGET_WRITE_COMPLEX_PART
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index f6276a2d0b6..22012bfea13 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -1190,19 +1190,7 @@ gen_reg_rtx (machine_mode mode)
   if (generating_concat_p
       && (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
 	  || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT))
-    {
-      /* For complex modes, don't make a single pseudo.
-	 Instead, make a CONCAT of two pseudos.
-	 This allows noncontiguous allocation of the real and imaginary parts,
-	 which makes much better code.  Besides, allocating DCmode
-	 pseudos overstrains reload on some machines like the 386.  */
-      rtx realpart, imagpart;
-      machine_mode partmode = GET_MODE_INNER (mode);
-
-      realpart = gen_reg_rtx (partmode);
-      imagpart = gen_reg_rtx (partmode);
-      return gen_rtx_CONCAT (mode, realpart, imagpart);
-    }
+    return targetm.gen_rtx_complex (mode, NULL, NULL);
 
   /* Do not call gen_reg_rtx with uninitialized crtl.  */
   gcc_assert (crtl->emit.regno_pointer_align_length);
@@ -6274,14 +6262,18 @@ init_emit_once (void)
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_COMPLEX_INT)
     {
-      rtx inner = const_tiny_rtx[0][(int)GET_MODE_INNER (mode)];
-      const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
+      machine_mode imode = GET_MODE_INNER (mode);
+      rtx inner = const_tiny_rtx[0][(int) imode];
+      const_tiny_rtx[0][(int) mode] =
+	targetm.gen_rtx_complex (mode, inner, inner);
     }
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_COMPLEX_FLOAT)
     {
-      rtx inner = const_tiny_rtx[0][(int)GET_MODE_INNER (mode)];
-      const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
+      machine_mode imode = GET_MODE_INNER (mode);
+      rtx inner = const_tiny_rtx[0][(int) imode];
+      const_tiny_rtx[0][(int) mode] =
+	targetm.gen_rtx_complex (mode, inner, inner);
     }
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 2f787cc28f9..8a18161827b 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -400,7 +400,7 @@ flip_storage_order (machine_mode mode, rtx x)
       real = flip_storage_order (GET_MODE_INNER (mode), real);
       imag = flip_storage_order (GET_MODE_INNER (mode), imag);
 
-      return gen_rtx_CONCAT (mode, real, imag);
+      return targetm.gen_rtx_complex (mode, real, imag);
     }
 
   if (UNLIKELY (reverse_storage_order_supported < 0))
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 4e9f58f8060..18900e8113e 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -1001,16 +1001,18 @@ expand_doubleword_mod (machine_mode mode, rtx op0, rtx op1, bool unsignedp)
 	  machine_mode cmode = TYPE_MODE (ctype);
 	  rtx op00 = operand_subword_force (op0, 0, mode);
 	  rtx op01 = operand_subword_force (op0, 1, mode);
-	  rtx cres = gen_rtx_CONCAT (cmode, gen_reg_rtx (word_mode),
-				     gen_reg_rtx (word_mode));
+	  rtx cres = targetm.gen_rtx_complex (cmode, gen_reg_rtx (word_mode),
+					      gen_reg_rtx (word_mode));
 	  tree lhs = make_tree (ctype, cres);
 	  tree arg0 = make_tree (wtype, op00);
 	  tree arg1 = make_tree (wtype, op01);
 	  expand_addsub_overflow (UNKNOWN_LOCATION, PLUS_EXPR, lhs, arg0,
 				  arg1, true, true, true, false, NULL);
-	  sum = expand_simple_binop (word_mode, PLUS, XEXP (cres, 0),
-				     XEXP (cres, 1), NULL_RTX, 1,
-				     OPTAB_DIRECT);
+	  sum =
+	    expand_simple_binop (word_mode, PLUS,
+				 read_complex_part (cres, REAL_P),
+				 read_complex_part (cres, IMAG_P), NULL_RTX,
+				 1, OPTAB_DIRECT);
 	  if (sum == NULL_RTX)
 	    return NULL_RTX;
 	}
diff --git a/gcc/target.def b/gcc/target.def
index 9798c0f58e4..ee1dfdc7565 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3306,6 +3306,16 @@ a pointer to int.",
  bool, (ao_ref *ref),
  default_ref_may_alias_errno)
 
+/* Return the rtx representation of a complex with a specified mode.  */
+DEFHOOK
+(gen_rtx_complex,
+ "This hook should return an rtx representing a complex of mode @var{machine_mode} built from @var{real_part} and @var{imag_part}.\n\
+  If both arguments are @code{NULL}, create them as registers.\n\
+ The default is @code{gen_rtx_CONCAT}.",
+ rtx,
+ (machine_mode mode, rtx real_part, rtx imag_part),
+ default_gen_rtx_complex)
+
 /* Returns the value corresponding to the specified part of a complex.  */
 DEFHOOK
 (read_complex_part,
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index d33fcbd9a13..4ea40c643a8 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1532,6 +1532,33 @@ default_preferred_simd_mode (scalar_mode)
   return word_mode;
 }
 
+/* By default, call gen_rtx_CONCAT.  */
+
+rtx
+default_gen_rtx_complex (machine_mode mode, rtx real_part, rtx imag_part)
+{
+  /* For complex modes, don't make a single pseudo.
+     Instead, make a CONCAT of two pseudos.
+     This allows noncontiguous allocation of the real and imaginary parts,
+     which makes much better code.  Besides, allocating DCmode
+     pseudos overstrains reload on some machines like the 386.  */
+  machine_mode imode = GET_MODE_INNER (mode);
+
+  if (real_part == NULL)
+    real_part = gen_reg_rtx (imode);
+  else
+    gcc_assert ((GET_MODE (real_part) == imode)
+		|| (GET_MODE (real_part) == E_VOIDmode));
+
+  if (imag_part == NULL)
+    imag_part = gen_reg_rtx (imode);
+  else
+    gcc_assert ((GET_MODE (imag_part) == imode)
+		|| (GET_MODE (imag_part) == E_VOIDmode));
+
+  return gen_rtx_CONCAT (mode, real_part, imag_part);
+}
+
 /* By default, extract one of the components of the complex value CPLX.  Extract the
    real part if part is REAL_P, and the imaginary part if it is IMAG_P. If part is
    BOTH_P, return cplx directly*/
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 805abd96938..811cd6165de 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -124,6 +124,8 @@ extern opt_machine_mode default_get_mask_mode (machine_mode);
 extern bool default_empty_mask_is_expensive (unsigned);
 extern vector_costs *default_vectorize_create_costs (vec_info *, bool);
 
+extern rtx default_gen_rtx_complex (machine_mode mode, rtx real_part,
+				    rtx imag_part);
 extern rtx default_read_complex_part (rtx cplx, complex_part_t part);
 extern void default_write_complex_part (rtx cplx, rtx val,
 					complex_part_t part,
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 4/9] Native complex operations: Allow native complex regs and ops in rtl
  2023-07-17  9:02 [PATCH 0/9] Native complex operations Sylvain Noiry
                   ` (2 preceding siblings ...)
  2023-07-17  9:02 ` [PATCH 3/9] Native complex operations: Add gen_rtx_complex hook Sylvain Noiry
@ 2023-07-17  9:02 ` Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 5/9] Native complex operations: Add the conjugate op in optabs Sylvain Noiry
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Support registers of complex types in rtl. Also adapt the functions
called during the expand pass to support native complex operations.

gcc/ChangeLog:

	* explow.cc (trunc_int_for_mode): Allow complex int modes
	* expr.cc (emit_move_complex_parts): Move both parts at the
	same time if it is supported by the backend
	(emit_move_complex): Do not move via integer if not int mode
	corresponds. For complex floats, relax the constraint on the
	number of registers for targets with pairs of registers, and
	use native moves if it is supported by the backend.
	(expand_expr_real_2): Move both parts at the same time if it
	is supported by the backend
	(expand_expr_real_1): Update the expand of complex constants
	(const_vector_from_tree): Add the expand of both parts of a
	complex	constant
	* real.h: update FLOAT_MODE_FORMAT
	* machmode.h: Add COMPLEX_INT_MODE_P and COMPLEX_FLOAT_MODE_P
	predicates
	* optabs-libfuncs.cc (gen_int_libfunc): Add support for
	complex modes
	(gen_intv_fp_libfunc): Likewise
	* recog.cc (general_operand): Likewise
---
 gcc/explow.cc          |  2 +-
 gcc/expr.cc            | 84 ++++++++++++++++++++++++++++++++++++------
 gcc/machmode.h         |  6 +++
 gcc/optabs-libfuncs.cc | 29 ++++++++++++---
 gcc/real.h             |  3 +-
 gcc/recog.cc           |  1 +
 6 files changed, 105 insertions(+), 20 deletions(-)

diff --git a/gcc/explow.cc b/gcc/explow.cc
index 6424c0802f0..48572a40eab 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -56,7 +56,7 @@ trunc_int_for_mode (HOST_WIDE_INT c, machine_mode mode)
   int width = GET_MODE_PRECISION (smode);
 
   /* You want to truncate to a _what_?  */
-  gcc_assert (SCALAR_INT_MODE_P (mode));
+  gcc_assert (SCALAR_INT_MODE_P (mode) || COMPLEX_INT_MODE_P (mode));
 
   /* Canonicalize BImode to 0 and STORE_FLAG_VALUE.  */
   if (smode == BImode)
diff --git a/gcc/expr.cc b/gcc/expr.cc
index e1a0892b4d9..e94de8a05b5 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3847,8 +3847,14 @@ emit_move_complex_parts (rtx x, rtx y)
       && REG_P (x) && !reg_overlap_mentioned_p (x, y))
     emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true);
-  write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false);
+  machine_mode mode = GET_MODE (x);
+  if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
+    write_complex_part (x, read_complex_part (y, BOTH_P), BOTH_P, false);
+  else
+    {
+      write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true);
+      write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false);
+    }
 
   return get_last_insn ();
 }
@@ -3868,14 +3874,14 @@ emit_move_complex (machine_mode mode, rtx x, rtx y)
 
   /* See if we can coerce the target into moving both values at once, except
      for floating point where we favor moving as parts if this is easy.  */
-  if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
+  scalar_int_mode imode;
+  if (!int_mode_for_mode (mode).exists (&imode))
+    try_int = false;
+  else if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
       && optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing
-      && !(REG_P (x)
-	   && HARD_REGISTER_P (x)
-	   && REG_NREGS (x) == 1)
-      && !(REG_P (y)
-	   && HARD_REGISTER_P (y)
-	   && REG_NREGS (y) == 1))
+      && optab_handler (mov_optab, mode) != CODE_FOR_nothing
+      && !(REG_P (x) && HARD_REGISTER_P (x))
+      && !(REG_P (y) && HARD_REGISTER_P (y)))
     try_int = false;
   /* Not possible if the values are inherently not adjacent.  */
   else if (GET_CODE (x) == CONCAT || GET_CODE (y) == CONCAT)
@@ -10246,9 +10252,14 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	    break;
 	  }
 
-      /* Move the real (op0) and imaginary (op1) parts to their location.  */
-      write_complex_part (target, op0, REAL_P, true);
-      write_complex_part (target, op1, IMAG_P, false);
+      if ((op0 == op1) && (GET_CODE (op0) == CONST_VECTOR))
+	write_complex_part (target, op0, BOTH_P, false);
+      else
+	{
+	  /* Move the real (op0) and imaginary (op1) parts to their location.  */
+	  write_complex_part (target, op0, REAL_P, true);
+	  write_complex_part (target, op1, IMAG_P, false);
+	}
 
       return target;
 
@@ -11001,6 +11012,51 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 
 	  return original_target;
 	}
+      else if (original_target && (GET_CODE (original_target) == REG)
+	       &&
+	       ((GET_MODE_CLASS (GET_MODE (original_target)) ==
+		 MODE_COMPLEX_INT)
+		|| (GET_MODE_CLASS (GET_MODE (original_target)) ==
+		    MODE_COMPLEX_FLOAT)))
+	{
+	  mode = TYPE_MODE (TREE_TYPE (exp));
+
+	  /* Move both parts at the same time if possible */
+	  if (TREE_COMPLEX_BOTH_PARTS (exp) != NULL)
+	    {
+	      op0 =
+		expand_expr (TREE_COMPLEX_BOTH_PARTS (exp), original_target,
+			     mode, EXPAND_NORMAL);
+	      write_complex_part (original_target, op0, BOTH_P, false);
+	    }
+	  else
+	    {
+	      mode = TYPE_MODE (TREE_TYPE (TREE_TYPE (exp)));
+
+	      rtx rtarg = gen_reg_rtx (mode);
+	      rtx itarg = gen_reg_rtx (mode);
+	      op0 =
+		expand_expr (TREE_REALPART (exp), rtarg, mode, EXPAND_NORMAL);
+	      op1 =
+		expand_expr (TREE_IMAGPART (exp), itarg, mode, EXPAND_NORMAL);
+
+	      write_complex_part (original_target, op0, REAL_P, false);
+	      write_complex_part (original_target, op1, IMAG_P, false);
+
+	      return original_target;
+	    }
+	}
+      /* TODO use a finer grain approach than just size of 2 words */
+      else if ((TREE_COMPLEX_BOTH_PARTS (exp) != NULL)
+	       && (known_le (GET_MODE_BITSIZE (mode), 2 * BITS_PER_WORD)))
+	{
+	  op0 =
+	    expand_expr (TREE_COMPLEX_BOTH_PARTS (exp), original_target, mode,
+			 EXPAND_NORMAL);
+	  rtx tmp = gen_reg_rtx (mode);
+	  write_complex_part (tmp, op0, BOTH_P, false);
+	  return tmp;
+	}
 
       /* fall through */
 
@@ -13347,6 +13403,10 @@ const_vector_from_tree (tree exp)
       else if (TREE_CODE (elt) == FIXED_CST)
 	builder.quick_push (CONST_FIXED_FROM_FIXED_VALUE
 			    (TREE_FIXED_CST (elt), inner));
+      else if (TREE_CODE (elt) == COMPLEX_CST)
+	builder.quick_push (expand_expr
+			    (TREE_COMPLEX_BOTH_PARTS (elt), NULL_RTX, mode,
+			     EXPAND_NORMAL));
       else
 	builder.quick_push (immed_wide_int_const (wi::to_poly_wide (elt),
 						  inner));
diff --git a/gcc/machmode.h b/gcc/machmode.h
index a22df60dc20..b1937eafdc3 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -119,6 +119,12 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
    || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT \
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
 
+#define COMPLEX_INT_MODE_P(MODE) \
+   (GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT)
+
+#define COMPLEX_FLOAT_MODE_P(MODE) \
+  (GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)
+
 /* Nonzero if MODE is a complex mode.  */
 #define COMPLEX_MODE_P(MODE)			\
   (GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT	\
diff --git a/gcc/optabs-libfuncs.cc b/gcc/optabs-libfuncs.cc
index f1abe6916d3..fe390a592eb 100644
--- a/gcc/optabs-libfuncs.cc
+++ b/gcc/optabs-libfuncs.cc
@@ -190,19 +190,34 @@ gen_int_libfunc (optab optable, const char *opname, char suffix,
   int maxsize = 2 * BITS_PER_WORD;
   int minsize = BITS_PER_WORD;
   scalar_int_mode int_mode;
+  complex_mode cplx_int_mode;
+  int bitsize;
+  bool cplx = false;
 
-  if (!is_int_mode (mode, &int_mode))
+  if (is_int_mode (mode, &int_mode))
+    bitsize = GET_MODE_BITSIZE (int_mode);
+  else if (is_complex_int_mode (mode, &cplx_int_mode))
+  {
+    cplx = true;
+    bitsize = GET_MODE_BITSIZE (cplx_int_mode);
+  }
+  else
     return;
+
   if (maxsize < LONG_LONG_TYPE_SIZE)
     maxsize = LONG_LONG_TYPE_SIZE;
   if (minsize > INT_TYPE_SIZE
       && (trapv_binoptab_p (optable)
 	  || trapv_unoptab_p (optable)))
     minsize = INT_TYPE_SIZE;
-  if (GET_MODE_BITSIZE (int_mode) < minsize
-      || GET_MODE_BITSIZE (int_mode) > maxsize)
+
+  if (bitsize < minsize || bitsize > maxsize)
     return;
-  gen_libfunc (optable, opname, suffix, int_mode);
+
+  if (GET_MODE_CLASS (mode) == MODE_INT)
+    gen_libfunc (optable, opname, suffix, int_mode);
+  else if (cplx)
+    gen_libfunc (optable, opname, suffix, cplx_int_mode);
 }
 
 /* Like gen_libfunc, but verify that FP and set decimal prefix if needed.  */
@@ -280,9 +295,11 @@ void
 gen_intv_fp_libfunc (optab optable, const char *name, char suffix,
 		     machine_mode mode)
 {
-  if (DECIMAL_FLOAT_MODE_P (mode) || GET_MODE_CLASS (mode) == MODE_FLOAT)
+  if (DECIMAL_FLOAT_MODE_P (mode) || GET_MODE_CLASS (mode) == MODE_FLOAT
+      || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
     gen_fp_libfunc (optable, name, suffix, mode);
-  if (GET_MODE_CLASS (mode) == MODE_INT)
+  if (GET_MODE_CLASS (mode) == MODE_INT
+      || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)
     {
       int len = strlen (name);
       char *v_name = XALLOCAVEC (char, len + 2);
diff --git a/gcc/real.h b/gcc/real.h
index 9ed6c372b14..53585418e68 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -189,7 +189,8 @@ extern const struct real_format *
 			: (gcc_unreachable (), 0)])
 
 #define FLOAT_MODE_FORMAT(MODE) \
-  (REAL_MODE_FORMAT (as_a <scalar_float_mode> (GET_MODE_INNER (MODE))))
+  (REAL_MODE_FORMAT (as_a <scalar_float_mode> \
+    (GET_MODE_INNER ((COMPLEX_FLOAT_MODE_P (MODE)) ? (GET_MODE_INNER (MODE)) : (MODE)))))
 
 /* The following macro determines whether the floating point format is
    composite, i.e. may contain non-consecutive mantissa bits, in which
diff --git a/gcc/recog.cc b/gcc/recog.cc
index 37432087812..687fe2b1b8a 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -1441,6 +1441,7 @@ general_operand (rtx op, machine_mode mode)
      if the caller wants something floating.  */
   if (GET_MODE (op) == VOIDmode && mode != VOIDmode
       && GET_MODE_CLASS (mode) != MODE_INT
+      && GET_MODE_CLASS (mode) != MODE_COMPLEX_INT
       && GET_MODE_CLASS (mode) != MODE_PARTIAL_INT)
     return false;
 
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 5/9] Native complex operations: Add the conjugate op in optabs
  2023-07-17  9:02 [PATCH 0/9] Native complex operations Sylvain Noiry
                   ` (3 preceding siblings ...)
  2023-07-17  9:02 ` [PATCH 4/9] Native complex operations: Allow native complex regs and ops in rtl Sylvain Noiry
@ 2023-07-17  9:02 ` Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 6/9] Native complex operations: Update how complex rotations are handled Sylvain Noiry
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Add an optab and rtl operation for the conjugate, called conj,
to expand CONJ_EXPR.

gcc/ChangeLog:

	* rtl.def: Add a conj operation in rtl
	* optabs.def: Add a conj optab
	* optabs-tree.cc (optab_for_tree_code): use the
	conj_optab to convert a CONJ_EXPR
	* expr.cc (expand_expr_real_2): Add a case to expand
	native CONJ_EXPR
	(expand_expr_real_1): Likewise
---
 gcc/expr.cc        | 17 ++++++++++++++++-
 gcc/optabs-tree.cc |  3 +++
 gcc/optabs.def     |  3 +++
 gcc/rtl.def        |  3 +++
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index e94de8a05b5..be153be0b71 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10498,6 +10498,18 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return dst;
       }
 
+    case CONJ_EXPR:
+      op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
+      if (modifier == EXPAND_STACK_PARM)
+	target = 0;
+      temp = expand_unop (mode,
+			  optab_for_tree_code (CONJ_EXPR, type,
+					       optab_default),
+			  op0, target, 0);
+      gcc_assert (temp);
+      return REDUCE_BIT_FIELD (temp);
+
+
     default:
       gcc_unreachable ();
     }
@@ -12064,6 +12076,10 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       op0 = expand_normal (treeop0);
       return read_complex_part (op0, IMAG_P);
 
+    case CONJ_EXPR:
+      op0 = expand_normal (treeop0);
+      return op0;
+
     case RETURN_EXPR:
     case LABEL_EXPR:
     case GOTO_EXPR:
@@ -12087,7 +12103,6 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
     case VA_ARG_EXPR:
     case BIND_EXPR:
     case INIT_EXPR:
-    case CONJ_EXPR:
     case COMPOUND_EXPR:
     case PREINCREMENT_EXPR:
     case PREDECREMENT_EXPR:
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index e6ae15939d3..c646b3667d4 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -271,6 +271,9 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 	return TYPE_UNSIGNED (type) ? usneg_optab : ssneg_optab;
       return trapv ? negv_optab : neg_optab;
 
+    case CONJ_EXPR:
+      return conj_optab;
+
     case ABS_EXPR:
       return trapv ? absv_optab : abs_optab;
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 3dae228fba6..31475c8afcc 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -160,6 +160,9 @@ OPTAB_NL(umax_optab, "umax$I$a3", UMAX, "umax", '3', gen_int_libfunc)
 OPTAB_NL(neg_optab, "neg$P$a2", NEG, "neg", '2', gen_int_fp_fixed_libfunc)
 OPTAB_NX(neg_optab, "neg$F$a2")
 OPTAB_NX(neg_optab, "neg$Q$a2")
+OPTAB_NL(conj_optab, "conj$P$a2", CONJ, "conj", '2', gen_int_fp_fixed_libfunc)
+OPTAB_NX(conj_optab, "conj$F$a2")
+OPTAB_NX(conj_optab, "conj$Q$a2")
 OPTAB_VL(negv_optab, "negv$I$a2", NEG, "neg", '2', gen_intv_fp_libfunc)
 OPTAB_VX(negv_optab, "neg$F$a2")
 OPTAB_NL(ssneg_optab, "ssneg$Q$a2", SS_NEG, "ssneg", '2', gen_signed_fixed_libfunc)
diff --git a/gcc/rtl.def b/gcc/rtl.def
index 88e2b198503..4280f727286 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -460,6 +460,9 @@ DEF_RTL_EXPR(MINUS, "minus", "ee", RTX_BIN_ARITH)
 /* Minus operand 0.  */
 DEF_RTL_EXPR(NEG, "neg", "e", RTX_UNARY)
 
+/* Conj operand 0 */
+DEF_RTL_EXPR(CONJ, "conj", "e", RTX_UNARY)
+
 DEF_RTL_EXPR(MULT, "mult", "ee", RTX_COMM_ARITH)
 
 /* Multiplication with signed saturation */
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 6/9] Native complex operations: Update how complex rotations are handled
  2023-07-17  9:02 [PATCH 0/9] Native complex operations Sylvain Noiry
                   ` (4 preceding siblings ...)
  2023-07-17  9:02 ` [PATCH 5/9] Native complex operations: Add the conjugate op in optabs Sylvain Noiry
@ 2023-07-17  9:02 ` Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 7/9] Native complex operations: Vectorization of native complex operations Sylvain Noiry
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Catch complex rotation by 90° and 270° in fold-const.cc like before,
but now convert them into the new COMPLEX_ROT90 and COMPLEX_ROT270
internal functions. Also add crot90 and crot270 optabs to expose these
operation the backends. So conditionnaly lower COMPLEX_ROT90/COMPLEX_ROT270
by checking if crot90/crot270 are in the optab. Finally, convert
a + crot90/270(b) into cadd90/270(a, b) in a similar way than FMAs.

gcc/ChangeLog:

	* internal-fn.def: Add COMPLEX_ROT90 and COMPLEX_ROT270
	* fold-const.cc (fold_binary_loc): Update the folding of
	complex rotations to generate called to COMPLEX_ROT90 and
	COMPLEX_ROT270
	* optabs.def: add crot90/crot270 optabs
	* tree-complex.cc (init_dont_simulate_again): Catch calls
	to COMPLEX_ROT90 and COMPLEX_ROT270
	(expand_complex_rotation): Conditionally lower complex
	rotations if no pattern is present in the backend
	(expand_complex_operations_1): Likewise
	(convert_crot): Likewise
	* tree-ssa-math-opts.cc (convert_crot_1): Catch complex
	rotations with additions in a similar way the FMAs.
	(math_opts_dom_walker::after_dom_children): Call convert_crot
	if a COMPLEX_ROT90 or COMPLEX_ROT270 is identified
---
 gcc/fold-const.cc         | 115 ++++++++++++++++++++++++++-------
 gcc/internal-fn.def       |   2 +
 gcc/optabs.def            |   2 +
 gcc/tree-complex.cc       |  79 ++++++++++++++++++++++-
 gcc/tree-ssa-math-opts.cc | 129 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 302 insertions(+), 25 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index a02ede79fed..f1224b6a548 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -11609,30 +11609,6 @@ fold_binary_loc (location_t loc, enum tree_code code, tree type,
 	}
       else
 	{
-	  /* Fold z * +-I to __complex__ (-+__imag z, +-__real z).
-	     This is not the same for NaNs or if signed zeros are
-	     involved.  */
-	  if (!HONOR_NANS (arg0)
-	      && !HONOR_SIGNED_ZEROS (arg0)
-	      && COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      && TREE_CODE (arg1) == COMPLEX_CST
-	      && real_zerop (TREE_REALPART (arg1)))
-	    {
-	      tree rtype = TREE_TYPE (TREE_TYPE (arg0));
-	      if (real_onep (TREE_IMAGPART (arg1)))
-		return
-		  fold_build2_loc (loc, COMPLEX_EXPR, type,
-			       negate_expr (fold_build1_loc (loc, IMAGPART_EXPR,
-							     rtype, arg0)),
-			       fold_build1_loc (loc, REALPART_EXPR, rtype, arg0));
-	      else if (real_minus_onep (TREE_IMAGPART (arg1)))
-		return
-		  fold_build2_loc (loc, COMPLEX_EXPR, type,
-			       fold_build1_loc (loc, IMAGPART_EXPR, rtype, arg0),
-			       negate_expr (fold_build1_loc (loc, REALPART_EXPR,
-							     rtype, arg0)));
-	    }
-
 	  /* Optimize z * conj(z) for floating point complex numbers.
 	     Guarded by flag_unsafe_math_optimizations as non-finite
 	     imaginary components don't produce scalar results.  */
@@ -11645,6 +11621,97 @@ fold_binary_loc (location_t loc, enum tree_code code, tree type,
 	      && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
 	    return fold_mult_zconjz (loc, type, arg0);
 	}
+
+      /* Fold z * +-I to __complex__ (-+__imag z, +-__real z).
+	 This is not the same for NaNs or if signed zeros are
+	 involved.  */
+      if (!HONOR_NANS (arg0)
+	  && !HONOR_SIGNED_ZEROS (arg0)
+	  && TREE_CODE (arg1) == COMPLEX_CST
+	  && (COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0))
+	      && real_zerop (TREE_REALPART (arg1))))
+	{
+	  if (real_onep (TREE_IMAGPART (arg1)))
+	    {
+	      tree rtype = TREE_TYPE (TREE_TYPE (arg0));
+	      tree cplx_build = fold_build2_loc (loc, COMPLEX_EXPR, type,
+						 negate_expr (fold_build1_loc (loc, IMAGPART_EXPR,
+									       rtype, arg0)),
+	      fold_build1_loc (loc, REALPART_EXPR, rtype, arg0));
+	      if (cplx_build && TREE_CODE (TREE_OPERAND (cplx_build, 0)) != NEGATE_EXPR)
+		return cplx_build;
+
+	      if ((TREE_CODE (arg0) == COMPLEX_EXPR) && real_zerop (TREE_OPERAND (arg0, 1)))
+		return fold_build2_loc (loc, COMPLEX_EXPR, type,
+					TREE_OPERAND (arg0, 1), TREE_OPERAND (arg0, 0));
+
+	      if (TREE_CODE (arg0) == CALL_EXPR)
+		{
+		  if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX_ROT90)
+		    return negate_expr (CALL_EXPR_ARG (arg0, 0));
+		  else if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX_ROT270)
+		    return CALL_EXPR_ARG (arg0, 0);
+		}
+	      else if (TREE_CODE (arg0) == NEGATE_EXPR)
+		return build_call_expr_internal_loc(loc, IFN_COMPLEX_ROT270, TREE_TYPE (arg0), 1, TREE_OPERAND(arg0, 0));
+	      else
+		return build_call_expr_internal_loc(loc, IFN_COMPLEX_ROT90, TREE_TYPE (arg0), 1, arg0);
+	    }
+	  else if (real_minus_onep (TREE_IMAGPART (arg1)))
+	    {
+	      if (real_zerop (TREE_OPERAND (arg0, 1)))
+		return fold_build2_loc (loc, COMPLEX_EXPR, type,
+					TREE_OPERAND (arg0, 1), negate_expr (TREE_OPERAND (arg0, 0)));
+
+	      return build_call_expr_internal_loc(loc, IFN_COMPLEX_ROT270, TREE_TYPE (arg0), 1, fold (arg0));
+	    }
+	}
+
+      /* Fold z * +-I to __complex__ (-+__imag z, +-__real z).
+	 This is not the same for NaNs or if signed zeros are
+	 involved.  */
+      if (!HONOR_NANS (arg0)
+	  && !HONOR_SIGNED_ZEROS (arg0)
+	  && TREE_CODE (arg1) == COMPLEX_CST
+	  && (COMPLEX_INTEGER_TYPE_P (TREE_TYPE (arg0))
+	  && integer_zerop (TREE_REALPART (arg1))))
+	{
+	  if (integer_onep (TREE_IMAGPART (arg1)))
+	    {
+	      tree rtype = TREE_TYPE (TREE_TYPE (arg0));
+	      tree cplx_build = fold_build2_loc (loc, COMPLEX_EXPR, type,
+						 negate_expr (fold_build1_loc (loc, IMAGPART_EXPR,
+									       rtype, arg0)),
+	      fold_build1_loc (loc, REALPART_EXPR, rtype, arg0));
+	      if (cplx_build && TREE_CODE (TREE_OPERAND (cplx_build, 0)) != NEGATE_EXPR)
+		return cplx_build;
+
+	      if ((TREE_CODE (arg0) == COMPLEX_EXPR) && integer_zerop (TREE_OPERAND (arg0, 1)))
+		return fold_build2_loc (loc, COMPLEX_EXPR, type,
+					TREE_OPERAND (arg0, 1), TREE_OPERAND (arg0, 0));
+
+	      if (TREE_CODE (arg0) == CALL_EXPR)
+		{
+		  if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX_ROT90)
+		    return negate_expr (CALL_EXPR_ARG (arg0, 0));
+		  else if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX_ROT270)
+		    return CALL_EXPR_ARG (arg0, 0);
+		}
+	      else if (TREE_CODE (arg0) == NEGATE_EXPR)
+		return build_call_expr_internal_loc(loc, IFN_COMPLEX_ROT270, TREE_TYPE (arg0), 1, TREE_OPERAND(arg0, 0));
+	      else
+		return build_call_expr_internal_loc(loc, IFN_COMPLEX_ROT90, TREE_TYPE (arg0), 1, arg0);
+	    }
+	  else if (integer_minus_onep (TREE_IMAGPART (arg1)))
+	    {
+	      if (integer_zerop (TREE_OPERAND (arg0, 1)))
+		return fold_build2_loc (loc, COMPLEX_EXPR, type,
+					TREE_OPERAND (arg0, 1), negate_expr (TREE_OPERAND (arg0, 0)));
+
+	      return build_call_expr_internal_loc(loc, IFN_COMPLEX_ROT270, TREE_TYPE (arg0), 1, fold (arg0));
+	    }
+	}
+
       goto associate;
 
     case BIT_IOR_EXPR:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index ea750a921ed..e3e32603dc1 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -385,6 +385,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ROT90, ECF_CONST, crot90, unary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ROT270, ECF_CONST, crot270, unary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 31475c8afcc..afd15b1f30f 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -330,6 +330,8 @@ OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (atanh_optab, "atanh$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (crot90_optab, "crot90$a2")
+OPTAB_D (crot270_optab, "crot270$a2")
 OPTAB_D (cadd90_optab, "cadd90$a3")
 OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cmul_optab, "cmul$a3")
diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
index 63753e4acf4..b5aaa206319 100644
--- a/gcc/tree-complex.cc
+++ b/gcc/tree-complex.cc
@@ -241,7 +241,10 @@ init_dont_simulate_again (void)
 	  switch (gimple_code (stmt))
 	    {
 	    case GIMPLE_CALL:
-	      if (gimple_call_lhs (stmt))
+	      if (gimple_call_combined_fn (stmt) == CFN_COMPLEX_ROT90
+		  || gimple_call_combined_fn (stmt) == CFN_COMPLEX_ROT270)
+		saw_a_complex_op = true;
+	      else if (gimple_call_lhs (stmt))
 	        sim_again_p = is_complex_reg (gimple_call_lhs (stmt));
 	      break;
 
@@ -1727,6 +1730,67 @@ expand_complex_asm (gimple_stmt_iterator *gsi)
     }
 }
 
+/* Expand complex rotations represented as internal functions
+ * This function assumes that lowered complex rotation is still better
+ * than a complex multiplication, else the backend would has redfined
+ * crot90 and crot270 */
+
+static void
+expand_complex_rotation (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  tree ac = gimple_call_arg (stmt, 0);
+  gimple_seq stmts = NULL;
+  location_t loc = gimple_location (gsi_stmt (*gsi));
+
+  tree lhs = gimple_get_lhs (stmt);
+  tree type = TREE_TYPE (ac);
+  tree inner_type = TREE_TYPE (type);
+
+
+  tree rr, ri, rb;
+  optab op = optab_for_tree_code (MULT_EXPR, inner_type, optab_default);
+  if (optab_handler (op, TYPE_MODE (type)) != CODE_FOR_nothing)
+  {
+    tree cst_i = build_complex (type, build_zero_cst (inner_type), build_one_cst (inner_type));
+    rb = gimple_build (&stmts, loc, MULT_EXPR, type, ac, cst_i);
+
+    gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+
+    gassign* new_assign = gimple_build_assign (lhs, rb);
+    gimple_set_lhs (new_assign, lhs);
+    gsi_replace (gsi, new_assign, true);
+
+    update_complex_assignment (gsi, NULL, NULL, rb);
+  }
+  else
+  {
+    tree ar = extract_component (gsi, ac, REAL_P, true);
+    tree ai = extract_component (gsi, ac, IMAG_P, true);
+
+    if (gimple_call_internal_fn (stmt) == IFN_COMPLEX_ROT90)
+    {
+      rr = gimple_build (&stmts, loc, NEGATE_EXPR, inner_type, ai);
+      ri = ar;
+    }
+    else if (gimple_call_internal_fn (stmt) == IFN_COMPLEX_ROT270)
+    {
+      rr = ai;
+      ri = gimple_build (&stmts, loc, NEGATE_EXPR, inner_type, ar);
+    }
+    else
+      gcc_unreachable ();
+
+    gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+
+    gassign* new_assign = gimple_build_assign (gimple_get_lhs (stmt), COMPLEX_EXPR, rr, ri);
+    gimple_set_lhs (new_assign, gimple_get_lhs (stmt));
+    gsi_replace (gsi, new_assign, true);
+
+    update_complex_assignment (gsi, rr, ri);
+  }
+}
+
 /* Returns true if a complex component is a constant */
 
 static bool
@@ -1843,6 +1907,19 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
 	if (gimple_code (stmt) == GIMPLE_COND)
 	  return;
 
+	if (is_gimple_call (stmt)
+	    && (gimple_call_combined_fn (stmt) == CFN_COMPLEX_ROT90
+		|| gimple_call_combined_fn (stmt) == CFN_COMPLEX_ROT270))
+	{
+	  if (!direct_internal_fn_supported_p (gimple_call_internal_fn (stmt), type,
+					      bb_optimization_type (gimple_bb (stmt))))
+	    expand_complex_rotation (gsi);
+	  else
+	    update_complex_components (gsi, stmt, NULL, NULL, gimple_call_lhs (stmt));
+
+	  return;
+	}
+
 	if (TREE_CODE (type) == COMPLEX_TYPE)
 	  expand_complex_move (gsi, type);
 	else if (is_gimple_assign (stmt)
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 68fc518b1ab..c311e9ab29a 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -3286,6 +3286,119 @@ last_fma_candidate_feeds_initial_phi (fma_deferring_state *state,
   return false;
 }
 
+/* Convert complex rotation to addition with one operation rotated
+ * in a similar way than FMAs */
+
+static void
+convert_crot_1 (tree crot_result, tree op1, internal_fn cadd_fn)
+{
+  gimple *use_stmt;
+  imm_use_iterator imm_iter;
+  gcall *cadd_stmt;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, crot_result)
+    {
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      tree add_op, result = crot_result;
+
+      if (is_gimple_debug (use_stmt))
+	continue;
+
+      add_op = (gimple_assign_rhs1 (use_stmt) != result)
+			? gimple_assign_rhs1 (use_stmt) : gimple_assign_rhs2 (use_stmt);
+
+
+      cadd_stmt = gimple_build_call_internal (cadd_fn, 2, add_op, op1);
+      gimple_set_lhs (cadd_stmt, gimple_get_lhs (use_stmt));
+      gimple_call_set_nothrow (cadd_stmt, !stmt_can_throw_internal (cfun,
+								    use_stmt));
+      gsi_replace (&gsi, cadd_stmt, true);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Generated COMPLEX_ADD_ROT ");
+	  print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, TDF_NONE);
+	  fprintf (dump_file, "\n");
+	}
+    }
+}
+
+
+/* Convert complex rotation to addition with one operation rotated
+ * in a similar way than FMAs */
+
+static bool
+convert_crot (gimple *crot_stmt, tree op1, combined_fn crot_kind)
+{
+  internal_fn cadd_fn;
+  switch (crot_kind)
+    {
+    case CFN_COMPLEX_ROT90:
+      cadd_fn = IFN_COMPLEX_ADD_ROT90;
+      break;
+    case CFN_COMPLEX_ROT270:
+      cadd_fn = IFN_COMPLEX_ADD_ROT270;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+
+  tree crot_result = gimple_get_lhs (crot_stmt);
+  /* If there isn't a LHS then this can't be an CADD.  There can be no LHS
+     if the statement was left just for the side-effects.  */
+  if (!crot_result)
+    return false;
+  tree type = TREE_TYPE (crot_result);
+  gimple *use_stmt;
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+
+  if (COMPLEX_FLOAT_TYPE_P (type)
+      && flag_fp_contract_mode == FP_CONTRACT_OFF)
+    return false;
+
+  /* We don't want to do bitfield reduction ops.  */
+  if (INTEGRAL_TYPE_P (type)
+      && (!type_has_mode_precision_p (type) || TYPE_OVERFLOW_TRAPS (type)))
+    return false;
+
+  /* If the target doesn't support it, don't generate it. */
+  optimization_type opt_type = bb_optimization_type (gimple_bb (crot_stmt));
+  if (!direct_internal_fn_supported_p (cadd_fn, type, opt_type))
+    return false;
+
+  /* If the crot has zero uses, it is kept around probably because
+     of -fnon-call-exceptions.  Don't optimize it away in that case,
+     it is DCE job.  */
+  if (has_zero_uses (crot_result))
+    return false;
+
+  /* Make sure that the crot statement becomes dead after
+     the transformation, thus that all uses are transformed to FMAs.
+     This means we assume that an FMA operation has the same cost
+     as an addition.  */
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, crot_result)
+    {
+      use_stmt = USE_STMT (use_p);
+
+      if (is_gimple_debug (use_stmt))
+	continue;
+
+      if (gimple_bb (use_stmt) != gimple_bb (crot_stmt))
+	return false;
+
+      if (!is_gimple_assign (use_stmt))
+	return false;
+
+      if (gimple_assign_rhs_code (use_stmt) != PLUS_EXPR)
+	return false;
+    }
+
+  convert_crot_1 (crot_result, op1, cadd_fn);
+  return true;
+}
+
 /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2
    with uses in additions and subtractions to form fused multiply-add
    operations.  Returns true if successful and MUL_STMT should be removed.
@@ -5636,6 +5749,22 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
 	      cancel_fma_deferring (&fma_state);
 	      break;
 
+	    case CFN_COMPLEX_ROT90:
+	    case CFN_COMPLEX_ROT270:
+	      if (gimple_call_lhs (stmt)
+		  && convert_crot (stmt,
+				   gimple_call_arg (stmt, 0),
+				   gimple_call_combined_fn (stmt)))
+		{
+		  unlink_stmt_vdef (stmt);
+		  if (gsi_remove (&gsi, true)
+		      && gimple_purge_dead_eh_edges (bb))
+		    *m_cfg_changed_p = true;
+		  release_defs (stmt);
+		  continue;
+		}
+	      break;
+
 	    default:
 	      break;
 	    }
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 7/9] Native complex operations: Vectorization of native complex operations
  2023-07-17  9:02 [PATCH 0/9] Native complex operations Sylvain Noiry
                   ` (5 preceding siblings ...)
  2023-07-17  9:02 ` [PATCH 6/9] Native complex operations: Update how complex rotations are handled Sylvain Noiry
@ 2023-07-17  9:02 ` Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 8/9] Native complex operations: Add explicit vector of complex Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 9/9] Native complex operation: Experimental support in x86 backend Sylvain Noiry
  8 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Add vectors of complex types to vectorize native operations. Because of
the vectorize was designed to work with scalar elements, several functions
and target hooks have to be adapted or duplicated to support complex types.
After that, the vectorization of native complex operations follows exactly
the same flow as scalars operations.

gcc/ChangeLog:

	* target.def: Add preferred_simd_mode_complex and
	related_mode_complex by duplicating their scalar counterparts
	* targhooks.h: Add default_preferred_simd_mode_complex and
	default_vectorize_related_mode_complex
	* targhooks.cc (default_preferred_simd_mode_complex): New:
	Default implementation of preferred_simd_mode_complex
	(default_vectorize_related_mode_complex): New: Default
	implementation of related_mode_complex
	* doc/tm.texi: Document
	TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX
	and TARGET_VECTORIZE_RELATED_MODE_COMPLEX
	* doc/tm.texi.in: Add TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX
	and TARGET_VECTORIZE_RELATED_MODE_COMPLEX
	* emit-rtl.cc (init_emit_once): Add the zero constant for vectors
	of complex modes
	* genmodes.cc (vector_class): Add case for vectors of complex
	(complete_mode): Likewise
	(make_complex_modes): Likewise
	* gensupport.cc (match_pattern): Likewise
	* machmode.h: Add vectors of complex in predicates and redefine
	mode_for_vector and related_vector_mode for complex types
	* mode-classes.def: Add MODE_VECTOR_COMPLEX_INT and
	MODE_VECTOR_COMPLEX_FLOAT classes
	* simplify-rtx.cc (simplify_context::simplify_binary_operation):
	FIXME: do not simplify binary operations with complex vector
	modes.
	* stor-layout.cc (mode_for_vector): Adapt for complex modes
	using sub-functions calling a common one
	(related_vector_mode): Implement the function for complex modes
	* tree-vect-generic.cc (type_for_widest_vector_mode): Add
	cases for complex modes
	* tree-vect-stmts.cc (get_related_vectype_for_scalar_type):
	Adapt for complex modes
	* tree.cc (build_vector_type_for_mode): Add cases for complex
	modes
---
 gcc/doc/tm.texi          | 31 ++++++++++++++++++++++++
 gcc/doc/tm.texi.in       |  4 ++++
 gcc/emit-rtl.cc          | 10 ++++++++
 gcc/genmodes.cc          |  8 +++++++
 gcc/gensupport.cc        |  3 +++
 gcc/machmode.h           | 19 +++++++++++----
 gcc/mode-classes.def     |  2 ++
 gcc/simplify-rtx.cc      |  4 ++++
 gcc/stor-layout.cc       | 43 +++++++++++++++++++++++++++++----
 gcc/target.def           | 39 ++++++++++++++++++++++++++++++
 gcc/targhooks.cc         | 29 ++++++++++++++++++++++
 gcc/targhooks.h          |  4 ++++
 gcc/tree-vect-generic.cc |  4 ++++
 gcc/tree-vect-stmts.cc   | 52 +++++++++++++++++++++++++++-------------
 gcc/tree.cc              |  2 ++
 15 files changed, 230 insertions(+), 24 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b73147aea9f..955a1f983d0 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6229,6 +6229,13 @@ equal to @code{word_mode}, because the vectorizer can do some
 transformations even in absence of specialized @acronym{SIMD} hardware.
 @end deftypefn
 
+@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX (complex_mode @var{mode})
+This hook should return the preferred mode for vectorizing complex
+mode @var{mode}.  The default is
+equal to @code{word_mode}, because the vectorizer can do some
+transformations even in absence of specialized @acronym{SIMD} hardware.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_SPLIT_REDUCTION (machine_mode)
 This hook should return the preferred mode to split the final reduction
 step on @var{mode} to.  The reduction is then carried out reducing upper
@@ -6291,6 +6298,30 @@ requested mode, returning a mode with the same size as @var{vector_mode}
 when @var{nunits} is zero.  This is the correct behavior for most targets.
 @end deftypefn
 
+@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_RELATED_MODE_COMPLEX (machine_mode @var{vector_mode}, complex_mode @var{element_mode}, poly_uint64 @var{nunits})
+If a piece of code is using vector mode @var{vector_mode} and also wants
+to operate on elements of mode @var{element_mode}, return the vector mode
+it should use for those elements.  If @var{nunits} is nonzero, ensure that
+the mode has exactly @var{nunits} elements, otherwise pick whichever vector
+size pairs the most naturally with @var{vector_mode}.  Return an empty
+@code{opt_machine_mode} if there is no supported vector mode with the
+required properties.
+
+There is no prescribed way of handling the case in which @var{nunits}
+is zero.  One common choice is to pick a vector mode with the same size
+as @var{vector_mode}; this is the natural choice if the target has a
+fixed vector size.  Another option is to choose a vector mode with the
+same number of elements as @var{vector_mode}; this is the natural choice
+if the target has a fixed number of elements.  Alternatively, the hook
+might choose a middle ground, such as trying to keep the number of
+elements as similar as possible while applying maximum and minimum
+vector sizes.
+
+The default implementation uses @code{mode_for_vector} to find the
+requested mode, returning a mode with the same size as @var{vector_mode}
+when @var{nunits} is zero.  This is the correct behavior for most targets.
+@end deftypefn
+
 @deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE (machine_mode @var{mode})
 Return the mode to use for a vector mask that holds one boolean
 result for each element of vector mode @var{mode}.  The returned mask mode
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index dd39e450903..a8dc1155f13 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4195,12 +4195,16 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 
+@hook TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX
+
 @hook TARGET_VECTORIZE_SPLIT_REDUCTION
 
 @hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
 
 @hook TARGET_VECTORIZE_RELATED_MODE
 
+@hook TARGET_VECTORIZE_RELATED_MODE_COMPLEX
+
 @hook TARGET_VECTORIZE_GET_MASK_MODE
 
 @hook TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 22012bfea13..e454f452d46 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -6276,6 +6276,16 @@ init_emit_once (void)
 	targetm.gen_rtx_complex (mode, inner, inner);
     }
 
+  FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_COMPLEX_INT)
+  {
+    const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
+  }
+
+  FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_COMPLEX_FLOAT)
+  {
+    const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
+  }
+
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
     {
       const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
index 55ac2adb559..ab113720948 100644
--- a/gcc/genmodes.cc
+++ b/gcc/genmodes.cc
@@ -142,6 +142,8 @@ vector_class (enum mode_class cl)
     case MODE_UFRACT: return MODE_VECTOR_UFRACT;
     case MODE_ACCUM: return MODE_VECTOR_ACCUM;
     case MODE_UACCUM: return MODE_VECTOR_UACCUM;
+    case MODE_COMPLEX_INT: return MODE_VECTOR_COMPLEX_INT;
+    case MODE_COMPLEX_FLOAT: return MODE_VECTOR_COMPLEX_FLOAT;
     default:
       error ("no vector class for class %s", mode_class_names[cl]);
       return MODE_RANDOM;
@@ -400,6 +402,8 @@ complete_mode (struct mode_data *m)
     case MODE_VECTOR_UFRACT:
     case MODE_VECTOR_ACCUM:
     case MODE_VECTOR_UACCUM:
+    case MODE_VECTOR_COMPLEX_INT:
+    case MODE_VECTOR_COMPLEX_FLOAT:
       /* Vector modes should have a component and a number of components.  */
       validate_mode (m, UNSET, UNSET, SET, SET, UNSET);
       if (m->component->precision != (unsigned int)-1)
@@ -462,6 +466,10 @@ make_complex_modes (enum mode_class cl,
       if (m->boolean)
 	continue;
 
+      /* Skip already created mode */
+      if (m->complex)
+	continue;
+
       m_len = strlen (m->name);
       /* The leading "1 +" is in case we prepend a "C" below.  */
       buf = (char *) xmalloc (1 + m_len + 1);
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 9aa2ba69fcd..de798a70cbd 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -3747,16 +3747,19 @@ match_pattern (optab_pattern *p, const char *name, const char *pat)
 		if (*p == 0
 		    && (! force_int || mode_class[i] == MODE_INT
 			|| mode_class[i] == MODE_COMPLEX_INT
+			|| mode_class[i] == MODE_VECTOR_COMPLEX_INT
 			|| mode_class[i] == MODE_VECTOR_INT)
 		    && (! force_partial_int
 			|| mode_class[i] == MODE_INT
 			|| mode_class[i] == MODE_COMPLEX_INT
+			|| mode_class[i] == MODE_VECTOR_COMPLEX_INT
 			|| mode_class[i] == MODE_PARTIAL_INT
 			|| mode_class[i] == MODE_VECTOR_INT)
 		    && (! force_float
 			|| mode_class[i] == MODE_FLOAT
 			|| mode_class[i] == MODE_DECIMAL_FLOAT
 			|| mode_class[i] == MODE_COMPLEX_FLOAT
+			|| mode_class[i] == MODE_VECTOR_COMPLEX_FLOAT
 			|| mode_class[i] == MODE_VECTOR_FLOAT)
 		    && (! force_fixed
 			|| mode_class[i] == MODE_FRACT
diff --git a/gcc/machmode.h b/gcc/machmode.h
index b1937eafdc3..e7d67e2dce1 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -110,6 +110,7 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
    || GET_MODE_CLASS (MODE) == MODE_PARTIAL_INT \
    || GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT \
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL \
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_INT \
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_INT)
 
 /* Nonzero if MODE is a floating-point mode.  */
@@ -117,17 +118,22 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
   (GET_MODE_CLASS (MODE) == MODE_FLOAT	\
    || GET_MODE_CLASS (MODE) == MODE_DECIMAL_FLOAT \
    || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT \
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_FLOAT \
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
 
-#define COMPLEX_INT_MODE_P(MODE) \
-   (GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT)
+#define COMPLEX_INT_MODE_P(MODE)   	\
+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_INT \
+   || GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT)
 
-#define COMPLEX_FLOAT_MODE_P(MODE) \
-  (GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)
+#define COMPLEX_FLOAT_MODE_P(MODE)		\
+   (GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_FLOAT \
+    || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)
 
 /* Nonzero if MODE is a complex mode.  */
 #define COMPLEX_MODE_P(MODE)			\
   (GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT	\
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_INT	\
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_FLOAT	\
    || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)
 
 /* Nonzero if MODE is a vector mode.  */
@@ -138,6 +144,8 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_FRACT	\
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_UFRACT	\
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM	\
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_INT	\
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_FLOAT	\
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
 
 /* Nonzero if MODE is a scalar integral mode.  */
@@ -927,6 +935,9 @@ extern opt_machine_mode bitwise_mode_for_mode (machine_mode);
 extern opt_machine_mode mode_for_vector (scalar_mode, poly_uint64);
 extern opt_machine_mode related_vector_mode (machine_mode, scalar_mode,
 					     poly_uint64 = 0);
+extern opt_machine_mode mode_for_vector (complex_mode, poly_uint64);
+extern opt_machine_mode related_vector_mode (machine_mode,
+					     complex_mode, poly_uint64 = 0);
 extern opt_machine_mode related_int_vector_mode (machine_mode);
 
 /* A class for iterating through possible bitfield modes.  */
diff --git a/gcc/mode-classes.def b/gcc/mode-classes.def
index de42d7ee6fb..cc6bcaeb026 100644
--- a/gcc/mode-classes.def
+++ b/gcc/mode-classes.def
@@ -32,9 +32,11 @@ along with GCC; see the file COPYING3.  If not see
   DEF_MODE_CLASS (MODE_COMPLEX_FLOAT),					   \
   DEF_MODE_CLASS (MODE_VECTOR_BOOL),	/* vectors of single bits */	   \
   DEF_MODE_CLASS (MODE_VECTOR_INT),	/* SIMD vectors */		   \
+  DEF_MODE_CLASS (MODE_VECTOR_COMPLEX_INT), /* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_FRACT),	/* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_UFRACT),	/* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_ACCUM),	/* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_UACCUM),	/* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_FLOAT),                                      \
+  DEF_MODE_CLASS (MODE_VECTOR_COMPLEX_FLOAT),                              \
   DEF_MODE_CLASS (MODE_OPAQUE)          /* opaque modes */
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index d7315d82aa3..0b988bf1484 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -2653,6 +2653,10 @@ simplify_context::simplify_binary_operation (rtx_code code, machine_mode mode,
   gcc_assert (GET_RTX_CLASS (code) != RTX_COMPARE);
   gcc_assert (GET_RTX_CLASS (code) != RTX_COMM_COMPARE);
 
+  /* FIXME */
+  if (VECTOR_MODE_P (mode) && COMPLEX_MODE_P (mode))
+    return NULL_RTX;
+
   /* Make sure the constant is second.  */
   if (GET_RTX_CLASS (code) == RTX_COMM_ARITH
       && swap_commutative_operands_p (op0, op1))
diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
index a6deed4424b..5a7218999e8 100644
--- a/gcc/stor-layout.cc
+++ b/gcc/stor-layout.cc
@@ -480,8 +480,8 @@ bitwise_type_for_mode (machine_mode mode)
    elements of mode INNERMODE, if one exists.  The returned mode can be
    either an integer mode or a vector mode.  */
 
-opt_machine_mode
-mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
+static opt_machine_mode
+mode_for_vector (machine_mode innermode, poly_uint64 nunits)
 {
   machine_mode mode;
 
@@ -496,8 +496,14 @@ mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
     mode = MIN_MODE_VECTOR_ACCUM;
   else if (SCALAR_UACCUM_MODE_P (innermode))
     mode = MIN_MODE_VECTOR_UACCUM;
-  else
+  else if (SCALAR_INT_MODE_P (innermode))
     mode = MIN_MODE_VECTOR_INT;
+  else if (COMPLEX_FLOAT_MODE_P (innermode))
+    mode = MIN_MODE_VECTOR_COMPLEX_FLOAT;
+  else if (COMPLEX_INT_MODE_P (innermode))
+    mode = MIN_MODE_VECTOR_COMPLEX_INT;
+  else
+    gcc_unreachable ();
 
   /* Only check the broader vector_mode_supported_any_target_p here.
      We'll filter through target-specific availability and
@@ -511,7 +517,7 @@ mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
   /* For integers, try mapping it to a same-sized scalar mode.  */
   if (GET_MODE_CLASS (innermode) == MODE_INT)
     {
-      poly_uint64 nbits = nunits * GET_MODE_BITSIZE (innermode);
+      poly_uint64 nbits = nunits * GET_MODE_BITSIZE (innermode).coeffs[0];
       if (int_mode_for_size (nbits, 0).exists (&mode)
 	  && have_regs_of_mode[mode])
 	return mode;
@@ -520,6 +526,26 @@ mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
   return opt_machine_mode ();
 }
 
+/* Find a mode that is suitable for representing a vector with NUNITS
+   elements of scalar mode INNERMODE, if one exists.  The returned mode
+   can be either an integer mode or a vector mode.  */
+
+opt_machine_mode
+mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
+{
+  return mode_for_vector (machine_mode (innermode), nunits);
+}
+
+/* Find a mode that is suitable for representing a vector with NUNITS
+   elements of complex mode INNERMODE, if one exists.  The returned mode
+   can be either an integer mode or a vector mode.  */
+
+opt_machine_mode
+mode_for_vector (complex_mode innermode, poly_uint64 nunits)
+{
+  return mode_for_vector (machine_mode (innermode), nunits);
+}
+
 /* If a piece of code is using vector mode VECTOR_MODE and also wants
    to operate on elements of mode ELEMENT_MODE, return the vector mode
    it should use for those elements.  If NUNITS is nonzero, ensure that
@@ -540,6 +566,15 @@ related_vector_mode (machine_mode vector_mode, scalar_mode element_mode,
   return targetm.vectorize.related_mode (vector_mode, element_mode, nunits);
 }
 
+opt_machine_mode
+related_vector_mode (machine_mode vector_mode,
+		     complex_mode element_mode, poly_uint64 nunits)
+{
+  gcc_assert (VECTOR_MODE_P (vector_mode));
+  return targetm.vectorize.related_mode_complex (vector_mode, element_mode,
+						 nunits);
+}
+
 /* If a piece of code is using vector mode VECTOR_MODE and also wants
    to operate on integer vectors with the same element size and number
    of elements, return the vector mode it should use.  Return an empty
diff --git a/gcc/target.def b/gcc/target.def
index ee1dfdc7565..246665bf90f 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1943,6 +1943,18 @@ transformations even in absence of specialized @acronym{SIMD} hardware.",
  (scalar_mode mode),
  default_preferred_simd_mode)
 
+/* Returns the preferred mode for SIMD operations for the specified
+   complex mode.  */
+DEFHOOK
+(preferred_simd_mode_complex,
+ "This hook should return the preferred mode for vectorizing complex\n\
+mode @var{mode}.  The default is\n\
+equal to @code{word_mode}, because the vectorizer can do some\n\
+transformations even in absence of specialized @acronym{SIMD} hardware.",
+ machine_mode,
+ (complex_mode mode),
+ default_preferred_simd_mode_complex)
+
 /* Returns the preferred mode for splitting SIMD reductions to.  */
 DEFHOOK
 (split_reduction,
@@ -2017,6 +2029,33 @@ when @var{nunits} is zero.  This is the correct behavior for most targets.",
  (machine_mode vector_mode, scalar_mode element_mode, poly_uint64 nunits),
  default_vectorize_related_mode)
 
+DEFHOOK
+(related_mode_complex,
+ "If a piece of code is using vector mode @var{vector_mode} and also wants\n\
+to operate on elements of mode @var{element_mode}, return the vector mode\n\
+it should use for those elements.  If @var{nunits} is nonzero, ensure that\n\
+the mode has exactly @var{nunits} elements, otherwise pick whichever vector\n\
+size pairs the most naturally with @var{vector_mode}.  Return an empty\n\
+@code{opt_machine_mode} if there is no supported vector mode with the\n\
+required properties.\n\
+\n\
+There is no prescribed way of handling the case in which @var{nunits}\n\
+is zero.  One common choice is to pick a vector mode with the same size\n\
+as @var{vector_mode}; this is the natural choice if the target has a\n\
+fixed vector size.  Another option is to choose a vector mode with the\n\
+same number of elements as @var{vector_mode}; this is the natural choice\n\
+if the target has a fixed number of elements.  Alternatively, the hook\n\
+might choose a middle ground, such as trying to keep the number of\n\
+elements as similar as possible while applying maximum and minimum\n\
+vector sizes.\n\
+\n\
+The default implementation uses @code{mode_for_vector} to find the\n\
+requested mode, returning a mode with the same size as @var{vector_mode}\n\
+when @var{nunits} is zero.  This is the correct behavior for most targets.",
+ opt_machine_mode,
+ (machine_mode vector_mode, complex_mode element_mode, poly_uint64 nunits),
+ default_vectorize_related_mode_complex)
+
 /* Function to get a target mode for a vector mask.  */
 DEFHOOK
 (get_mask_mode,
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 4ea40c643a8..be3d80a0773 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1532,6 +1532,15 @@ default_preferred_simd_mode (scalar_mode)
   return word_mode;
 }
 
+/* By default, only attempt to parallelize bitwise operations, and
+   possibly adds/subtracts using bit-twiddling.  */
+
+machine_mode
+default_preferred_simd_mode_complex (complex_mode)
+{
+  return word_mode;
+}
+
 /* By default, call gen_rtx_CONCAT.  */
 
 rtx
@@ -1733,6 +1742,26 @@ default_vectorize_related_mode (machine_mode vector_mode,
   return opt_machine_mode ();
 }
 
+
+/* The default implementation of TARGET_VECTORIZE_RELATED_MODE_COMPLEX.  */
+
+opt_machine_mode
+default_vectorize_related_mode_complex (machine_mode vector_mode,
+					complex_mode element_mode,
+					poly_uint64 nunits)
+{
+  machine_mode result_mode;
+  if ((maybe_ne (nunits, 0U)
+       || multiple_p (GET_MODE_SIZE (vector_mode),
+		      GET_MODE_SIZE (element_mode), &nunits))
+      && mode_for_vector (element_mode, nunits).exists (&result_mode)
+      && VECTOR_MODE_P (result_mode)
+      && targetm.vector_mode_supported_p (result_mode))
+    return result_mode;
+
+  return opt_machine_mode ();
+}
+
 /* By default a vector of integers is used as a mask.  */
 
 opt_machine_mode
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 811cd6165de..2fff5ba4640 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -115,11 +115,15 @@ default_builtin_support_vector_misalignment (machine_mode mode,
 					     const_tree,
 					     int, bool);
 extern machine_mode default_preferred_simd_mode (scalar_mode mode);
+extern machine_mode default_preferred_simd_mode_complex (complex_mode mode);
 extern machine_mode default_split_reduction (machine_mode);
 extern unsigned int default_autovectorize_vector_modes (vector_modes *, bool);
 extern opt_machine_mode default_vectorize_related_mode (machine_mode,
 							scalar_mode,
 							poly_uint64);
+extern opt_machine_mode default_vectorize_related_mode_complex (machine_mode,
+								complex_mode,
+								poly_uint64);
 extern opt_machine_mode default_get_mask_mode (machine_mode);
 extern bool default_empty_mask_is_expensive (unsigned);
 extern vector_costs *default_vectorize_create_costs (vec_info *, bool);
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index a7e6cb87a5e..718b144ec23 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -1363,6 +1363,10 @@ type_for_widest_vector_mode (tree type, optab op)
     mode = MIN_MODE_VECTOR_ACCUM;
   else if (SCALAR_UACCUM_MODE_P (inner_mode))
     mode = MIN_MODE_VECTOR_UACCUM;
+  else if (COMPLEX_INT_MODE_P (inner_mode))
+    mode = MIN_MODE_VECTOR_COMPLEX_INT;
+  else if (COMPLEX_FLOAT_MODE_P (inner_mode))
+    mode = MIN_MODE_VECTOR_COMPLEX_FLOAT;
   else if (inner_mode == BImode)
     mode = MIN_MODE_VECTOR_BOOL;
   else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 10e71178ce7..2852832b7db 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12272,18 +12272,27 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
 				     tree scalar_type, poly_uint64 nunits)
 {
   tree orig_scalar_type = scalar_type;
-  scalar_mode inner_mode;
+  scalar_mode scal_mode;
+  complex_mode cplx_mode;
+  machine_mode inner_mode;
   machine_mode simd_mode;
   tree vectype;
+  bool cplx = false;
 
-  if ((!INTEGRAL_TYPE_P (scalar_type)
+  if (is_complex_int_mode (TYPE_MODE (scalar_type), &cplx_mode)
+      || is_complex_float_mode (TYPE_MODE (scalar_type), &cplx_mode))
+    cplx = true;
+
+  if ((!cplx && !INTEGRAL_TYPE_P (scalar_type)
        && !POINTER_TYPE_P (scalar_type)
        && !SCALAR_FLOAT_TYPE_P (scalar_type))
-      || (!is_int_mode (TYPE_MODE (scalar_type), &inner_mode)
-	  && !is_float_mode (TYPE_MODE (scalar_type), &inner_mode)))
+      || (!cplx && !is_int_mode (TYPE_MODE (scalar_type), &scal_mode)
+      && !is_float_mode (TYPE_MODE (scalar_type), &scal_mode)))
     return NULL_TREE;
 
-  unsigned int nbytes = GET_MODE_SIZE (inner_mode);
+  unsigned int nbytes =
+    (cplx) ? GET_MODE_SIZE (cplx_mode) : GET_MODE_SIZE (scal_mode);
+  inner_mode = (cplx) ? machine_mode (cplx_mode) : machine_mode (scal_mode);
 
   /* Interoperability between modes requires one to be a constant multiple
      of the other, so that the number of vectors required for each operation
@@ -12301,19 +12310,20 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
      they support the proper result truncation/extension.
      We also make sure to build vector types with INTEGER_TYPE
      component type only.  */
-  if (INTEGRAL_TYPE_P (scalar_type)
-      && (GET_MODE_BITSIZE (inner_mode) != TYPE_PRECISION (scalar_type)
+  if (!cplx && INTEGRAL_TYPE_P (scalar_type)
+      && (GET_MODE_BITSIZE (scal_mode) != TYPE_PRECISION (scalar_type)
 	  || TREE_CODE (scalar_type) != INTEGER_TYPE))
-    scalar_type = build_nonstandard_integer_type (GET_MODE_BITSIZE (inner_mode),
-						  TYPE_UNSIGNED (scalar_type));
+    scalar_type =
+      build_nonstandard_integer_type (GET_MODE_BITSIZE (scal_mode),
+				      TYPE_UNSIGNED (scalar_type));
 
   /* We shouldn't end up building VECTOR_TYPEs of non-scalar components.
      When the component mode passes the above test simply use a type
      corresponding to that mode.  The theory is that any use that
      would cause problems with this will disable vectorization anyway.  */
-  else if (!SCALAR_FLOAT_TYPE_P (scalar_type)
+  else if (!cplx && !SCALAR_FLOAT_TYPE_P (scalar_type)
 	   && !INTEGRAL_TYPE_P (scalar_type))
-    scalar_type = lang_hooks.types.type_for_mode (inner_mode, 1);
+    scalar_type = lang_hooks.types.type_for_mode (scal_mode, 1);
 
   /* We can't build a vector type of elements with alignment bigger than
      their size.  */
@@ -12331,7 +12341,10 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
   if (prevailing_mode == VOIDmode)
     {
       gcc_assert (known_eq (nunits, 0U));
-      simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
+
+      simd_mode = (cplx)
+	? targetm.vectorize.preferred_simd_mode_complex (cplx_mode)
+	: targetm.vectorize.preferred_simd_mode (scal_mode);
       if (SCALAR_INT_MODE_P (simd_mode))
 	{
 	  /* Traditional behavior is not to take the integer mode
@@ -12342,13 +12355,19 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
 	     Note that nunits == 1 is allowed in order to support single
 	     element vector types.  */
 	  if (!multiple_p (GET_MODE_SIZE (simd_mode), nbytes, &nunits)
-	      || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
+	      || !((cplx)
+		? mode_for_vector (cplx_mode, nunits).exists (&simd_mode)
+		: mode_for_vector (scal_mode, nunits).exists (&simd_mode)))
 	    return NULL_TREE;
 	}
     }
   else if (SCALAR_INT_MODE_P (prevailing_mode)
-	   || !related_vector_mode (prevailing_mode,
-				    inner_mode, nunits).exists (&simd_mode))
+	   || !((cplx) ? related_vector_mode (prevailing_mode,
+					      cplx_mode, nunits)
+			  .exists (&simd_mode)
+		       : related_vector_mode (prevailing_mode,
+					      scal_mode, nunits)
+			  .exists (&simd_mode)))
     {
       /* Fall back to using mode_for_vector, mostly in the hope of being
 	 able to use an integer mode.  */
@@ -12356,7 +12375,8 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
 	  && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, &nunits))
 	return NULL_TREE;
 
-      if (!mode_for_vector (inner_mode, nunits).exists (&simd_mode))
+      if (!((cplx) ? mode_for_vector (cplx_mode, nunits).exists (&simd_mode)
+	    : mode_for_vector (scal_mode, nunits).exists (&simd_mode)))
 	return NULL_TREE;
     }
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 2bc1b0d1e3f..91d49016e5b 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -10115,6 +10115,8 @@ build_vector_type_for_mode (tree innertype, machine_mode mode)
     case MODE_VECTOR_UFRACT:
     case MODE_VECTOR_ACCUM:
     case MODE_VECTOR_UACCUM:
+    case MODE_VECTOR_COMPLEX_INT:
+    case MODE_VECTOR_COMPLEX_FLOAT:
       nunits = GET_MODE_NUNITS (mode);
       break;
 
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 8/9] Native complex operations: Add explicit vector of complex
  2023-07-17  9:02 [PATCH 0/9] Native complex operations Sylvain Noiry
                   ` (6 preceding siblings ...)
  2023-07-17  9:02 ` [PATCH 7/9] Native complex operations: Vectorization of native complex operations Sylvain Noiry
@ 2023-07-17  9:02 ` Sylvain Noiry
  2023-07-17  9:02 ` [PATCH 9/9] Native complex operation: Experimental support in x86 backend Sylvain Noiry
  8 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Allow the creation and usage of builtins vectors of complex
in C, using __attribute__ ((vector_size ()))

gcc/c-family/ChangeLog:

	* c-attribs.cc (vector_mode_valid_p): Add cases for
	vectors of complex
	(handle_mode_attribute): Likewise
	(type_valid_for_vector_size): Likewise
	* c-common.cc (c_common_type_for_mode): Likewise
	(vector_types_compatible_elements_p): Likewise

gcc/ChangeLog:

	* fold-const.cc (fold_binary_loc): Likewise

gcc/c/ChangeLog:

	* c-typeck.cc (build_unary_op): Likewise
---
 gcc/c-family/c-attribs.cc | 12 ++++++++++--
 gcc/c-family/c-common.cc  | 20 +++++++++++++++++++-
 gcc/c/c-typeck.cc         |  8 ++++++--
 gcc/fold-const.cc         |  1 +
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e2792ca6898..d4de85160c1 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -2019,6 +2019,8 @@ vector_mode_valid_p (machine_mode mode)
   /* Doh!  What's going on?  */
   if (mclass != MODE_VECTOR_INT
       && mclass != MODE_VECTOR_FLOAT
+      && mclass != MODE_VECTOR_COMPLEX_INT
+      && mclass != MODE_VECTOR_COMPLEX_FLOAT
       && mclass != MODE_VECTOR_FRACT
       && mclass != MODE_VECTOR_UFRACT
       && mclass != MODE_VECTOR_ACCUM
@@ -2125,6 +2127,8 @@ handle_mode_attribute (tree *node, tree name, tree args,
 
 	case MODE_VECTOR_INT:
 	case MODE_VECTOR_FLOAT:
+	case MODE_VECTOR_COMPLEX_INT:
+	case MODE_VECTOR_COMPLEX_FLOAT:
 	case MODE_VECTOR_FRACT:
 	case MODE_VECTOR_UFRACT:
 	case MODE_VECTOR_ACCUM:
@@ -4361,9 +4365,13 @@ type_valid_for_vector_size (tree type, tree atname, tree args,
 
   if ((!INTEGRAL_TYPE_P (type)
        && !SCALAR_FLOAT_TYPE_P (type)
+       && !COMPLEX_INTEGER_TYPE_P (type)
+       && !COMPLEX_FLOAT_TYPE_P (type)
        && !FIXED_POINT_TYPE_P (type))
-      || (!SCALAR_FLOAT_MODE_P (orig_mode)
-	  && GET_MODE_CLASS (orig_mode) != MODE_INT
+      || ((!SCALAR_FLOAT_MODE_P (orig_mode)
+	   && GET_MODE_CLASS (orig_mode) != MODE_INT)
+	  && (!COMPLEX_FLOAT_MODE_P (orig_mode)
+	      && GET_MODE_CLASS (orig_mode) != MODE_COMPLEX_INT)
 	  && !ALL_SCALAR_FIXED_POINT_MODE_P (orig_mode))
       || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type))
       || TREE_CODE (type) == BOOLEAN_TYPE)
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 6ab63dae997..9574c074d26 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -2430,7 +2430,23 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
 	      : make_signed_type (precision));
     }
 
-  if (COMPLEX_MODE_P (mode))
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
+	   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
+    {
+      unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+						    GET_MODE_NUNITS (mode));
+      tree bool_type = build_nonstandard_boolean_type (elem_bits);
+      return build_vector_type_for_mode (bool_type, mode);
+    }
+  else if (VECTOR_MODE_P (mode)
+	   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
+    {
+      machine_mode inner_mode = GET_MODE_INNER (mode);
+      tree inner_type = c_common_type_for_mode (inner_mode, unsignedp);
+      if (inner_type != NULL_TREE)
+	return build_vector_type_for_mode (inner_type, mode);
+    }
+  else if (COMPLEX_MODE_P (mode))
     {
       machine_mode inner_mode;
       tree inner_type;
@@ -8104,9 +8120,11 @@ vector_types_compatible_elements_p (tree t1, tree t2)
 
   gcc_assert ((INTEGRAL_TYPE_P (t1)
 	       || c1 == REAL_TYPE
+	       || c1 == COMPLEX_TYPE
 	       || c1 == FIXED_POINT_TYPE)
 	      && (INTEGRAL_TYPE_P (t2)
 		  || c2 == REAL_TYPE
+		  || c2 == COMPLEX_TYPE
 		  || c2 == FIXED_POINT_TYPE));
 
   t1 = c_common_signed_type (t1);
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index 7cf411155c6..68a9646cf5b 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -4584,7 +4584,9 @@ build_unary_op (location_t location, enum tree_code code, tree xarg,
       /* ~ works on integer types and non float vectors. */
       if (typecode == INTEGER_TYPE
 	  || (gnu_vector_type_p (TREE_TYPE (arg))
-	      && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg))))
+	      && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg))
+	      && !COMPLEX_INTEGER_TYPE_P (TREE_TYPE (TREE_TYPE (arg)))
+	      && !COMPLEX_FLOAT_TYPE_P (TREE_TYPE (TREE_TYPE (arg)))))
 	{
 	  tree e = arg;
 
@@ -4607,7 +4609,9 @@ build_unary_op (location_t location, enum tree_code code, tree xarg,
 	  if (!noconvert)
 	    arg = default_conversion (arg);
 	}
-      else if (typecode == COMPLEX_TYPE)
+      else if (typecode == COMPLEX_TYPE
+	  || COMPLEX_INTEGER_TYPE_P (TREE_TYPE (TREE_TYPE (arg)))
+	  || COMPLEX_FLOAT_TYPE_P (TREE_TYPE (TREE_TYPE (arg))))
 	{
 	  code = CONJ_EXPR;
 	  pedwarn (location, OPT_Wpedantic,
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index f1224b6a548..9e9f711e82d 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -11109,6 +11109,7 @@ fold_binary_loc (location_t loc, enum tree_code code, tree type,
 	     to __complex__ ( x, y ).  This is not the same for SNaNs or
 	     if signed zeros are involved.  */
 	  if (!HONOR_SNANS (arg0)
+	      && !(VECTOR_TYPE_P (TREE_TYPE (arg0)))
 	      && !HONOR_SIGNED_ZEROS (arg0)
 	      && COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0)))
 	    {
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 9/9] Native complex operation: Experimental support in x86 backend
  2023-07-17  9:02 [PATCH 0/9] Native complex operations Sylvain Noiry
                   ` (7 preceding siblings ...)
  2023-07-17  9:02 ` [PATCH 8/9] Native complex operations: Add explicit vector of complex Sylvain Noiry
@ 2023-07-17  9:02 ` Sylvain Noiry
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
  8 siblings, 1 reply; 24+ messages in thread
From: Sylvain Noiry @ 2023-07-17  9:02 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Add an experimental support for native complex operation handling in
the x86 backend. For now it only support add, sub, mul, conj, neg, mov
in SCmode (complex float). Performance gains are still marginal on this
target because there are no particular instructions to speedup complex
operation, except some SIMD tricks.

gcc/ChangeLog:

	* config/i386/i386.cc (classify_argument): Align complex
	element to the whole size, not size of the parts
	(ix86_return_in_memory): Handle complex modes like a scalar
	with the same size
	(ix86_class_max_nregs): Likewise
	(ix86_hard_regno_nregs): Likewise
	(function_value_ms_64): Add case for SCmode
	(ix86_build_const_vector): Likewise
	(ix86_build_signbit_mask): Likewise
	(x86_gen_rtx_complex): New: Implement the gen_rtx_complex
	hook, use registers of complex modes to represent complex
	elements in rtl
	(x86_read_complex_part): New: Implement the read_complex_part
	hook, handle registers of complex modes
	(x86_write_complex_part): New: Implement the write_complex_part
	hook, handle registers of complex modes
	* config/i386/i386.h: Add SCmode in several predicates
	* config/i386/sse.md: Add pattern for some complex operations in
	SCmode. This includes movsc, addsc3, subsc3, negsc2, mulsc3,
	and conjsc2
---
 gcc/config/i386/i386.cc | 296 +++++++++++++++++++++++++++++++++++++++-
 gcc/config/i386/i386.h  |  11 +-
 gcc/config/i386/sse.md  | 144 +++++++++++++++++++
 3 files changed, 440 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index f0d6167e667..a65ac92a4a9 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -2339,8 +2339,8 @@ classify_argument (machine_mode mode, const_tree type,
 	mode_alignment = 128;
       else if (mode == XCmode)
 	mode_alignment = 256;
-      if (COMPLEX_MODE_P (mode))
-	mode_alignment /= 2;
+      /*if (COMPLEX_MODE_P (mode))
+	mode_alignment /= 2;*/
       /* Misaligned fields are always returned in memory.  */
       if (bit_offset % mode_alignment)
 	return 0;
@@ -3007,6 +3007,7 @@ pass_in_reg:
     case E_V4BFmode:
     case E_V2SImode:
     case E_V2SFmode:
+    case E_SCmode:
     case E_V1TImode:
     case E_V1DImode:
       if (!type || !AGGREGATE_TYPE_P (type))
@@ -3257,6 +3258,7 @@ pass_in_reg:
     case E_V4BFmode:
     case E_V2SImode:
     case E_V2SFmode:
+    case E_SCmode:
     case E_V1TImode:
     case E_V1DImode:
       if (!type || !AGGREGATE_TYPE_P (type))
@@ -4158,8 +4160,8 @@ function_value_ms_64 (machine_mode orig_mode, machine_mode mode,
 	      && !INTEGRAL_TYPE_P (valtype)
 	      && !VECTOR_FLOAT_TYPE_P (valtype))
 	    break;
-	  if ((SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode))
-	      && !COMPLEX_MODE_P (mode))
+	  if ((SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode)))
+	     // && !COMPLEX_MODE_P (mode))
 	    regno = FIRST_SSE_REG;
 	  break;
 	case 8:
@@ -4266,7 +4268,7 @@ ix86_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED)
 	       || INTEGRAL_TYPE_P (type)
 	       || VECTOR_FLOAT_TYPE_P (type))
 	      && (SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode))
-	      && !COMPLEX_MODE_P (mode)
+	      //&& !COMPLEX_MODE_P (mode)
 	      && (GET_MODE_SIZE (mode) == 16 || size == 16))
 	    return false;
 
@@ -15722,6 +15724,7 @@ ix86_build_const_vector (machine_mode mode, bool vect, rtx value)
     case E_V8SFmode:
     case E_V4SFmode:
     case E_V2SFmode:
+    case E_SCmode:
     case E_V8DFmode:
     case E_V4DFmode:
     case E_V2DFmode:
@@ -15770,6 +15773,7 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, bool invert)
     case E_V8SFmode:
     case E_V4SFmode:
     case E_V2SFmode:
+    case E_SCmode:
     case E_V2SImode:
       vec_mode = mode;
       imode = SImode;
@@ -19821,7 +19825,8 @@ ix86_class_max_nregs (reg_class_t rclass, machine_mode mode)
   else
     {
       if (COMPLEX_MODE_P (mode))
-	return 2;
+	return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD);
+	//return 2;
       else
 	return 1;
     }
@@ -20157,7 +20162,8 @@ ix86_hard_regno_nregs (unsigned int regno, machine_mode mode)
       return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD);
     }
   if (COMPLEX_MODE_P (mode))
-    return 2;
+    return 1;
+    //return 2;
   /* Register pair for mask registers.  */
   if (mode == P2QImode || mode == P2HImode)
     return 2;
@@ -23613,6 +23619,273 @@ ix86_preferred_simd_mode (scalar_mode mode)
     }
 }
 
+static rtx
+x86_gen_rtx_complex (machine_mode mode, rtx real_part, rtx imag_part)
+{
+  machine_mode imode = GET_MODE_INNER (mode);
+
+  if ((real_part == imag_part) && (real_part == CONST0_RTX (imode)))
+    {
+      if (CONST_DOUBLE_P (real_part))
+       return const_double_from_real_value (dconst0, mode);
+      else if (CONST_INT_P (real_part))
+       return GEN_INT (0);
+      else
+       gcc_unreachable ();
+    }
+
+  bool saved_generating_concat_p = generating_concat_p;
+  generating_concat_p = false;
+  rtx complex_reg = gen_reg_rtx (mode);
+  generating_concat_p = saved_generating_concat_p;
+
+  if (real_part)
+    {
+      gcc_assert (imode == GET_MODE (real_part));
+      write_complex_part (complex_reg, real_part, REAL_P, false);
+    }
+
+  if (imag_part)
+    {
+      gcc_assert (imode == GET_MODE (imag_part));
+      write_complex_part (complex_reg, imag_part, IMAG_P, false);
+    }
+
+  return complex_reg;
+}
+
+static rtx
+x86_read_complex_part (rtx cplx, complex_part_t part)
+{
+  machine_mode cmode;
+  scalar_mode imode;
+  unsigned ibitsize;
+
+  if (GET_CODE (cplx) == CONCAT)
+    return XEXP (cplx, part);
+
+  cmode = GET_MODE (cplx);
+  imode = GET_MODE_INNER (cmode);
+  ibitsize = GET_MODE_BITSIZE (imode);
+
+  if (COMPLEX_MODE_P (cmode) && (part == BOTH_P))
+    return cplx;
+
+  /* For constants under 32-bit vector constans are folded during expand,
+   * so we need to compensate for it as cplx is an integer constant
+   * In this case cmode and imode are equal */
+  if (cmode == imode)
+    ibitsize /= 2;
+
+  if (cmode == E_VOIDmode)
+    return cplx;               /* FIXME case used when initialising mock in a complex register */
+
+  if ((cmode == E_DCmode) && (GET_CODE (cplx) == CONST_DOUBLE))        /* FIXME stop generation of DC const_double, because not patterns and wired */
+    return CONST0_RTX (E_DFmode);
+  /* verify aswell SC const_double */
+
+  /* Special case reads from complex constants that got spilled to memory.  */
+  if (MEM_P (cplx) && GET_CODE (XEXP (cplx, 0)) == SYMBOL_REF)
+    {
+      tree decl = SYMBOL_REF_DECL (XEXP (cplx, 0));
+      if (decl && TREE_CODE (decl) == COMPLEX_CST)
+	{
+	  tree cplx_part = (part == IMAG_P) ? TREE_IMAGPART (decl)
+			  : (part == REAL_P) ? TREE_REALPART (decl)
+			  : TREE_COMPLEX_BOTH_PARTS (decl);
+	if (CONSTANT_CLASS_P (cplx_part))
+	  return expand_expr (cplx_part, NULL_RTX, imode, EXPAND_NORMAL);
+	}
+    }
+
+  /* For MEMs simplify_gen_subreg may generate an invalid new address
+     because, e.g., the original address is considered mode-dependent
+     by the target, which restricts simplify_subreg from invoking
+     adjust_address_nv.  Instead of preparing fallback support for an
+     invalid address, we call adjust_address_nv directly.  */
+  if (MEM_P (cplx))
+    {
+      if (part == BOTH_P)
+       return adjust_address_nv (cplx, cmode, 0);
+      else
+       return adjust_address_nv (cplx, imode, (part == IMAG_P)
+				 ? GET_MODE_SIZE (imode) : 0);
+    }
+
+  /* If the sub-object is at least word sized, then we know that subregging
+     will work.  This special case is important, since extract_bit_field
+     wants to operate on integer modes, and there's rarely an OImode to
+     correspond to TCmode.  */
+  if (ibitsize >= BITS_PER_WORD
+      /* For hard regs we have exact predicates.  Assume we can split
+	 the original object if it spans an even number of hard regs.
+	 This special case is important for SCmode on 64-bit platforms
+	 where the natural size of floating-point regs is 32-bit.  */
+      || (REG_P (cplx)
+	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
+	  && REG_NREGS (cplx) % 2 == 0))
+    {
+      rtx ret = simplify_gen_subreg (imode, cplx, cmode, (part == IMAG_P)
+				     ? GET_MODE_SIZE (imode) : 0);
+      if (ret)
+       return ret;
+      else
+       /* simplify_gen_subreg may fail for sub-word MEMs.  */
+       gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
+    }
+
+  if (part == BOTH_P)
+    return extract_bit_field (cplx, 2 * ibitsize, 0, true, NULL_RTX, cmode,
+			      cmode, false, NULL);
+  else
+    return extract_bit_field (cplx, ibitsize, (part == IMAG_P) ? ibitsize : 0,
+			      true, NULL_RTX, imode, imode, false, NULL);
+}
+
+static void
+x86_write_complex_part (rtx cplx, rtx val, complex_part_t part, bool undefined_p)
+{
+  machine_mode cmode;
+  scalar_mode imode;
+  unsigned ibitsize;
+
+  cmode = GET_MODE (cplx);
+  imode = GET_MODE_INNER (cmode);
+  ibitsize = GET_MODE_BITSIZE (imode);
+
+  /* special case for constants */
+  if (GET_CODE (val) == CONST_VECTOR)
+    {
+      if (part == BOTH_P)
+	{
+	  machine_mode temp_mode = E_BLKmode;;
+	  switch (cmode)
+	    {
+	    case E_CQImode:
+	      temp_mode = E_HImode;
+	      break;
+	    case E_CHImode:
+	      temp_mode = E_SImode;
+	      break;
+	    case E_CSImode:
+	      temp_mode = E_DImode;
+	      break;
+	    case E_SCmode:
+	      temp_mode = E_DFmode;
+	      break;
+	    case E_CDImode:
+	      temp_mode = E_TImode;
+	      break;
+	    case E_DCmode:
+	    default:
+	      break;
+	    }
+
+	  if (temp_mode != E_BLKmode)
+	    {
+	      rtx temp_reg = gen_reg_rtx (temp_mode);
+	      store_bit_field (temp_reg, GET_MODE_BITSIZE (temp_mode), 0, 0,
+			       0, GET_MODE (val), val, false, undefined_p);
+	      emit_move_insn (cplx,
+			      simplify_gen_subreg (cmode, temp_reg, temp_mode,
+						   0));
+	    }
+	  else
+	    {
+	      /* write real part and imag part separetly */
+	      gcc_assert (GET_CODE (val) == CONST_VECTOR);
+	      write_complex_part (cplx, const_vector_elt (val, 0), REAL_P, false);
+	      write_complex_part (cplx, const_vector_elt (val, 1), IMAG_P, false);
+	    }
+	}
+      else
+	write_complex_part (cplx,
+			    const_vector_elt (val,
+			    ((part == REAL_P) ? 0 : 1)),
+			    part, false);
+      return;
+    }
+
+  if ((part == BOTH_P) && !MEM_P (cplx)
+      /*&& (optab_handler (mov_optab, cmode) != CODE_FOR_nothing)*/)
+    {
+      write_complex_part (cplx, read_complex_part(cplx, REAL_P), REAL_P, undefined_p);
+      write_complex_part (cplx, read_complex_part(cplx, IMAG_P), IMAG_P, undefined_p);
+      //emit_move_insn (cplx, val);
+      return;
+    }
+
+  if ((GET_CODE (val) == CONST_DOUBLE) || (GET_CODE (val) == CONST_INT))
+    {
+      if (part == REAL_P)
+	{
+	  emit_move_insn (gen_lowpart (imode, cplx), val);
+	  return;
+	}
+      else if (part == IMAG_P)
+	{
+	  /* cannot set highpart of a pseudo register */
+	  if (REGNO (cplx) < FIRST_PSEUDO_REGISTER)
+	    {
+	      emit_move_insn (gen_highpart (imode, cplx), val);
+	      return;
+	    }
+	}
+      else
+	gcc_unreachable ();
+    }
+
+  if (GET_CODE (cplx) == CONCAT)
+    {
+      emit_move_insn (XEXP (cplx, part), val);
+      return;
+    }
+
+  /* For MEMs simplify_gen_subreg may generate an invalid new address
+     because, e.g., the original address is considered mode-dependent
+     by the target, which restricts simplify_subreg from invoking
+     adjust_address_nv.  Instead of preparing fallback support for an
+     invalid address, we call adjust_address_nv directly.  */
+  if (MEM_P (cplx))
+    {
+      if (part == BOTH_P)
+       emit_move_insn (adjust_address_nv (cplx, cmode, 0), val);
+      else
+       emit_move_insn (adjust_address_nv (cplx, imode, (part == IMAG_P)
+					  ? GET_MODE_SIZE (imode) : 0), val);
+      return;
+    }
+
+  /* If the sub-object is at least word sized, then we know that subregging
+     will work.  This special case is important, since store_bit_field
+     wants to operate on integer modes, and there's rarely an OImode to
+     correspond to TCmode.  */
+  if (ibitsize >= BITS_PER_WORD
+      /* For hard regs we have exact predicates.  Assume we can split
+	 the original object if it spans an even number of hard regs.
+	 This special case is important for SCmode on 64-bit platforms
+	 where the natural size of floating-point regs is 32-bit.  */
+      || (REG_P (cplx)
+	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
+	  && REG_NREGS (cplx) % 2 == 0))
+    {
+      rtx cplx_part = simplify_gen_subreg (imode, cplx, cmode,
+					   (part == IMAG_P)
+					   ? GET_MODE_SIZE (imode) : 0);
+      if (cplx_part)
+	{
+	  emit_move_insn (cplx_part, val);
+	  return;
+	}
+      else
+       /* simplify_gen_subreg may fail for sub-word MEMs.  */
+       gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
+    }
+
+  store_bit_field (cplx, ibitsize, (part == IMAG_P) ? ibitsize : 0, 0, 0,
+		   imode, val, false, undefined_p);
+}
+
 /* If AVX is enabled then try vectorizing with both 256bit and 128bit
    vectors.  If AVX512F is enabled then try vectorizing with 512bit,
    256bit and 128bit vectors.  */
@@ -25621,6 +25894,15 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_IFUNC_REF_LOCAL_OK
 #define TARGET_IFUNC_REF_LOCAL_OK ix86_ifunc_ref_local_ok
 
+#undef TARGET_GEN_RTX_COMPLEX
+#define TARGET_GEN_RTX_COMPLEX x86_gen_rtx_complex
+
+#undef TARGET_READ_COMPLEX_PART
+#define TARGET_READ_COMPLEX_PART x86_read_complex_part
+
+#undef TARGET_WRITE_COMPLEX_PART
+#define TARGET_WRITE_COMPLEX_PART x86_write_complex_part
+
 #if !TARGET_MACHO && !TARGET_DLLIMPORT_DECL_ATTRIBUTES
 # undef TARGET_ASM_RELOC_RW_MASK
 # define TARGET_ASM_RELOC_RW_MASK ix86_reloc_rw_mask
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index aea3209d5a3..86157b97b25 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1054,7 +1054,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode	\
    || (MODE) == V2DImode || (MODE) == V2QImode				\
    || (MODE) == DFmode	|| (MODE) == DImode				\
-   || (MODE) == HFmode || (MODE) == BFmode)
+   || (MODE) == HFmode || (MODE) == BFmode				\
+   || (MODE) == SCmode)
 
 #define VALID_SSE_REG_MODE(MODE)					\
   ((MODE) == V1TImode || (MODE) == TImode				\
@@ -1063,7 +1064,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == TFmode || (MODE) == TDmode)
 
 #define VALID_MMX_REG_MODE_3DNOW(MODE) \
-  ((MODE) == V2SFmode || (MODE) == SFmode)
+  ((MODE) == V2SFmode || (MODE) == SFmode || (MODE) == SCmode)
 
 /* To match ia32 psABI, V4HFmode should be added here.  */
 #define VALID_MMX_REG_MODE(MODE)					\
@@ -1106,13 +1107,15 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode \
    || (MODE) == V32HFmode || (MODE) == V16HFmode || (MODE) == V8HFmode  \
-   || (MODE) == V32BFmode || (MODE) == V16BFmode || (MODE) == V8BFmode)
+   || (MODE) == V32BFmode || (MODE) == V16BFmode || (MODE) == V8BFmode	\
+   || (MODE) == SCmode)
 
 #define X87_FLOAT_MODE_P(MODE)	\
   (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode))
 
 #define SSE_FLOAT_MODE_P(MODE) \
-  ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
+  ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode) \
+   || (TARGET_SSE2 && (MODE) == SCmode))
 
 #define SSE_FLOAT_MODE_SSEMATH_OR_HF_P(MODE)				\
   ((SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH)				\
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 6bf9c99a2c1..b2b354c439e 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -30209,3 +30209,147 @@
   "vcvtneo<bf16_ph>2ps\t{%1, %0|%0, %1}"
   [(set_attr "prefix" "vex")
    (set_attr "mode" "<sseinsnmode>")])
+
+(define_expand "movsc"
+  [(match_operand:SC 0 "nonimmediate_operand" "")
+   (match_operand:SC 1 "nonimmediate_operand" "")]
+  ""
+  {
+    emit_insn (gen_movv2sf (simplify_gen_subreg (V2SFmode, operands[0], SCmode, 0),
+			    simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0)));
+    DONE;
+  }
+)
+
+(define_expand "addsc3"
+  [(match_operand:SC 0 "register_operand" "=r")
+   (match_operand:SC 1 "register_operand" "r")
+   (match_operand:SC 2 "register_operand" "r")]
+  ""
+  {
+    emit_insn (gen_addv2sf3 (simplify_gen_subreg (V2SFmode, operands[0], SCmode, 0),
+			     simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0),
+			     simplify_gen_subreg (V2SFmode, operands[2], SCmode, 0)));
+    DONE;
+  }
+)
+
+(define_expand "subsc3"
+  [(match_operand:SC 0 "register_operand" "=r")
+   (match_operand:SC 1 "register_operand" "r")
+   (match_operand:SC 2 "register_operand" "r")]
+  ""
+  {
+    emit_insn (gen_subv2sf3 (simplify_gen_subreg (V2SFmode, operands[0], SCmode, 0),
+			     simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0),
+			     simplify_gen_subreg (V2SFmode, operands[2], SCmode, 0)));
+    DONE;
+  }
+)
+
+(define_expand "negsc2"
+  [(match_operand:SC 0 "register_operand" "=r")
+   (match_operand:SC 1 "register_operand" "r")]
+  ""
+  {
+    emit_insn (gen_negv2sf2 (simplify_gen_subreg (V2SFmode, operands[0], SCmode, 0),
+                             simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0)));
+    DONE;
+  }
+)
+
+(define_expand "sse_shufsc"
+  [(match_operand:V4SF 0 "register_operand")
+   (match_operand:SC 1 "register_operand")
+   (match_operand:SC 2 "vector_operand")
+   (match_operand:SI 3 "const_int_operand")]
+  "TARGET_SSE"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_sse_shufsc_sc (operands[0],
+						     operands[1],
+						     operands[2],
+						     GEN_INT ((mask >> 0) & 3),
+						     GEN_INT ((mask >> 2) & 3),
+						     GEN_INT (((mask >> 4) & 3) + 4),
+						     GEN_INT (((mask >> 6) & 3) + 4)));
+  DONE;
+})
+
+(define_insn "sse_shufsc_sc"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,v")
+	(vec_select:V4SF
+	  (vec_concat:V4SF
+	    (match_operand:V2SF 1 "register_operand" "0,v")
+	    (match_operand:V2SF 2 "vector_operand" "xBm,vm"))
+	  (parallel [(match_operand 3 "const_0_to_3_operand")
+		     (match_operand 4 "const_0_to_3_operand")
+		     (match_operand 5 "const_4_to_7_operand")
+		     (match_operand 6 "const_4_to_7_operand")])))]
+  "TARGET_SSE"
+{
+  int mask = 0;
+  mask |= INTVAL (operands[3]) << 0;
+  mask |= INTVAL (operands[4]) << 2;
+  mask |= (INTVAL (operands[5]) - 4) << 4;
+  mask |= (INTVAL (operands[6]) - 4) << 6;
+  operands[3] = GEN_INT (mask);
+
+  switch (which_alternative)
+    {
+    case 0:
+      return "shufps\t{%3, %2, %0|%0, %2, %3}";
+    case 1:
+      return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseshuf")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "mode" "V4SF")])
+
+(define_expand "mulsc3"
+  [(match_operand:SC 0 "register_operand" "=r")
+   (match_operand:SC 1 "register_operand" "r")
+   (match_operand:SC 2 "register_operand" "r")]
+  "TARGET_SSE3"
+  {
+    rtx a = gen_reg_rtx (V4SFmode);
+    rtx b = gen_reg_rtx (V4SFmode);
+    emit_insn (gen_sse_shufsc (a,
+                                    simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0),
+                                    simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0),
+                                    GEN_INT (0b01000100)));
+    emit_insn (gen_sse_shufsc (b,
+                                    simplify_gen_subreg (V2SFmode, operands[2], SCmode, 0),
+                                    simplify_gen_subreg (V2SFmode, operands[2], SCmode, 0),
+                                    GEN_INT (0b00010100)));
+    emit_insn (gen_mulv4sf3 (a, a, b));
+    emit_insn (gen_sse_shufps (b,
+                                    a,
+                                    a,
+                                    GEN_INT (0b00001101)));
+    emit_insn (gen_sse_shufps (a,
+                                    a,
+                                    a,
+                                    GEN_INT (0b00001000)));
+    emit_insn (gen_vec_addsubv2sf3 (simplify_gen_subreg (V2SFmode, operands[0], SCmode, 0),
+				    simplify_gen_subreg (V2SFmode, a, V4SFmode, 0),
+				    simplify_gen_subreg (V2SFmode, b, V4SFmode, 0)));
+    DONE;
+  }
+)
+
+(define_expand "conjsc2"
+  [(match_operand:SC 0 "register_operand" "=r")
+   (match_operand:SC 1 "register_operand" "r")]
+  ""
+  {
+    emit_insn (gen_negdf2 (simplify_gen_subreg (DFmode, operands[0], SCmode, 0),
+			   simplify_gen_subreg (DFmode, operands[1], SCmode, 0)));
+    DONE;
+  }
+)
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 0/11] Native complex operations
  2023-07-17  9:02 ` [PATCH 9/9] Native complex operation: Experimental support in x86 backend Sylvain Noiry
@ 2023-09-12 10:07   ` Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 01/11] Native complex ops : Conditional lowering Sylvain Noiry
                       ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches

I have updated the series of patches. Most changes consist of bug fixes.

However 2 new patches add features:

PATCH 9/11: Remove useless special cases

    This patch remove two special cases for complex which are now fairly 
    enough handled by the general case. Don't hesitate to tell me if you 
    think I'm wrong.

PATCH 10/11: Add a fast complex multiplication pattern

    In some cases where the target machine does not have a dedicated instruction 
    for a floating point operation, we may let gcc expand it into a series of 
    basics operations, and IEEE checks are automatically added. However it may 
    be interesting for a backend developer to write its own fast path of an 
    emulated operation, without the need to check IEEE manually. This is what a 
    fast pattern stands for. For example, it's possible to write a fast emulated 
    complex multiplication pattern, but let gcc check if the result is correct, 
    or call the helper elsewhere. 

The experimental x86 support is now patch number 11.

Thanks,

Sylvain






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 01/11] Native complex ops : Conditional lowering
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 02/11] Native complex ops: Move functions to hooks Sylvain Noiry
                       ` (9 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Allow the cplxlower pass to identify if an operation does not need
to be lowered through optabs. In this case, lowering is not performed.
The cplxlower pass now has to handle a mix of lowered and non-lowered
operations. A quick access to both parts of a complex constant is
also implemented.

gcc/lto/ChangeLog:

	* lto-common.cc (compare_tree_sccs_1): Handle both parts of a
	  complex constant

gcc/ChangeLog:

	* coretypes.h: Add enum for complex parts
	* gensupport.cc (match_pattern): Add complex types
	* lto-streamer-out.cc (DFS::DFS_write_tree_body):
	(hash_tree): Handle both parts of a complex constant
	* tree-complex.cc (get_component_var): Support handling of
	both parts of a complex
	(get_component_ssa_name): Likewise
	(set_component_ssa_name): Likewise
	(extract_component): Likewise
	(update_complex_components): Likewise
	(update_complex_components_on_edge): Likewise
	(update_complex_assignment): Likewise
	(update_phi_components): Likewise
	(expand_complex_move): Likewise
	(expand_complex_asm): Update with complex_part_t
	(complex_component_cst_p): New: check if a complex
	component is a constant
	(target_native_complex_operation): New: Check if complex
	operation is supported natively by the backend, through
	the optab
	(expand_complex_operations_1): Condionally lowered ops
	(tree_lower_complex): Support handling of both parts of
	 a complex
	* tree-core.h (struct GTY): Add field for both parts of
	the tree_complex struct
	* tree-streamer-in.cc (lto_input_ts_complex_tree_pointers):
	Handle both parts of a complex constant
	* tree-streamer-out.cc (write_ts_complex_tree_pointers):
	Likewise
	* tree.cc (build_complex): likewise
	* tree.h (class auto_suppress_location_wrappers):
	(type_has_mode_precision_p): Add special case for complex
	* tree-dfa.cc (get_ref_base_and_extent): Handle REALPART_EXPR
	and IMAGPART_EXPR
---
 gcc/coretypes.h          |  11 ++
 gcc/gensupport.cc        |   2 +
 gcc/lto-streamer-out.cc  |   2 +
 gcc/lto/lto-common.cc    |   2 +
 gcc/tree-complex.cc      | 401 ++++++++++++++++++++++++++++++---------
 gcc/tree-core.h          |   1 +
 gcc/tree-dfa.cc          |   3 +
 gcc/tree-streamer-in.cc  |   1 +
 gcc/tree-streamer-out.cc |   1 +
 gcc/tree.cc              |   8 +
 gcc/tree.h               |  15 +-
 11 files changed, 358 insertions(+), 89 deletions(-)

diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index f86dc169a40..76f49f25cad 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -448,6 +448,17 @@ enum optimize_size_level
   OPTIMIZE_SIZE_MAX
 };
 
+/* Part of a complex.  */
+
+enum complex_part_e
+{
+  REAL_P = 0,
+  IMAG_P = 1,
+  BOTH_P = 2
+};
+
+typedef enum complex_part_e complex_part_t;
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second either NULL if the pointer to
    pointer points into a GC object or the actual pointer address if
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index f7164b3214d..54f7b3cfe81 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -3746,9 +3746,11 @@ match_pattern (optab_pattern *p, const char *name, const char *pat)
 		    break;
 		if (*p == 0
 		    && (! force_int || mode_class[i] == MODE_INT
+			|| mode_class[i] == MODE_COMPLEX_INT
 			|| mode_class[i] == MODE_VECTOR_INT)
 		    && (! force_partial_int
 			|| mode_class[i] == MODE_INT
+			|| mode_class[i] == MODE_COMPLEX_INT
 			|| mode_class[i] == MODE_PARTIAL_INT
 			|| mode_class[i] == MODE_VECTOR_INT)
 		    && (! force_float
diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 5ffa8954022..38c48e44867 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -985,6 +985,7 @@ DFS::DFS_write_tree_body (struct output_block *ob,
     {
       DFS_follow_tree_edge (TREE_REALPART (expr));
       DFS_follow_tree_edge (TREE_IMAGPART (expr));
+      DFS_follow_tree_edge (TREE_COMPLEX_BOTH_PARTS (expr));
     }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
@@ -1417,6 +1418,7 @@ hash_tree (struct streamer_tree_cache_d *cache, hash_map<tree, hashval_t> *map,
     {
       visit (TREE_REALPART (t));
       visit (TREE_IMAGPART (t));
+      visit (TREE_COMPLEX_BOTH_PARTS (t));
     }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 703e665b698..f647ee62f9e 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -1408,6 +1408,8 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
     {
       compare_tree_edges (TREE_REALPART (t1), TREE_REALPART (t2));
       compare_tree_edges (TREE_IMAGPART (t1), TREE_IMAGPART (t2));
+      compare_tree_edges (TREE_COMPLEX_BOTH_PARTS (t1),
+			  TREE_COMPLEX_BOTH_PARTS (t2));
     }
 
   if (CODE_CONTAINS_STRUCT (code, TS_DECL_MINIMAL))
diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
index 688fe13989c..d889a99d513 100644
--- a/gcc/tree-complex.cc
+++ b/gcc/tree-complex.cc
@@ -42,6 +42,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfganal.h"
 #include "gimple-fold.h"
 #include "diagnostic-core.h"
+#include "target.h"
+#include "memmodel.h"
+#include "optabs-tree.h"
+#include "internal-fn.h"
 
 
 /* For each complex ssa name, a lattice value.  We're interested in finding
@@ -74,7 +78,7 @@ static vec<complex_lattice_t> complex_lattice_values;
    the hashtable.  */
 static int_tree_htab_type *complex_variable_components;
 
-/* For each complex SSA_NAME, a pair of ssa names for the components.  */
+/* For each complex SSA_NAME, three ssa names for the components.  */
 static vec<tree> complex_ssa_name_components;
 
 /* Vector of PHI triplets (original complex PHI and corresponding real and
@@ -476,17 +480,27 @@ create_one_component_var (tree type, tree orig, const char *prefix,
 /* Retrieve a value for a complex component of VAR.  */
 
 static tree
-get_component_var (tree var, bool imag_p)
+get_component_var (tree var, complex_part_t part)
 {
-  size_t decl_index = DECL_UID (var) * 2 + imag_p;
+  size_t decl_index = DECL_UID (var) * 3 + part;
   tree ret = cvc_lookup (decl_index);
 
   if (ret == NULL)
     {
-      ret = create_one_component_var (TREE_TYPE (TREE_TYPE (var)), var,
-				      imag_p ? "CI" : "CR",
-				      imag_p ? "$imag" : "$real",
-				      imag_p ? IMAGPART_EXPR : REALPART_EXPR);
+      switch (part)
+	{
+	case REAL_P:
+	  ret = create_one_component_var (TREE_TYPE (TREE_TYPE (var)), var,
+					  "CR", "$real", REALPART_EXPR);
+	  break;
+	case IMAG_P:
+	  ret = create_one_component_var (TREE_TYPE (TREE_TYPE (var)), var,
+					  "CI", "$imag", IMAGPART_EXPR);
+	  break;
+	case BOTH_P:
+	  ret = var;
+	  break;
+	}
       cvc_insert (decl_index, ret);
     }
 
@@ -496,13 +510,15 @@ get_component_var (tree var, bool imag_p)
 /* Retrieve a value for a complex component of SSA_NAME.  */
 
 static tree
-get_component_ssa_name (tree ssa_name, bool imag_p)
+get_component_ssa_name (tree ssa_name, complex_part_t part)
 {
   complex_lattice_t lattice = find_lattice_value (ssa_name);
   size_t ssa_name_index;
   tree ret;
 
-  if (lattice == (imag_p ? ONLY_REAL : ONLY_IMAG))
+  if (((lattice == ONLY_IMAG) && (part == REAL_P))
+      || ((lattice == ONLY_REAL) && (part == IMAG_P)))
+
     {
       tree inner_type = TREE_TYPE (TREE_TYPE (ssa_name));
       if (SCALAR_FLOAT_TYPE_P (inner_type))
@@ -511,14 +527,33 @@ get_component_ssa_name (tree ssa_name, bool imag_p)
 	return build_int_cst (inner_type, 0);
     }
 
-  ssa_name_index = SSA_NAME_VERSION (ssa_name) * 2 + imag_p;
+  if (part == BOTH_P)
+    return ssa_name;
+
+  ssa_name_index = SSA_NAME_VERSION (ssa_name) * 3 + part;
+  unsigned length = complex_ssa_name_components.length ();
+
+  /* Increase size of dynamic array if needed.  */
+  if (ssa_name_index >= length)
+    {
+      size_t new_size = 2 * length;
+      complex_ssa_name_components.safe_grow_cleared (new_size, true);
+      complex_lattice_values.safe_grow_cleared (new_size, true);
+    }
+
   ret = complex_ssa_name_components[ssa_name_index];
   if (ret == NULL)
     {
       if (SSA_NAME_VAR (ssa_name))
-	ret = get_component_var (SSA_NAME_VAR (ssa_name), imag_p);
+	ret = get_component_var (SSA_NAME_VAR (ssa_name), part);
       else
-	ret = TREE_TYPE (TREE_TYPE (ssa_name));
+	{
+	  if (part == BOTH_P)
+	    ret = TREE_TYPE (ssa_name);
+	  else
+	    ret = TREE_TYPE (TREE_TYPE (ssa_name));
+	}
+
       ret = make_ssa_name (ret);
 
       /* Copy some properties from the original.  In particular, whether it
@@ -542,7 +577,7 @@ get_component_ssa_name (tree ssa_name, bool imag_p)
    gimple_seq of stuff that needs doing.  */
 
 static gimple_seq
-set_component_ssa_name (tree ssa_name, bool imag_p, tree value)
+set_component_ssa_name (tree ssa_name, complex_part_t part, tree value)
 {
   complex_lattice_t lattice = find_lattice_value (ssa_name);
   size_t ssa_name_index;
@@ -553,14 +588,25 @@ set_component_ssa_name (tree ssa_name, bool imag_p, tree value)
   /* We know the value must be zero, else there's a bug in our lattice
      analysis.  But the value may well be a variable known to contain
      zero.  We should be safe ignoring it.  */
-  if (lattice == (imag_p ? ONLY_REAL : ONLY_IMAG))
+  if (((lattice == ONLY_IMAG) && (part == REAL_P))
+      || ((lattice == ONLY_REAL) && (part == IMAG_P)))
     return NULL;
 
   /* If we've already assigned an SSA_NAME to this component, then this
      means that our walk of the basic blocks found a use before the set.
      This is fine.  Now we should create an initialization for the value
      we created earlier.  */
-  ssa_name_index = SSA_NAME_VERSION (ssa_name) * 2 + imag_p;
+  ssa_name_index = SSA_NAME_VERSION (ssa_name) * 3 + part;
+  unsigned length = complex_ssa_name_components.length ();
+
+  /* Increase size of dynamic array if needed.  */
+  if (ssa_name_index >= length)
+    {
+      size_t new_size = 2 * length;
+      complex_ssa_name_components.safe_grow (new_size, true);
+      complex_lattice_values.safe_grow (new_size, true);
+    }
+
   comp = complex_ssa_name_components[ssa_name_index];
   if (comp)
     ;
@@ -584,7 +630,7 @@ set_component_ssa_name (tree ssa_name, bool imag_p, tree value)
 	  && (!SSA_NAME_VAR (value) || DECL_IGNORED_P (SSA_NAME_VAR (value)))
 	  && !DECL_IGNORED_P (SSA_NAME_VAR (ssa_name)))
 	{
-	  comp = get_component_var (SSA_NAME_VAR (ssa_name), imag_p);
+	  comp = get_component_var (SSA_NAME_VAR (ssa_name), part);
 	  replace_ssa_name_symbol (value, comp);
 	}
 
@@ -595,7 +641,7 @@ set_component_ssa_name (tree ssa_name, bool imag_p, tree value)
   /* Finally, we need to stabilize the result by installing the value into
      a new ssa name.  */
   else
-    comp = get_component_ssa_name (ssa_name, imag_p);
+    comp = get_component_ssa_name (ssa_name, part);
 
   /* Do all the work to assign VALUE to COMP.  */
   list = NULL;
@@ -612,13 +658,14 @@ set_component_ssa_name (tree ssa_name, bool imag_p, tree value)
    Emit any new code before gsi.  */
 
 static tree
-extract_component (gimple_stmt_iterator *gsi, tree t, bool imagpart_p,
+extract_component (gimple_stmt_iterator * gsi, tree t, complex_part_t part,
 		   bool gimple_p, bool phiarg_p = false)
 {
   switch (TREE_CODE (t))
     {
     case COMPLEX_CST:
-      return imagpart_p ? TREE_IMAGPART (t) : TREE_REALPART (t);
+      return (part == BOTH_P) ? t : (part == IMAG_P) ?
+	TREE_IMAGPART (t) : TREE_REALPART (t);
 
     case COMPLEX_EXPR:
       gcc_unreachable ();
@@ -627,11 +674,14 @@ extract_component (gimple_stmt_iterator *gsi, tree t, bool imagpart_p,
       {
 	tree inner_type = TREE_TYPE (TREE_TYPE (t));
 	t = unshare_expr (t);
-	TREE_TYPE (t) = inner_type;
-	TREE_OPERAND (t, 1) = TYPE_SIZE (inner_type);
-	if (imagpart_p)
-	  TREE_OPERAND (t, 2) = size_binop (PLUS_EXPR, TREE_OPERAND (t, 2),
-					    TYPE_SIZE (inner_type));
+	if (part != BOTH_P)
+	  {
+	    TREE_TYPE (t) = inner_type;
+	    TREE_OPERAND (t, 1) = TYPE_SIZE (inner_type);
+	    if (part == IMAG_P)
+	      TREE_OPERAND (t, 2) = size_binop (PLUS_EXPR, TREE_OPERAND (t, 2),
+						TYPE_SIZE (inner_type));
+	  }
 	if (gimple_p)
 	  t = force_gimple_operand_gsi (gsi, t, true, NULL, true,
 					GSI_SAME_STMT);
@@ -646,10 +696,11 @@ extract_component (gimple_stmt_iterator *gsi, tree t, bool imagpart_p,
     case VIEW_CONVERT_EXPR:
     case MEM_REF:
       {
-	tree inner_type = TREE_TYPE (TREE_TYPE (t));
-
-	t = build1 ((imagpart_p ? IMAGPART_EXPR : REALPART_EXPR),
-		    inner_type, unshare_expr (t));
+	if (part == BOTH_P)
+	  t = unshare_expr (t);
+	else
+	  t = build1 (((part == IMAG_P) ? IMAGPART_EXPR : REALPART_EXPR),
+		      (TREE_TYPE (TREE_TYPE (t))), unshare_expr (t));
 
 	if (gimple_p)
 	  t = force_gimple_operand_gsi (gsi, t, true, NULL, true,
@@ -659,10 +710,12 @@ extract_component (gimple_stmt_iterator *gsi, tree t, bool imagpart_p,
       }
 
     case SSA_NAME:
-      t = get_component_ssa_name (t, imagpart_p);
-      if (TREE_CODE (t) == SSA_NAME && SSA_NAME_DEF_STMT (t) == NULL)
-	gcc_assert (phiarg_p);
-      return t;
+      {
+	t = get_component_ssa_name (t, part);
+	if (TREE_CODE (t) == SSA_NAME && SSA_NAME_DEF_STMT (t) == NULL)
+	  gcc_assert (phiarg_p);
+	return t;
+      }
 
     default:
       gcc_unreachable ();
@@ -673,18 +726,29 @@ extract_component (gimple_stmt_iterator *gsi, tree t, bool imagpart_p,
 
 static void
 update_complex_components (gimple_stmt_iterator *gsi, gimple *stmt, tree r,
-			   tree i)
+			   tree i, tree b = NULL)
 {
   tree lhs;
   gimple_seq list;
 
+  gcc_assert (b || (r && i));
   lhs = gimple_get_lhs (stmt);
+  if (!b)
+    b = lhs;
+  if (!r)
+    r = build1 (REALPART_EXPR, TREE_TYPE (TREE_TYPE (b)), unshare_expr (b));
+  if (!i)
+    i = build1 (IMAGPART_EXPR, TREE_TYPE (TREE_TYPE (b)), unshare_expr (b));
+
+  list = set_component_ssa_name (lhs, REAL_P, r);
+  if (list)
+    gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
 
-  list = set_component_ssa_name (lhs, false, r);
+  list = set_component_ssa_name (lhs, IMAG_P, i);
   if (list)
     gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
 
-  list = set_component_ssa_name (lhs, true, i);
+  list = set_component_ssa_name (lhs, BOTH_P, b);
   if (list)
     gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
 }
@@ -694,11 +758,11 @@ update_complex_components_on_edge (edge e, tree lhs, tree r, tree i)
 {
   gimple_seq list;
 
-  list = set_component_ssa_name (lhs, false, r);
+  list = set_component_ssa_name (lhs, REAL_P, r);
   if (list)
     gsi_insert_seq_on_edge (e, list);
 
-  list = set_component_ssa_name (lhs, true, i);
+  list = set_component_ssa_name (lhs, IMAG_P, i);
   if (list)
     gsi_insert_seq_on_edge (e, list);
 }
@@ -707,19 +771,24 @@ update_complex_components_on_edge (edge e, tree lhs, tree r, tree i)
 /* Update an assignment to a complex variable in place.  */
 
 static void
-update_complex_assignment (gimple_stmt_iterator *gsi, tree r, tree i)
+update_complex_assignment (gimple_stmt_iterator * gsi, tree r, tree i,
+			   tree b = NULL)
 {
   gimple *old_stmt = gsi_stmt (*gsi);
-  gimple_assign_set_rhs_with_ops (gsi, COMPLEX_EXPR, r, i);
+  if (b == NULL)
+    gimple_assign_set_rhs_with_ops (gsi, COMPLEX_EXPR, r, i);
+  else
+    /* dummy assignment, but pr45569.C fails if removed.  */
+    gimple_assign_set_rhs_from_tree (gsi, b);
+
   gimple *stmt = gsi_stmt (*gsi);
   update_stmt (stmt);
   if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
     bitmap_set_bit (need_eh_cleanup, gimple_bb (stmt)->index);
 
-  update_complex_components (gsi, gsi_stmt (*gsi), r, i);
+  update_complex_components (gsi, gsi_stmt (*gsi), r, i, b);
 }
 
-
 /* Generate code at the entry point of the function to initialize the
    component variables for a complex parameter.  */
 
@@ -768,7 +837,8 @@ update_phi_components (basic_block bb)
 
 	  for (j = 0; j < 2; j++)
 	    {
-	      tree l = get_component_ssa_name (gimple_phi_result (phi), j > 0);
+	      tree l = get_component_ssa_name (gimple_phi_result (phi),
+					       (complex_part_t) j);
 	      if (TREE_CODE (l) == SSA_NAME)
 		p[j] = create_phi_node (l, bb);
 	    }
@@ -779,7 +849,9 @@ update_phi_components (basic_block bb)
 	      for (j = 0; j < 2; j++)
 		if (p[j])
 		  {
-		    comp = extract_component (NULL, arg, j > 0, false, true);
+		    comp =
+		      extract_component (NULL, arg, (complex_part_t) j, false,
+					 true);
 		    if (TREE_CODE (comp) == SSA_NAME
 			&& SSA_NAME_DEF_STMT (comp) == NULL)
 		      {
@@ -815,7 +887,7 @@ static void
 expand_complex_move (gimple_stmt_iterator *gsi, tree type)
 {
   tree inner_type = TREE_TYPE (type);
-  tree r, i, lhs, rhs;
+  tree r, i, b, lhs, rhs;
   gimple *stmt = gsi_stmt (*gsi);
 
   if (is_gimple_assign (stmt))
@@ -862,16 +934,13 @@ expand_complex_move (gimple_stmt_iterator *gsi, tree type)
       else
 	{
 	  if (gimple_assign_rhs_code (stmt) != COMPLEX_EXPR)
-	    {
-	      r = extract_component (gsi, rhs, 0, true);
-	      i = extract_component (gsi, rhs, 1, true);
-	    }
+	    update_complex_assignment (gsi, NULL, NULL,
+				       extract_component (gsi, rhs,
+							  BOTH_P, true));
 	  else
-	    {
-	      r = gimple_assign_rhs1 (stmt);
-	      i = gimple_assign_rhs2 (stmt);
-	    }
-	  update_complex_assignment (gsi, r, i);
+	    update_complex_assignment (gsi,
+				       gimple_assign_rhs1 (stmt),
+				       gimple_assign_rhs2 (stmt), NULL);
 	}
     }
   else if (rhs
@@ -883,24 +952,18 @@ expand_complex_move (gimple_stmt_iterator *gsi, tree type)
       location_t loc;
 
       loc = gimple_location (stmt);
-      r = extract_component (gsi, rhs, 0, false);
-      i = extract_component (gsi, rhs, 1, false);
-
-      x = build1 (REALPART_EXPR, inner_type, unshare_expr (lhs));
-      t = gimple_build_assign (x, r);
-      gimple_set_location (t, loc);
-      gsi_insert_before (gsi, t, GSI_SAME_STMT);
+      b = extract_component (gsi, rhs, BOTH_P, false);
 
       if (stmt == gsi_stmt (*gsi))
 	{
-	  x = build1 (IMAGPART_EXPR, inner_type, unshare_expr (lhs));
+	  x = unshare_expr (lhs);
 	  gimple_assign_set_lhs (stmt, x);
-	  gimple_assign_set_rhs1 (stmt, i);
+	  gimple_assign_set_rhs1 (stmt, b);
 	}
       else
 	{
-	  x = build1 (IMAGPART_EXPR, inner_type, unshare_expr (lhs));
-	  t = gimple_build_assign (x, i);
+	  x = unshare_expr (lhs);
+	  t = gimple_build_assign (x, b);
 	  gimple_set_location (t, loc);
 	  gsi_insert_before (gsi, t, GSI_SAME_STMT);
 
@@ -1641,26 +1704,101 @@ expand_complex_asm (gimple_stmt_iterator *gsi)
 		}
 	      /* Make sure to not ICE later, see PR105165.  */
 	      tree zero = build_zero_cst (TREE_TYPE (TREE_TYPE (op)));
-	      set_component_ssa_name (op, false, zero);
-	      set_component_ssa_name (op, true, zero);
+	      set_component_ssa_name (op, REAL_P, zero);
+	      set_component_ssa_name (op, IMAG_P, zero);
+	      set_component_ssa_name (op, BOTH_P, zero);
 	      continue;
 	    }
 	  tree type = TREE_TYPE (op);
 	  tree inner_type = TREE_TYPE (type);
 	  tree r = build1 (REALPART_EXPR, inner_type, op);
 	  tree i = build1 (IMAGPART_EXPR, inner_type, op);
-	  gimple_seq list = set_component_ssa_name (op, false, r);
+	  tree b = op;
+	  gimple_seq list = set_component_ssa_name (op, REAL_P, r);
 
 	  if (list)
 	    gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
 
-	  list = set_component_ssa_name (op, true, i);
+	  list = set_component_ssa_name (op, IMAG_P, i);
+	  if (list)
+	    gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
+
+	  list = set_component_ssa_name (op, BOTH_P, b);
 	  if (list)
 	    gsi_insert_seq_after (gsi, list, GSI_CONTINUE_LINKING);
 	}
     }
 }
 
+/* Returns true if a complex component is a constant.  */
+
+static bool
+complex_component_cst_p (tree cplx, complex_part_t part)
+{
+  switch (TREE_CODE (cplx))
+    {
+    case COMPLEX_CST:
+      return true;
+
+    case SSA_NAME:
+      {
+	size_t ssa_name_index = SSA_NAME_VERSION (cplx) * 3 + part;
+	tree val = complex_ssa_name_components[ssa_name_index];
+	return (val) ? CONSTANT_CLASS_P (val) : false;
+      }
+
+    default:
+      return false;
+    }
+}
+
+/* Returns true if the target support a particular complex operation
+   natively.  */
+
+static bool
+target_native_complex_operation (enum tree_code code, tree type,
+				 tree inner_type, tree ac, tree bc,
+				 complex_lattice_t al, complex_lattice_t bl)
+{
+  /* Lower trivial complex operations.
+     -------------------------------------------------------------------
+     |    a  \  b    | UNINITIALIZED | ONLY_REAL | ONLY_IMAG | VARYING |
+     -------------------------------------------------------------------
+     | UNINITIALIZED | xxxxxxxxxxxxx | xxxxxxxxx | xxxxxxxxx | xxxxxxx |
+     -------------------------------------------------------------------
+     |   ONLY_REAL   |     lower     |   lower   |   native  | native  |
+     -------------------------------------------------------------------
+     |   ONLY_IMAG   |     lower     |   native  |   lower   | native  |
+     -------------------------------------------------------------------
+     |    VARYING    |     native    |   native  |   native  | native  |
+     ------------------------------------------------------------------- */
+  if (((al != VARYING) && (bl != VARYING))
+      && ((bl == UNINITIALIZED)
+       || (al == bl)))
+    return false;
+
+  /* Do not use native operations when a part of the result is constant.  */
+  if ((bl == UNINITIALIZED)
+      && (complex_component_cst_p (ac, REAL_P)
+	  || complex_component_cst_p (ac, IMAG_P)))
+    return false;
+  else if ((bl != UNINITIALIZED)
+	   &&
+	   ((complex_component_cst_p (ac, REAL_P)
+	     && complex_component_cst_p (bc, REAL_P))
+	    || (complex_component_cst_p (ac, IMAG_P)
+		&& complex_component_cst_p (bc, IMAG_P))))
+    return false;
+
+  optab op = optab_for_tree_code (code, inner_type, optab_default);
+
+  /* No need to search if operation is not in the optab.  */
+  if (op == unknown_optab)
+    return false;
+
+  return optab_handler (op, TYPE_MODE (type)) != CODE_FOR_nothing;
+}
+
 /* Process one statement.  If we identify a complex operation, expand it.  */
 
 static void
@@ -1729,14 +1867,17 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
 		 && TREE_CODE (lhs) == SSA_NAME)
 	  {
 	    rhs = gimple_assign_rhs1 (stmt);
+	    enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 	    rhs = extract_component (gsi, TREE_OPERAND (rhs, 0),
-		                     gimple_assign_rhs_code (stmt)
-				       == IMAGPART_EXPR,
-				     false);
+				     (rhs_code == IMAGPART_EXPR) ? IMAG_P
+				     : (rhs_code == REALPART_EXPR) ? REAL_P
+				     : BOTH_P, false);
 	    gimple_assign_set_rhs_from_tree (gsi, rhs);
 	    stmt = gsi_stmt (*gsi);
 	    update_stmt (stmt);
 	  }
+	else if (is_gimple_call (stmt))
+	  return;
       }
       return;
     }
@@ -1755,19 +1896,6 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
       bc = gimple_cond_rhs (stmt);
     }
 
-  ar = extract_component (gsi, ac, false, true);
-  ai = extract_component (gsi, ac, true, true);
-
-  if (ac == bc)
-    br = ar, bi = ai;
-  else if (bc)
-    {
-      br = extract_component (gsi, bc, 0, true);
-      bi = extract_component (gsi, bc, 1, true);
-    }
-  else
-    br = bi = NULL_TREE;
-
   al = find_lattice_value (ac);
   if (al == UNINITIALIZED)
     al = VARYING;
@@ -1783,6 +1911,102 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
 	bl = VARYING;
     }
 
+  if (target_native_complex_operation
+      (code, type, inner_type, ac, bc, al, bl))
+    {
+      tree ab, bb, rb;
+      gimple_seq stmts = NULL;
+      location_t loc = gimple_location (gsi_stmt (*gsi));
+
+      ab = extract_component (gsi, ac, BOTH_P, true);
+      if (ac == bc)
+	bb = ab;
+      else if (bc)
+	{
+	  bb = extract_component (gsi, bc, BOTH_P, true);
+	}
+      else
+	bb = NULL_TREE;
+
+      switch (code)
+	{
+	case PLUS_EXPR:
+	case MINUS_EXPR:
+	case MULT_EXPR:
+	  rb = gimple_build (&stmts, loc, code, type, ab, bb);
+	  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	  update_complex_assignment (gsi, NULL, NULL, rb);
+	  break;
+
+	case NEGATE_EXPR:
+	case CONJ_EXPR:
+	  rb = gimple_build (&stmts, loc, code, type, ab);
+	  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	  update_complex_assignment (gsi, NULL, NULL, rb);
+	  break;
+
+	case EQ_EXPR:
+	case NE_EXPR:
+	  {
+	    gimple *stmt = gsi_stmt (*gsi);
+	    rb = gimple_build (&stmts, loc, code, type, ab, bb);
+	    switch (gimple_code (stmt))
+	      {
+	      case GIMPLE_RETURN:
+		{
+		  greturn *return_stmt = as_a < greturn * >(stmt);
+		  gimple_return_set_retval (return_stmt,
+					    fold_convert (type, rb));
+		}
+		break;
+
+	      case GIMPLE_ASSIGN:
+		update_complex_assignment (gsi, NULL, NULL, rb);
+		gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+		break;
+
+	      case GIMPLE_COND:
+		{
+		  gcond *cond_stmt = as_a < gcond * >(stmt);
+		  gimple_cond_set_code (cond_stmt, EQ_EXPR);
+		  gimple_cond_set_lhs (cond_stmt, rb);
+		  gimple_cond_set_rhs (cond_stmt, boolean_true_node);
+		}
+		break;
+
+	      default:
+		break;
+	      }
+	    break;
+	  }
+
+
+	/* Not supported yet.  */
+	case TRUNC_DIV_EXPR:
+	case CEIL_DIV_EXPR:
+	case FLOOR_DIV_EXPR:
+	case ROUND_DIV_EXPR:
+	case RDIV_EXPR:
+
+	default:
+	  gcc_unreachable ();
+	}
+      return;
+    }
+
+  ar = extract_component (gsi, ac, REAL_P, true);
+  ai = extract_component (gsi, ac, IMAG_P, true);
+
+  if (ac == bc)
+    br = ar, bi = ai;
+  else if (bc)
+    {
+      br = extract_component (gsi, bc, REAL_P, true);
+      bi = extract_component (gsi, bc, IMAG_P, true);
+    }
+  else
+    br = bi = NULL_TREE;
+
   switch (code)
     {
     case PLUS_EXPR:
@@ -1819,8 +2043,8 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
       gcc_unreachable ();
     }
 }
-
 \f
+
 /* Entry point for complex operation lowering during optimization.  */
 
 static unsigned int
@@ -1845,8 +2069,8 @@ tree_lower_complex (void)
 
   complex_variable_components = new int_tree_htab_type (10);
 
-  complex_ssa_name_components.create (2 * num_ssa_names);
-  complex_ssa_name_components.safe_grow_cleared (2 * num_ssa_names, true);
+  complex_ssa_name_components.create (3 * num_ssa_names);
+  complex_ssa_name_components.safe_grow_cleared (3 * num_ssa_names, true);
 
   update_parameter_components ();
 
@@ -1879,7 +2103,8 @@ tree_lower_complex (void)
 		      || is_gimple_min_invariant (op))
 		    continue;
 		  tree arg = gimple_phi_arg_def (phis_to_revisit[j], l);
-		  op = extract_component (NULL, arg, k > 0, false, false);
+		  op = extract_component (NULL, arg, (complex_part_t) k,
+					  false, false);
 		  SET_PHI_ARG_DEF (phi, l, op);
 		}
 	    }
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 91551fde900..19293e03af6 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1487,6 +1487,7 @@ struct GTY(()) tree_complex {
   struct tree_typed typed;
   tree real;
   tree imag;
+  tree both;
 };
 
 struct GTY(()) tree_vector {
diff --git a/gcc/tree-dfa.cc b/gcc/tree-dfa.cc
index ad8cfedec8c..42c254cbfed 100644
--- a/gcc/tree-dfa.cc
+++ b/gcc/tree-dfa.cc
@@ -394,6 +394,9 @@ get_ref_base_and_extent (tree exp, poly_int64_pod *poffset,
       size_tree = TREE_OPERAND (exp, 1);
       exp = TREE_OPERAND (exp, 0);
     }
+  else if (TREE_CODE (exp) == REALPART_EXPR
+	   || TREE_CODE (exp) == IMAGPART_EXPR)
+    exp = TREE_OPERAND (exp, 0);
   else if (!VOID_TYPE_P (TREE_TYPE (exp)))
     {
       machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index 5bead0c3c6a..a1fa2cb9eea 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -695,6 +695,7 @@ lto_input_ts_complex_tree_pointers (class lto_input_block *ib,
 {
   TREE_REALPART (expr) = stream_read_tree_ref (ib, data_in);
   TREE_IMAGPART (expr) = stream_read_tree_ref (ib, data_in);
+  TREE_COMPLEX_BOTH_PARTS (expr) = stream_read_tree_ref (ib, data_in);
 }
 
 
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index ff9694e17dd..be7314ef748 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -592,6 +592,7 @@ write_ts_complex_tree_pointers (struct output_block *ob, tree expr)
 {
   stream_write_tree_ref (ob, TREE_REALPART (expr));
   stream_write_tree_ref (ob, TREE_IMAGPART (expr));
+  stream_write_tree_ref (ob, TREE_COMPLEX_BOTH_PARTS (expr));
 }
 
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index b34d75f8c85..73b72f80d25 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -2500,6 +2500,14 @@ build_complex (tree type, tree real, tree imag)
 
   tree t = make_node (COMPLEX_CST);
 
+  /* Represent both parts as a constant vector.  */
+  tree vector_type = build_vector_type (TREE_TYPE (real), 2);
+  tree_vector_builder v (vector_type, 1, 2);
+  v.quick_push (real);
+  v.quick_push (imag);
+  tree both = v.build ();
+
+  TREE_COMPLEX_BOTH_PARTS (t) = both;
   TREE_REALPART (t) = real;
   TREE_IMAGPART (t) = imag;
   TREE_TYPE (t) = type ? type : build_complex_type (TREE_TYPE (real));
diff --git a/gcc/tree.h b/gcc/tree.h
index 54cf8282cb2..572d90283af 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -647,6 +647,12 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 
 #define SCALAR_FLOAT_TYPE_P(TYPE) (TREE_CODE (TYPE) == REAL_TYPE)
 
+/* Nonzero if TYPE represents a complex integer type.  */
+
+#define COMPLEX_INTEGER_TYPE_P(TYPE)	\
+  (TREE_CODE (TYPE) == COMPLEX_TYPE	\
+   && TREE_CODE (TREE_TYPE (TYPE)) == INTEGER_TYPE)
+
 /* Nonzero if TYPE represents a complex floating-point type.  */
 
 #define COMPLEX_FLOAT_TYPE_P(TYPE)	\
@@ -1170,6 +1176,7 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 /* In a COMPLEX_CST node.  */
 #define TREE_REALPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.real)
 #define TREE_IMAGPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.imag)
+#define TREE_COMPLEX_BOTH_PARTS(NODE) (COMPLEX_CST_CHECK (NODE)->complex.both)
 
 /* In a VECTOR_CST node.  See generic.texi for details.  */
 #define VECTOR_CST_NELTS(NODE) (TYPE_VECTOR_SUBPARTS (TREE_TYPE (NODE)))
@@ -2234,6 +2241,8 @@ class auto_suppress_location_wrappers
   (as_a <scalar_int_mode> (TYPE_CHECK (NODE)->type_common.mode))
 #define SCALAR_FLOAT_TYPE_MODE(NODE) \
   (as_a <scalar_float_mode> (TYPE_CHECK (NODE)->type_common.mode))
+#define COMPLEX_TYPE_MODE(NODE) \
+  (as_a <complex_mode> (TYPE_CHECK (NODE)->type_common.mode))
 #define SET_TYPE_MODE(NODE, MODE) \
   (TYPE_CHECK (NODE)->type_common.mode = (MODE))
 
@@ -6733,7 +6742,11 @@ extern const builtin_structptr_type builtin_structptr_types[6];
 inline bool
 type_has_mode_precision_p (const_tree t)
 {
-  return known_eq (TYPE_PRECISION (t), GET_MODE_PRECISION (TYPE_MODE (t)));
+  if (TREE_CODE (t) == COMPLEX_TYPE)
+    return known_eq (2 * TYPE_PRECISION (TREE_TYPE (t)),
+		     GET_MODE_PRECISION (TYPE_MODE (t)));
+  else
+    return known_eq (TYPE_PRECISION (t), GET_MODE_PRECISION (TYPE_MODE (t)));
 }
 
 /* Helper functions for fndecl_built_in_p.  */
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 02/11] Native complex ops: Move functions to hooks
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 01/11] Native complex ops : Conditional lowering Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 03/11] Native complex ops: Add gen_rtx_complex hook Sylvain Noiry
                       ` (8 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Move read_complex_part and write_complex_part to target hooks. Their
signature also change because of the type of argument part is now
complex_part_t. Calls to theses functions are updated accordingly.

gcc/ChangeLog:

        * target.def: Define hooks for read_complex_part and
        write_complex_part
        * targhooks.cc (default_read_complex_part): New: default
        implementation of read_complex_part
        (default_write_complex_part): New: default implementation
        if write_complex_part
        * targhooks.h: Add default_read_complex_part and
        default_write_complex_part
        * doc/tm.texi: Document the new TARGET_READ_COMPLEX_PART
        and TARGET_WRITE_COMPLEX_PART hooks
        * doc/tm.texi.in: Add TARGET_READ_COMPLEX_PART and
        TARGET_WRITE_COMPLEX_PART
        * expr.cc
        (write_complex_part): Call TARGET_READ_COMPLEX_PART hook
        (read_complex_part): Call TARGET_WRITE_COMPLEX_PART hook
        * expr.h: Update function signatures of read_complex_part
        and write_complex_part
        * builtins.cc (expand_ifn_atomic_compare_exchange_into_call):
        Update calls to read_complex_part and write_complex_part
        (expand_ifn_atomic_compare_exchange): Likewise
        * expmed.cc (flip_storage_order): Likewise
        (clear_storage_hints): Likewise
        and write_complex_part
        (emit_move_complex_push): Likewise
        (emit_move_complex_parts): Likewise
        (expand_assignment): Likewise
        (expand_expr_real_2): Likewise
        (expand_expr_real_1): Likewise
        (const_vector_from_tree): Likewise
        * internal-fn.cc (expand_arith_set_overflow): Likewise
        (expand_arith_overflow_result_store): Likewise
        (expand_addsub_overflow): Likewise
        (expand_neg_overflow): Likewise
        (expand_mul_overflow): Likewise
        (expand_arith_overflow): Likewise
        (expand_UADDC): Likewise
---
 gcc/builtins.cc    |   8 +--
 gcc/doc/tm.texi    |  10 +++
 gcc/doc/tm.texi.in |   4 ++
 gcc/expmed.cc      |   4 +-
 gcc/expr.cc        | 165 +++++++++------------------------------------
 gcc/expr.h         |   5 +-
 gcc/internal-fn.cc |  16 ++---
 gcc/target.def     |  18 +++++
 gcc/targhooks.cc   | 139 ++++++++++++++++++++++++++++++++++++++
 gcc/targhooks.h    |   4 ++
 10 files changed, 221 insertions(+), 152 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 3b453b3ec8c..b5cb652c413 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6349,8 +6349,8 @@ expand_ifn_atomic_compare_exchange_into_call (gcall *call, machine_mode mode)
       if (GET_MODE (boolret) != mode)
 	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
       x = force_reg (mode, x);
-      write_complex_part (target, boolret, true, true);
-      write_complex_part (target, x, false, false);
+      write_complex_part (target, boolret, IMAG_P, true);
+      write_complex_part (target, x, REAL_P, false);
     }
 }
 
@@ -6405,8 +6405,8 @@ expand_ifn_atomic_compare_exchange (gcall *call)
       rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (GET_MODE (boolret) != mode)
 	boolret = convert_modes (mode, GET_MODE (boolret), boolret, 1);
-      write_complex_part (target, boolret, true, true);
-      write_complex_part (target, oldval, false, false);
+      write_complex_part (target, boolret, IMAG_P, true);
+      write_complex_part (target, oldval, REAL_P, false);
     }
 }
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index ff69207fb9f..c4f935b5746 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4620,6 +4620,16 @@ to return a nonzero value when it is required, the compiler will run out
 of spill registers and print a fatal error message.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_READ_COMPLEX_PART (rtx @var{cplx}, complex_part_t @var{part})
+This hook should return the rtx representing the specified @var{part} of the complex given by @var{cplx}.
+  @var{part} can be the real part, the imaginary part, or both of them.
+@end deftypefn
+
+@deftypefn {Target Hook} void TARGET_WRITE_COMPLEX_PART (rtx @var{cplx}, rtx @var{val}, complex_part_t @var{part})
+This hook should move the rtx value given by @var{val} to the specified @var{var} of the complex given by @var{cplx}.
+  @var{var} can be the real part, the imaginary part, or both of them.
+@end deftypefn
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index cad6308a87c..b8970761c8d 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3392,6 +3392,10 @@ stack.
 
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 
+@hook TARGET_READ_COMPLEX_PART
+
+@hook TARGET_WRITE_COMPLEX_PART
+
 @node Scalar Return
 @subsection How Scalar Function Values Are Returned
 @cindex return values in registers
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index b294eabb08d..973c16a14d3 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -394,8 +394,8 @@ flip_storage_order (machine_mode mode, rtx x)
 
   if (COMPLEX_MODE_P (mode))
     {
-      rtx real = read_complex_part (x, false);
-      rtx imag = read_complex_part (x, true);
+      rtx real = read_complex_part (x, REAL_P);
+      rtx imag = read_complex_part (x, IMAG_P);
 
       real = flip_storage_order (GET_MODE_INNER (mode), real);
       imag = flip_storage_order (GET_MODE_INNER (mode), imag);
diff --git a/gcc/expr.cc b/gcc/expr.cc
index d5b6494b4fc..12b74273144 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3475,8 +3475,8 @@ clear_storage_hints (rtx object, rtx size, enum block_op_methods method,
 	  zero = CONST0_RTX (GET_MODE_INNER (mode));
 	  if (zero != NULL)
 	    {
-	      write_complex_part (object, zero, 0, true);
-	      write_complex_part (object, zero, 1, false);
+	      write_complex_part (object, zero, REAL_P, true);
+	      write_complex_part (object, zero, IMAG_P, false);
 	      return NULL;
 	    }
 	}
@@ -3641,126 +3641,18 @@ set_storage_via_setmem (rtx object, rtx size, rtx val, unsigned int align,
    If UNDEFINED_P then the value in CPLX is currently undefined.  */
 
 void
-write_complex_part (rtx cplx, rtx val, bool imag_p, bool undefined_p)
+write_complex_part (rtx cplx, rtx val, complex_part_t part, bool undefined_p)
 {
-  machine_mode cmode;
-  scalar_mode imode;
-  unsigned ibitsize;
-
-  if (GET_CODE (cplx) == CONCAT)
-    {
-      emit_move_insn (XEXP (cplx, imag_p), val);
-      return;
-    }
-
-  cmode = GET_MODE (cplx);
-  imode = GET_MODE_INNER (cmode);
-  ibitsize = GET_MODE_BITSIZE (imode);
-
-  /* For MEMs simplify_gen_subreg may generate an invalid new address
-     because, e.g., the original address is considered mode-dependent
-     by the target, which restricts simplify_subreg from invoking
-     adjust_address_nv.  Instead of preparing fallback support for an
-     invalid address, we call adjust_address_nv directly.  */
-  if (MEM_P (cplx))
-    {
-      emit_move_insn (adjust_address_nv (cplx, imode,
-					 imag_p ? GET_MODE_SIZE (imode) : 0),
-		      val);
-      return;
-    }
-
-  /* If the sub-object is at least word sized, then we know that subregging
-     will work.  This special case is important, since store_bit_field
-     wants to operate on integer modes, and there's rarely an OImode to
-     correspond to TCmode.  */
-  if (ibitsize >= BITS_PER_WORD
-      /* For hard regs we have exact predicates.  Assume we can split
-	 the original object if it spans an even number of hard regs.
-	 This special case is important for SCmode on 64-bit platforms
-	 where the natural size of floating-point regs is 32-bit.  */
-      || (REG_P (cplx)
-	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
-	  && REG_NREGS (cplx) % 2 == 0))
-    {
-      rtx part = simplify_gen_subreg (imode, cplx, cmode,
-				      imag_p ? GET_MODE_SIZE (imode) : 0);
-      if (part)
-        {
-	  emit_move_insn (part, val);
-	  return;
-	}
-      else
-	/* simplify_gen_subreg may fail for sub-word MEMs.  */
-	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
-    }
-
-  store_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0, 0, 0, imode, val,
-		   false, undefined_p);
+  targetm.write_complex_part (cplx, val, part, undefined_p);
 }
 
 /* Extract one of the components of the complex value CPLX.  Extract the
    real part if IMAG_P is false, and the imaginary part if it's true.  */
 
 rtx
-read_complex_part (rtx cplx, bool imag_p)
-{
-  machine_mode cmode;
-  scalar_mode imode;
-  unsigned ibitsize;
-
-  if (GET_CODE (cplx) == CONCAT)
-    return XEXP (cplx, imag_p);
-
-  cmode = GET_MODE (cplx);
-  imode = GET_MODE_INNER (cmode);
-  ibitsize = GET_MODE_BITSIZE (imode);
-
-  /* Special case reads from complex constants that got spilled to memory.  */
-  if (MEM_P (cplx) && GET_CODE (XEXP (cplx, 0)) == SYMBOL_REF)
-    {
-      tree decl = SYMBOL_REF_DECL (XEXP (cplx, 0));
-      if (decl && TREE_CODE (decl) == COMPLEX_CST)
-	{
-	  tree part = imag_p ? TREE_IMAGPART (decl) : TREE_REALPART (decl);
-	  if (CONSTANT_CLASS_P (part))
-	    return expand_expr (part, NULL_RTX, imode, EXPAND_NORMAL);
-	}
-    }
-
-  /* For MEMs simplify_gen_subreg may generate an invalid new address
-     because, e.g., the original address is considered mode-dependent
-     by the target, which restricts simplify_subreg from invoking
-     adjust_address_nv.  Instead of preparing fallback support for an
-     invalid address, we call adjust_address_nv directly.  */
-  if (MEM_P (cplx))
-    return adjust_address_nv (cplx, imode,
-			      imag_p ? GET_MODE_SIZE (imode) : 0);
-
-  /* If the sub-object is at least word sized, then we know that subregging
-     will work.  This special case is important, since extract_bit_field
-     wants to operate on integer modes, and there's rarely an OImode to
-     correspond to TCmode.  */
-  if (ibitsize >= BITS_PER_WORD
-      /* For hard regs we have exact predicates.  Assume we can split
-	 the original object if it spans an even number of hard regs.
-	 This special case is important for SCmode on 64-bit platforms
-	 where the natural size of floating-point regs is 32-bit.  */
-      || (REG_P (cplx)
-	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
-	  && REG_NREGS (cplx) % 2 == 0))
-    {
-      rtx ret = simplify_gen_subreg (imode, cplx, cmode,
-				     imag_p ? GET_MODE_SIZE (imode) : 0);
-      if (ret)
-        return ret;
-      else
-	/* simplify_gen_subreg may fail for sub-word MEMs.  */
-	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
-    }
-
-  return extract_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0,
-			    true, NULL_RTX, imode, imode, false, NULL);
+read_complex_part (rtx cplx, complex_part_t part)
+{
+  return targetm.read_complex_part (cplx, part);
 }
 \f
 /* A subroutine of emit_move_insn_1.  Yet another lowpart generator.
@@ -3931,9 +3823,10 @@ emit_move_complex_push (machine_mode mode, rtx x, rtx y)
     }
 
   emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)),
-		  read_complex_part (y, imag_first));
+		  read_complex_part (y, (imag_first) ? IMAG_P : REAL_P));
   return emit_move_insn (gen_rtx_MEM (submode, XEXP (x, 0)),
-			 read_complex_part (y, !imag_first));
+			 read_complex_part (y,
+					    (imag_first) ? REAL_P : IMAG_P));
 }
 
 /* A subroutine of emit_move_complex.  Perform the move from Y to X
@@ -3949,8 +3842,8 @@ emit_move_complex_parts (rtx x, rtx y)
       && REG_P (x) && !reg_overlap_mentioned_p (x, y))
     emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, false), false, true);
-  write_complex_part (x, read_complex_part (y, true), true, false);
+  write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true);
+  write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false);
 
   return get_last_insn ();
 }
@@ -5807,9 +5700,9 @@ expand_assignment (tree to, tree from, bool nontemporal)
 		  if (from_rtx)
 		    {
 		      emit_move_insn (XEXP (to_rtx, 0),
-				      read_complex_part (from_rtx, false));
+				      read_complex_part (from_rtx, REAL_P));
 		      emit_move_insn (XEXP (to_rtx, 1),
-				      read_complex_part (from_rtx, true));
+				      read_complex_part (from_rtx, IMAG_P));
 		    }
 		  else
 		    {
@@ -5831,14 +5724,16 @@ expand_assignment (tree to, tree from, bool nontemporal)
 	    concat_store_slow:;
 	      rtx temp = assign_stack_temp (GET_MODE (to_rtx),
 					    GET_MODE_SIZE (GET_MODE (to_rtx)));
-	      write_complex_part (temp, XEXP (to_rtx, 0), false, true);
-	      write_complex_part (temp, XEXP (to_rtx, 1), true, false);
+	      write_complex_part (temp, XEXP (to_rtx, 0), REAL_P, true);
+	      write_complex_part (temp, XEXP (to_rtx, 1), IMAG_P, false);
 	      result = store_field (temp, bitsize, bitpos,
 				    bitregion_start, bitregion_end,
 				    mode1, from, get_alias_set (to),
 				    nontemporal, reversep);
-	      emit_move_insn (XEXP (to_rtx, 0), read_complex_part (temp, false));
-	      emit_move_insn (XEXP (to_rtx, 1), read_complex_part (temp, true));
+	      emit_move_insn (XEXP (to_rtx, 0),
+			      read_complex_part (temp, REAL_P));
+	      emit_move_insn (XEXP (to_rtx, 1),
+			      read_complex_part (temp, IMAG_P));
 	    }
 	}
       /* For calls to functions returning variable length structures, if TO_RTX
@@ -10317,8 +10212,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	      complex_expr_swap_order:
 		/* Move the imaginary (op1) and real (op0) parts to their
 		   location.  */
-		write_complex_part (target, op1, true, true);
-		write_complex_part (target, op0, false, false);
+		write_complex_part (target, op1, IMAG_P, true);
+		write_complex_part (target, op0, REAL_P, false);
 
 		return target;
 	      }
@@ -10346,9 +10241,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	    break;
 	  }
 
-      /* Move the real (op0) and imaginary (op1) parts to their location.  */
-      write_complex_part (target, op0, false, true);
-      write_complex_part (target, op1, true, false);
+      /* Temporary use a CONCAT to pass both real and imag parts in one call.  */
+      write_complex_part (target, gen_rtx_CONCAT (GET_MODE (target), op0, op1), BOTH_P, true);
 
       return target;
 
@@ -11550,7 +11444,8 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 		    rtx parts[2];
 		    for (int i = 0; i < 2; i++)
 		      {
-			rtx op = read_complex_part (op0, i != 0);
+			rtx op = read_complex_part (op0, (i != 0) ? IMAG_P
+						    : REAL_P);
 			if (GET_CODE (op) == SUBREG)
 			  op = force_reg (GET_MODE (op), op);
 			temp = gen_lowpart_common (GET_MODE_INNER (mode1), op);
@@ -12150,11 +12045,11 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 
     case REALPART_EXPR:
       op0 = expand_normal (treeop0);
-      return read_complex_part (op0, false);
+      return read_complex_part (op0, REAL_P);
 
     case IMAGPART_EXPR:
       op0 = expand_normal (treeop0);
-      return read_complex_part (op0, true);
+      return read_complex_part (op0, IMAG_P);
 
     case RETURN_EXPR:
     case LABEL_EXPR:
@@ -13494,8 +13389,8 @@ const_vector_from_tree (tree exp)
 	builder.quick_push (const_double_from_real_value (TREE_REAL_CST (elt),
 							  inner));
       else if (TREE_CODE (elt) == FIXED_CST)
-	builder.quick_push (CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt),
-							  inner));
+	builder.quick_push (CONST_FIXED_FROM_FIXED_VALUE
+			    (TREE_FIXED_CST (elt), inner));
       else
 	builder.quick_push (immed_wide_int_const (wi::to_poly_wide (elt),
 						  inner));
diff --git a/gcc/expr.h b/gcc/expr.h
index 11bff531862..833ff16bd0d 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -261,9 +261,8 @@ extern rtx_insn *emit_move_insn_1 (rtx, rtx);
 
 extern rtx_insn *emit_move_complex_push (machine_mode, rtx, rtx);
 extern rtx_insn *emit_move_complex_parts (rtx, rtx);
-extern rtx read_complex_part (rtx, bool);
-extern void write_complex_part (rtx, rtx, bool, bool);
-extern rtx read_complex_part (rtx, bool);
+extern rtx read_complex_part (rtx, complex_part_t);
+extern void write_complex_part (rtx, rtx, complex_part_t, bool);
 extern rtx emit_move_resolve_push (machine_mode, rtx);
 
 /* Push a block of length SIZE (perhaps variable)
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 0fd34359247..a01b7160303 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -919,9 +919,9 @@ expand_arith_set_overflow (tree lhs, rtx target)
 {
   if (TYPE_PRECISION (TREE_TYPE (TREE_TYPE (lhs))) == 1
       && !TYPE_UNSIGNED (TREE_TYPE (TREE_TYPE (lhs))))
-    write_complex_part (target, constm1_rtx, true, false);
+    write_complex_part (target, constm1_rtx, IMAG_P, false);
   else
-    write_complex_part (target, const1_rtx, true, false);
+    write_complex_part (target, const1_rtx, IMAG_P, false);
 }
 
 /* Helper for expand_*_overflow.  Store RES into the __real__ part
@@ -976,7 +976,7 @@ expand_arith_overflow_result_store (tree lhs, rtx target,
       expand_arith_set_overflow (lhs, target);
       emit_label (done_label);
     }
-  write_complex_part (target, lres, false, false);
+  write_complex_part (target, lres, REAL_P, false);
 }
 
 /* Helper for expand_*_overflow.  Store RES into TARGET.  */
@@ -1051,7 +1051,7 @@ expand_addsub_overflow (location_t loc, tree_code code, tree lhs,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true, false);
+	write_complex_part (target, const0_rtx, IMAG_P, false);
     }
 
   /* We assume both operands and result have the same precision
@@ -1496,7 +1496,7 @@ expand_neg_overflow (location_t loc, tree lhs, tree arg1, bool is_ubsan,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true, false);
+	write_complex_part (target, const0_rtx, IMAG_P, false);
     }
 
   enum insn_code icode = optab_handler (negv3_optab, mode);
@@ -1621,7 +1621,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
     {
       target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
       if (!is_ubsan)
-	write_complex_part (target, const0_rtx, true, false);
+	write_complex_part (target, const0_rtx, IMAG_P, false);
     }
 
   if (is_ubsan)
@@ -2444,7 +2444,7 @@ expand_mul_overflow (location_t loc, tree lhs, tree arg0, tree arg1,
       do_compare_rtx_and_jump (op1, res, NE, true, mode, NULL_RTX, NULL,
 			       all_done_label, profile_probability::very_unlikely ());
       emit_label (set_noovf);
-      write_complex_part (target, const0_rtx, true, false);
+      write_complex_part (target, const0_rtx, IMAG_P, false);
       emit_label (all_done_label);
     }
 
@@ -2713,7 +2713,7 @@ expand_arith_overflow (enum tree_code code, gimple *stmt)
 	{
 	  /* The infinity precision result will always fit into result.  */
 	  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
-	  write_complex_part (target, const0_rtx, true, false);
+	  write_complex_part (target, const0_rtx, IMAG_P, false);
 	  scalar_int_mode mode = SCALAR_INT_TYPE_MODE (type);
 	  struct separate_ops ops;
 	  ops.code = code;
diff --git a/gcc/target.def b/gcc/target.def
index 42622177ef9..f99df939776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3313,6 +3313,24 @@ a pointer to int.",
  bool, (ao_ref *ref),
  default_ref_may_alias_errno)
 
+/* Returns the value corresponding to the specified part of a complex.  */
+DEFHOOK
+(read_complex_part,
+ "This hook should return the rtx representing the specified @var{part} of the complex given by @var{cplx}.\n\
+  @var{part} can be the real part, the imaginary part, or both of them.",
+ rtx,
+ (rtx cplx, complex_part_t part),
+ default_read_complex_part)
+
+/* Moves a value to the specified part of a complex  */
+DEFHOOK
+(write_complex_part,
+ "This hook should move the rtx value given by @var{val} to the specified @var{var} of the complex given by @var{cplx}.\n\
+  @var{var} can be the real part, the imaginary part, or both of them.",
+ void,
+ (rtx cplx, rtx val, complex_part_t part),
+ default_write_complex_part)
+
 /* Support for named address spaces.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_ADDR_SPACE_"
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 4f5b240f8d6..df852eb18e3 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1533,6 +1533,145 @@ default_preferred_simd_mode (scalar_mode)
   return word_mode;
 }
 
+/* By default, extract one of the components of the complex value CPLX.  Extract the
+   real part if part is REAL_P, and the imaginary part if it is IMAG_P. If part is
+   BOTH_P, return cplx directly.  */
+
+rtx
+default_read_complex_part (rtx cplx, complex_part_t part)
+{
+  machine_mode cmode;
+  scalar_mode imode;
+  unsigned ibitsize;
+
+  if (part == BOTH_P)
+    return cplx;
+
+  if (GET_CODE (cplx) == CONCAT)
+    return XEXP (cplx, part);
+
+  cmode = GET_MODE (cplx);
+  imode = GET_MODE_INNER (cmode);
+  ibitsize = GET_MODE_BITSIZE (imode);
+
+  /* Special case reads from complex constants that got spilled to memory.  */
+  if (MEM_P (cplx) && GET_CODE (XEXP (cplx, 0)) == SYMBOL_REF)
+    {
+      tree decl = SYMBOL_REF_DECL (XEXP (cplx, 0));
+      if (decl && TREE_CODE (decl) == COMPLEX_CST)
+	{
+	  tree cplx_part =
+	    (part == IMAG_P) ? TREE_IMAGPART (decl) : TREE_REALPART (decl);
+	  if (CONSTANT_CLASS_P (cplx_part))
+	    return expand_expr (cplx_part, NULL_RTX, imode, EXPAND_NORMAL);
+	}
+    }
+
+  /* For MEMs simplify_gen_subreg may generate an invalid new address
+     because, e.g., the original address is considered mode-dependent
+     by the target, which restricts simplify_subreg from invoking
+     adjust_address_nv.  Instead of preparing fallback support for an
+     invalid address, we call adjust_address_nv directly.  */
+  if (MEM_P (cplx))
+    return adjust_address_nv (cplx, imode, (part == IMAG_P)
+			      ? GET_MODE_SIZE (imode) : 0);
+
+  /* If the sub-object is at least word sized, then we know that subregging
+     will work.  This special case is important, since extract_bit_field
+     wants to operate on integer modes, and there's rarely an OImode to
+     correspond to TCmode.  */
+  if (ibitsize >= BITS_PER_WORD
+      /* For hard regs we have exact predicates.  Assume we can split
+	 the original object if it spans an even number of hard regs.
+	 This special case is important for SCmode on 64-bit platforms
+	 where the natural size of floating-point regs is 32-bit.  */
+      || (REG_P (cplx)
+	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
+	  && REG_NREGS (cplx) % 2 == 0))
+    {
+      rtx ret = simplify_gen_subreg (imode, cplx, cmode, (part == IMAG_P)
+				     ? GET_MODE_SIZE (imode) : 0);
+      if (ret)
+	return ret;
+      else
+	/* simplify_gen_subreg may fail for sub-word MEMs.  */
+	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
+    }
+
+  return extract_bit_field (cplx, ibitsize, (part == IMAG_P) ? ibitsize : 0,
+			    true, NULL_RTX, imode, imode, false, NULL);
+}
+
+/* By default, Write to one of the components of the complex value CPLX.  Write VAL to
+   the real part if part is REAL_P, and the imaginary part if it is IMAG_P. If part is
+   BOTH_P, call recursively with REAL_P and IMAG_P.  */
+
+void
+default_write_complex_part (rtx cplx, rtx val, complex_part_t part)
+{
+  machine_mode cmode;
+  scalar_mode imode;
+  unsigned ibitsize;
+
+  if (part == BOTH_P)
+    {
+      write_complex_part (cplx, read_complex_part (val, REAL_P), REAL_P);
+      write_complex_part (cplx, read_complex_part (val, IMAG_P), IMAG_P);
+      return;
+    }
+
+  if (GET_CODE (cplx) == CONCAT)
+    {
+      emit_move_insn (XEXP (cplx, part == IMAG_P), val);
+      return;
+    }
+
+  cmode = GET_MODE (cplx);
+  imode = GET_MODE_INNER (cmode);
+  ibitsize = GET_MODE_BITSIZE (imode);
+
+  /* For MEMs simplify_gen_subreg may generate an invalid new address
+     because, e.g., the original address is considered mode-dependent
+     by the target, which restricts simplify_subreg from invoking
+     adjust_address_nv.  Instead of preparing fallback support for an
+     invalid address, we call adjust_address_nv directly.  */
+  if (MEM_P (cplx))
+    {
+      emit_move_insn (adjust_address_nv (cplx, imode, (part == IMAG_P)
+					 ? GET_MODE_SIZE (imode) : 0), val);
+      return;
+    }
+
+  /* If the sub-object is at least word sized, then we know that subregging
+     will work.  This special case is important, since store_bit_field
+     wants to operate on integer modes, and there's rarely an OImode to
+     correspond to TCmode.  */
+  if (ibitsize >= BITS_PER_WORD
+      /* For hard regs we have exact predicates.  Assume we can split
+	 the original object if it spans an even number of hard regs.
+	 This special case is important for SCmode on 64-bit platforms
+	 where the natural size of floating-point regs is 32-bit.  */
+      || (REG_P (cplx)
+	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
+	  && REG_NREGS (cplx) % 2 == 0))
+    {
+      rtx cplx_part = simplify_gen_subreg (imode, cplx, cmode,
+					   (part == IMAG_P) ?
+					   GET_MODE_SIZE (imode) : 0);
+      if (cplx_part)
+	{
+	  emit_move_insn (cplx_part, val);
+	  return;
+	}
+      else
+	/* simplify_gen_subreg may fail for sub-word MEMs.  */
+	gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
+    }
+
+  store_bit_field (cplx, ibitsize, (part == IMAG_P) ? ibitsize : 0, 0, 0,
+		   imode, val, false);
+}
+
 /* By default do not split reductions further.  */
 
 machine_mode
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 189549cb1c7..dcacc725e27 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -124,6 +124,10 @@ extern opt_machine_mode default_get_mask_mode (machine_mode);
 extern bool default_empty_mask_is_expensive (unsigned);
 extern vector_costs *default_vectorize_create_costs (vec_info *, bool);
 
+extern rtx default_read_complex_part (rtx cplx, complex_part_t part);
+extern void default_write_complex_part (rtx cplx, rtx val,
+					complex_part_t part);
+
 /* OpenACC hooks.  */
 extern bool default_goacc_validate_dims (tree, int [], int, unsigned);
 extern int default_goacc_dim_limit (int);
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 03/11] Native complex ops: Add gen_rtx_complex hook
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 01/11] Native complex ops : Conditional lowering Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 02/11] Native complex ops: Move functions to hooks Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 04/11] Native complex ops: Allow native complex regs and ops in rtl Sylvain Noiry
                       ` (7 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Add a new target hook for complex element creation during
the expand pass, called gen_rtx_complex. The default implementation
calls gen_rtx_CONCAT like before. Then calls to gen_rtx_CONCAT for
complex handling are replaced by calls to targetm.gen_rtx_complex.

gcc/ChangeLog:

	* target.def: Add gen_rtx_complex target hook
	* targhooks.cc (default_gen_rtx_complex): New: Default
	implementation for gen_rtx_complex
	* targhooks.h: Add default_gen_rtx_complex
	* doc/tm.texi: Document TARGET_GEN_RTX_COMPLEX
	* doc/tm.texi.in: Add TARGET_GEN_RTX_COMPLEX
	* emit-rtl.cc (gen_reg_rtx): Replace call to
	gen_rtx_CONCAT by call to gen_rtx_complex
	(init_emit_once): Likewise
	* expmed.cc (flip_storage_order): Likewise
	* optabs.cc (expand_doubleword_mod): Likewise
---
 gcc/doc/tm.texi    |  6 ++++++
 gcc/doc/tm.texi.in |  2 ++
 gcc/emit-rtl.cc    | 26 +++++++++-----------------
 gcc/expmed.cc      |  2 +-
 gcc/optabs.cc      | 11 ++++++-----
 gcc/target.def     | 10 ++++++++++
 gcc/targhooks.cc   | 27 +++++++++++++++++++++++++++
 gcc/targhooks.h    |  2 ++
 8 files changed, 63 insertions(+), 23 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c4f935b5746..470497a3ade 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4620,6 +4620,12 @@ to return a nonzero value when it is required, the compiler will run out
 of spill registers and print a fatal error message.
 @end deftypefn
 
+@deftypefn {Target Hook} rtx TARGET_GEN_RTX_COMPLEX (machine_mode @var{mode}, rtx @var{real_part}, rtx @var{imag_part})
+This hook should return an rtx representing a complex of mode @var{machine_mode} built from @var{real_part} and @var{imag_part}.
+  If both arguments are @code{NULL}, create them as registers.
+ The default is @code{gen_rtx_CONCAT}.
+@end deftypefn
+
 @deftypefn {Target Hook} rtx TARGET_READ_COMPLEX_PART (rtx @var{cplx}, complex_part_t @var{part})
 This hook should return the rtx representing the specified @var{part} of the complex given by @var{cplx}.
   @var{part} can be the real part, the imaginary part, or both of them.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b8970761c8d..27a0b321fe0 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3392,6 +3392,8 @@ stack.
 
 @hook TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 
+@hook TARGET_GEN_RTX_COMPLEX
+
 @hook TARGET_READ_COMPLEX_PART
 
 @hook TARGET_WRITE_COMPLEX_PART
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index f6276a2d0b6..22012bfea13 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -1190,19 +1190,7 @@ gen_reg_rtx (machine_mode mode)
   if (generating_concat_p
       && (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
 	  || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT))
-    {
-      /* For complex modes, don't make a single pseudo.
-	 Instead, make a CONCAT of two pseudos.
-	 This allows noncontiguous allocation of the real and imaginary parts,
-	 which makes much better code.  Besides, allocating DCmode
-	 pseudos overstrains reload on some machines like the 386.  */
-      rtx realpart, imagpart;
-      machine_mode partmode = GET_MODE_INNER (mode);
-
-      realpart = gen_reg_rtx (partmode);
-      imagpart = gen_reg_rtx (partmode);
-      return gen_rtx_CONCAT (mode, realpart, imagpart);
-    }
+    return targetm.gen_rtx_complex (mode, NULL, NULL);
 
   /* Do not call gen_reg_rtx with uninitialized crtl.  */
   gcc_assert (crtl->emit.regno_pointer_align_length);
@@ -6274,14 +6262,18 @@ init_emit_once (void)
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_COMPLEX_INT)
     {
-      rtx inner = const_tiny_rtx[0][(int)GET_MODE_INNER (mode)];
-      const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
+      machine_mode imode = GET_MODE_INNER (mode);
+      rtx inner = const_tiny_rtx[0][(int) imode];
+      const_tiny_rtx[0][(int) mode] =
+	targetm.gen_rtx_complex (mode, inner, inner);
     }
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_COMPLEX_FLOAT)
     {
-      rtx inner = const_tiny_rtx[0][(int)GET_MODE_INNER (mode)];
-      const_tiny_rtx[0][(int) mode] = gen_rtx_CONCAT (mode, inner, inner);
+      machine_mode imode = GET_MODE_INNER (mode);
+      rtx inner = const_tiny_rtx[0][(int) imode];
+      const_tiny_rtx[0][(int) mode] =
+	targetm.gen_rtx_complex (mode, inner, inner);
     }
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 973c16a14d3..ce935951781 100644
--- a/gcc/expmed.cc
+++ b/gcc/expmed.cc
@@ -400,7 +400,7 @@ flip_storage_order (machine_mode mode, rtx x)
       real = flip_storage_order (GET_MODE_INNER (mode), real);
       imag = flip_storage_order (GET_MODE_INNER (mode), imag);
 
-      return gen_rtx_CONCAT (mode, real, imag);
+      return targetm.gen_rtx_complex (mode, real, imag);
     }
 
   if (UNLIKELY (reverse_storage_order_supported < 0))
diff --git a/gcc/optabs.cc b/gcc/optabs.cc
index 32ff379ffc3..429a20f9cd7 100644
--- a/gcc/optabs.cc
+++ b/gcc/optabs.cc
@@ -1001,16 +1001,17 @@ expand_doubleword_mod (machine_mode mode, rtx op0, rtx op1, bool unsignedp)
 	  machine_mode cmode = TYPE_MODE (ctype);
 	  rtx op00 = operand_subword_force (op0, 0, mode);
 	  rtx op01 = operand_subword_force (op0, 1, mode);
-	  rtx cres = gen_rtx_CONCAT (cmode, gen_reg_rtx (word_mode),
-				     gen_reg_rtx (word_mode));
+	  rtx cres = targetm.gen_rtx_complex (cmode, gen_reg_rtx (word_mode),
+					      gen_reg_rtx (word_mode));
 	  tree lhs = make_tree (ctype, cres);
 	  tree arg0 = make_tree (wtype, op00);
 	  tree arg1 = make_tree (wtype, op01);
 	  expand_addsub_overflow (UNKNOWN_LOCATION, PLUS_EXPR, lhs, arg0,
 				  arg1, true, true, true, false, NULL);
-	  sum = expand_simple_binop (word_mode, PLUS, XEXP (cres, 0),
-				     XEXP (cres, 1), NULL_RTX, 1,
-				     OPTAB_DIRECT);
+	  sum = expand_simple_binop (word_mode, PLUS,
+				     read_complex_part (cres, REAL_P),
+				     read_complex_part (cres, IMAG_P),
+				     NULL_RTX, 1, OPTAB_DIRECT);
 	  if (sum == NULL_RTX)
 	    return NULL_RTX;
 	}
diff --git a/gcc/target.def b/gcc/target.def
index f99df939776..d63dacbbb8f 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3313,6 +3313,16 @@ a pointer to int.",
  bool, (ao_ref *ref),
  default_ref_may_alias_errno)
 
+/* Return the rtx representation of a complex with a specified mode.  */
+DEFHOOK
+(gen_rtx_complex,
+ "This hook should return an rtx representing a complex of mode @var{machine_mode} built from @var{real_part} and @var{imag_part}.\n\
+  If both arguments are @code{NULL}, create them as registers.\n\
+ The default is @code{gen_rtx_CONCAT}.",
+ rtx,
+ (machine_mode mode, rtx real_part, rtx imag_part),
+ default_gen_rtx_complex)
+
 /* Returns the value corresponding to the specified part of a complex.  */
 DEFHOOK
 (read_complex_part,
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index df852eb18e3..f6e7bc6c141 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1533,6 +1533,33 @@ default_preferred_simd_mode (scalar_mode)
   return word_mode;
 }
 
+/* By default, call gen_rtx_CONCAT.  */
+
+rtx
+default_gen_rtx_complex (machine_mode mode, rtx real_part, rtx imag_part)
+{
+  /* For complex modes, don't make a single pseudo.
+     Instead, make a CONCAT of two pseudos.
+     This allows noncontiguous allocation of the real and imaginary parts,
+     which makes much better code.  Besides, allocating DCmode
+     pseudos overstrains reload on some machines like the 386.  */
+  machine_mode imode = GET_MODE_INNER (mode);
+
+  if (real_part == NULL)
+    real_part = gen_reg_rtx (imode);
+  else
+    gcc_assert ((GET_MODE (real_part) == imode)
+		|| (GET_MODE (real_part) == E_VOIDmode));
+
+  if (imag_part == NULL)
+    imag_part = gen_reg_rtx (imode);
+  else
+    gcc_assert ((GET_MODE (imag_part) == imode)
+		|| (GET_MODE (imag_part) == E_VOIDmode));
+
+  return gen_rtx_CONCAT (mode, real_part, imag_part);
+}
+
 /* By default, extract one of the components of the complex value CPLX.  Extract the
    real part if part is REAL_P, and the imaginary part if it is IMAG_P. If part is
    BOTH_P, return cplx directly.  */
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index dcacc725e27..cf37eea24b5 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -124,6 +124,8 @@ extern opt_machine_mode default_get_mask_mode (machine_mode);
 extern bool default_empty_mask_is_expensive (unsigned);
 extern vector_costs *default_vectorize_create_costs (vec_info *, bool);
 
+extern rtx default_gen_rtx_complex (machine_mode mode, rtx real_part,
+				    rtx imag_part);
 extern rtx default_read_complex_part (rtx cplx, complex_part_t part);
 extern void default_write_complex_part (rtx cplx, rtx val,
 					complex_part_t part);
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 04/11] Native complex ops: Allow native complex regs and ops in rtl
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
                       ` (2 preceding siblings ...)
  2023-09-12 10:07     ` [PATCH v2 03/11] Native complex ops: Add gen_rtx_complex hook Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 05/11] Native complex ops: Add the conjugate op in optabs Sylvain Noiry
                       ` (6 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Support registers of complex types in rtl. Also adapt the functions
called during the expand pass to support native complex operations.

gcc/ChangeLog:

	* explow.cc (trunc_int_for_mode): Allow complex int modes
	* expr.cc (emit_move_complex_parts): Move both parts at the
	same time if it is supported by the backend
	(emit_move_complex): Do not move via integer if not int mode
	corresponds. For complex floats, relax the constraint on the
	number of registers for targets with pairs of registers, and
	use native moves if it is supported by the backend.
	(expand_expr_real_2): Move both parts at the same time if it
	is supported by the backend
	(expand_expr_real_1): Update the expand of complex constants
	(const_vector_from_tree): Add the expand of both parts of a
	complex	constant
	* real.h: update FLOAT_MODE_FORMAT
	* machmode.h: Add COMPLEX_INT_MODE_P and COMPLEX_FLOAT_MODE_P
	predicates
	* optabs-libfuncs.cc (gen_int_libfunc): Add support for
	complex modes
	(gen_intv_fp_libfunc): Likewise
	* recog.cc (general_operand): Likewise
	* cse.cc (try_const_anchors): Likewise
	* emit-rtl.cc: (validate_subreg): Likewise
---
 gcc/cse.cc               |  2 +-
 gcc/doc/tm.texi          |  2 +-
 gcc/emit-rtl.cc          |  2 +-
 gcc/explow.cc            |  2 +-
 gcc/expr.cc              | 70 ++++++++++++++++++++++++++++++++++------
 gcc/internal-fn.cc       |  4 +--
 gcc/machmode.h           |  8 +++++
 gcc/optabs-libfuncs.cc   | 25 ++++++++++----
 gcc/real.h               |  3 +-
 gcc/recog.cc             |  1 +
 gcc/target.def           |  2 +-
 gcc/targhooks.cc         |  8 ++---
 gcc/targhooks.h          |  3 +-
 gcc/tree-ssa-forwprop.cc |  1 +
 14 files changed, 105 insertions(+), 28 deletions(-)

diff --git a/gcc/cse.cc b/gcc/cse.cc
index c46870059e6..5ce6c692070 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -1313,7 +1313,7 @@ try_const_anchors (rtx src_const, machine_mode mode)
   unsigned lower_old, upper_old;
 
   /* CONST_INT may be in various modes, avoid non-scalar-int mode. */
-  if (!SCALAR_INT_MODE_P (mode))
+  if (!(SCALAR_INT_MODE_P (mode) || COMPLEX_INT_MODE_P (mode)))
     return NULL_RTX;
 
   if (!compute_const_anchors (src_const, &lower_base, &lower_offs,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 470497a3ade..1e87f798449 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -4631,7 +4631,7 @@ This hook should return the rtx representing the specified @var{part} of the com
   @var{part} can be the real part, the imaginary part, or both of them.
 @end deftypefn
 
-@deftypefn {Target Hook} void TARGET_WRITE_COMPLEX_PART (rtx @var{cplx}, rtx @var{val}, complex_part_t @var{part})
+@deftypefn {Target Hook} void TARGET_WRITE_COMPLEX_PART (rtx @var{cplx}, rtx @var{val}, complex_part_t @var{part}, bool @var{undefined_p})
 This hook should move the rtx value given by @var{val} to the specified @var{var} of the complex given by @var{cplx}.
   @var{var} can be the real part, the imaginary part, or both of them.
 @end deftypefn
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index 22012bfea13..f7c33c4afb1 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -946,7 +946,7 @@ validate_subreg (machine_mode omode, machine_mode imode,
      if this ought to be represented at all -- why can't this all be hidden
      in post-reload splitters that make arbitrarily mode changes to the
      registers themselves.  */
-  else if (VECTOR_MODE_P (omode)
+  else if ((VECTOR_MODE_P (omode) || COMPLEX_MODE_P (omode))
 	   && GET_MODE_INNER (omode) == GET_MODE_INNER (imode))
     ;
   /* Subregs involving floating point modes are not allowed to
diff --git a/gcc/explow.cc b/gcc/explow.cc
index 6424c0802f0..48572a40eab 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -56,7 +56,7 @@ trunc_int_for_mode (HOST_WIDE_INT c, machine_mode mode)
   int width = GET_MODE_PRECISION (smode);
 
   /* You want to truncate to a _what_?  */
-  gcc_assert (SCALAR_INT_MODE_P (mode));
+  gcc_assert (SCALAR_INT_MODE_P (mode) || COMPLEX_INT_MODE_P (mode));
 
   /* Canonicalize BImode to 0 and STORE_FLAG_VALUE.  */
   if (smode == BImode)
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 12b74273144..01462486631 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -3842,8 +3842,14 @@ emit_move_complex_parts (rtx x, rtx y)
       && REG_P (x) && !reg_overlap_mentioned_p (x, y))
     emit_clobber (x);
 
-  write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true);
-  write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false);
+  machine_mode mode = GET_MODE (x);
+  if (optab_handler (mov_optab, mode) != CODE_FOR_nothing)
+    write_complex_part (x, read_complex_part (y, BOTH_P), BOTH_P, true);
+  else
+    {
+      write_complex_part (x, read_complex_part (y, REAL_P), REAL_P, true);
+      write_complex_part (x, read_complex_part (y, IMAG_P), IMAG_P, false);
+    }
 
   return get_last_insn ();
 }
@@ -3863,14 +3869,14 @@ emit_move_complex (machine_mode mode, rtx x, rtx y)
 
   /* See if we can coerce the target into moving both values at once, except
      for floating point where we favor moving as parts if this is easy.  */
-  if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
+  scalar_int_mode imode;
+  if (!int_mode_for_mode (mode).exists (&imode))
+    try_int = false;
+  else if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT
       && optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing
-      && !(REG_P (x)
-	   && HARD_REGISTER_P (x)
-	   && REG_NREGS (x) == 1)
-      && !(REG_P (y)
-	   && HARD_REGISTER_P (y)
-	   && REG_NREGS (y) == 1))
+      && optab_handler (mov_optab, mode) != CODE_FOR_nothing
+      && !(REG_P (x) && HARD_REGISTER_P (x))
+      && !(REG_P (y) && HARD_REGISTER_P (y)))
     try_int = false;
   /* Not possible if the values are inherently not adjacent.  */
   else if (GET_CODE (x) == CONCAT || GET_CODE (y) == CONCAT)
@@ -11044,6 +11050,48 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
 
 	  return original_target;
 	}
+      else if (original_target && (GET_CODE (original_target) == REG)
+	       &&
+	       ((GET_MODE_CLASS (GET_MODE (original_target)) ==
+		 MODE_COMPLEX_INT)
+		|| (GET_MODE_CLASS (GET_MODE (original_target)) ==
+		    MODE_COMPLEX_FLOAT)))
+	{
+	  mode = TYPE_MODE (TREE_TYPE (exp));
+
+	  /* Move both parts at the same time if it is possible.  */
+	  if (TREE_COMPLEX_BOTH_PARTS (exp) != NULL)
+	    {
+	      op0 = expand_expr (TREE_COMPLEX_BOTH_PARTS (exp),
+				 original_target, mode, EXPAND_NORMAL);
+	      write_complex_part (original_target, op0, BOTH_P, false);
+	    }
+	  else
+	    {
+	      mode = TYPE_MODE (TREE_TYPE (TREE_TYPE (exp)));
+
+	      rtx rtarg = gen_reg_rtx (mode);
+	      rtx itarg = gen_reg_rtx (mode);
+	      op0 =
+		expand_expr (TREE_REALPART (exp), rtarg, mode, EXPAND_NORMAL);
+	      op1 =
+		expand_expr (TREE_IMAGPART (exp), itarg, mode, EXPAND_NORMAL);
+
+	      write_complex_part (original_target, op0, REAL_P, true);
+	      write_complex_part (original_target, op1, IMAG_P, false);
+	    }
+	  return original_target;
+	}
+      else if ((TREE_COMPLEX_BOTH_PARTS (exp) != NULL)
+	       && (known_le (GET_MODE_BITSIZE (mode), 2 * BITS_PER_WORD)))
+	{
+	  op0 =
+	    expand_expr (TREE_COMPLEX_BOTH_PARTS (exp), original_target, mode,
+			 EXPAND_NORMAL);
+	  rtx tmp = gen_reg_rtx (mode);
+	  write_complex_part (tmp, op0, BOTH_P, false);
+	  return tmp;
+	}
 
       /* fall through */
 
@@ -13391,6 +13439,10 @@ const_vector_from_tree (tree exp)
       else if (TREE_CODE (elt) == FIXED_CST)
 	builder.quick_push (CONST_FIXED_FROM_FIXED_VALUE
 			    (TREE_FIXED_CST (elt), inner));
+      else if (TREE_CODE (elt) == COMPLEX_CST)
+	builder.quick_push (expand_expr
+			    (TREE_COMPLEX_BOTH_PARTS (elt), NULL_RTX, mode,
+			     EXPAND_NORMAL));
       else
 	builder.quick_push (immed_wide_int_const (wi::to_poly_wide (elt),
 						  inner));
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index a01b7160303..c1c8e456320 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2878,8 +2878,8 @@ expand_UADDC (internal_fn ifn, gcall *stmt)
   create_input_operand (&ops[3], op2, mode);
   create_input_operand (&ops[4], op3, mode);
   expand_insn (icode, 5, ops);
-  write_complex_part (target, re, false, false);
-  write_complex_part (target, im, true, false);
+  write_complex_part (target, re, REAL_P, false);
+  write_complex_part (target, im, IMAG_P, false);
 }
 
 /* Expand USUBC STMT.  */
diff --git a/gcc/machmode.h b/gcc/machmode.h
index a22df60dc20..fd87af7c74a 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -119,6 +119,14 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
    || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT \
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
 
+/* Nonzero if MODE is a complex integer mode.  */
+#define COMPLEX_INT_MODE_P(MODE) \
+   (GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT)
+
+/* Nonzero if MODE is a complex floating-point mode.  */
+#define COMPLEX_FLOAT_MODE_P(MODE) \
+  (GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)
+
 /* Nonzero if MODE is a complex mode.  */
 #define COMPLEX_MODE_P(MODE)			\
   (GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT	\
diff --git a/gcc/optabs-libfuncs.cc b/gcc/optabs-libfuncs.cc
index f1abe6916d3..4bb56b2f0d5 100644
--- a/gcc/optabs-libfuncs.cc
+++ b/gcc/optabs-libfuncs.cc
@@ -190,19 +190,30 @@ gen_int_libfunc (optab optable, const char *opname, char suffix,
   int maxsize = 2 * BITS_PER_WORD;
   int minsize = BITS_PER_WORD;
   scalar_int_mode int_mode;
+  complex_mode cplx_int_mode;
+  int bitsize;
 
-  if (!is_int_mode (mode, &int_mode))
+  if (is_int_mode (mode, &int_mode))
+    bitsize = GET_MODE_BITSIZE (int_mode);
+  else if (is_complex_int_mode (mode, &cplx_int_mode))
+    bitsize = GET_MODE_BITSIZE (cplx_int_mode);
+  else
     return;
+
   if (maxsize < LONG_LONG_TYPE_SIZE)
     maxsize = LONG_LONG_TYPE_SIZE;
   if (minsize > INT_TYPE_SIZE
       && (trapv_binoptab_p (optable)
 	  || trapv_unoptab_p (optable)))
     minsize = INT_TYPE_SIZE;
-  if (GET_MODE_BITSIZE (int_mode) < minsize
-      || GET_MODE_BITSIZE (int_mode) > maxsize)
+
+  if (bitsize < minsize || bitsize > maxsize)
     return;
-  gen_libfunc (optable, opname, suffix, int_mode);
+
+  if (GET_MODE_CLASS (mode) == MODE_INT)
+    gen_libfunc (optable, opname, suffix, int_mode);
+  else
+    gen_libfunc (optable, opname, suffix, cplx_int_mode);
 }
 
 /* Like gen_libfunc, but verify that FP and set decimal prefix if needed.  */
@@ -280,9 +291,11 @@ void
 gen_intv_fp_libfunc (optab optable, const char *name, char suffix,
 		     machine_mode mode)
 {
-  if (DECIMAL_FLOAT_MODE_P (mode) || GET_MODE_CLASS (mode) == MODE_FLOAT)
+  if (DECIMAL_FLOAT_MODE_P (mode) || GET_MODE_CLASS (mode) == MODE_FLOAT
+      || GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT)
     gen_fp_libfunc (optable, name, suffix, mode);
-  if (GET_MODE_CLASS (mode) == MODE_INT)
+  if (GET_MODE_CLASS (mode) == MODE_INT
+      || GET_MODE_CLASS (mode) == MODE_COMPLEX_INT)
     {
       int len = strlen (name);
       char *v_name = XALLOCAVEC (char, len + 2);
diff --git a/gcc/real.h b/gcc/real.h
index 9ed6c372b14..53585418e68 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -189,7 +189,8 @@ extern const struct real_format *
 			: (gcc_unreachable (), 0)])
 
 #define FLOAT_MODE_FORMAT(MODE) \
-  (REAL_MODE_FORMAT (as_a <scalar_float_mode> (GET_MODE_INNER (MODE))))
+  (REAL_MODE_FORMAT (as_a <scalar_float_mode> \
+    (GET_MODE_INNER ((COMPLEX_FLOAT_MODE_P (MODE)) ? (GET_MODE_INNER (MODE)) : (MODE)))))
 
 /* The following macro determines whether the floating point format is
    composite, i.e. may contain non-consecutive mantissa bits, in which
diff --git a/gcc/recog.cc b/gcc/recog.cc
index 92f151248a6..8f53e93f566 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -1441,6 +1441,7 @@ general_operand (rtx op, machine_mode mode)
      if the caller wants something floating.  */
   if (GET_MODE (op) == VOIDmode && mode != VOIDmode
       && GET_MODE_CLASS (mode) != MODE_INT
+      && GET_MODE_CLASS (mode) != MODE_COMPLEX_INT
       && GET_MODE_CLASS (mode) != MODE_PARTIAL_INT)
     return false;
 
diff --git a/gcc/target.def b/gcc/target.def
index d63dacbbb8f..4eafff1d21b 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3338,7 +3338,7 @@ DEFHOOK
  "This hook should move the rtx value given by @var{val} to the specified @var{var} of the complex given by @var{cplx}.\n\
   @var{var} can be the real part, the imaginary part, or both of them.",
  void,
- (rtx cplx, rtx val, complex_part_t part),
+ (rtx cplx, rtx val, complex_part_t part, bool undefined_p),
  default_write_complex_part)
 
 /* Support for named address spaces.  */
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index f6e7bc6c141..d89668cd1ab 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1634,7 +1634,7 @@ default_read_complex_part (rtx cplx, complex_part_t part)
    BOTH_P, call recursively with REAL_P and IMAG_P.  */
 
 void
-default_write_complex_part (rtx cplx, rtx val, complex_part_t part)
+default_write_complex_part (rtx cplx, rtx val, complex_part_t part, bool undefined_p)
 {
   machine_mode cmode;
   scalar_mode imode;
@@ -1642,8 +1642,8 @@ default_write_complex_part (rtx cplx, rtx val, complex_part_t part)
 
   if (part == BOTH_P)
     {
-      write_complex_part (cplx, read_complex_part (val, REAL_P), REAL_P);
-      write_complex_part (cplx, read_complex_part (val, IMAG_P), IMAG_P);
+      write_complex_part (cplx, read_complex_part (val, REAL_P), REAL_P, false);
+      write_complex_part (cplx, read_complex_part (val, IMAG_P), IMAG_P, false);
       return;
     }
 
@@ -1696,7 +1696,7 @@ default_write_complex_part (rtx cplx, rtx val, complex_part_t part)
     }
 
   store_bit_field (cplx, ibitsize, (part == IMAG_P) ? ibitsize : 0, 0, 0,
-		   imode, val, false);
+		   imode, val, false, undefined_p);
 }
 
 /* By default do not split reductions further.  */
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index cf37eea24b5..f3ae17998de 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -128,7 +128,8 @@ extern rtx default_gen_rtx_complex (machine_mode mode, rtx real_part,
 				    rtx imag_part);
 extern rtx default_read_complex_part (rtx cplx, complex_part_t part);
 extern void default_write_complex_part (rtx cplx, rtx val,
-					complex_part_t part);
+					complex_part_t part,
+					bool undefined_p);
 
 /* OpenACC hooks.  */
 extern bool default_goacc_validate_dims (tree, int [], int, unsigned);
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 047f9237dd4..30e99f812f1 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -3812,6 +3812,7 @@ pass_forwprop::execute (function *fun)
 		}
 	      else
 		gsi_next (&gsi);
+	      gsi_next (&gsi);
 	    }
 	  else if (code == CONSTRUCTOR
 		   && VECTOR_TYPE_P (TREE_TYPE (rhs))
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 05/11] Native complex ops: Add the conjugate op in optabs
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
                       ` (3 preceding siblings ...)
  2023-09-12 10:07     ` [PATCH v2 04/11] Native complex ops: Allow native complex regs and ops in rtl Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 06/11] Native complex ops: Update how complex rotations are handled Sylvain Noiry
                       ` (5 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Add an optab and rtl operation for the conjugate, called conj,
to expand CONJ_EXPR.

gcc/ChangeLog:

	* rtl.def: Add a conj operation in rtl
	* optabs.def: Add a conj optab
	* optabs-tree.cc (optab_for_tree_code): use the
	conj_optab to convert a CONJ_EXPR
	* expr.cc (expand_expr_real_2): Add a case to expand
	native CONJ_EXPR
	(expand_expr_real_1): Likewise
---
 gcc/expr.cc        | 17 ++++++++++++++++-
 gcc/optabs-tree.cc |  3 +++
 gcc/optabs.def     |  3 +++
 gcc/rtl.def        |  3 +++
 4 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 01462486631..937c2375133 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -10487,6 +10487,18 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return dst;
       }
 
+    case CONJ_EXPR:
+      op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL);
+      if (modifier == EXPAND_STACK_PARM)
+	target = 0;
+      temp = expand_unop (mode,
+			  optab_for_tree_code (CONJ_EXPR, type,
+					       optab_default),
+			  op0, target, 0);
+      gcc_assert (temp);
+      return REDUCE_BIT_FIELD (temp);
+
+
     default:
       gcc_unreachable ();
     }
@@ -12099,6 +12111,10 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
       op0 = expand_normal (treeop0);
       return read_complex_part (op0, IMAG_P);
 
+    case CONJ_EXPR:
+      op0 = expand_normal (treeop0);
+      return op0;
+
     case RETURN_EXPR:
     case LABEL_EXPR:
     case GOTO_EXPR:
@@ -12122,7 +12138,6 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode,
     case VA_ARG_EXPR:
     case BIND_EXPR:
     case INIT_EXPR:
-    case CONJ_EXPR:
     case COMPOUND_EXPR:
     case PREINCREMENT_EXPR:
     case PREDECREMENT_EXPR:
diff --git a/gcc/optabs-tree.cc b/gcc/optabs-tree.cc
index 40bfbb1a5ad..ee5d52a7d50 100644
--- a/gcc/optabs-tree.cc
+++ b/gcc/optabs-tree.cc
@@ -271,6 +271,9 @@ optab_for_tree_code (enum tree_code code, const_tree type,
 	return TYPE_UNSIGNED (type) ? usneg_optab : ssneg_optab;
       return trapv ? negv_optab : neg_optab;
 
+    case CONJ_EXPR:
+      return conj_optab;
+
     case ABS_EXPR:
       return trapv ? absv_optab : abs_optab;
 
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 2ccbe4197b7..8405d365c97 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -162,6 +162,9 @@ OPTAB_NL(umax_optab, "umax$I$a3", UMAX, "umax", '3', gen_int_libfunc)
 OPTAB_NL(neg_optab, "neg$P$a2", NEG, "neg", '2', gen_int_fp_fixed_libfunc)
 OPTAB_NX(neg_optab, "neg$F$a2")
 OPTAB_NX(neg_optab, "neg$Q$a2")
+OPTAB_NL(conj_optab, "conj$P$a2", CONJ, "conj", '2', gen_int_fp_fixed_libfunc)
+OPTAB_NX(conj_optab, "conj$F$a2")
+OPTAB_NX(conj_optab, "conj$Q$a2")
 OPTAB_VL(negv_optab, "negv$I$a2", NEG, "neg", '2', gen_intv_fp_libfunc)
 OPTAB_VX(negv_optab, "neg$F$a2")
 OPTAB_NL(ssneg_optab, "ssneg$Q$a2", SS_NEG, "ssneg", '2', gen_signed_fixed_libfunc)
diff --git a/gcc/rtl.def b/gcc/rtl.def
index 88e2b198503..0312b3ea262 100644
--- a/gcc/rtl.def
+++ b/gcc/rtl.def
@@ -460,6 +460,9 @@ DEF_RTL_EXPR(MINUS, "minus", "ee", RTX_BIN_ARITH)
 /* Minus operand 0.  */
 DEF_RTL_EXPR(NEG, "neg", "e", RTX_UNARY)
 
+/* Conj operand 0.  */
+DEF_RTL_EXPR(CONJ, "conj", "e", RTX_UNARY)
+
 DEF_RTL_EXPR(MULT, "mult", "ee", RTX_COMM_ARITH)
 
 /* Multiplication with signed saturation */
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 06/11] Native complex ops: Update how complex rotations are handled
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
                       ` (4 preceding siblings ...)
  2023-09-12 10:07     ` [PATCH v2 05/11] Native complex ops: Add the conjugate op in optabs Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 07/11] Native complex ops: Vectorization of native complex operations Sylvain Noiry
                       ` (4 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Catch complex rotation by 90° and 270° in fold-const.cc like before,
but now convert them into the new COMPLEX_ROT90 and COMPLEX_ROT270
internal functions. Also add crot90 and crot270 optabs to expose these
operation the backends. So conditionnaly lower COMPLEX_ROT90/COMPLEX_ROT270
by checking if crot90/crot270 are in the optab. Finally, convert
a + crot90/270(b) into cadd90/270(a, b) in a similar way than FMAs.

gcc/ChangeLog:

	* internal-fn.def: Add COMPLEX_ROT90 and COMPLEX_ROT270
	* fold-const.cc (fold_binary_loc): Update the folding of
	complex rotations to generate called to COMPLEX_ROT90 and
	COMPLEX_ROT270
	* optabs.def: add crot90/crot270 optabs
	* tree-complex.cc (init_dont_simulate_again): Catch calls
	to COMPLEX_ROT90 and COMPLEX_ROT270
	(expand_complex_rotation): Conditionally lower complex
	rotations if no pattern is present in the backend
	(expand_complex_operations_1): Likewise
	(convert_crot): Likewise
	* tree-ssa-math-opts.cc (convert_crot_1): Catch complex
	rotations with additions in a similar way the FMAs.
	(math_opts_dom_walker::after_dom_children): Call convert_crot
	if a COMPLEX_ROT90 or COMPLEX_ROT270 is identified
---
 gcc/fold-const.cc         | 145 +++++++++++++++++++++++++++++++-------
 gcc/internal-fn.def       |   2 +
 gcc/optabs.def            |   2 +
 gcc/tree-complex.cc       |  83 +++++++++++++++++++++-
 gcc/tree-ssa-math-opts.cc | 128 +++++++++++++++++++++++++++++++++
 5 files changed, 335 insertions(+), 25 deletions(-)

diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index d19b4666c65..dc05599c7fe 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -11865,30 +11865,6 @@ fold_binary_loc (location_t loc, enum tree_code code, tree type,
 	}
       else
 	{
-	  /* Fold z * +-I to __complex__ (-+__imag z, +-__real z).
-	     This is not the same for NaNs or if signed zeros are
-	     involved.  */
-	  if (!HONOR_NANS (arg0)
-	      && !HONOR_SIGNED_ZEROS (arg0)
-	      && COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0))
-	      && TREE_CODE (arg1) == COMPLEX_CST
-	      && real_zerop (TREE_REALPART (arg1)))
-	    {
-	      tree rtype = TREE_TYPE (TREE_TYPE (arg0));
-	      if (real_onep (TREE_IMAGPART (arg1)))
-		return
-		  fold_build2_loc (loc, COMPLEX_EXPR, type,
-			       negate_expr (fold_build1_loc (loc, IMAGPART_EXPR,
-							     rtype, arg0)),
-			       fold_build1_loc (loc, REALPART_EXPR, rtype, arg0));
-	      else if (real_minus_onep (TREE_IMAGPART (arg1)))
-		return
-		  fold_build2_loc (loc, COMPLEX_EXPR, type,
-			       fold_build1_loc (loc, IMAGPART_EXPR, rtype, arg0),
-			       negate_expr (fold_build1_loc (loc, REALPART_EXPR,
-							     rtype, arg0)));
-	    }
-
 	  /* Optimize z * conj(z) for floating point complex numbers.
 	     Guarded by flag_unsafe_math_optimizations as non-finite
 	     imaginary components don't produce scalar results.  */
@@ -11901,6 +11877,127 @@ fold_binary_loc (location_t loc, enum tree_code code, tree type,
 	      && operand_equal_p (arg0, TREE_OPERAND (arg1, 0), 0))
 	    return fold_mult_zconjz (loc, type, arg0);
 	}
+
+      /* Fold z * +-I to __complex__ (-+__imag z, +-__real z).
+	 This is not the same for NaNs or if signed zeros are
+	 involved.  */
+      if (!HONOR_NANS (arg0)
+	  && !HONOR_SIGNED_ZEROS (arg0)
+	  && TREE_CODE (arg1) == COMPLEX_CST
+	  && (COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0))
+	      && real_zerop (TREE_REALPART (arg1))))
+	{
+	  if (real_onep (TREE_IMAGPART (arg1)))
+	    {
+	      tree rtype = TREE_TYPE (TREE_TYPE (arg0));
+	      tree cplx_build = fold_build2_loc (loc, COMPLEX_EXPR, type,
+						 negate_expr (fold_build1_loc
+							      (loc,
+							       IMAGPART_EXPR,
+							       rtype, arg0)),
+						 fold_build1_loc (loc,
+								  REALPART_EXPR,
+								  rtype,
+								  arg0));
+	      if (cplx_build
+		  && TREE_CODE (TREE_OPERAND (cplx_build, 0)) != NEGATE_EXPR)
+		return cplx_build;
+
+	      if ((TREE_CODE (arg0) == COMPLEX_EXPR)
+		  && real_zerop (TREE_OPERAND (arg0, 1)))
+		return fold_build2_loc (loc, COMPLEX_EXPR, type,
+					TREE_OPERAND (arg0, 1),
+					TREE_OPERAND (arg0, 0));
+
+	      if (TREE_CODE (arg0) == CALL_EXPR)
+		{
+		  if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX_ROT90)
+		    return negate_expr (CALL_EXPR_ARG (arg0, 0));
+		  else if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX_ROT270)
+		    return CALL_EXPR_ARG (arg0, 0);
+		}
+	      else if (TREE_CODE (arg0) == NEGATE_EXPR)
+		return build_call_expr_internal_loc (loc, IFN_COMPLEX_ROT270,
+						     TREE_TYPE (arg0), 1,
+						     TREE_OPERAND (arg0, 0));
+	      else
+		return build_call_expr_internal_loc (loc, IFN_COMPLEX_ROT90,
+						     TREE_TYPE (arg0), 1,
+						     arg0);
+	    }
+	  else if (real_minus_onep (TREE_IMAGPART (arg1)))
+	    {
+	      if (real_zerop (TREE_OPERAND (arg0, 1)))
+		return fold_build2_loc (loc, COMPLEX_EXPR, type,
+					TREE_OPERAND (arg0, 1),
+					negate_expr (TREE_OPERAND (arg0, 0)));
+
+	      return build_call_expr_internal_loc (loc, IFN_COMPLEX_ROT270,
+						   TREE_TYPE (arg0), 1,
+						   fold (arg0));
+	    }
+	}
+
+      /* Fold z * +-I to __complex__ (-+__imag z, +-__real z).
+	 This is not the same for NaNs or if signed zeros are
+	 involved.  */
+      if (!HONOR_NANS (arg0)
+	  && !HONOR_SIGNED_ZEROS (arg0)
+	  && TREE_CODE (arg1) == COMPLEX_CST
+	  && (COMPLEX_INTEGER_TYPE_P (TREE_TYPE (arg0))
+	      && integer_zerop (TREE_REALPART (arg1))))
+	{
+	  if (integer_onep (TREE_IMAGPART (arg1)))
+	    {
+	      tree rtype = TREE_TYPE (TREE_TYPE (arg0));
+	      tree cplx_build = fold_build2_loc (loc, COMPLEX_EXPR, type,
+						 negate_expr (fold_build1_loc
+							      (loc,
+							       IMAGPART_EXPR,
+							       rtype, arg0)),
+						 fold_build1_loc (loc,
+								  REALPART_EXPR,
+								  rtype,
+								  arg0));
+	      if (cplx_build
+		  && TREE_CODE (TREE_OPERAND (cplx_build, 0)) != NEGATE_EXPR)
+		return cplx_build;
+
+	      if ((TREE_CODE (arg0) == COMPLEX_EXPR)
+		  && integer_zerop (TREE_OPERAND (arg0, 1)))
+		return fold_build2_loc (loc, COMPLEX_EXPR, type,
+					TREE_OPERAND (arg0, 1),
+					TREE_OPERAND (arg0, 0));
+
+	      if (TREE_CODE (arg0) == CALL_EXPR)
+		{
+		  if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX_ROT90)
+		    return negate_expr (CALL_EXPR_ARG (arg0, 0));
+		  else if (CALL_EXPR_IFN (arg0) == IFN_COMPLEX_ROT270)
+		    return CALL_EXPR_ARG (arg0, 0);
+		}
+	      else if (TREE_CODE (arg0) == NEGATE_EXPR)
+		return build_call_expr_internal_loc (loc, IFN_COMPLEX_ROT270,
+						     TREE_TYPE (arg0), 1,
+						     TREE_OPERAND (arg0, 0));
+	      else
+		return build_call_expr_internal_loc (loc, IFN_COMPLEX_ROT90,
+						     TREE_TYPE (arg0), 1,
+						     arg0);
+	    }
+	  else if (integer_minus_onep (TREE_IMAGPART (arg1)))
+	    {
+	      if (integer_zerop (TREE_OPERAND (arg0, 1)))
+		return fold_build2_loc (loc, COMPLEX_EXPR, type,
+					TREE_OPERAND (arg0, 1),
+					negate_expr (TREE_OPERAND (arg0, 0)));
+
+	      return build_call_expr_internal_loc (loc, IFN_COMPLEX_ROT270,
+						   TREE_TYPE (arg0), 1,
+						   fold (arg0));
+	    }
+	}
+
       goto associate;
 
     case BIT_IOR_EXPR:
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a2023ab9c3d..0ac6cd98a4f 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -390,6 +390,8 @@ DEF_INTERNAL_FLT_FN (SCALB, ECF_CONST, scalb, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMIN, ECF_CONST, fmin, binary)
 DEF_INTERNAL_FLT_FLOATN_FN (FMAX, ECF_CONST, fmax, binary)
 DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONST, xorsign, binary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ROT90, ECF_CONST, crot90, unary)
+DEF_INTERNAL_OPTAB_FN (COMPLEX_ROT270, ECF_CONST, crot270, unary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 8405d365c97..d146cac5eec 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -334,6 +334,8 @@ OPTAB_D (atan_optab, "atan$a2")
 OPTAB_D (atanh_optab, "atanh$a2")
 OPTAB_D (copysign_optab, "copysign$F$a3")
 OPTAB_D (xorsign_optab, "xorsign$F$a3")
+OPTAB_D (crot90_optab, "crot90$a2")
+OPTAB_D (crot270_optab, "crot270$a2")
 OPTAB_D (cadd90_optab, "cadd90$a3")
 OPTAB_D (cadd270_optab, "cadd270$a3")
 OPTAB_D (cmul_optab, "cmul$a3")
diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
index d889a99d513..d814e407af6 100644
--- a/gcc/tree-complex.cc
+++ b/gcc/tree-complex.cc
@@ -241,7 +241,10 @@ init_dont_simulate_again (void)
 	  switch (gimple_code (stmt))
 	    {
 	    case GIMPLE_CALL:
-	      if (gimple_call_lhs (stmt))
+	      if (gimple_call_combined_fn (stmt) == CFN_COMPLEX_ROT90
+		  || gimple_call_combined_fn (stmt) == CFN_COMPLEX_ROT270)
+		saw_a_complex_op = true;
+	      else if (gimple_call_lhs (stmt))
 	        sim_again_p = is_complex_reg (gimple_call_lhs (stmt));
 	      break;
 
@@ -1730,6 +1733,69 @@ expand_complex_asm (gimple_stmt_iterator *gsi)
     }
 }
 
+/* Expand complex rotations represented as internal functions
+   This function assumes that lowered complex rotation is still better
+   than a complex multiplication, else the backend would have redefined
+   crot90 and crot270.  */
+
+static void
+expand_complex_rotation (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  tree ac = gimple_call_arg (stmt, 0);
+  gimple_seq stmts = NULL;
+  location_t loc = gimple_location (gsi_stmt (*gsi));
+
+  tree lhs = gimple_get_lhs (stmt);
+  tree type = TREE_TYPE (ac);
+  tree inner_type = TREE_TYPE (type);
+
+
+  tree rr, ri, rb;
+  optab op = optab_for_tree_code (MULT_EXPR, inner_type, optab_default);
+  if (optab_handler (op, TYPE_MODE (type)) != CODE_FOR_nothing)
+    {
+      tree cst_i = build_complex (type, build_zero_cst (inner_type),
+				  build_one_cst (inner_type));
+      rb = gimple_build (&stmts, loc, MULT_EXPR, type, ac, cst_i);
+
+      gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+
+      gassign *new_assign = gimple_build_assign (lhs, rb);
+      gimple_set_lhs (new_assign, lhs);
+      gsi_replace (gsi, new_assign, true);
+
+      update_complex_assignment (gsi, NULL, NULL, rb);
+    }
+  else
+    {
+      tree ar = extract_component (gsi, ac, REAL_P, true);
+      tree ai = extract_component (gsi, ac, IMAG_P, true);
+
+      if (gimple_call_internal_fn (stmt) == IFN_COMPLEX_ROT90)
+	{
+	  rr = gimple_build (&stmts, loc, NEGATE_EXPR, inner_type, ai);
+	  ri = ar;
+	}
+      else if (gimple_call_internal_fn (stmt) == IFN_COMPLEX_ROT270)
+	{
+	  rr = ai;
+	  ri = gimple_build (&stmts, loc, NEGATE_EXPR, inner_type, ar);
+	}
+      else
+	gcc_unreachable ();
+
+      gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+
+      gassign *new_assign =
+	gimple_build_assign (gimple_get_lhs (stmt), COMPLEX_EXPR, rr, ri);
+      gimple_set_lhs (new_assign, gimple_get_lhs (stmt));
+      gsi_replace (gsi, new_assign, true);
+
+      update_complex_assignment (gsi, rr, ri);
+    }
+}
+
 /* Returns true if a complex component is a constant.  */
 
 static bool
@@ -1859,6 +1925,21 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
 	if (gimple_code (stmt) == GIMPLE_COND)
 	  return;
 
+	if (is_gimple_call (stmt)
+	    && (gimple_call_combined_fn (stmt) == CFN_COMPLEX_ROT90
+		|| gimple_call_combined_fn (stmt) == CFN_COMPLEX_ROT270))
+	  {
+	    if (!direct_internal_fn_supported_p
+		(gimple_call_internal_fn (stmt), type,
+		 bb_optimization_type (gimple_bb (stmt))))
+	      expand_complex_rotation (gsi);
+	    else
+	      update_complex_components (gsi, stmt, NULL, NULL,
+					 gimple_call_lhs (stmt));
+
+	    return;
+	  }
+
 	if (TREE_CODE (type) == COMPLEX_TYPE)
 	  expand_complex_move (gsi, type);
 	else if (is_gimple_assign (stmt)
diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
index 95c22694368..74b9a993e2d 100644
--- a/gcc/tree-ssa-math-opts.cc
+++ b/gcc/tree-ssa-math-opts.cc
@@ -3291,6 +3291,118 @@ last_fma_candidate_feeds_initial_phi (fma_deferring_state *state,
   return false;
 }
 
+/* Convert complex rotation to addition with one operation rotated
+   in a similar way than FMAs.  */
+
+static void
+convert_crot_1 (tree crot_result, tree op1, internal_fn cadd_fn)
+{
+  gimple *use_stmt;
+  imm_use_iterator imm_iter;
+  gcall *cadd_stmt;
+
+  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, crot_result)
+  {
+    gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+    tree add_op, result = crot_result;
+
+    if (is_gimple_debug (use_stmt))
+      continue;
+
+    add_op = (gimple_assign_rhs1 (use_stmt) != result)
+      ? gimple_assign_rhs1 (use_stmt) : gimple_assign_rhs2 (use_stmt);
+
+
+    cadd_stmt = gimple_build_call_internal (cadd_fn, 2, add_op, op1);
+    gimple_set_lhs (cadd_stmt, gimple_get_lhs (use_stmt));
+    gimple_call_set_nothrow (cadd_stmt, !stmt_can_throw_internal (cfun,
+								  use_stmt));
+    gsi_replace (&gsi, cadd_stmt, true);
+
+    if (dump_file && (dump_flags & TDF_DETAILS))
+      {
+	fprintf (dump_file, "Generated COMPLEX_ADD_ROT ");
+	print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, TDF_NONE);
+	fprintf (dump_file, "\n");
+      }
+  }
+}
+
+
+/* Convert complex rotation to addition with one operation rotated
+   in a similar way than FMAs.  */
+
+static bool
+convert_crot (gimple *crot_stmt, tree op1, combined_fn crot_kind)
+{
+  internal_fn cadd_fn;
+  switch (crot_kind)
+    {
+    case CFN_COMPLEX_ROT90:
+      cadd_fn = IFN_COMPLEX_ADD_ROT90;
+      break;
+    case CFN_COMPLEX_ROT270:
+      cadd_fn = IFN_COMPLEX_ADD_ROT270;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+
+  tree crot_result = gimple_get_lhs (crot_stmt);
+  /* If there isn't a LHS then this can't be an CADD.  There can be no LHS
+     if the statement was left just for the side-effects.  */
+  if (!crot_result)
+    return false;
+  tree type = TREE_TYPE (crot_result);
+  gimple *use_stmt;
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+
+  if (COMPLEX_FLOAT_TYPE_P (type) && flag_fp_contract_mode == FP_CONTRACT_OFF)
+    return false;
+
+  /* We don't want to do bitfield reduction ops.  */
+  if (INTEGRAL_TYPE_P (type)
+      && (!type_has_mode_precision_p (type) || TYPE_OVERFLOW_TRAPS (type)))
+    return false;
+
+  /* If the target doesn't support it, don't generate it. */
+  optimization_type opt_type = bb_optimization_type (gimple_bb (crot_stmt));
+  if (!direct_internal_fn_supported_p (cadd_fn, type, opt_type))
+    return false;
+
+  /* If the crot has zero uses, it is kept around probably because
+     of -fnon-call-exceptions.  Don't optimize it away in that case,
+     it is DCE job.  */
+  if (has_zero_uses (crot_result))
+    return false;
+
+  /* Make sure that the crot statement becomes dead after
+     the transformation, thus that all uses are transformed to FMAs.
+     This means we assume that an FMA operation has the same cost
+     as an addition.  */
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, crot_result)
+  {
+    use_stmt = USE_STMT (use_p);
+
+    if (is_gimple_debug (use_stmt))
+      continue;
+
+    if (gimple_bb (use_stmt) != gimple_bb (crot_stmt))
+      return false;
+
+    if (!is_gimple_assign (use_stmt))
+      return false;
+
+    if (gimple_assign_rhs_code (use_stmt) != PLUS_EXPR)
+      return false;
+  }
+
+  convert_crot_1 (crot_result, op1, cadd_fn);
+  return true;
+}
+
 /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2
    with uses in additions and subtractions to form fused multiply-add
    operations.  Returns true if successful and MUL_STMT should be removed.
@@ -5839,6 +5951,22 @@ math_opts_dom_walker::after_dom_children (basic_block bb)
 	      cancel_fma_deferring (&fma_state);
 	      break;
 
+	    case CFN_COMPLEX_ROT90:
+	    case CFN_COMPLEX_ROT270:
+	      if (gimple_call_lhs (stmt)
+		  && convert_crot (stmt,
+				   gimple_call_arg (stmt, 0),
+				   gimple_call_combined_fn (stmt)))
+		{
+		  unlink_stmt_vdef (stmt);
+		  if (gsi_remove (&gsi, true)
+		      && gimple_purge_dead_eh_edges (bb))
+		    *m_cfg_changed_p = true;
+		  release_defs (stmt);
+		  continue;
+		}
+	      break;
+
 	    default:
 	      break;
 	    }
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 07/11] Native complex ops: Vectorization of native complex operations
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
                       ` (5 preceding siblings ...)
  2023-09-12 10:07     ` [PATCH v2 06/11] Native complex ops: Update how complex rotations are handled Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 08/11] Native complex ops: Add explicit vector of complex Sylvain Noiry
                       ` (3 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Add vectors of complex types to vectorize native operations. Because of
the vectorize was designed to work with scalar elements, several functions
and target hooks have to be adapted or duplicated to support complex types.
After that, the vectorization of native complex operations follows exactly
the same flow as scalars operations.

gcc/ChangeLog:

	* target.def: Add preferred_simd_mode_complex and
	related_mode_complex by duplicating their scalar counterparts
	* targhooks.h: Add default_preferred_simd_mode_complex and
	default_vectorize_related_mode_complex
	* targhooks.cc (default_preferred_simd_mode_complex): New:
	Default implementation of preferred_simd_mode_complex
	(default_vectorize_related_mode_complex): New: Default
	implementation of related_mode_complex
	* doc/tm.texi: Document
	TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX
	and TARGET_VECTORIZE_RELATED_MODE_COMPLEX
	* doc/tm.texi.in: Add TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX
	and TARGET_VECTORIZE_RELATED_MODE_COMPLEX
	* emit-rtl.cc (init_emit_once): Add the zero constant for vectors
	of complex modes
	* genmodes.cc (vector_class): Add case for vectors of complex
	(complete_mode): Likewise
	(make_complex_modes): Likewise
	* gensupport.cc (match_pattern): Likewise
	* machmode.h: Add vectors of complex in predicates and redefine
	mode_for_vector and related_vector_mode for complex types
	* mode-classes.def: Add MODE_VECTOR_COMPLEX_INT and
	MODE_VECTOR_COMPLEX_FLOAT classes
	* stor-layout.cc (mode_for_vector): Adapt for complex modes
	using sub-functions calling a common one
	(int_mode_for_mode): Add case for complex vectors
	(related_vector_mode): Implement the function for complex modes
	* tree-vect-generic.cc (type_for_widest_vector_mode): Add
	cases for complex modes
	* tree-vect-stmts.cc (get_related_vectype_for_scalar_type):
	Adapt for complex modes
	* tree.cc (build_vector_type_for_mode): Add cases for complex
	modes
---
 gcc/doc/tm.texi          | 31 ++++++++++++++++++++++
 gcc/doc/tm.texi.in       |  4 +++
 gcc/emit-rtl.cc          | 10 +++++++
 gcc/genmodes.cc          |  8 ++++++
 gcc/gensupport.cc        |  3 +++
 gcc/machmode.h           | 19 +++++++++++---
 gcc/mode-classes.def     |  2 ++
 gcc/stor-layout.cc       | 45 ++++++++++++++++++++++++++++---
 gcc/target.def           | 39 +++++++++++++++++++++++++++
 gcc/targhooks.cc         | 29 ++++++++++++++++++++
 gcc/targhooks.h          |  4 +++
 gcc/tree-vect-generic.cc |  4 +++
 gcc/tree-vect-stmts.cc   | 57 ++++++++++++++++++++++++++++------------
 gcc/tree.cc              |  2 ++
 14 files changed, 232 insertions(+), 25 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 1e87f798449..f7a8a5351e2 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6251,6 +6251,13 @@ equal to @code{word_mode}, because the vectorizer can do some
 transformations even in absence of specialized @acronym{SIMD} hardware.
 @end deftypefn
 
+@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX (complex_mode @var{mode})
+This hook should return the preferred mode for vectorizing complex
+mode @var{mode}.  The default is
+equal to @code{word_mode}, because the vectorizer can do some
+transformations even in absence of specialized @acronym{SIMD} hardware.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_SPLIT_REDUCTION (machine_mode)
 This hook should return the preferred mode to split the final reduction
 step on @var{mode} to.  The reduction is then carried out reducing upper
@@ -6313,6 +6320,30 @@ requested mode, returning a mode with the same size as @var{vector_mode}
 when @var{nunits} is zero.  This is the correct behavior for most targets.
 @end deftypefn
 
+@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_RELATED_MODE_COMPLEX (machine_mode @var{vector_mode}, complex_mode @var{element_mode}, poly_uint64 @var{nunits})
+If a piece of code is using vector mode @var{vector_mode} and also wants
+to operate on elements of mode @var{element_mode}, return the vector mode
+it should use for those elements.  If @var{nunits} is nonzero, ensure that
+the mode has exactly @var{nunits} elements, otherwise pick whichever vector
+size pairs the most naturally with @var{vector_mode}.  Return an empty
+@code{opt_machine_mode} if there is no supported vector mode with the
+required properties.
+
+There is no prescribed way of handling the case in which @var{nunits}
+is zero.  One common choice is to pick a vector mode with the same size
+as @var{vector_mode}; this is the natural choice if the target has a
+fixed vector size.  Another option is to choose a vector mode with the
+same number of elements as @var{vector_mode}; this is the natural choice
+if the target has a fixed number of elements.  Alternatively, the hook
+might choose a middle ground, such as trying to keep the number of
+elements as similar as possible while applying maximum and minimum
+vector sizes.
+
+The default implementation uses @code{mode_for_vector} to find the
+requested mode, returning a mode with the same size as @var{vector_mode}
+when @var{nunits} is zero.  This is the correct behavior for most targets.
+@end deftypefn
+
 @deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE (machine_mode @var{mode})
 Return the mode to use for a vector mask that holds one boolean
 result for each element of vector mode @var{mode}.  The returned mask mode
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 27a0b321fe0..68650d70e02 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4197,12 +4197,16 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_PREFERRED_SIMD_MODE
 
+@hook TARGET_VECTORIZE_PREFERRED_SIMD_MODE_COMPLEX
+
 @hook TARGET_VECTORIZE_SPLIT_REDUCTION
 
 @hook TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES
 
 @hook TARGET_VECTORIZE_RELATED_MODE
 
+@hook TARGET_VECTORIZE_RELATED_MODE_COMPLEX
+
 @hook TARGET_VECTORIZE_GET_MASK_MODE
 
 @hook TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE
diff --git a/gcc/emit-rtl.cc b/gcc/emit-rtl.cc
index f7c33c4afb1..b0d556e45aa 100644
--- a/gcc/emit-rtl.cc
+++ b/gcc/emit-rtl.cc
@@ -6276,6 +6276,16 @@ init_emit_once (void)
 	targetm.gen_rtx_complex (mode, inner, inner);
     }
 
+  FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_COMPLEX_INT)
+  {
+    const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
+  }
+
+  FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_COMPLEX_FLOAT)
+  {
+    const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
+  }
+
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_BOOL)
     {
       const_tiny_rtx[0][(int) mode] = gen_const_vector (mode, 0);
diff --git a/gcc/genmodes.cc b/gcc/genmodes.cc
index 55ac2adb559..775878b3f7e 100644
--- a/gcc/genmodes.cc
+++ b/gcc/genmodes.cc
@@ -142,6 +142,8 @@ vector_class (enum mode_class cl)
     case MODE_UFRACT: return MODE_VECTOR_UFRACT;
     case MODE_ACCUM: return MODE_VECTOR_ACCUM;
     case MODE_UACCUM: return MODE_VECTOR_UACCUM;
+    case MODE_COMPLEX_INT: return MODE_VECTOR_COMPLEX_INT;
+    case MODE_COMPLEX_FLOAT: return MODE_VECTOR_COMPLEX_FLOAT;
     default:
       error ("no vector class for class %s", mode_class_names[cl]);
       return MODE_RANDOM;
@@ -400,6 +402,8 @@ complete_mode (struct mode_data *m)
     case MODE_VECTOR_UFRACT:
     case MODE_VECTOR_ACCUM:
     case MODE_VECTOR_UACCUM:
+    case MODE_VECTOR_COMPLEX_INT:
+    case MODE_VECTOR_COMPLEX_FLOAT:
       /* Vector modes should have a component and a number of components.  */
       validate_mode (m, UNSET, UNSET, SET, SET, UNSET);
       if (m->component->precision != (unsigned int)-1)
@@ -462,6 +466,10 @@ make_complex_modes (enum mode_class cl,
       if (m->boolean)
 	continue;
 
+      /* Skip already created mode.  */
+      if (m->complex)
+	continue;
+
       m_len = strlen (m->name);
       /* The leading "1 +" is in case we prepend a "C" below.  */
       buf = (char *) xmalloc (1 + m_len + 1);
diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 54f7b3cfe81..6be704feee1 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -3747,16 +3747,19 @@ match_pattern (optab_pattern *p, const char *name, const char *pat)
 		if (*p == 0
 		    && (! force_int || mode_class[i] == MODE_INT
 			|| mode_class[i] == MODE_COMPLEX_INT
+			|| mode_class[i] == MODE_VECTOR_COMPLEX_INT
 			|| mode_class[i] == MODE_VECTOR_INT)
 		    && (! force_partial_int
 			|| mode_class[i] == MODE_INT
 			|| mode_class[i] == MODE_COMPLEX_INT
+			|| mode_class[i] == MODE_VECTOR_COMPLEX_INT
 			|| mode_class[i] == MODE_PARTIAL_INT
 			|| mode_class[i] == MODE_VECTOR_INT)
 		    && (! force_float
 			|| mode_class[i] == MODE_FLOAT
 			|| mode_class[i] == MODE_DECIMAL_FLOAT
 			|| mode_class[i] == MODE_COMPLEX_FLOAT
+			|| mode_class[i] == MODE_VECTOR_COMPLEX_FLOAT
 			|| mode_class[i] == MODE_VECTOR_FLOAT)
 		    && (! force_fixed
 			|| mode_class[i] == MODE_FRACT
diff --git a/gcc/machmode.h b/gcc/machmode.h
index fd87af7c74a..a32624da863 100644
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -110,6 +110,7 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
    || GET_MODE_CLASS (MODE) == MODE_PARTIAL_INT \
    || GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT \
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL \
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_INT \
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_INT)
 
 /* Nonzero if MODE is a floating-point mode.  */
@@ -117,19 +118,24 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
   (GET_MODE_CLASS (MODE) == MODE_FLOAT	\
    || GET_MODE_CLASS (MODE) == MODE_DECIMAL_FLOAT \
    || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT \
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_FLOAT \
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_FLOAT)
 
 /* Nonzero if MODE is a complex integer mode.  */
-#define COMPLEX_INT_MODE_P(MODE) \
-   (GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT)
+#define COMPLEX_INT_MODE_P(MODE)   	\
+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_INT \
+   || GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT)
 
 /* Nonzero if MODE is a complex floating-point mode.  */
-#define COMPLEX_FLOAT_MODE_P(MODE) \
-  (GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)
+#define COMPLEX_FLOAT_MODE_P(MODE)		\
+   (GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_FLOAT \
+    || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)
 
 /* Nonzero if MODE is a complex mode.  */
 #define COMPLEX_MODE_P(MODE)			\
   (GET_MODE_CLASS (MODE) == MODE_COMPLEX_INT	\
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_INT	\
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_FLOAT	\
    || GET_MODE_CLASS (MODE) == MODE_COMPLEX_FLOAT)
 
 /* Nonzero if MODE is a vector mode.  */
@@ -140,6 +146,8 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_FRACT	\
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_UFRACT	\
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM	\
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_INT	\
+   || GET_MODE_CLASS (MODE) == MODE_VECTOR_COMPLEX_FLOAT	\
    || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
 
 /* Nonzero if MODE is a scalar integral mode.  */
@@ -929,6 +937,9 @@ extern opt_machine_mode bitwise_mode_for_mode (machine_mode);
 extern opt_machine_mode mode_for_vector (scalar_mode, poly_uint64);
 extern opt_machine_mode related_vector_mode (machine_mode, scalar_mode,
 					     poly_uint64 = 0);
+extern opt_machine_mode mode_for_vector (complex_mode, poly_uint64);
+extern opt_machine_mode related_vector_mode (machine_mode,
+					     complex_mode, poly_uint64 = 0);
 extern opt_machine_mode related_int_vector_mode (machine_mode);
 
 /* A class for iterating through possible bitfield modes.  */
diff --git a/gcc/mode-classes.def b/gcc/mode-classes.def
index de42d7ee6fb..cc6bcaeb026 100644
--- a/gcc/mode-classes.def
+++ b/gcc/mode-classes.def
@@ -32,9 +32,11 @@ along with GCC; see the file COPYING3.  If not see
   DEF_MODE_CLASS (MODE_COMPLEX_FLOAT),					   \
   DEF_MODE_CLASS (MODE_VECTOR_BOOL),	/* vectors of single bits */	   \
   DEF_MODE_CLASS (MODE_VECTOR_INT),	/* SIMD vectors */		   \
+  DEF_MODE_CLASS (MODE_VECTOR_COMPLEX_INT), /* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_FRACT),	/* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_UFRACT),	/* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_ACCUM),	/* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_UACCUM),	/* SIMD vectors */		   \
   DEF_MODE_CLASS (MODE_VECTOR_FLOAT),                                      \
+  DEF_MODE_CLASS (MODE_VECTOR_COMPLEX_FLOAT),                              \
   DEF_MODE_CLASS (MODE_OPAQUE)          /* opaque modes */
diff --git a/gcc/stor-layout.cc b/gcc/stor-layout.cc
index ba375fa423c..450067d3562 100644
--- a/gcc/stor-layout.cc
+++ b/gcc/stor-layout.cc
@@ -378,6 +378,8 @@ int_mode_for_mode (machine_mode mode)
 
     case MODE_COMPLEX_INT:
     case MODE_COMPLEX_FLOAT:
+    case MODE_VECTOR_COMPLEX_INT:
+    case MODE_VECTOR_COMPLEX_FLOAT:
     case MODE_FLOAT:
     case MODE_DECIMAL_FLOAT:
     case MODE_FRACT:
@@ -480,8 +482,8 @@ bitwise_type_for_mode (machine_mode mode)
    elements of mode INNERMODE, if one exists.  The returned mode can be
    either an integer mode or a vector mode.  */
 
-opt_machine_mode
-mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
+static opt_machine_mode
+mode_for_vector (machine_mode innermode, poly_uint64 nunits)
 {
   machine_mode mode;
 
@@ -496,8 +498,14 @@ mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
     mode = MIN_MODE_VECTOR_ACCUM;
   else if (SCALAR_UACCUM_MODE_P (innermode))
     mode = MIN_MODE_VECTOR_UACCUM;
-  else
+  else if (SCALAR_INT_MODE_P (innermode))
     mode = MIN_MODE_VECTOR_INT;
+  else if (COMPLEX_FLOAT_MODE_P (innermode))
+    mode = MIN_MODE_VECTOR_COMPLEX_FLOAT;
+  else if (COMPLEX_INT_MODE_P (innermode))
+    mode = MIN_MODE_VECTOR_COMPLEX_INT;
+  else
+    gcc_unreachable ();
 
   /* Only check the broader vector_mode_supported_any_target_p here.
      We'll filter through target-specific availability and
@@ -511,7 +519,7 @@ mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
   /* For integers, try mapping it to a same-sized scalar mode.  */
   if (GET_MODE_CLASS (innermode) == MODE_INT)
     {
-      poly_uint64 nbits = nunits * GET_MODE_BITSIZE (innermode);
+      poly_uint64 nbits = nunits * GET_MODE_BITSIZE (innermode).coeffs[0];
       if (int_mode_for_size (nbits, 0).exists (&mode)
 	  && have_regs_of_mode[mode])
 	return mode;
@@ -520,6 +528,26 @@ mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
   return opt_machine_mode ();
 }
 
+/* Find a mode that is suitable for representing a vector with NUNITS
+   elements of scalar mode INNERMODE, if one exists.  The returned mode
+   can be either an integer mode or a vector mode.  */
+
+opt_machine_mode
+mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
+{
+  return mode_for_vector (machine_mode (innermode), nunits);
+}
+
+/* Find a mode that is suitable for representing a vector with NUNITS
+   elements of complex mode INNERMODE, if one exists.  The returned mode
+   can be either an integer mode or a vector mode.  */
+
+opt_machine_mode
+mode_for_vector (complex_mode innermode, poly_uint64 nunits)
+{
+  return mode_for_vector (machine_mode (innermode), nunits);
+}
+
 /* If a piece of code is using vector mode VECTOR_MODE and also wants
    to operate on elements of mode ELEMENT_MODE, return the vector mode
    it should use for those elements.  If NUNITS is nonzero, ensure that
@@ -540,6 +568,15 @@ related_vector_mode (machine_mode vector_mode, scalar_mode element_mode,
   return targetm.vectorize.related_mode (vector_mode, element_mode, nunits);
 }
 
+opt_machine_mode
+related_vector_mode (machine_mode vector_mode,
+		     complex_mode element_mode, poly_uint64 nunits)
+{
+  gcc_assert (VECTOR_MODE_P (vector_mode));
+  return targetm.vectorize.related_mode_complex (vector_mode, element_mode,
+						 nunits);
+}
+
 /* If a piece of code is using vector mode VECTOR_MODE and also wants
    to operate on integer vectors with the same element size and number
    of elements, return the vector mode it should use.  Return an empty
diff --git a/gcc/target.def b/gcc/target.def
index 4eafff1d21b..665e81e9ef1 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1943,6 +1943,18 @@ transformations even in absence of specialized @acronym{SIMD} hardware.",
  (scalar_mode mode),
  default_preferred_simd_mode)
 
+/* Returns the preferred mode for SIMD operations for the specified
+   complex mode.  */
+DEFHOOK
+(preferred_simd_mode_complex,
+ "This hook should return the preferred mode for vectorizing complex\n\
+mode @var{mode}.  The default is\n\
+equal to @code{word_mode}, because the vectorizer can do some\n\
+transformations even in absence of specialized @acronym{SIMD} hardware.",
+ machine_mode,
+ (complex_mode mode),
+ default_preferred_simd_mode_complex)
+
 /* Returns the preferred mode for splitting SIMD reductions to.  */
 DEFHOOK
 (split_reduction,
@@ -2017,6 +2029,33 @@ when @var{nunits} is zero.  This is the correct behavior for most targets.",
  (machine_mode vector_mode, scalar_mode element_mode, poly_uint64 nunits),
  default_vectorize_related_mode)
 
+DEFHOOK
+(related_mode_complex,
+ "If a piece of code is using vector mode @var{vector_mode} and also wants\n\
+to operate on elements of mode @var{element_mode}, return the vector mode\n\
+it should use for those elements.  If @var{nunits} is nonzero, ensure that\n\
+the mode has exactly @var{nunits} elements, otherwise pick whichever vector\n\
+size pairs the most naturally with @var{vector_mode}.  Return an empty\n\
+@code{opt_machine_mode} if there is no supported vector mode with the\n\
+required properties.\n\
+\n\
+There is no prescribed way of handling the case in which @var{nunits}\n\
+is zero.  One common choice is to pick a vector mode with the same size\n\
+as @var{vector_mode}; this is the natural choice if the target has a\n\
+fixed vector size.  Another option is to choose a vector mode with the\n\
+same number of elements as @var{vector_mode}; this is the natural choice\n\
+if the target has a fixed number of elements.  Alternatively, the hook\n\
+might choose a middle ground, such as trying to keep the number of\n\
+elements as similar as possible while applying maximum and minimum\n\
+vector sizes.\n\
+\n\
+The default implementation uses @code{mode_for_vector} to find the\n\
+requested mode, returning a mode with the same size as @var{vector_mode}\n\
+when @var{nunits} is zero.  This is the correct behavior for most targets.",
+ opt_machine_mode,
+ (machine_mode vector_mode, complex_mode element_mode, poly_uint64 nunits),
+ default_vectorize_related_mode_complex)
+
 /* Function to get a target mode for a vector mask.  */
 DEFHOOK
 (get_mask_mode,
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index d89668cd1ab..463b53e0baa 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1533,6 +1533,15 @@ default_preferred_simd_mode (scalar_mode)
   return word_mode;
 }
 
+/* By default, only attempt to parallelize bitwise operations, and
+   possibly adds/subtracts using bit-twiddling.  */
+
+machine_mode
+default_preferred_simd_mode_complex (complex_mode)
+{
+  return word_mode;
+}
+
 /* By default, call gen_rtx_CONCAT.  */
 
 rtx
@@ -1734,6 +1743,26 @@ default_vectorize_related_mode (machine_mode vector_mode,
   return opt_machine_mode ();
 }
 
+
+/* The default implementation of TARGET_VECTORIZE_RELATED_MODE_COMPLEX.  */
+
+opt_machine_mode
+default_vectorize_related_mode_complex (machine_mode vector_mode,
+					complex_mode element_mode,
+					poly_uint64 nunits)
+{
+  machine_mode result_mode;
+  if ((maybe_ne (nunits, 0U)
+       || multiple_p (GET_MODE_SIZE (vector_mode),
+		      GET_MODE_SIZE (element_mode), &nunits))
+      && mode_for_vector (element_mode, nunits).exists (&result_mode)
+      && VECTOR_MODE_P (result_mode)
+      && targetm.vector_mode_supported_p (result_mode))
+    return result_mode;
+
+  return opt_machine_mode ();
+}
+
 /* By default a vector of integers is used as a mask.  */
 
 opt_machine_mode
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index f3ae17998de..c7ca6b55e31 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -115,11 +115,15 @@ default_builtin_support_vector_misalignment (machine_mode mode,
 					     const_tree,
 					     int, bool);
 extern machine_mode default_preferred_simd_mode (scalar_mode mode);
+extern machine_mode default_preferred_simd_mode_complex (complex_mode mode);
 extern machine_mode default_split_reduction (machine_mode);
 extern unsigned int default_autovectorize_vector_modes (vector_modes *, bool);
 extern opt_machine_mode default_vectorize_related_mode (machine_mode,
 							scalar_mode,
 							poly_uint64);
+extern opt_machine_mode default_vectorize_related_mode_complex (machine_mode,
+								complex_mode,
+								poly_uint64);
 extern opt_machine_mode default_get_mask_mode (machine_mode);
 extern bool default_empty_mask_is_expensive (unsigned);
 extern vector_costs *default_vectorize_create_costs (vec_info *, bool);
diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc
index a7e6cb87a5e..718b144ec23 100644
--- a/gcc/tree-vect-generic.cc
+++ b/gcc/tree-vect-generic.cc
@@ -1363,6 +1363,10 @@ type_for_widest_vector_mode (tree type, optab op)
     mode = MIN_MODE_VECTOR_ACCUM;
   else if (SCALAR_UACCUM_MODE_P (inner_mode))
     mode = MIN_MODE_VECTOR_UACCUM;
+  else if (COMPLEX_INT_MODE_P (inner_mode))
+    mode = MIN_MODE_VECTOR_COMPLEX_INT;
+  else if (COMPLEX_FLOAT_MODE_P (inner_mode))
+    mode = MIN_MODE_VECTOR_COMPLEX_FLOAT;
   else if (inner_mode == BImode)
     mode = MIN_MODE_VECTOR_BOOL;
   else
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cd7c1090d88..507cfd04897 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -12732,18 +12732,31 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
 				     tree scalar_type, poly_uint64 nunits)
 {
   tree orig_scalar_type = scalar_type;
-  scalar_mode inner_mode;
+  scalar_mode scal_mode;
+  complex_mode cplx_mode;
+  machine_mode inner_mode;
   machine_mode simd_mode;
   tree vectype;
+  bool cplx = false;
 
-  if ((!INTEGRAL_TYPE_P (scalar_type)
+  if (!INTEGRAL_TYPE_P (scalar_type)
        && !POINTER_TYPE_P (scalar_type)
-       && !SCALAR_FLOAT_TYPE_P (scalar_type))
-      || (!is_int_mode (TYPE_MODE (scalar_type), &inner_mode)
-	  && !is_float_mode (TYPE_MODE (scalar_type), &inner_mode)))
+       && !SCALAR_FLOAT_TYPE_P (scalar_type)
+       && !COMPLEX_INTEGER_TYPE_P (scalar_type)
+       && !COMPLEX_FLOAT_TYPE_P (scalar_type))
     return NULL_TREE;
 
-  unsigned int nbytes = GET_MODE_SIZE (inner_mode);
+  if (is_complex_int_mode (TYPE_MODE (scalar_type), &cplx_mode)
+      || is_complex_float_mode (TYPE_MODE (scalar_type), &cplx_mode))
+    cplx = true;
+
+  if (!cplx && !is_int_mode (TYPE_MODE (scalar_type), &scal_mode)
+      && !is_float_mode (TYPE_MODE (scalar_type), &scal_mode))
+    return NULL_TREE;
+
+  unsigned int nbytes =
+    (cplx) ? GET_MODE_SIZE (cplx_mode) : GET_MODE_SIZE (scal_mode);
+  inner_mode = (cplx) ? machine_mode (cplx_mode) : machine_mode (scal_mode);
 
   /* Interoperability between modes requires one to be a constant multiple
      of the other, so that the number of vectors required for each operation
@@ -12761,19 +12774,20 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
      they support the proper result truncation/extension.
      We also make sure to build vector types with INTEGER_TYPE
      component type only.  */
-  if (INTEGRAL_TYPE_P (scalar_type)
-      && (GET_MODE_BITSIZE (inner_mode) != TYPE_PRECISION (scalar_type)
+  if (!cplx && INTEGRAL_TYPE_P (scalar_type)
+      && (GET_MODE_BITSIZE (scal_mode) != TYPE_PRECISION (scalar_type)
 	  || TREE_CODE (scalar_type) != INTEGER_TYPE))
-    scalar_type = build_nonstandard_integer_type (GET_MODE_BITSIZE (inner_mode),
-						  TYPE_UNSIGNED (scalar_type));
+    scalar_type =
+      build_nonstandard_integer_type (GET_MODE_BITSIZE (scal_mode),
+				      TYPE_UNSIGNED (scalar_type));
 
   /* We shouldn't end up building VECTOR_TYPEs of non-scalar components.
      When the component mode passes the above test simply use a type
      corresponding to that mode.  The theory is that any use that
      would cause problems with this will disable vectorization anyway.  */
-  else if (!SCALAR_FLOAT_TYPE_P (scalar_type)
+  else if (!cplx && !SCALAR_FLOAT_TYPE_P (scalar_type)
 	   && !INTEGRAL_TYPE_P (scalar_type))
-    scalar_type = lang_hooks.types.type_for_mode (inner_mode, 1);
+    scalar_type = lang_hooks.types.type_for_mode (scal_mode, 1);
 
   /* We can't build a vector type of elements with alignment bigger than
      their size.  */
@@ -12791,7 +12805,10 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
   if (prevailing_mode == VOIDmode)
     {
       gcc_assert (known_eq (nunits, 0U));
-      simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
+
+      simd_mode = (cplx)
+	? targetm.vectorize.preferred_simd_mode_complex (cplx_mode)
+	: targetm.vectorize.preferred_simd_mode (scal_mode);
       if (SCALAR_INT_MODE_P (simd_mode))
 	{
 	  /* Traditional behavior is not to take the integer mode
@@ -12802,13 +12819,18 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
 	     Note that nunits == 1 is allowed in order to support single
 	     element vector types.  */
 	  if (!multiple_p (GET_MODE_SIZE (simd_mode), nbytes, &nunits)
-	      || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
+	      || !((cplx)
+		   ? mode_for_vector (cplx_mode, nunits).exists (&simd_mode)
+		   : mode_for_vector (scal_mode, nunits).exists (&simd_mode)))
 	    return NULL_TREE;
 	}
     }
   else if (SCALAR_INT_MODE_P (prevailing_mode)
-	   || !related_vector_mode (prevailing_mode,
-				    inner_mode, nunits).exists (&simd_mode))
+	   || !((cplx) ? related_vector_mode (prevailing_mode,
+					      cplx_mode,
+					      nunits).exists (&simd_mode) :
+		related_vector_mode (prevailing_mode, scal_mode,
+				     nunits).exists (&simd_mode)))
     {
       /* Fall back to using mode_for_vector, mostly in the hope of being
 	 able to use an integer mode.  */
@@ -12816,7 +12838,8 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
 	  && !multiple_p (GET_MODE_SIZE (prevailing_mode), nbytes, &nunits))
 	return NULL_TREE;
 
-      if (!mode_for_vector (inner_mode, nunits).exists (&simd_mode))
+      if (!((cplx) ? mode_for_vector (cplx_mode, nunits).exists (&simd_mode)
+	    : mode_for_vector (scal_mode, nunits).exists (&simd_mode)))
 	return NULL_TREE;
     }
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 73b72f80d25..8dcad519164 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -10170,6 +10170,8 @@ build_vector_type_for_mode (tree innertype, machine_mode mode)
     case MODE_VECTOR_UFRACT:
     case MODE_VECTOR_ACCUM:
     case MODE_VECTOR_UACCUM:
+    case MODE_VECTOR_COMPLEX_INT:
+    case MODE_VECTOR_COMPLEX_FLOAT:
       nunits = GET_MODE_NUNITS (mode);
       break;
 
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 08/11] Native complex ops: Add explicit vector of complex
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
                       ` (6 preceding siblings ...)
  2023-09-12 10:07     ` [PATCH v2 07/11] Native complex ops: Vectorization of native complex operations Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  2023-09-12 17:25       ` Joseph Myers
  2023-09-12 10:07     ` [PATCH v2 09/11] Native complex ops: remove useless special cases Sylvain Noiry
                       ` (2 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Allow the creation and usage of builtins vectors of complex
in C, using __attribute__ ((vector_size ()))

gcc/c-family/ChangeLog:

        * c-attribs.cc (vector_mode_valid_p): Add cases for
        vectors of complex
        (handle_mode_attribute): Likewise
        (type_valid_for_vector_size): Likewise
        * c-common.cc (c_common_type_for_mode): Likewise
        (vector_types_compatible_elements_p): Likewise

gcc/ChangeLog:

        * fold-const.cc (fold_binary_loc): Likewise

gcc/c/ChangeLog:

        * c-typeck.cc (build_unary_op): Likewise
---
 gcc/c-family/c-attribs.cc | 12 ++++++++++--
 gcc/c-family/c-common.cc  | 21 +++++++++++++++++++--
 gcc/c/c-typeck.cc         |  8 ++++++--
 gcc/fold-const.cc         |  1 +
 4 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index e0c4259c905..b3ca5219730 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -2019,6 +2019,8 @@ vector_mode_valid_p (machine_mode mode)
   /* Doh!  What's going on?  */
   if (mclass != MODE_VECTOR_INT
       && mclass != MODE_VECTOR_FLOAT
+      && mclass != MODE_VECTOR_COMPLEX_INT
+      && mclass != MODE_VECTOR_COMPLEX_FLOAT
       && mclass != MODE_VECTOR_FRACT
       && mclass != MODE_VECTOR_UFRACT
       && mclass != MODE_VECTOR_ACCUM
@@ -2125,6 +2127,8 @@ handle_mode_attribute (tree *node, tree name, tree args,
 
 	case MODE_VECTOR_INT:
 	case MODE_VECTOR_FLOAT:
+	case MODE_VECTOR_COMPLEX_INT:
+	case MODE_VECTOR_COMPLEX_FLOAT:
 	case MODE_VECTOR_FRACT:
 	case MODE_VECTOR_UFRACT:
 	case MODE_VECTOR_ACCUM:
@@ -4361,9 +4365,13 @@ type_valid_for_vector_size (tree type, tree atname, tree args,
 
   if ((!INTEGRAL_TYPE_P (type)
        && !SCALAR_FLOAT_TYPE_P (type)
+       && !COMPLEX_INTEGER_TYPE_P (type)
+       && !COMPLEX_FLOAT_TYPE_P (type)
        && !FIXED_POINT_TYPE_P (type))
-      || (!SCALAR_FLOAT_MODE_P (orig_mode)
-	  && GET_MODE_CLASS (orig_mode) != MODE_INT
+      || ((!SCALAR_FLOAT_MODE_P (orig_mode)
+	   && GET_MODE_CLASS (orig_mode) != MODE_INT)
+	  && (!COMPLEX_FLOAT_MODE_P (orig_mode)
+	      && GET_MODE_CLASS (orig_mode) != MODE_COMPLEX_INT)
 	  && !ALL_SCALAR_FIXED_POINT_MODE_P (orig_mode))
       || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type))
       || TREE_CODE (type) == BOOLEAN_TYPE
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 73e739c503d..f236fae94d4 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -2441,7 +2441,23 @@ c_common_type_for_mode (machine_mode mode, int unsignedp)
 	      : make_signed_type (precision));
     }
 
-  if (COMPLEX_MODE_P (mode))
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL
+      && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
+    {
+      unsigned int elem_bits = vector_element_size (GET_MODE_BITSIZE (mode),
+						    GET_MODE_NUNITS (mode));
+      tree bool_type = build_nonstandard_boolean_type (elem_bits);
+      return build_vector_type_for_mode (bool_type, mode);
+    }
+  else if (VECTOR_MODE_P (mode)
+	   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
+    {
+      machine_mode inner_mode = GET_MODE_INNER (mode);
+      tree inner_type = c_common_type_for_mode (inner_mode, unsignedp);
+      if (inner_type != NULL_TREE)
+	return build_vector_type_for_mode (inner_type, mode);
+    }
+  else if (COMPLEX_MODE_P (mode))
     {
       machine_mode inner_mode;
       tree inner_type;
@@ -8360,10 +8376,11 @@ vector_types_compatible_elements_p (tree t1, tree t2)
 
   gcc_assert ((INTEGRAL_TYPE_P (t1)
 	       || c1 == REAL_TYPE
+	       || c1 == COMPLEX_TYPE
 	       || c1 == FIXED_POINT_TYPE)
 	      && (INTEGRAL_TYPE_P (t2)
 		  || c2 == REAL_TYPE
-		  || c2 == FIXED_POINT_TYPE));
+		  || c2 == COMPLEX_TYPE || c2 == FIXED_POINT_TYPE));
 
   t1 = c_common_signed_type (t1);
   t2 = c_common_signed_type (t2);
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index e55e887da14..25e7f68b5ab 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -4576,7 +4576,9 @@ build_unary_op (location_t location, enum tree_code code, tree xarg,
       if (typecode == INTEGER_TYPE
 	  || typecode == BITINT_TYPE
 	  || (gnu_vector_type_p (TREE_TYPE (arg))
-	      && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg))))
+	      && !VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg))
+	      && !COMPLEX_INTEGER_TYPE_P (TREE_TYPE (TREE_TYPE (arg)))
+	      && !COMPLEX_FLOAT_TYPE_P (TREE_TYPE (TREE_TYPE (arg)))))
 	{
 	  tree e = arg;
 
@@ -4599,7 +4601,9 @@ build_unary_op (location_t location, enum tree_code code, tree xarg,
 	  if (!noconvert)
 	    arg = default_conversion (arg);
 	}
-      else if (typecode == COMPLEX_TYPE)
+      else if (typecode == COMPLEX_TYPE
+	       || COMPLEX_INTEGER_TYPE_P (TREE_TYPE (TREE_TYPE (arg)))
+	       || COMPLEX_FLOAT_TYPE_P (TREE_TYPE (TREE_TYPE (arg))))
 	{
 	  code = CONJ_EXPR;
 	  pedwarn (location, OPT_Wpedantic,
diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
index dc05599c7fe..5c7b58136eb 100644
--- a/gcc/fold-const.cc
+++ b/gcc/fold-const.cc
@@ -11365,6 +11365,7 @@ fold_binary_loc (location_t loc, enum tree_code code, tree type,
 	     to __complex__ ( x, y ).  This is not the same for SNaNs or
 	     if signed zeros are involved.  */
 	  if (!HONOR_SNANS (arg0)
+	      && !(VECTOR_TYPE_P (TREE_TYPE (arg0)))
 	      && !HONOR_SIGNED_ZEROS (arg0)
 	      && COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg0)))
 	    {
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 09/11] Native complex ops: remove useless special cases
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
                       ` (7 preceding siblings ...)
  2023-09-12 10:07     ` [PATCH v2 08/11] Native complex ops: Add explicit vector of complex Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 10/11] Native complex ops: Add a fast complex multiplication pattern Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 11/11] Native complex ops: Experimental support in x86 backend Sylvain Noiry
  10 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Remove two special cases which are now useless with the new complex
handling.

gcc/ChangeLog:

    * tree-ssa-forwprop.cc (pass_forwprop::execute): Remove
      two special cases
---
 gcc/tree-ssa-forwprop.cc | 133 +--------------------------------------
 1 file changed, 3 insertions(+), 130 deletions(-)

diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 30e99f812f1..0c968f6ca32 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -3670,61 +3670,8 @@ pass_forwprop::execute (function *fun)
 		       != TARGET_MEM_REF)
 		   && !stmt_can_throw_internal (fun, stmt))
 	    {
-	      /* Rewrite loads used only in real/imagpart extractions to
-	         component-wise loads.  */
-	      use_operand_p use_p;
-	      imm_use_iterator iter;
-	      bool rewrite = true;
-	      FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
-		{
-		  gimple *use_stmt = USE_STMT (use_p);
-		  if (is_gimple_debug (use_stmt))
-		    continue;
-		  if (!is_gimple_assign (use_stmt)
-		      || (gimple_assign_rhs_code (use_stmt) != REALPART_EXPR
-			  && gimple_assign_rhs_code (use_stmt) != IMAGPART_EXPR)
-		      || TREE_OPERAND (gimple_assign_rhs1 (use_stmt), 0) != lhs)
-		    {
-		      rewrite = false;
-		      break;
-		    }
-		}
-	      if (rewrite)
-		{
-		  gimple *use_stmt;
-		  FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
-		    {
-		      if (is_gimple_debug (use_stmt))
-			{
-			  if (gimple_debug_bind_p (use_stmt))
-			    {
-			      gimple_debug_bind_reset_value (use_stmt);
-			      update_stmt (use_stmt);
-			    }
-			  continue;
-			}
-
-		      tree new_rhs = build1 (gimple_assign_rhs_code (use_stmt),
-					     TREE_TYPE (TREE_TYPE (rhs)),
-					     unshare_expr (rhs));
-		      gimple *new_stmt
-			= gimple_build_assign (gimple_assign_lhs (use_stmt),
-					       new_rhs);
-
-		      location_t loc = gimple_location (use_stmt);
-		      gimple_set_location (new_stmt, loc);
-		      gimple_stmt_iterator gsi2 = gsi_for_stmt (use_stmt);
-		      unlink_stmt_vdef (use_stmt);
-		      gsi_remove (&gsi2, true);
-
-		      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
-		    }
-
-		  release_defs (stmt);
-		  gsi_remove (&gsi, true);
-		}
-	      else
-		gsi_next (&gsi);
+	      /* Special case removed due to better complex processing.  */
+	      gsi_next (&gsi);
 	    }
 	  else if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE
 		   && (TYPE_MODE (TREE_TYPE (lhs)) == BLKmode
@@ -3739,81 +3686,7 @@ pass_forwprop::execute (function *fun)
 	    optimize_vector_load (&gsi);
 
 	  else if (code == COMPLEX_EXPR)
-	    {
-	      /* Rewrite stores of a single-use complex build expression
-	         to component-wise stores.  */
-	      use_operand_p use_p;
-	      gimple *use_stmt, *def1, *def2;
-	      tree rhs2;
-	      if (single_imm_use (lhs, &use_p, &use_stmt)
-		  && gimple_store_p (use_stmt)
-		  && !gimple_has_volatile_ops (use_stmt)
-		  && is_gimple_assign (use_stmt)
-		  && (TREE_CODE (gimple_assign_lhs (use_stmt))
-		      != TARGET_MEM_REF))
-		{
-		  tree use_lhs = gimple_assign_lhs (use_stmt);
-		  if (auto_var_p (use_lhs))
-		    DECL_NOT_GIMPLE_REG_P (use_lhs) = 1;
-		  tree new_lhs = build1 (REALPART_EXPR,
-					 TREE_TYPE (TREE_TYPE (use_lhs)),
-					 unshare_expr (use_lhs));
-		  gimple *new_stmt = gimple_build_assign (new_lhs, rhs);
-		  location_t loc = gimple_location (use_stmt);
-		  gimple_set_location (new_stmt, loc);
-		  gimple_set_vuse (new_stmt, gimple_vuse (use_stmt));
-		  gimple_set_vdef (new_stmt, make_ssa_name (gimple_vop (fun)));
-		  SSA_NAME_DEF_STMT (gimple_vdef (new_stmt)) = new_stmt;
-		  gimple_set_vuse (use_stmt, gimple_vdef (new_stmt));
-		  gimple_stmt_iterator gsi2 = gsi_for_stmt (use_stmt);
-		  gsi_insert_before (&gsi2, new_stmt, GSI_SAME_STMT);
-
-		  new_lhs = build1 (IMAGPART_EXPR,
-				    TREE_TYPE (TREE_TYPE (use_lhs)),
-				    unshare_expr (use_lhs));
-		  gimple_assign_set_lhs (use_stmt, new_lhs);
-		  gimple_assign_set_rhs1 (use_stmt, gimple_assign_rhs2 (stmt));
-		  update_stmt (use_stmt);
-
-		  release_defs (stmt);
-		  gsi_remove (&gsi, true);
-		}
-	      /* Rewrite a component-wise load of a complex to a complex
-		 load if the components are not used separately.  */
-	      else if (TREE_CODE (rhs) == SSA_NAME
-		       && has_single_use (rhs)
-		       && ((rhs2 = gimple_assign_rhs2 (stmt)), true)
-		       && TREE_CODE (rhs2) == SSA_NAME
-		       && has_single_use (rhs2)
-		       && (def1 = SSA_NAME_DEF_STMT (rhs),
-			   gimple_assign_load_p (def1))
-		       && (def2 = SSA_NAME_DEF_STMT (rhs2),
-			   gimple_assign_load_p (def2))
-		       && (gimple_vuse (def1) == gimple_vuse (def2))
-		       && !gimple_has_volatile_ops (def1)
-		       && !gimple_has_volatile_ops (def2)
-		       && !stmt_can_throw_internal (fun, def1)
-		       && !stmt_can_throw_internal (fun, def2)
-		       && gimple_assign_rhs_code (def1) == REALPART_EXPR
-		       && gimple_assign_rhs_code (def2) == IMAGPART_EXPR
-		       && operand_equal_p (TREE_OPERAND (gimple_assign_rhs1
-								 (def1), 0),
-					   TREE_OPERAND (gimple_assign_rhs1
-								 (def2), 0)))
-		{
-		  tree cl = TREE_OPERAND (gimple_assign_rhs1 (def1), 0);
-		  gimple_assign_set_rhs_from_tree (&gsi, unshare_expr (cl));
-		  gcc_assert (gsi_stmt (gsi) == stmt);
-		  gimple_set_vuse (stmt, gimple_vuse (def1));
-		  gimple_set_modified (stmt, true);
-		  gimple_stmt_iterator gsi2 = gsi_for_stmt (def1);
-		  gsi_remove (&gsi, false);
-		  gsi_insert_after (&gsi2, stmt, GSI_SAME_STMT);
-		}
-	      else
-		gsi_next (&gsi);
-	      gsi_next (&gsi);
-	    }
+	    gsi_next (&gsi);
 	  else if (code == CONSTRUCTOR
 		   && VECTOR_TYPE_P (TREE_TYPE (rhs))
 		   && TYPE_MODE (TREE_TYPE (rhs)) == BLKmode
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 10/11] Native complex ops: Add a fast complex multiplication pattern
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
                       ` (8 preceding siblings ...)
  2023-09-12 10:07     ` [PATCH v2 09/11] Native complex ops: remove useless special cases Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  2023-09-12 10:07     ` [PATCH v2 11/11] Native complex ops: Experimental support in x86 backend Sylvain Noiry
  10 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Add a new fast_mult_optab to define a pattern corresponding to
the fast path of a IEEE compliant multiplication. Indeed, the backend
programmer can change the fast path without having to handle manually
the IEEE checks.

gcc/ChangeLog:

        * internal-fn.def: Add a FAST_MULT internal fn
        * optabs.def: Add fast_mult_optab
        * tree-complex.cc (expand_complex_multiplication_components):
        Adapt complex multiplication expand to generate
        FAST_MULT internal fn
        (expand_complex_multiplication): Likewise
        (expand_complex_operations_1): Likewise
---
 gcc/internal-fn.def |  1 +
 gcc/optabs.def      |  1 +
 gcc/tree-complex.cc | 70 +++++++++++++++++++++++++++++----------------
 3 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 0ac6cd98a4f..f1046996a48 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -396,6 +396,7 @@ DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT90, ECF_CONST, cadd90, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_ADD_ROT270, ECF_CONST, cadd270, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL, ECF_CONST, cmul, binary)
 DEF_INTERNAL_OPTAB_FN (COMPLEX_MUL_CONJ, ECF_CONST, cmul_conj, binary)
+DEF_INTERNAL_OPTAB_FN (FAST_MULT, ECF_CONST, fast_mul, binary)
 DEF_INTERNAL_OPTAB_FN (VEC_ADDSUB, ECF_CONST, vec_addsub, binary)
 DEF_INTERNAL_WIDENING_OPTAB_FN (VEC_WIDEN_PLUS,
 				ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index d146cac5eec..a90b6ee6440 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -344,6 +344,7 @@ OPTAB_D (cmla_optab, "cmla$a4")
 OPTAB_D (cmla_conj_optab, "cmla_conj$a4")
 OPTAB_D (cmls_optab, "cmls$a4")
 OPTAB_D (cmls_conj_optab, "cmls_conj$a4")
+OPTAB_D (fast_mul_optab, "fast_mul$a3")
 OPTAB_D (cos_optab, "cos$a2")
 OPTAB_D (cosh_optab, "cosh$a2")
 OPTAB_D (exp10_optab, "exp10$a2")
diff --git a/gcc/tree-complex.cc b/gcc/tree-complex.cc
index d814e407af6..16759f1f3ba 100644
--- a/gcc/tree-complex.cc
+++ b/gcc/tree-complex.cc
@@ -1138,25 +1138,36 @@ expand_complex_libcall (gimple_stmt_iterator *gsi, tree type, tree ar, tree ai,
 
 static void
 expand_complex_multiplication_components (gimple_seq *stmts, location_t loc,
-					  tree type, tree ar, tree ai,
-					  tree br, tree bi,
-					  tree *rr, tree *ri)
+					  tree type, tree ac, tree ar,
+					  tree ai, tree bc, tree br, tree bi,
+					  tree *rr, tree *ri,
+					  bool fast_mult)
 {
-  tree t1, t2, t3, t4;
+  tree inner_type = TREE_TYPE (type);
+  if (!fast_mult)
+    {
+      tree t1, t2, t3, t4;
 
-  t1 = gimple_build (stmts, loc, MULT_EXPR, type, ar, br);
-  t2 = gimple_build (stmts, loc, MULT_EXPR, type, ai, bi);
-  t3 = gimple_build (stmts, loc, MULT_EXPR, type, ar, bi);
+      t1 = gimple_build (stmts, loc, MULT_EXPR, inner_type, ar, br);
+      t2 = gimple_build (stmts, loc, MULT_EXPR, inner_type, ai, bi);
+      t3 = gimple_build (stmts, loc, MULT_EXPR, inner_type, ar, bi);
 
-  /* Avoid expanding redundant multiplication for the common
-     case of squaring a complex number.  */
-  if (ar == br && ai == bi)
-    t4 = t3;
-  else
-    t4 = gimple_build (stmts, loc, MULT_EXPR, type, ai, br);
+      /* Avoid expanding redundant multiplication for the common
+	 case of squaring a complex number.  */
+      if (ar == br && ai == bi)
+	t4 = t3;
+      else
+	t4 = gimple_build (stmts, loc, MULT_EXPR, inner_type, ai, br);
 
-  *rr = gimple_build (stmts, loc, MINUS_EXPR, type, t1, t2);
-  *ri = gimple_build (stmts, loc, PLUS_EXPR, type, t3, t4);
+      *rr = gimple_build (stmts, loc, MINUS_EXPR, inner_type, t1, t2);
+      *ri = gimple_build (stmts, loc, PLUS_EXPR, inner_type, t3, t4);
+    }
+  else
+    {
+      tree rc = gimple_build (stmts, loc, CFN_FAST_MULT, type, ac, bc);
+      *rr = gimple_build (stmts, loc, REALPART_EXPR, inner_type, rc);
+      *ri = gimple_build (stmts, loc, IMAGPART_EXPR, inner_type, rc);
+    }
 }
 
 /* Expand complex multiplication to scalars:
@@ -1165,13 +1176,18 @@ expand_complex_multiplication_components (gimple_seq *stmts, location_t loc,
 
 static void
 expand_complex_multiplication (gimple_stmt_iterator *gsi, tree type,
-			       tree ar, tree ai, tree br, tree bi,
+			       tree ac, tree ar, tree ai,
+			       tree bc, tree br, tree bi,
 			       complex_lattice_t al, complex_lattice_t bl)
 {
   tree rr, ri;
   tree inner_type = TREE_TYPE (type);
   location_t loc = gimple_location (gsi_stmt (*gsi));
   gimple_seq stmts = NULL;
+  bool fast_mult = direct_internal_fn_supported_p (IFN_FAST_MULT, type,
+						   bb_optimization_type
+						   (gimple_bb
+						    (gsi_stmt (*gsi))));
 
   if (al < bl)
     {
@@ -1232,9 +1248,10 @@ expand_complex_multiplication (gimple_stmt_iterator *gsi, tree type,
 	    {
 	      /* If we are not worrying about NaNs expand to
 		 (ar*br - ai*bi) + i(ar*bi + br*ai) directly.  */
-	      expand_complex_multiplication_components (&stmts, loc, inner_type,
-							ar, ai, br, bi,
-							&rr, &ri);
+	      expand_complex_multiplication_components (&stmts, loc, type,
+							ac, ar, ai, bc, br,
+							bi, &rr, &ri,
+							fast_mult);
 	      break;
 	    }
 
@@ -1245,8 +1262,9 @@ expand_complex_multiplication (gimple_stmt_iterator *gsi, tree type,
 
 	  tree tmpr, tmpi;
 	  expand_complex_multiplication_components (&stmts, loc,
-						    inner_type, ar, ai,
-						    br, bi, &tmpr, &tmpi);
+						    type, ac, ar, ai,
+						    bc, br, bi, &tmpr, &tmpi,
+						    fast_mult);
 	  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
 	  stmts = NULL;
 
@@ -1297,10 +1315,11 @@ expand_complex_multiplication (gimple_stmt_iterator *gsi, tree type,
 	}
       else
 	/* If we are not worrying about NaNs expand to
-	  (ar*br - ai*bi) + i(ar*bi + br*ai) directly.  */
+	   (ar*br - ai*bi) + i(ar*bi + br*ai) directly.  */
 	expand_complex_multiplication_components (&stmts, loc,
-						  inner_type, ar, ai,
-						  br, bi, &rr, &ri);
+						  type, ac, ar, ai,
+						  bc, br, bi, &rr, &ri,
+						  fast_mult);
       break;
 
     default:
@@ -2096,7 +2115,8 @@ expand_complex_operations_1 (gimple_stmt_iterator *gsi)
       break;
 
     case MULT_EXPR:
-      expand_complex_multiplication (gsi, type, ar, ai, br, bi, al, bl);
+      expand_complex_multiplication (gsi, type, ac, ar, ai, bc, br, bi, al,
+				     bl);
       break;
 
     case TRUNC_DIV_EXPR:
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 11/11] Native complex ops: Experimental support in x86 backend
  2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
                       ` (9 preceding siblings ...)
  2023-09-12 10:07     ` [PATCH v2 10/11] Native complex ops: Add a fast complex multiplication pattern Sylvain Noiry
@ 2023-09-12 10:07     ` Sylvain Noiry
  10 siblings, 0 replies; 24+ messages in thread
From: Sylvain Noiry @ 2023-09-12 10:07 UTC (permalink / raw)
  To: gcc-patches; +Cc: Sylvain Noiry

Summary:
Add an experimental support for native complex operation handling in
the x86 backend. For now it only support add, sub, mul, conj, neg, mov
in SCmode (complex float). Performance gains are still marginal on this
target because there are no particular instructions to speedup complex
operation, except some SIMD tricks.

gcc/ChangeLog:

	* config/i386/i386.cc (classify_argument): Align complex
	element to the whole size, not size of the parts
	(ix86_return_in_memory): Handle complex modes like a scalar
	with the same size
	(ix86_class_max_nregs): Likewise
	(ix86_hard_regno_nregs): Likewise
	(function_value_ms_64): Add case for SCmode
	(ix86_build_const_vector): Likewise
	(ix86_build_signbit_mask): Likewise
	(x86_gen_rtx_complex): New: Implement the gen_rtx_complex
	hook, use registers of complex modes to represent complex
	elements in rtl
	(x86_read_complex_part): New: Implement the read_complex_part
	hook, handle registers of complex modes
	(x86_write_complex_part): New: Implement the write_complex_part
	hook, handle registers of complex modes
	* config/i386/i386.h: Add SCmode in several predicates
	* config/i386/sse.md: Add pattern for some complex operations in
	SCmode. This includes movsc, addsc3, subsc3, negsc2, mulsc3,
	and conjsc2
---
 gcc/config/i386/i386.cc | 296 +++++++++++++++++++++++++++++++++++++++-
 gcc/config/i386/i386.h  |  11 +-
 gcc/config/i386/sse.md  | 144 +++++++++++++++++++
 3 files changed, 440 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 477e6cecc38..77bf80b64b1 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -2348,8 +2348,8 @@ classify_argument (machine_mode mode, const_tree type,
 	mode_alignment = 128;
       else if (mode == XCmode)
 	mode_alignment = 256;
-      if (COMPLEX_MODE_P (mode))
-	mode_alignment /= 2;
+      /*if (COMPLEX_MODE_P (mode))
+	mode_alignment /= 2;*/
       /* Misaligned fields are always returned in memory.  */
       if (bit_offset % mode_alignment)
 	return 0;
@@ -3023,6 +3023,7 @@ pass_in_reg:
     case E_V4BFmode:
     case E_V2SImode:
     case E_V2SFmode:
+    case E_SCmode:
     case E_V1TImode:
     case E_V1DImode:
       if (!type || !AGGREGATE_TYPE_P (type))
@@ -3273,6 +3274,7 @@ pass_in_reg:
     case E_V4BFmode:
     case E_V2SImode:
     case E_V2SFmode:
+    case E_SCmode:
     case E_V1TImode:
     case E_V1DImode:
       if (!type || !AGGREGATE_TYPE_P (type))
@@ -4187,8 +4189,8 @@ function_value_ms_64 (machine_mode orig_mode, machine_mode mode,
 	      && !INTEGRAL_TYPE_P (valtype)
 	      && !VECTOR_FLOAT_TYPE_P (valtype))
 	    break;
-	  if ((SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode))
-	      && !COMPLEX_MODE_P (mode))
+	  if ((SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode)))
+	     // && !COMPLEX_MODE_P (mode))
 	    regno = FIRST_SSE_REG;
 	  break;
 	case 8:
@@ -4295,7 +4297,7 @@ ix86_return_in_memory (const_tree type, const_tree fntype ATTRIBUTE_UNUSED)
 	       || INTEGRAL_TYPE_P (type)
 	       || VECTOR_FLOAT_TYPE_P (type))
 	      && (SCALAR_INT_MODE_P (mode) || VECTOR_MODE_P (mode))
-	      && !COMPLEX_MODE_P (mode)
+	      //&& !COMPLEX_MODE_P (mode)
 	      && (GET_MODE_SIZE (mode) == 16 || size == 16))
 	    return false;
 
@@ -15752,6 +15754,7 @@ ix86_build_const_vector (machine_mode mode, bool vect, rtx value)
     case E_V8SFmode:
     case E_V4SFmode:
     case E_V2SFmode:
+    case E_SCmode:
     case E_V8DFmode:
     case E_V4DFmode:
     case E_V2DFmode:
@@ -15800,6 +15803,7 @@ ix86_build_signbit_mask (machine_mode mode, bool vect, bool invert)
     case E_V8SFmode:
     case E_V4SFmode:
     case E_V2SFmode:
+    case E_SCmode:
     case E_V2SImode:
       vec_mode = mode;
       imode = SImode;
@@ -19894,7 +19898,8 @@ ix86_class_max_nregs (reg_class_t rclass, machine_mode mode)
   else
     {
       if (COMPLEX_MODE_P (mode))
-	return 2;
+	return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD);
+	//return 2;
       else
 	return 1;
     }
@@ -20230,7 +20235,8 @@ ix86_hard_regno_nregs (unsigned int regno, machine_mode mode)
       return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD);
     }
   if (COMPLEX_MODE_P (mode))
-    return 2;
+    return 1;
+    //return 2;
   /* Register pair for mask registers.  */
   if (mode == P2QImode || mode == P2HImode)
     return 2;
@@ -23757,6 +23763,273 @@ ix86_preferred_simd_mode (scalar_mode mode)
     }
 }
 
+static rtx
+x86_gen_rtx_complex (machine_mode mode, rtx real_part, rtx imag_part)
+{
+  machine_mode imode = GET_MODE_INNER (mode);
+
+  if ((real_part == imag_part) && (real_part == CONST0_RTX (imode)))
+    {
+      if (CONST_DOUBLE_P (real_part))
+       return const_double_from_real_value (dconst0, mode);
+      else if (CONST_INT_P (real_part))
+       return GEN_INT (0);
+      else
+       gcc_unreachable ();
+    }
+
+  bool saved_generating_concat_p = generating_concat_p;
+  generating_concat_p = false;
+  rtx complex_reg = gen_reg_rtx (mode);
+  generating_concat_p = saved_generating_concat_p;
+
+  if (real_part)
+    {
+      gcc_assert (imode == GET_MODE (real_part));
+      write_complex_part (complex_reg, real_part, REAL_P, false);
+    }
+
+  if (imag_part)
+    {
+      gcc_assert (imode == GET_MODE (imag_part));
+      write_complex_part (complex_reg, imag_part, IMAG_P, false);
+    }
+
+  return complex_reg;
+}
+
+static rtx
+x86_read_complex_part (rtx cplx, complex_part_t part)
+{
+  machine_mode cmode;
+  scalar_mode imode;
+  unsigned ibitsize;
+
+  if (GET_CODE (cplx) == CONCAT)
+    return XEXP (cplx, part);
+
+  cmode = GET_MODE (cplx);
+  imode = GET_MODE_INNER (cmode);
+  ibitsize = GET_MODE_BITSIZE (imode);
+
+  if (COMPLEX_MODE_P (cmode) && (part == BOTH_P))
+    return cplx;
+
+  /* For constants under 32-bit vector constans are folded during expand,
+   * so we need to compensate for it as cplx is an integer constant
+   * In this case cmode and imode are equal */
+  if (cmode == imode)
+    ibitsize /= 2;
+
+  if (cmode == E_VOIDmode)
+    return cplx;               /* FIXME case used when initialising mock in a complex register */
+
+  if ((cmode == E_DCmode) && (GET_CODE (cplx) == CONST_DOUBLE))        /* FIXME stop generation of DC const_double, because not patterns and wired */
+    return CONST0_RTX (E_DFmode);
+  /* verify aswell SC const_double */
+
+  /* Special case reads from complex constants that got spilled to memory.  */
+  if (MEM_P (cplx) && GET_CODE (XEXP (cplx, 0)) == SYMBOL_REF)
+    {
+      tree decl = SYMBOL_REF_DECL (XEXP (cplx, 0));
+      if (decl && TREE_CODE (decl) == COMPLEX_CST)
+	{
+	  tree cplx_part = (part == IMAG_P) ? TREE_IMAGPART (decl)
+			  : (part == REAL_P) ? TREE_REALPART (decl)
+			  : TREE_COMPLEX_BOTH_PARTS (decl);
+	if (CONSTANT_CLASS_P (cplx_part))
+	  return expand_expr (cplx_part, NULL_RTX, imode, EXPAND_NORMAL);
+	}
+    }
+
+  /* For MEMs simplify_gen_subreg may generate an invalid new address
+     because, e.g., the original address is considered mode-dependent
+     by the target, which restricts simplify_subreg from invoking
+     adjust_address_nv.  Instead of preparing fallback support for an
+     invalid address, we call adjust_address_nv directly.  */
+  if (MEM_P (cplx))
+    {
+      if (part == BOTH_P)
+       return adjust_address_nv (cplx, cmode, 0);
+      else
+       return adjust_address_nv (cplx, imode, (part == IMAG_P)
+				 ? GET_MODE_SIZE (imode) : 0);
+    }
+
+  /* If the sub-object is at least word sized, then we know that subregging
+     will work.  This special case is important, since extract_bit_field
+     wants to operate on integer modes, and there's rarely an OImode to
+     correspond to TCmode.  */
+  if (ibitsize >= BITS_PER_WORD
+      /* For hard regs we have exact predicates.  Assume we can split
+	 the original object if it spans an even number of hard regs.
+	 This special case is important for SCmode on 64-bit platforms
+	 where the natural size of floating-point regs is 32-bit.  */
+      || (REG_P (cplx)
+	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
+	  && REG_NREGS (cplx) % 2 == 0))
+    {
+      rtx ret = simplify_gen_subreg (imode, cplx, cmode, (part == IMAG_P)
+				     ? GET_MODE_SIZE (imode) : 0);
+      if (ret)
+       return ret;
+      else
+       /* simplify_gen_subreg may fail for sub-word MEMs.  */
+       gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
+    }
+
+  if (part == BOTH_P)
+    return extract_bit_field (cplx, 2 * ibitsize, 0, true, NULL_RTX, cmode,
+			      cmode, false, NULL);
+  else
+    return extract_bit_field (cplx, ibitsize, (part == IMAG_P) ? ibitsize : 0,
+			      true, NULL_RTX, imode, imode, false, NULL);
+}
+
+static void
+x86_write_complex_part (rtx cplx, rtx val, complex_part_t part, bool undefined_p)
+{
+  machine_mode cmode;
+  scalar_mode imode;
+  unsigned ibitsize;
+
+  cmode = GET_MODE (cplx);
+  imode = GET_MODE_INNER (cmode);
+  ibitsize = GET_MODE_BITSIZE (imode);
+
+  /* special case for constants */
+  if (GET_CODE (val) == CONST_VECTOR)
+    {
+      if (part == BOTH_P)
+	{
+	  machine_mode temp_mode = E_BLKmode;;
+	  switch (cmode)
+	    {
+	    case E_CQImode:
+	      temp_mode = E_HImode;
+	      break;
+	    case E_CHImode:
+	      temp_mode = E_SImode;
+	      break;
+	    case E_CSImode:
+	      temp_mode = E_DImode;
+	      break;
+	    case E_SCmode:
+	      temp_mode = E_DFmode;
+	      break;
+	    case E_CDImode:
+	      temp_mode = E_TImode;
+	      break;
+	    case E_DCmode:
+	    default:
+	      break;
+	    }
+
+	  if (temp_mode != E_BLKmode)
+	    {
+	      rtx temp_reg = gen_reg_rtx (temp_mode);
+	      store_bit_field (temp_reg, GET_MODE_BITSIZE (temp_mode), 0, 0,
+			       0, GET_MODE (val), val, false, undefined_p);
+	      emit_move_insn (cplx,
+			      simplify_gen_subreg (cmode, temp_reg, temp_mode,
+						   0));
+	    }
+	  else
+	    {
+	      /* write real part and imag part separetly */
+	      gcc_assert (GET_CODE (val) == CONST_VECTOR);
+	      write_complex_part (cplx, const_vector_elt (val, 0), REAL_P, false);
+	      write_complex_part (cplx, const_vector_elt (val, 1), IMAG_P, false);
+	    }
+	}
+      else
+	write_complex_part (cplx,
+			    const_vector_elt (val,
+			    ((part == REAL_P) ? 0 : 1)),
+			    part, false);
+      return;
+    }
+
+  if ((part == BOTH_P) && !MEM_P (cplx)
+      /*&& (optab_handler (mov_optab, cmode) != CODE_FOR_nothing)*/)
+    {
+      write_complex_part (cplx, read_complex_part(cplx, REAL_P), REAL_P, undefined_p);
+      write_complex_part (cplx, read_complex_part(cplx, IMAG_P), IMAG_P, undefined_p);
+      //emit_move_insn (cplx, val);
+      return;
+    }
+
+  if ((GET_CODE (val) == CONST_DOUBLE) || (GET_CODE (val) == CONST_INT))
+    {
+      if (part == REAL_P)
+	{
+	  emit_move_insn (gen_lowpart (imode, cplx), val);
+	  return;
+	}
+      else if (part == IMAG_P)
+	{
+	  /* cannot set highpart of a pseudo register */
+	  if (REGNO (cplx) < FIRST_PSEUDO_REGISTER)
+	    {
+	      emit_move_insn (gen_highpart (imode, cplx), val);
+	      return;
+	    }
+	}
+      else
+	gcc_unreachable ();
+    }
+
+  if (GET_CODE (cplx) == CONCAT)
+    {
+      emit_move_insn (XEXP (cplx, part), val);
+      return;
+    }
+
+  /* For MEMs simplify_gen_subreg may generate an invalid new address
+     because, e.g., the original address is considered mode-dependent
+     by the target, which restricts simplify_subreg from invoking
+     adjust_address_nv.  Instead of preparing fallback support for an
+     invalid address, we call adjust_address_nv directly.  */
+  if (MEM_P (cplx))
+    {
+      if (part == BOTH_P)
+       emit_move_insn (adjust_address_nv (cplx, cmode, 0), val);
+      else
+       emit_move_insn (adjust_address_nv (cplx, imode, (part == IMAG_P)
+					  ? GET_MODE_SIZE (imode) : 0), val);
+      return;
+    }
+
+  /* If the sub-object is at least word sized, then we know that subregging
+     will work.  This special case is important, since store_bit_field
+     wants to operate on integer modes, and there's rarely an OImode to
+     correspond to TCmode.  */
+  if (ibitsize >= BITS_PER_WORD
+      /* For hard regs we have exact predicates.  Assume we can split
+	 the original object if it spans an even number of hard regs.
+	 This special case is important for SCmode on 64-bit platforms
+	 where the natural size of floating-point regs is 32-bit.  */
+      || (REG_P (cplx)
+	  && REGNO (cplx) < FIRST_PSEUDO_REGISTER
+	  && REG_NREGS (cplx) % 2 == 0))
+    {
+      rtx cplx_part = simplify_gen_subreg (imode, cplx, cmode,
+					   (part == IMAG_P)
+					   ? GET_MODE_SIZE (imode) : 0);
+      if (cplx_part)
+	{
+	  emit_move_insn (cplx_part, val);
+	  return;
+	}
+      else
+       /* simplify_gen_subreg may fail for sub-word MEMs.  */
+       gcc_assert (MEM_P (cplx) && ibitsize < BITS_PER_WORD);
+    }
+
+  store_bit_field (cplx, ibitsize, (part == IMAG_P) ? ibitsize : 0, 0, 0,
+		   imode, val, false, undefined_p);
+}
+
 /* If AVX is enabled then try vectorizing with both 256bit and 128bit
    vectors.  If AVX512F is enabled then try vectorizing with 512bit,
    256bit and 128bit vectors.  */
@@ -25792,6 +26065,15 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_IFUNC_REF_LOCAL_OK
 #define TARGET_IFUNC_REF_LOCAL_OK ix86_ifunc_ref_local_ok
 
+#undef TARGET_GEN_RTX_COMPLEX
+#define TARGET_GEN_RTX_COMPLEX x86_gen_rtx_complex
+
+#undef TARGET_READ_COMPLEX_PART
+#define TARGET_READ_COMPLEX_PART x86_read_complex_part
+
+#undef TARGET_WRITE_COMPLEX_PART
+#define TARGET_WRITE_COMPLEX_PART x86_write_complex_part
+
 #if !TARGET_MACHO && !TARGET_DLLIMPORT_DECL_ATTRIBUTES
 # undef TARGET_ASM_RELOC_RW_MASK
 # define TARGET_ASM_RELOC_RW_MASK ix86_reloc_rw_mask
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3e8488f2ae8..faa058f3ec0 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1058,7 +1058,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode	\
    || (MODE) == V2DImode || (MODE) == V2QImode				\
    || (MODE) == DFmode	|| (MODE) == DImode				\
-   || (MODE) == HFmode || (MODE) == BFmode)
+   || (MODE) == HFmode || (MODE) == BFmode				\
+   || (MODE) == SCmode)
 
 #define VALID_SSE_REG_MODE(MODE)					\
   ((MODE) == V1TImode || (MODE) == TImode				\
@@ -1067,7 +1068,7 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == TFmode || (MODE) == TDmode)
 
 #define VALID_MMX_REG_MODE_3DNOW(MODE) \
-  ((MODE) == V2SFmode || (MODE) == SFmode)
+  ((MODE) == V2SFmode || (MODE) == SFmode || (MODE) == SCmode)
 
 /* To match ia32 psABI, V4HFmode should be added here.  */
 #define VALID_MMX_REG_MODE(MODE)					\
@@ -1110,13 +1111,15 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == V16SImode || (MODE) == V32HImode || (MODE) == V8DFmode	\
    || (MODE) == V16SFmode \
    || (MODE) == V32HFmode || (MODE) == V16HFmode || (MODE) == V8HFmode  \
-   || (MODE) == V32BFmode || (MODE) == V16BFmode || (MODE) == V8BFmode)
+   || (MODE) == V32BFmode || (MODE) == V16BFmode || (MODE) == V8BFmode	\
+   || (MODE) == SCmode)
 
 #define X87_FLOAT_MODE_P(MODE)	\
   (TARGET_80387 && ((MODE) == SFmode || (MODE) == DFmode || (MODE) == XFmode))
 
 #define SSE_FLOAT_MODE_P(MODE) \
-  ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode))
+  ((TARGET_SSE && (MODE) == SFmode) || (TARGET_SSE2 && (MODE) == DFmode) \
+   || (TARGET_SSE2 && (MODE) == SCmode))
 
 #define SSE_FLOAT_MODE_SSEMATH_OR_HF_P(MODE)				\
   ((SSE_FLOAT_MODE_P (MODE) && TARGET_SSE_MATH)				\
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 80b43fd7db7..06281eb0fd6 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -30504,3 +30504,147 @@
   "TARGET_AVXVNNIINT16"
   "vpdp<vpdpwprodtype>\t{%3, %2, %0|%0, %2, %3}"
    [(set_attr "prefix" "vex")])
+
+(define_expand "movsc"
+  [(match_operand:SC 0 "nonimmediate_operand" "")
+   (match_operand:SC 1 "nonimmediate_operand" "")]
+  ""
+  {
+    emit_insn (gen_movv2sf (simplify_gen_subreg (V2SFmode, operands[0], SCmode, 0),
+			    simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0)));
+    DONE;
+  }
+)
+
+(define_expand "addsc3"
+  [(match_operand:SC 0 "register_operand" "=r")
+   (match_operand:SC 1 "register_operand" "r")
+   (match_operand:SC 2 "register_operand" "r")]
+  ""
+  {
+    emit_insn (gen_addv2sf3 (simplify_gen_subreg (V2SFmode, operands[0], SCmode, 0),
+			     simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0),
+			     simplify_gen_subreg (V2SFmode, operands[2], SCmode, 0)));
+    DONE;
+  }
+)
+
+(define_expand "subsc3"
+  [(match_operand:SC 0 "register_operand" "=r")
+   (match_operand:SC 1 "register_operand" "r")
+   (match_operand:SC 2 "register_operand" "r")]
+  ""
+  {
+    emit_insn (gen_subv2sf3 (simplify_gen_subreg (V2SFmode, operands[0], SCmode, 0),
+			     simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0),
+			     simplify_gen_subreg (V2SFmode, operands[2], SCmode, 0)));
+    DONE;
+  }
+)
+
+(define_expand "negsc2"
+  [(match_operand:SC 0 "register_operand" "=r")
+   (match_operand:SC 1 "register_operand" "r")]
+  ""
+  {
+    emit_insn (gen_negv2sf2 (simplify_gen_subreg (V2SFmode, operands[0], SCmode, 0),
+                             simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0)));
+    DONE;
+  }
+)
+
+(define_expand "sse_shufsc"
+  [(match_operand:V4SF 0 "register_operand")
+   (match_operand:SC 1 "register_operand")
+   (match_operand:SC 2 "vector_operand")
+   (match_operand:SI 3 "const_int_operand")]
+  "TARGET_SSE"
+{
+  int mask = INTVAL (operands[3]);
+  emit_insn (gen_sse_shufsc_sc (operands[0],
+						     operands[1],
+						     operands[2],
+						     GEN_INT ((mask >> 0) & 3),
+						     GEN_INT ((mask >> 2) & 3),
+						     GEN_INT (((mask >> 4) & 3) + 4),
+						     GEN_INT (((mask >> 6) & 3) + 4)));
+  DONE;
+})
+
+(define_insn "sse_shufsc_sc"
+  [(set (match_operand:V4SF 0 "register_operand" "=x,v")
+	(vec_select:V4SF
+	  (vec_concat:V4SF
+	    (match_operand:V2SF 1 "register_operand" "0,v")
+	    (match_operand:V2SF 2 "vector_operand" "xBm,vm"))
+	  (parallel [(match_operand 3 "const_0_to_3_operand")
+		     (match_operand 4 "const_0_to_3_operand")
+		     (match_operand 5 "const_4_to_7_operand")
+		     (match_operand 6 "const_4_to_7_operand")])))]
+  "TARGET_SSE"
+{
+  int mask = 0;
+  mask |= INTVAL (operands[3]) << 0;
+  mask |= INTVAL (operands[4]) << 2;
+  mask |= (INTVAL (operands[5]) - 4) << 4;
+  mask |= (INTVAL (operands[6]) - 4) << 6;
+  operands[3] = GEN_INT (mask);
+
+  switch (which_alternative)
+    {
+    case 0:
+      return "shufps\t{%3, %2, %0|%0, %2, %3}";
+    case 1:
+      return "vshufps\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+    default:
+      gcc_unreachable ();
+    }
+}
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "type" "sseshuf")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "orig,maybe_evex")
+   (set_attr "mode" "V4SF")])
+
+(define_expand "mulsc3"
+  [(match_operand:SC 0 "register_operand" "=r")
+   (match_operand:SC 1 "register_operand" "r")
+   (match_operand:SC 2 "register_operand" "r")]
+  "TARGET_SSE3"
+  {
+    rtx a = gen_reg_rtx (V4SFmode);
+    rtx b = gen_reg_rtx (V4SFmode);
+    emit_insn (gen_sse_shufsc (a,
+                                    simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0),
+                                    simplify_gen_subreg (V2SFmode, operands[1], SCmode, 0),
+                                    GEN_INT (0b01000100)));
+    emit_insn (gen_sse_shufsc (b,
+                                    simplify_gen_subreg (V2SFmode, operands[2], SCmode, 0),
+                                    simplify_gen_subreg (V2SFmode, operands[2], SCmode, 0),
+                                    GEN_INT (0b00010100)));
+    emit_insn (gen_mulv4sf3 (a, a, b));
+    emit_insn (gen_sse_shufps (b,
+                                    a,
+                                    a,
+                                    GEN_INT (0b00001101)));
+    emit_insn (gen_sse_shufps (a,
+                                    a,
+                                    a,
+                                    GEN_INT (0b00001000)));
+    emit_insn (gen_vec_addsubv2sf3 (simplify_gen_subreg (V2SFmode, operands[0], SCmode, 0),
+				    simplify_gen_subreg (V2SFmode, a, V4SFmode, 0),
+				    simplify_gen_subreg (V2SFmode, b, V4SFmode, 0)));
+    DONE;
+  }
+)
+
+(define_expand "conjsc2"
+  [(match_operand:SC 0 "register_operand" "=r")
+   (match_operand:SC 1 "register_operand" "r")]
+  ""
+  {
+    emit_insn (gen_negdf2 (simplify_gen_subreg (DFmode, operands[0], SCmode, 0),
+			   simplify_gen_subreg (DFmode, operands[1], SCmode, 0)));
+    DONE;
+  }
+)
-- 
2.17.1






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 08/11] Native complex ops: Add explicit vector of complex
  2023-09-12 10:07     ` [PATCH v2 08/11] Native complex ops: Add explicit vector of complex Sylvain Noiry
@ 2023-09-12 17:25       ` Joseph Myers
  2023-09-13  6:48         ` Richard Biener
  0 siblings, 1 reply; 24+ messages in thread
From: Joseph Myers @ 2023-09-12 17:25 UTC (permalink / raw)
  To: Sylvain Noiry; +Cc: gcc-patches

On Tue, 12 Sep 2023, Sylvain Noiry via Gcc-patches wrote:

> Summary:
> Allow the creation and usage of builtins vectors of complex
> in C, using __attribute__ ((vector_size ()))

If you're adding a new language feature like this, you need to update 
extend.texi to explain the valid uses of the attribute for complex types, 
and (under "Vector Extensions") the valid uses of the resulting vectors.  
You also need to add testcases to the testsuite for such vectors - both 
execution tests covering valid uses of the vectors, and tests that invalid 
declarations or uses of such vectors (uses with any operator, or other 
operand to such operator, that aren't valid) are properly rejected - go 
through all cases of operators, with one or two complex vector operands, 
of the same or different types, and with different choices for what type 
the other operand might be when one has complex vector type, and make sure 
they are all properly tested and do have the desired and documented 
semantics.

If the intended semantics are the same for C and C++, the tests should be 
c-c++-common tests.  Any cases where the intended semantics are different 
will need separate tests for each language or appropriately conditional 
test assertions in c-c++-common.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 08/11] Native complex ops: Add explicit vector of complex
  2023-09-12 17:25       ` Joseph Myers
@ 2023-09-13  6:48         ` Richard Biener
  0 siblings, 0 replies; 24+ messages in thread
From: Richard Biener @ 2023-09-13  6:48 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Sylvain Noiry, gcc-patches

On Tue, Sep 12, 2023 at 7:26 PM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Tue, 12 Sep 2023, Sylvain Noiry via Gcc-patches wrote:
>
> > Summary:
> > Allow the creation and usage of builtins vectors of complex
> > in C, using __attribute__ ((vector_size ()))
>
> If you're adding a new language feature like this, you need to update
> extend.texi to explain the valid uses of the attribute for complex types,
> and (under "Vector Extensions") the valid uses of the resulting vectors.
> You also need to add testcases to the testsuite for such vectors - both
> execution tests covering valid uses of the vectors, and tests that invalid
> declarations or uses of such vectors (uses with any operator, or other
> operand to such operator, that aren't valid) are properly rejected - go
> through all cases of operators, with one or two complex vector operands,
> of the same or different types, and with different choices for what type
> the other operand might be when one has complex vector type, and make sure
> they are all properly tested and do have the desired and documented
> semantics.
>
> If the intended semantics are the same for C and C++, the tests should be
> c-c++-common tests.  Any cases where the intended semantics are different
> will need separate tests for each language or appropriately conditional
> test assertions in c-c++-common.

And to add - in other related discussions we always rejected adding vector types
of composite types.  I realize that if the hardware supports vector complex
arithmetic instructions this might be the first true good reason to allow these.

Richard.

> --
> Joseph S. Myers
> joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-09-13  6:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-17  9:02 [PATCH 0/9] Native complex operations Sylvain Noiry
2023-07-17  9:02 ` [PATCH 1/9] Native complex operations: Conditional lowering Sylvain Noiry
2023-07-17  9:02 ` [PATCH 2/9] Native complex operations: Move functions to hooks Sylvain Noiry
2023-07-17  9:02 ` [PATCH 3/9] Native complex operations: Add gen_rtx_complex hook Sylvain Noiry
2023-07-17  9:02 ` [PATCH 4/9] Native complex operations: Allow native complex regs and ops in rtl Sylvain Noiry
2023-07-17  9:02 ` [PATCH 5/9] Native complex operations: Add the conjugate op in optabs Sylvain Noiry
2023-07-17  9:02 ` [PATCH 6/9] Native complex operations: Update how complex rotations are handled Sylvain Noiry
2023-07-17  9:02 ` [PATCH 7/9] Native complex operations: Vectorization of native complex operations Sylvain Noiry
2023-07-17  9:02 ` [PATCH 8/9] Native complex operations: Add explicit vector of complex Sylvain Noiry
2023-07-17  9:02 ` [PATCH 9/9] Native complex operation: Experimental support in x86 backend Sylvain Noiry
2023-09-12 10:07   ` [PATCH v2 0/11] Native complex operations Sylvain Noiry
2023-09-12 10:07     ` [PATCH v2 01/11] Native complex ops : Conditional lowering Sylvain Noiry
2023-09-12 10:07     ` [PATCH v2 02/11] Native complex ops: Move functions to hooks Sylvain Noiry
2023-09-12 10:07     ` [PATCH v2 03/11] Native complex ops: Add gen_rtx_complex hook Sylvain Noiry
2023-09-12 10:07     ` [PATCH v2 04/11] Native complex ops: Allow native complex regs and ops in rtl Sylvain Noiry
2023-09-12 10:07     ` [PATCH v2 05/11] Native complex ops: Add the conjugate op in optabs Sylvain Noiry
2023-09-12 10:07     ` [PATCH v2 06/11] Native complex ops: Update how complex rotations are handled Sylvain Noiry
2023-09-12 10:07     ` [PATCH v2 07/11] Native complex ops: Vectorization of native complex operations Sylvain Noiry
2023-09-12 10:07     ` [PATCH v2 08/11] Native complex ops: Add explicit vector of complex Sylvain Noiry
2023-09-12 17:25       ` Joseph Myers
2023-09-13  6:48         ` Richard Biener
2023-09-12 10:07     ` [PATCH v2 09/11] Native complex ops: remove useless special cases Sylvain Noiry
2023-09-12 10:07     ` [PATCH v2 10/11] Native complex ops: Add a fast complex multiplication pattern Sylvain Noiry
2023-09-12 10:07     ` [PATCH v2 11/11] Native complex ops: Experimental support in x86 backend Sylvain Noiry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).