public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] vect: while_ult for integer mask
@ 2022-09-28 15:05 Andrew Stubbs
  2022-09-29  7:52 ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Stubbs @ 2022-09-28 15:05 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1239 bytes --]

This patch is a prerequisite for some amdgcn patches I'm working on to 
support shorter vector lengths (having fixed 64 lanes tends to miss 
optimizations, and masking is not supported everywhere yet).

The problem is that, unlike AArch64, I'm not using different mask modes 
for different sized vectors, so all loops end up using the while_ultsidi 
pattern, regardless of vector length.  In theory I could use SImode for 
V32, HImode for V16, etc., but there's no mode to fit V4 or V2 so 
something else is needed.  Moving to using vector masks in the backend 
is not a natural fit for GCN, and would be a huge task in any case.

This patch adds an additional length operand so that we can distinguish 
the different uses in the back end and don't end up with more lanes 
enabled than there ought to be.

I've made the extra operand conditional on the mode so that I don't have 
to modify the AArch64 backend; that uses while_<cond> family of 
operators in a lot of places and uses iterators, so it would end up 
touching a lot of code just to add an inactive operand, plus I don't 
have a way to test it properly.  I've confirmed that AArch64 builds and 
expands while_ult correctly in a simple example.

OK for mainline?

Thanks

Andrew

[-- Attachment #2: 220928-while_ult-length.patch --]
[-- Type: text/plain, Size: 4556 bytes --]

vect: while_ult for integer masks

Add a vector length parameter needed by amdgcn without breaking aarch64.

All amdgcn vector masks are DImode, regardless of vector length, so we can't
tell what length is implied simply from the operator mode.  (Even if we used
different integer modes there's no mode small enough to differenciate a 2 or
4 lane mask).  Without knowing the intended length we end up using a mask with
too many lanes enabled, which leads to undefined behaviour..

The extra operand is not added for vector mask types so AArch64 does not need
to be adjusted.

gcc/ChangeLog:

	* config/gcn/gcn-valu.md (while_ultsidi): Limit mask length using
	operand 3.
	* doc/md.texi (while_ult): Document new operand 3 usage.
	* internal-fn.cc (expand_while_optab_fn): Set operand 3 when lhs_type
	maps to a non-vector mode.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 3bfdf8213fc..dec81e863f7 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -3052,7 +3052,8 @@ (define_expand "vcondu<V_ALL:mode><V_INT:mode>_exec"
 (define_expand "while_ultsidi"
   [(match_operand:DI 0 "register_operand")
    (match_operand:SI 1 "")
-   (match_operand:SI 2 "")]
+   (match_operand:SI 2 "")
+   (match_operand:SI 3 "")]
   ""
   {
     if (GET_CODE (operands[1]) != CONST_INT
@@ -3077,6 +3078,11 @@ (define_expand "while_ultsidi"
 			      : ~((unsigned HOST_WIDE_INT)-1 << diff));
 	emit_move_insn (operands[0], gen_rtx_CONST_INT (VOIDmode, mask));
       }
+    if (INTVAL (operands[3]) < 64)
+      emit_insn (gen_anddi3 (operands[0], operands[0],
+			     gen_rtx_CONST_INT (VOIDmode,
+						~((unsigned HOST_WIDE_INT)-1
+						  << INTVAL (operands[3])))));
     DONE;
   })
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index d46963f468c..d8e2a5a83f4 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4950,9 +4950,10 @@ This pattern is not allowed to @code{FAIL}.
 @cindex @code{while_ult@var{m}@var{n}} instruction pattern
 @item @code{while_ult@var{m}@var{n}}
 Set operand 0 to a mask that is true while incrementing operand 1
-gives a value that is less than operand 2.  Operand 0 has mode @var{n}
-and operands 1 and 2 are scalar integers of mode @var{m}.
-The operation is equivalent to:
+gives a value that is less than operand 2, for a vector length up to operand 3.
+Operand 0 has mode @var{n} and operands 1 to 3 are scalar integers of mode
+@var{m}.  Operand 3 should be omitted when @var{n} is a vector mode.  The
+operation for vector modes is equivalent to:
 
 @smallexample
 operand0[0] = operand1 < operand2;
@@ -4960,6 +4961,14 @@ for (i = 1; i < GET_MODE_NUNITS (@var{n}); i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+And for non-vector modes the operation is equivalent to:
+
+@smallexample
+operand0[0] = operand1 < operand2;
+for (i = 1; i < operand3; i++)
+  operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
+@end smallexample
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 651d99eaeb9..c306240c2ac 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -3664,7 +3664,7 @@ expand_direct_optab_fn (internal_fn fn, gcall *stmt, direct_optab optab,
 static void
 expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 {
-  expand_operand ops[3];
+  expand_operand ops[4];
   tree rhs_type[2];
 
   tree lhs = gimple_call_lhs (stmt);
@@ -3680,10 +3680,24 @@ expand_while_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
       create_input_operand (&ops[i + 1], rhs_rtx, TYPE_MODE (rhs_type[i]));
     }
 
+  int opcnt;
+  if (!VECTOR_MODE_P (TYPE_MODE (lhs_type)))
+    {
+      /* When the mask is an integer mode the exact vector length may not
+	 be clear to the backend, so we pass it in operand[3].
+         Use the vector in arg2 for the most reliable intended size.  */
+      tree type = TREE_TYPE (gimple_call_arg (stmt, 2));
+      create_integer_operand (&ops[3], TYPE_VECTOR_SUBPARTS (type));
+      opcnt = 4;
+    }
+  else
+    /* The mask has a vector type so the length operand is unnecessary.  */
+    opcnt = 3;
+
   insn_code icode = convert_optab_handler (optab, TYPE_MODE (rhs_type[0]),
 					   TYPE_MODE (lhs_type));
 
-  expand_insn (icode, 3, ops);
+  expand_insn (icode, opcnt, ops);
   if (!rtx_equal_p (lhs_rtx, ops[0].value))
     emit_move_insn (lhs_rtx, ops[0].value);
 }

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-03 14:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-28 15:05 [PATCH] vect: while_ult for integer mask Andrew Stubbs
2022-09-29  7:52 ` Richard Biener
2022-09-29  9:24   ` Richard Sandiford
2022-09-29  9:37     ` Richard Biener
2022-09-29  9:38     ` juzhe.zhong
2022-09-29  9:56     ` Andrew Stubbs
2022-09-29 10:17       ` Richard Sandiford
2022-09-29 13:46         ` Richard Biener
2022-10-03 14:27           ` Andrew Stubbs
2022-09-29  9:50   ` Andrew Stubbs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).